AI Engineer: Device Parallelism Researcher
(Ph.D/MS NCG position: Modern C++, CUDA device level parallelism)
KLA is hiring engineers for its Advanced Computing Labs in Chennai, India. KLA ACL is at our new research center in the IITM, Research Park. The goal of the center is to conduct computational research in parallel and distributed sub-systems and deploy them to KLA’s advanced semi-conductor platforms that are used for inspection and metrology tasks in leading fabs. These efforts are part of a larger global initiative at KLA to scale up its AI + HPC + cloud infrastructure.
What will you be responsible for?
As part of this elite R&D team, the job entails understanding core algorithms that have to expressed in various parallel computing constructs particularly HPC accelerators such as GPUs. The first step in optimizing will be to theoretically model break down the algorithm and model it in terms of available bandwidth, computational FLOPS etc. The implementation steps will include CUDA level programming along with performance tuning to ensure that we can come close to achieving the theoretical model. The developer will be exposed to a variety of image processing, signal processing and deep learning loads that have to be optimized. A complimentary stage of optimization includes exploring existing libraries and programming in higher level constructs such C++ Parallel programming.
While the initial focus of the team will be on NVIDIA GPUs, the R&D team will also be looking at other GPU accelerators from other vendors as well as FPGA acceleration.
What we would like to see:
New/recent College graduates in Ph.D. (preferred), Dual Degree MS in EE, CS or CSE. Bachelors graduates will also be considered.
A person who has a strong background in computer architecture, and in particular with a focus on high performance parallel processing at the device level (GPUs or CPUs/SIMD or FPGAs).
The candidate should have a very strong mental model of computational loads and mapping different algorithms to parallel architectures.
Very proficient programming skills in C/Modern C++ and Python.
Experience in analyzing and tuning applications using profiling tools such as NSIGHT or VTUNES.
Reasonable comfort level with the Linux operating system at the user level.
Exposure to multiprocessor and multithreading concepts
Some familiarity with GPU programming such as CUDA, OpenCL or SYCL.
The position also requires a person with significant communication, initiative and the ability to navigate from relatively high-level requirements to low level computational models.
Any prior experience in KLA domains such as wafer inspection coupled with programming in CUDA or AVX will be a very plus. Additionally, any experience in optimizing large scale signal or computer vision algorithms would also be a major plus. It
in FPGA programming while not essential will also be a major plus.
in large scale distributed HPC systems, proven experience in Docker and Container orchestration and any expertise in AI Frameworks (TensorFlow) will also be welcome.
Finally a strong background in Modern C++ concepts (C++ 11 through C++ 17), STL library would also be a way to stand out from the crowd.
We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees.
KLA is proud to be an equal opportunity employer
To apply for this job please visit kla.wd1.myworkdayjobs.com.