
Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE ROLE:
We are seeking an engineer to join our team that will thrive in a fast-paced work environment, using effective communication, problem-solving and prioritization skills. Individuals that are well organized, show great attention to detail, and employ critical thinking are well-suited for our team.
THE PERSON:
This AMD (Advanced Micro Devices) team is looking for a senior level person that can help guide the team, mentor upcoming developers, provide long range strategy, and is willing to jump in to help resolve issues quickly. You will be involved in all areas that impact the team including performance, automation, and development. The right candidate will be informed on the latest trends and become prepared to give consultative direction to senior management.
KEY RESPONSIBILITIES:
Diagnose, troubleshoot, and resolve complex issues related to GPU communication, performance bottlenecks, and scalability.
Enhance and maintain RCCL (Radeon Collective Communication Library).
Collaborate with hardware engineers and other software developers to optimize performance across various interconnects.
Work closely with data scientists and AI teams to integrate RCCL with major deep learning frameworks like TensorFlow and PyTorch.
Contribute to the improvement of documentation and user guides for RCCL and associated software components.
Stay informed of the latest advancements in GPU technologies, collective communication strategies, and HPC (High-Performance Computing) trends.
Mentor engineers and technical leaders, fostering a culture of innovation and excellence. Help develop the next generation of leaders through coaching, training, and feedback.
PREFERRED EXPERIENCE:
12+ years of Total experience and Deep expertise with distributed programming models (MPI, SHMEM), and the implementation and optimization of collective communication algorithms
Deep expertise with RoCE, RDMA, and network topologies
Experience with system software development in C/C++, and GPU software development, parallel programing and GPU architectures.
Strong programming skills in C++, CUDA, HIP, MPI, OpenMP or similar parallel computing languages.
Familiarity with ROCm ecosystem and open-source development processes is a plus.
Strong problem-solving skills and the ability to work in a collaborative environment.
Excellent written and verbal communication skills.
ACADEMIC CREDENTIALS:
Bachelor's or Master's in Computer Science, or a closely related field
#LI-SK4
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Apply on company website