Software Development Manager, LLM Inference Model Enablement, Neuron SDK | Cupertino, California | Remote-Friendly | $166,400 - $287,700
We're working with Annapurna Labs (U.S.) Inc. on this exciting opportunity.
This role offers a unique opportunity to lead a team of expert AI/ML engineers in optimizing and enabling state-of-the-art open-source and customer LLMs on custom AWS machine learning accelerators. You'll drive innovation in model enablement speed and inference usability, working across a vertically integrated system stack that includes PyTorch, Neuron compiler, and runtime.
Key Responsibilities
- Lead a team of expert AI/ML engineers to onboard and optimize open-source and customer LLMs for inference on Neuron, Trainium, and Inferentia accelerators.
- Drive improvements in model enablement speed and overall experience.
- Advance inference usability and quality through new features, infrastructure optimization, tools, and automation.
- Define and deliver model enablement and performance optimization for the latest state-of-the-art LLMs in collaboration with senior management.
What You'll Need
- 3+ years of engineering team management experience.
- Strong background in LLM model architectures, performance optimizations, and inference techniques using distributed inference libraries.
- Ability to manage demanding, fast-changing priorities in a dynamic environment.
- Strong technical ability to understand and deliver as part of a vertically integrated system stack including PyTorch inference library, Neuron compiler, runtime, and collectives.
What's On Offer
- Opportunities for mentorship and career growth within AWS.
- A focus on work-life harmony and flexibility.
Apply via Haystack today!