About the Role
As a Senior Infrastructure Technical Program Manager (TPM) at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that democratize access to AI technology globally. You will drive cross-functional excellence by streamlining critical workflows and enhancing communication across internal and external teams. Join top engineers, researchers, and innovators to shape the future of AI infrastructure and power the next generation of AI-driven solutions.
Responsibilities
- Product Development: Design and build products for AI researchers, developers, and enterprise customers, translating technical requirements into product features and collaborating with research, engineering, and design teams. Develop and execute strategic plans for the Observability, Storage, Network Engineering, and Security infrastructure teams.
- End-to-End Product Ownership: Own a comprehensive product roadmap, detailing key features, enhancements, and releases. Drive end-to-end product development, manage development and testing, and lead launches.
- Stakeholder Engagement: Engage with stakeholders to understand their needs, pain points, and feedback. Drive initiatives to enhance customer satisfaction and loyalty through product improvements and innovative solutions.
- Cross-Functional Execution: Lead and align diverse cross-functional teams — including Research, Engineering, DevOps, SRE, and Go-to-Market — to ensure seamless project delivery and organizational success.
Requirements
- ML Product or Infrastructure Experience: 5+ years of experience building and scaling AI/ML-powered products and infrastructure, specifically collaborating with research and engineering teams.
- Proven experience with large-scale technology deployments, including cloud computing platforms, decentralized cloud infrastructure, and distributed systems (e.g., containerization and orchestration tools).
- Familiarity with the technical domains of Observability, Storage, Network Engineering, and Security for infrastructure.Experience with cloud computing platforms, decentralized cloud infrastructure, and/or similar large-scale technology deployments.
- Familiarity with cloud-based technologies (e.g., AWS, Google Cloud, or Azure)
- Technical Foundation: Bachelor's or Master's degree in Machine Learning, Computer Science, Engineering, or a related field.
- Exceptional analytical and problem-solving skills, with a demonstrated ability to identify and proactively mitigate technical risks
- Experience using AI tools, such as ClaudeCode or similar, to accelerate analytical progress.
- Executive and Organizational Acumen
- Proven ability to thrive in a fast-paced, ambiguous startup environment, prioritizing complex tasks and managing multiple simultaneous projects.
- Strong organizational abilities to build cross-functional alignment and establish clear, focused priorities.
- A proactive and collaborative team-oriented approach, demonstrating a willingness to drive necessary outcomes across the company.
- Excellent communication and program management skills for effective collaboration with both internal stakeholders and external vendors.
About Together AI
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.
Compensation
We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $225k to 265k + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. This is a hybrid role based in the Bay Area.
Equal Opportunity
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at https://www.together.ai/privacy