FULL TIME

AI Inference Engineer (San Francisco)

Perplexity AIVerified
Unknown
over 1 year ago
Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers

  • Benchmark and address bottlenecks throughout our inference stack

  • Improve the reliability and observability of our systems and respond to system outages

  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)

  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)

  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

 

Job Details
TypeFULL TIME
LocationUnknown
WorkOn-site
PostedJun 10, 2024
SourcePerplexity AI
About Perplexity AI
Technology / AI
SMALL

AI search engine

AI-Powered
Practice for This Role
Get personalized AI interview coaching
  • Role-specific AI questions
  • Instant AI feedback
  • Behavioral & technical prep
Start AI Mock Interview

Takes 15-30 minutes

Get Notified
Create an alert for similar jobs