FULL TIME

AI Inference Engineer (San Francisco)

Perplexity AIVerified

Unknown

over 1 year ago

Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Qualifications

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Job Details

TypeFULL TIME

LocationUnknown

WorkOn-site

PostedJun 10, 2024

SourcePerplexity AI

About Perplexity AI

Technology / AI

SMALL

AI search engine

AI-Powered

Practice for This Role

Get personalized AI interview coaching

Role-specific AI questions
Instant AI feedback
Behavioral & technical prep

Start AI Mock Interview

Takes 15-30 minutes

Similar Jobs

More at Perplexity AI Similar roles

Get Notified

Create an alert for similar jobs

Create Job Alert