Perplexity AI logo
FULL TIMERemote

AI Inference Engineer (London)

Perplexity AIVerified
Remote
about 1 month ago
Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers

  • Benchmark and address bottlenecks throughout our inference stack

  • Improve the reliability and observability of our systems and respond to system outages

  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)

  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)

  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Final offer amounts are determined by multiple factors, including, experience and expertise.

Equity: In addition to the base salary, equity may be part of the total compensation package.

Skills & Requirements

Tags

AI
Prepare for This Role
Practice with AI-powered mock interviews tailored to this job
  • Role-specific interview questions
  • AI-powered feedback & coaching
Practice Interview

Type

FULL TIME

Remote

Yes

Posted

Nov 17

AI Inference Engineer (London) at Perplexity AI - Remote | full-time | VirtualInterview.ai Job Board | VirtualInterview.ai - AI Interview Platform