Job Description

Join to apply for the Machine Learning Engineer - Model Performance role at inference.net

Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.

About Inference.net
We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.

Responsibilities

Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models
Deploy and maintain large language models at scale in production environments
Deploy new models as they are released by frontier labs
Implement techniques like quantization, speculative decoding, and KV cache reuse
Contribute regularly to open source projects such as SGLang and vLLM
Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues
Collaborate with the engineering team to bring new features and capabilities to our inference platform
Develop robust and scalable infrastructure for AI model serving
Create and maintain technical documentation for inference systems

Requirements

3+ years of experience writing high-performance, production-quality code
Strong proficiency with Python and deep learning frameworks, particularly PyTorch
Demonstrated experience with LLM inference optimization techniques
Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred
Familiarity with Docker and Kubernetes for containerized deployments
Experience with CUDA programming and GPU optimization
Strong understanding of distributed systems and scalability challenges
Proven track record of optimizing AI models for production environments

Nice to Have

Familiarity with TensorRT and TensorRT-LLM
Knowledge of vision models and multimodal AI systems
Experience implementing techniques like quantization and speculative decoding
Contributions to open source machine learning projects
Experience with large-scale distributed computing

Compensation
We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including:

Full healthcare coverage
Quarterly offsites
Flexible PTO

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!

Seniority level

Seniority level
Not Applicable

Employment type

Employment type
Full-time

Job function

Job function
Engineering and Information Technology
Industries
Software Development

Referrals increase your chances of interviewing at inference.net by 2x

Sign in to set job alerts for Machine Learning Engineer roles.

San Francisco, CA $115,000.00-$185,000.00 5 days ago

San Francisco, CA $140,000.00-$180,000.00 5 months ago

San Francisco, CA $175,000.00-$225,000.00 8 months ago

San Francisco, CA $150,000.00-$225,000.00 3 months ago

AI/ML Engineer (Founding Technical Team)

San Francisco, CA $145,000.00-$175,000.00 1 month ago

San Francisco, CA $100,000.00-$300,000.00 1 month ago

San Francisco, CA $140,000.00-$160,000.00 4 months ago

Research Engineer - Machine Learning (ML)

San Francisco, CA $140,000.00-$200,000.00 3 weeks ago

San Mateo, CA $140,000.00-$210,000.00 1 month ago

San Mateo, CA $195,000.00-$255,000.00 7 months ago

San Francisco, CA $150,000.00-$195,000.00 3 days ago

San Francisco, CA $150,000.00-$225,000.00 2 weeks ago

Machine Learning Engineer, Identity Product

San Francisco, CA $212,000.00-$318,000.00 4 days ago

Software Engineer - Data Acquisition / Web Crawling

San Francisco, CA $225,000.00-$325,000.00 6 months ago

San Francisco, CA $85,000.00-$120,000.00 1 hour ago

San Francisco, CA $133,687.50-$178,250.00 3 days ago

San Francisco, CA $140.00-$210.00 8 months ago

ML Research Engineer, Foundation Models (Senior / Staff / Principal)

San Francisco, CA $85,000.00-$300,000.00 4 weeks ago

San Francisco, CA $140,000.00-$250,000.00 1 month ago

Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Job Tags

Full time, Work at office, Flexible hours,

Similar Jobs

Bestica

Travel Nurse RN - ED - Emergency Department Job at Bestica

...Requirements\nMassachusetts About Bestica We are a trusted provider of solutions in Information Technology and Healthcare sectors to the DoD, Federal and Commercial markets. Our guiding principle and core values help us care for our people and the community; and build a...

Solomon Page

Travel Nurse RN - Long Term Acute Care - $2,232 per week Job at Solomon Page

...Solomon Page is seeking a travel nurse RN Long Term Acute Care for a travel nursing job in Battle Creek, Michigan. Job Description & Requirements... ..., membership-based access to virtual primary care, urgent care, mental health therapy, a vision program, and prescription...

Dubizzle MENA

Search Engine Optimization (SEO) Specialist Job at Dubizzle MENA

As part of Dubizzle Group, we are alongside some of the strongest classified brands in the market. With a collective strength of 8 brands, we have more than 160 million monthly users that trust in our dedication to providing them with the best platform for their needs....

Manatee Memorial Hospital

Sterile Processing Tech Full-Time Evenings MMH Job at Manatee Memorial Hospital

...Description Responsibilities Cleans, processes & assembles surgical instrument/sets and... ...Degree Other: REQUIRED EXPERIENCE Description Required Preferred Minimum of one (1) year sterile processing experience EEO...

Dialysis Clinic, Inc.

Dialysis Equipment BioMedical Technician Job at Dialysis Clinic, Inc.

...Overview Dialysis Clinic, Inc. is recruiting top talent interested in supporting our nonprofit mission to prioritize individualized... ...ratio than other providers. The Equipment Bio Medical Technician ensures equipment and supplies used in all areas of the facility...

Machine Learning Engineer - Model Performance (San Francisco) Job at inference.net, San Francisco, CA

M0d5Mk0vUUVORUJ2eHE1dDNxbmUrS1Erb1E9PQ==

Job Description

Seniority level

Seniority level

Employment type

Employment type

Job function

Job function

Industries

Sign in to set job alerts for Machine Learning Engineer roles.

AI/ML Engineer (Founding Technical Team)

Research Engineer - Machine Learning (ML)

Machine Learning Engineer, Identity Product

Software Engineer - Data Acquisition / Web Crawling

ML Research Engineer, Foundation Models (Senior / Staff / Principal)

Job Tags

Similar Jobs

Bestica

Travel Nurse RN - ED - Emergency Department Job at Bestica

Solomon Page

Travel Nurse RN - Long Term Acute Care - $2,232 per week Job at Solomon Page

Dubizzle MENA

Search Engine Optimization (SEO) Specialist Job at Dubizzle MENA

Manatee Memorial Hospital

Sterile Processing Tech Full-Time Evenings MMH Job at Manatee Memorial Hospital

Dialysis Clinic, Inc.

Dialysis Equipment BioMedical Technician Job at Dialysis Clinic, Inc.

San Francisco, CA

Full Time

2025-09-25

2025-10-25