Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
LLM/ML Engineer (Inference) image - Rise Careers
Job details

LLM/ML Engineer (Inference)

About the role

About Us

The vast majority of enterprise data is in files like PDFs and spreadsheets. That includes everything from financial statements to medical records. Reducto helps AI teams turn those really complex documents into LLM-ready inputs with exceptional accuracy. This means they can build more reliable products while saving engineering time.

Our Traction

In less than a year we've scaled to 7 figures in ARR, serving customers from ambitious startups to Fortune 10 enterprises. We're now processing tens of millions of pages monthly.

The core work will include:

  • Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models

  • Optimizing model serving infrastructure for high throughput and low latency at scale

  • Developing and integrating advanced inference optimization techniques

  • Working closely with our research team to bring cutting-edge capabilities into production

  • Building developer tools and infrastructure to support rapid experimentation and deployment.

We would love to meet you if you:

  • Philosophy: You are your own worst critic. You have a high bar for quality and don’t rest until the job is done right—no settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.

  • Experience: You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.

  • Approach: You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

Bonus points if you:

  • Have experience with low-level systems programming (CUDA, Triton) and compiler optimization

  • Are passionate about open-source contributions and staying current with ML infrastructure developments

  • Bring practical experience with high-performance computing and distributed systems

  • Have worked in early-stage environments where you helped shape technical direction

  • Are energized by solving complex technical challenges in a collaborative environment

This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

About Reducto

Nearly 80% of enterprise data is in unstructured formats like PDFs

PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week.

Traditional approaches fail at reliably extracting information in complex PDFs

OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with.

Reducto breaks document layouts into subsections and then contextually parses each depending on the type of content. This is made possible by a combination of vision models, LLMs, and a suite of heuristics we built over time. Put simply, we can help you:

  • Accurately extract text and tables even with nonstandard layouts

  • Automatically convert graphs to tabular data and summarize images in documents

  • Extract important fields from complex forms with simple, natural language instructions

  • Build powerful retrieval pipelines using Reducto’s document metadata

  • Intelligently chunk information using the document’s layout data

Average salary estimate

$120000 / YEARLY (est.)
min
max
$100000K
$140000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About LLM/ML Engineer (Inference), Reducto

Are you excited about the rapidly evolving field of machine learning and AI? Reducto is looking for a talented LLM/ML Engineer (Inference) to join our San Francisco team! We know that enterprise data is primarily locked away in complex files like PDFs and spreadsheets, and we're on a mission to unlock that data and enhance the power of AI. As an LLM/ML Engineer at Reducto, your main focus will be architecting and implementing robust inference systems that serve our state-of-the-art AI models. You’ll be optimizing our model serving infrastructure to ensure high throughput and low latency at scale. You’ll collaborate closely with our research team to push the boundaries of what’s possible and bring cutting-edge capabilities into production. If you have deep expertise in Python and PyTorch and are experienced with systems like TGI and TensorRT-LLM, we want to meet you! We value high-quality work—you're your own worst critic and understand the importance of delivering top-notch results without compromises. Bonus points if you have experience in low-level systems programming and are passionate about open-source contributions! If you're ready to roll up your sleeves and contribute to solving complex technical challenges in a dynamic startup environment, Reducto is the place for you. We’re excited to see your application and hopefully welcome you to our innovative team that’s making strides in document data processing.

Frequently Asked Questions (FAQs) for LLM/ML Engineer (Inference) Role at Reducto
What are the primary responsibilities of the LLM/ML Engineer (Inference) at Reducto?

The LLM/ML Engineer (Inference) at Reducto is primarily responsible for architecting and implementing scalable inference systems. You'll optimize our model serving infrastructure for high throughput and low latency while developing advanced optimization techniques. Collaboration with the research team to bring cutting-edge capabilities into production is also a key aspect of the role.

Join Rise to see the full answer
What qualifications do I need to apply for the LLM/ML Engineer (Inference) position at Reducto?

To apply for the LLM/ML Engineer (Inference) role at Reducto, you should have deep expertise in Python and PyTorch, along with a solid understanding of operating systems concepts such as multi-threading and memory management. Experience with modern inference systems like TGI and TensorRT-LLM is essential, as is the ability to create custom tooling for testing and optimization.

Join Rise to see the full answer
Why is the LLM/ML Engineer (Inference) role important at Reducto?

This role is critical as it focuses on transforming how enterprise data embedded in complex documents is processed and utilized. By optimizing our inference systems, you will directly contribute to helping AI teams extract valuable information more accurately and efficiently, ultimately enhancing product reliability for our diverse client base.

Join Rise to see the full answer
What type of work culture can I expect as an LLM/ML Engineer (Inference) at Reducto?

At Reducto, you can expect a vibrant, fast-paced work culture where collaboration and continuous improvement are valued. As an early-stage company, we encourage our engineers to take ownership of their work, tackle challenges, and develop solutions that shape our technical direction, making your contributions truly impactful.

Join Rise to see the full answer
What technologies will I work with as an LLM/ML Engineer (Inference) at Reducto?

You will work with advanced technologies such as Python, PyTorch, and various inference systems like TGI, vLLM, and TensorRT-LLM. Additionally, familiarity with low-level systems programming tools like CUDA and Triton would be beneficial as Reducto continues to innovate and expand its capabilities.

Join Rise to see the full answer
Common Interview Questions for LLM/ML Engineer (Inference)
Can you explain your experience with optimization techniques in machine learning?

Discuss specific optimization techniques you've implemented in previous projects, such as model compression or performance tuning. Highlight any tools you’ve used and the impact it had on model accuracy and efficiency.

Join Rise to see the full answer
How do you approach debugging complex inference systems?

Describe your methodical approach to debugging, including the tools you use and your process for isolating issues. Provide a specific example of a challenge you faced and how you resolved it.

Join Rise to see the full answer
What is your experience with model serving infrastructure?

Share any experience you have in building or maintaining model serving systems. Discuss the technologies you've used, such as TensorFlow Serving or Flask, and any performance metrics you were able to improve.

Join Rise to see the full answer
How do you stay updated with the latest advancements in machine learning infrastructure?

Talk about the resources you utilize, such as research papers, online courses, or participation in relevant communities. Mention any recent trends or technologies in ML infrastructure that you're particularly excited about.

Join Rise to see the full answer
Describe how you've collaborated with research teams in the past.

Provide an example of a project where you worked hand-in-hand with a research team. Explain how you contributed to bringing experimental models into production and the challenges that arose during this process.

Join Rise to see the full answer
What strategies do you utilize for high-throughput model serving?

Discuss specific strategies you have implemented, like batching requests or utilizing asynchronous APIs. Highlight any performance benchmarks you achieved as a result.

Join Rise to see the full answer
Can you walk us through a project where you implemented low-level systems programming?

Select a project that showcases your skills in low-level programming, such as CUDA or Triton. Detail the project's objectives, your implementation process, and the results you achieved.

Join Rise to see the full answer
How do you ensure quality in your machine learning models?

Mention the methodologies you employ for model validation and testing, such as cross-validation or A/B testing. Discuss how you track performance over time and ensure models meet your quality standards before deployment.

Join Rise to see the full answer
What tools do you prefer for rapid experimentation in machine learning?

List the tools and platforms that have helped you accelerate your experimentation process, such as Jupyter notebooks for prototyping or MLflow for tracking experiments. Share an instance where these tools led to a successful outcome.

Join Rise to see the full answer
What do you find most challenging about working in an early-stage startup environment, and how do you overcome it?

Reflect on the unique challenges you’ve faced in a startup context, such as resource limitations or ambiguity in role definitions. Share your strategies for embracing change and flexibility to maintain productivity.

Join Rise to see the full answer
Similar Jobs
Posted yesterday
Reducto Hybrid San Francisco
Posted yesterday
Photo of the Rise User
Posted 6 days ago
Photo of the Rise User
Inclusive & Diverse
Collaboration over Competition
Growth & Learning
Transparent & Candid
Mission Driven
Diversity of Opinions
Empathetic
Fast-Paced
Rise from Within
Work/Life Harmony
Take Risks
Startup Mindset
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Employee Resource Groups
401K Matching
Paid Holidays
Paid Sick Days
Photo of the Rise User
LivePerson Remote Bulgaria Remote
Posted 24 hours ago
Photo of the Rise User
Posted 7 days ago
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Posted 2 days ago
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
LOCATION
No info
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
January 10, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!