Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
LLM/ML Engineer (Inference) image - Rise Careers
Job details

LLM/ML Engineer (Inference)

About the role

About Us

The vast majority of enterprise data is in files like PDFs and spreadsheets. That includes everything from financial statements to medical records. Reducto helps AI teams turn those really complex documents into LLM-ready inputs with exceptional accuracy. This means they can build more reliable products while saving engineering time.

Our Traction

In less than a year we've scaled to 7 figures in ARR, serving customers from ambitious startups to Fortune 10 enterprises. We're now processing tens of millions of pages monthly.

The core work will include:

  • Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models

  • Optimizing model serving infrastructure for high throughput and low latency at scale

  • Developing and integrating advanced inference optimization techniques

  • Working closely with our research team to bring cutting-edge capabilities into production

  • Building developer tools and infrastructure to support rapid experimentation and deployment.

We would love to meet you if you:

  • Philosophy: You are your own worst critic. You have a high bar for quality and don’t rest until the job is done right—no settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.

  • Experience: You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.

  • Approach: You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

Bonus points if you:

  • Have experience with low-level systems programming (CUDA, Triton) and compiler optimization

  • Are passionate about open-source contributions and staying current with ML infrastructure developments

  • Bring practical experience with high-performance computing and distributed systems

  • Have worked in early-stage environments where you helped shape technical direction

  • Are energized by solving complex technical challenges in a collaborative environment

This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

About Reducto

Nearly 80% of enterprise data is in unstructured formats like PDFs

PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week.

Traditional approaches fail at reliably extracting information in complex PDFs

OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with.

Reducto breaks document layouts into subsections and then contextually parses each depending on the type of content. This is made possible by a combination of vision models, LLMs, and a suite of heuristics we built over time. Put simply, we can help you:

  • Accurately extract text and tables even with nonstandard layouts

  • Automatically convert graphs to tabular data and summarize images in documents

  • Extract important fields from complex forms with simple, natural language instructions

  • Build powerful retrieval pipelines using Reducto’s document metadata

  • Intelligently chunk information using the document’s layout data

Review Reducto Company Benefits Here

Reducto is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law.

Average salary estimate

$125000 / YEARLY (est.)
min
max
$100000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About LLM/ML Engineer (Inference), Reducto

Are you ready to take your career to the next level as an LLM/ML Engineer (Inference) at Reducto in sunny San Francisco? Here at Reducto, we focus on transforming the vast majority of enterprise data—which commonly resides in complex documents like PDFs and spreadsheets—into something AI can really work with. Imagine developing inference systems that serve state-of-the-art AI models, allowing teams to build reliable products while reducing engineering workload. Your main challenge will be optimizing our model serving infrastructure for high throughput and low latency. But that's not all! You’ll also collaborate closely with our research team, bringing cutting-edge capabilities to life in practical applications. If you have a knack for Python and PyTorch and thrive in a fast-paced, early-stage environment, you’ll fit right in. Your high standards for quality and proactive approach will help maintain our momentum as we grow rapidly. With the chance to innovate and break new ground in ML infrastructure, the opportunities for growth are immense. Join us in revolutionizing how enterprises process their data and become part of a team that values creativity, collaboration, and tackling complex problems head-on. Let’s change the game together here at Reducto!

Frequently Asked Questions (FAQs) for LLM/ML Engineer (Inference) Role at Reducto
What are the primary responsibilities of an LLM/ML Engineer (Inference) at Reducto?

As an LLM/ML Engineer (Inference) at Reducto, your core responsibilities will include architecting and implementing robust scalable inference systems, optimizing model-serving infrastructure, and developing advanced inference optimization techniques. You'll also work hand-in-hand with our research team to ensure that the latest capabilities find their way into production, all while building essential developer tools for rapid experimentation and deployment.

Join Rise to see the full answer
What qualifications are needed for the LLM/ML Engineer (Inference) position at Reducto?

To be considered for the LLM/ML Engineer (Inference) position at Reducto, you should have extensive experience in Python and PyTorch, along with a strong foundation in concepts like multi-threading and memory management. Additionally, familiarity with modern inference systems and the ability to create custom tooling for optimization are key requirements. A passion for solving complex technical challenges and a proactive mindset will set you apart.

Join Rise to see the full answer
What makes Reducto an exciting place to work for LLM/ML Engineers (Inference)?

Reducto offers a dynamic and fast-paced work environment where LLM/ML Engineers (Inference) can make a significant impact. With our rapidly growing customer base and innovative solutions for processing complex documents, you’ll find ample opportunities to solve real-world challenges. Being part of an early-stage company, you have the chance to shape technical direction while collaborating with passionate team members committed to excellence and innovation.

Join Rise to see the full answer
What kind of projects can I expect as an LLM/ML Engineer (Inference) at Reducto?

As an LLM/ML Engineer (Inference) at Reducto, you can expect to work on projects that involve architecting inference systems that process millions of complex documents monthly. You'll dive into optimizing our infrastructure for better performance, creating cutting-edge tools, and participating in the experimentation processes that drive our growth. Your contributions will lead to enhancing AI models so they can reliably extract meaningful information from complex document formats.

Join Rise to see the full answer
What personal qualities are important for success as an LLM/ML Engineer (Inference) at Reducto?

Success in the LLM/ML Engineer (Inference) role at Reducto requires a blend of high standards for quality, a proactive approach to problem-solving, and a willingness to collaborate closely with fellow engineers and researchers. If you are driven, self-motivated, and have an insatiable desire to innovate and improve existing processes, you'll thrive in our fast-paced environment.

Join Rise to see the full answer
Common Interview Questions for LLM/ML Engineer (Inference)
Can you describe your experience with Python and PyTorch in relation to inference systems?

In answering this question, highlight specific projects where you've used Python and PyTorch. Provide examples that demonstrate your ability to build or optimize inference systems, focusing on performance improvements and innovative solutions you've implemented.

Join Rise to see the full answer
How do you approach optimizing model serving infrastructure?

Discuss your methodology for optimizing infrastructure, including techniques you've used in the past, such as load testing, performance profiling, and identifying bottlenecks. Mention any tools or frameworks you're familiar with that help in this process.

Join Rise to see the full answer
What challenges have you faced while implementing inference systems, and how did you overcome them?

Share a specific challenge, detailing the problem, your analysis, and the solution you implemented. This showcases your analytical skills and tenacity, which are crucial in overcoming hurdles in ML projects.

Join Rise to see the full answer
Can you explain how you debug complex systems?

Describe your debugging process, emphasizing a structured approach. Mention tools you use and how you gather insights from logs or metrics. An example would reinforce this with real-world context.

Join Rise to see the full answer
What advanced optimization techniques have you implemented in inference systems?

Provide details about specific techniques, such as using TensorRT-LLM or vLLM. Highlight results from your applications, like reduced latency or improved throughput, to demonstrate your knowledge.

Join Rise to see the full answer
How do you stay up to date with ML infrastructure developments?

Explain your methods for staying informed, whether through research papers, conferences, community forums, or contributing to open-source projects. Your commitment to continuous learning is a strong point.

Join Rise to see the full answer
Can you discuss a project where you improved inference performance?

Detail a specific project, including the starting performance metrics, the changes you made, and the impact on the overall system's performance. Quantifying your results can make your answer stand out.

Join Rise to see the full answer
What role does collaboration play in your projects?

Emphasize the importance of teamwork in building ML systems. Provide examples of how your collaboration with researchers or developers led to successful project outcomes, and discuss how you navigate differing opinions.

Join Rise to see the full answer
What motivates you to work in the ML field, particularly in inference systems?

Share your passion for machine learning and inference systems, focusing on the challenges and opportunities they present. Discuss any specific events, projects, or advancements that have inspired you.

Join Rise to see the full answer
What do you believe is the future of inference systems in enterprise applications?

Share your insights on where you see inference systems heading, noting trends like increased automation and integration in enterprise workflows. Discuss how you believe companies like Reducto can lead this transition.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Inclusive & Diverse
Collaboration over Competition
Fast-Paced
Growth & Learning
Empathetic
Posted 6 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
Posted 5 days ago
Posted 6 days ago
Posted 10 days ago
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
LOCATION
No info
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
January 10, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!