The vast majority of enterprise data is in files like PDFs and spreadsheets. That includes everything from financial statements to medical records. Reducto helps AI teams turn those really complex documents into LLM-ready inputs with exceptional accuracy. This means they can build more reliable products while saving engineering time.
In less than a year we've scaled to 7 figures in ARR, serving customers from ambitious startups to Fortune 10 enterprises. We're now processing tens of millions of pages monthly.
Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
Optimizing model serving infrastructure for high throughput and low latency at scale
Developing and integrating advanced inference optimization techniques
Working closely with our research team to bring cutting-edge capabilities into production
Building developer tools and infrastructure to support rapid experimentation and deployment.
Philosophy: You are your own worst critic. You have a high bar for quality and don’t rest until the job is done right—no settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.
Experience: You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.
Approach: You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.
Have experience with low-level systems programming (CUDA, Triton) and compiler optimization
Are passionate about open-source contributions and staying current with ML infrastructure developments
Bring practical experience with high-performance computing and distributed systems
Have worked in early-stage environments where you helped shape technical direction
Are energized by solving complex technical challenges in a collaborative environment
This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.
Nearly 80% of enterprise data is in unstructured formats like PDFs
PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week.
Traditional approaches fail at reliably extracting information in complex PDFs
OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with.
Reducto breaks document layouts into subsections and then contextually parses each depending on the type of content. This is made possible by a combination of vision models, LLMs, and a suite of heuristics we built over time. Put simply, we can help you:
Accurately extract text and tables even with nonstandard layouts
Automatically convert graphs to tabular data and summarize images in documents
Extract important fields from complex forms with simple, natural language instructions
Build powerful retrieval pipelines using Reducto’s document metadata
Intelligently chunk information using the document’s layout data
Review Reducto Company Benefits Here
Reducto is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Are you ready to take your career to the next level as an LLM/ML Engineer (Inference) at Reducto in sunny San Francisco? Here at Reducto, we focus on transforming the vast majority of enterprise data—which commonly resides in complex documents like PDFs and spreadsheets—into something AI can really work with. Imagine developing inference systems that serve state-of-the-art AI models, allowing teams to build reliable products while reducing engineering workload. Your main challenge will be optimizing our model serving infrastructure for high throughput and low latency. But that's not all! You’ll also collaborate closely with our research team, bringing cutting-edge capabilities to life in practical applications. If you have a knack for Python and PyTorch and thrive in a fast-paced, early-stage environment, you’ll fit right in. Your high standards for quality and proactive approach will help maintain our momentum as we grow rapidly. With the chance to innovate and break new ground in ML infrastructure, the opportunities for growth are immense. Join us in revolutionizing how enterprises process their data and become part of a team that values creativity, collaboration, and tackling complex problems head-on. Let’s change the game together here at Reducto!
Subscribe to Rise newsletter