Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
AI & Machine Learning Site Reliability Engineer image - Rise Careers
Job details

AI & Machine Learning Site Reliability Engineer

Oomnitza offers the industry’s most versatile Enterprise Technology Management platform that orchestrates and automates key business processes for IT. Our SaaS solution, with agentless integrations, best practices and low-code workflows, enables enterprises to leverage their existing infrastructure systems and automate processes such as offboarding, onboarding, audit readiness, refresh forecasting and more, thereby reducing reliance on error-prone manual tasks and tickets. We help some of the most well-known and innovative companies to improve efficiency, expedite audits, mitigate cyber risk and eliminate redundant IT spend. 


Team Oomnitza are seeking an experienced AI & ML Site Reliability Engineer who is passionate about AI, machine learning, and data science to support our innovations in AI and Data product management. In this role, you will be responsible for architecting and maintaining infrastructure that supports machine learning (ML), artificial intelligence (AI), and data-driven solutions. You will help stand up the foundational systems that enable large-scale AI deployment, including developing and managing Oomnitza’s big data analytics platform, developing AI architecture, implementing vector databases, building knowledge graphs, and optimizing systems for ML model deployment and inference.You will collaborate closely with data scientists, infrastructure engineers, product management teams, and UX designers to ensure our customers realize meaningful business value by streamlining workflows, ensure scalability, and manage the complete lifecycle of AI systems from development to production.


Responsibilities
  • Big Data Analytics Platform Build and maintain Oomnitza’s big data analytics platform that centralizes data from multiple customer instances and serves analytics and AI solutions
  • AI/ML Architecture & Infrastructure Development Design and build scalable, secure, and efficient AI infrastructure to support training and deploying machine learning models and AI software solutions.
  • Vector Databases & Knowledge Graphs Implement and manage vector databases for storing high-dimensional data and knowledge graphs to integrate structured and unstructured data.
  • Retrieval Augmented Generation (RAG) & GraphRAG Develop and integrate retrieval-augmented generation systems for more accurate, scalable, and context-aware models, including GraphRAG for advanced reasoning.
  • LLM Fine-Tuning, Transfer Learning & Optimization Work with data scientists to train and optimize and fine-tune large language models (LLMs) for specific business applications and ensure seamless integration with existing systems.
  • ML Model Deployment & Orchestration Deploy, manage, and monitor ML models in production, ensuring system reliability, scalability, and performance.
  • CI/CD for Machine Learning Pipelines Implement continuous integration and continuous deployment (CI/CD) processes tailored for machine learning, ensuring reproducibility and automation.
  • Agent Development & Automation Work with data scientists and the AI product management team todevelop and manage AI agents for task automation, process optimization, and adaptive learning systems.
  • Model Monitoring & Governance Ensure model performance monitoring, retraining, and governance protocols are in place for reliable and ethical AI usage.
  • Collaboration & Team Support Work closely with data scientists, ML engineers, and cross-functional teams to support development, testing, and deployment needs.


Qualifications
  • Education: Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field 
  • Experience: 5+ years of experience in site reliability engineering, dev ops, ML Ops or similar roleExperience with cloud platforms such as AWS, GCP, or Azure, including AI/ML services (e.g., SageMaker, Google Colab, Vertex AI).Proficient in deploying machine learning models such as regressions, decision trees, neural networks, recommendations systems, etc., into production and managing model lifecycle.
  • Technical Skills: Experience with data processing tools such as Apache Spark, Hadoop, or Airflow for large-scale data processing.Experience with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch, LangChain, Hugging Face).Strong understanding of vector databases (e.g., Pinecone, Milvus, Chroma) and knowledge graph tools (e.g., Neo4j, RDF).Experience with RAG (Retrieval-Augmented Generation) techniques and GraphRAG systems.Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).Proficiency in programming languages such as Python, Bash, and experience with ML tools and libraries.Experience implementing CI/CD for ML pipelines and working with ML version control systems (e.g., DVC, MLflow).Experience in on-call incident response in high-uptime environments
  • Behavioural Skills: Intellectual curiosity with a hunger to know how things work and question established ideas, concepts and frameworks
  • Spirit of service: with a “how can I serve” attitude that is centered around delivering value to the greater team, the overall company, and for our broader community of customers
  • Ability to embrace ambiguity: and apply structured structured thinking and  problem-solving skills
  • Entrepreneurial spirit with an enthusiasm to take on new challenges
  • Excellent communication and collaboration skills


Additional (Preferred) Qualifications


What We Can Offer You
  • Healthcare for dependents and spouse 
  • A progressive, healthy work culture with excellent opportunities for professional and personal development.  
  • Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales.
  • A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad


Our Benefits Package
  • Dental & Vision Insurance 
  • Employee equity plan
  • Health Insurance for your spouse and dependents 
  • Pension, Life insurance and Income protection
  • Remote working & flexible work schedules Working from home equipment allowance
  • Choice of preferred equipment, Mac or PC.
  • Regular, fun social events and  workshops.




Oomnitza recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, sex, disability, age, veteran status, and other protected status as required by applicable law.

Oomnitza Glassdoor Company Review
3.8 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Oomnitza DE&I Review
3.9 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Oomnitza
Oomnitza CEO photo
Unknown name
Approve of CEO

Average salary estimate

$135000 / YEARLY (est.)
min
max
$120000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About AI & Machine Learning Site Reliability Engineer, Oomnitza

Join Team Oomnitza as an AI & Machine Learning Site Reliability Engineer, where you will be at the forefront of revolutionizing enterprise technology management. In this remote role based in Galway, you will play a pivotal part in our innovative approach to AI and data product management. Your main mission will be to architect and maintain scalable infrastructures that support cutting-edge machine learning (ML) and artificial intelligence (AI) applications. You'll have the opportunity to develop Oomnitza’s big data analytics platform, enabling data from multiple customers to serve intelligent analytics and machine learning solutions. Collaborating closely with data scientists, infrastructure engineers, and product managers, you will help streamline workflows, ensure system reliability, and guide ML models from development to production. We're looking for someone with a proven track record in site reliability engineering and a penchant for the latest in AI and ML technologies. Your expertise in tools and frameworks such as TensorFlow and Docker will be invaluable as you implement CI/CD processes tailored for machine learning. If you're driven, curious, and ready to take on exciting challenges, Oomnitza offers a supportive culture with excellent benefits and opportunities for personal growth and professional development. Come and contribute to groundbreaking advancements that will redefine productivity for our clients and the industry as a whole!

Frequently Asked Questions (FAQs) for AI & Machine Learning Site Reliability Engineer Role at Oomnitza
What are the primary responsibilities of the AI & Machine Learning Site Reliability Engineer at Oomnitza?

As an AI & Machine Learning Site Reliability Engineer at Oomnitza, your primary responsibilities will include building and maintaining our big data analytics platform, designing scalable AI infrastructure, implementing and managing vector databases, and optimizing ML model deployment and orchestration processes. You will collaborate with various teams to enhance system performance and ensure seamless integration of AI solutions into our existing systems.

Join Rise to see the full answer
What qualifications are necessary for the AI & Machine Learning Site Reliability Engineer position at Oomnitza?

To qualify for the AI & Machine Learning Site Reliability Engineer role at Oomnitza, candidates should possess a Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field along with a minimum of 5 years of relevant experience in site reliability engineering, dev ops, or ML Ops. Proficiency in cloud platforms like AWS or Azure and familiarity with machine learning tools and frameworks are also essential.

Join Rise to see the full answer
What technical skills are expected from an AI & Machine Learning Site Reliability Engineer at Oomnitza?

Candidates for the AI & Machine Learning Site Reliability Engineer position at Oomnitza should demonstrate expertise in data processing tools like Apache Spark and experience with AI/ML tools such as TensorFlow and PyTorch. Familiarity with vector databases and knowledge graph implementations, as well as programming proficiency in Python and Bash, are crucial for success in this role.

Join Rise to see the full answer
How does Oomnitza support the professional growth of an AI & Machine Learning Site Reliability Engineer?

Oomnitza fosters a progressive work culture that emphasizes professional and personal development. As an AI & Machine Learning Site Reliability Engineer, you will work directly with founders and top performers, allowing you to drive initiatives and contribute to significant innovations within the company. Regular workshops and opportunities for collaboration will further enhance your career prospects in this rapidly evolving field.

Join Rise to see the full answer
What is the company culture like for the AI & Machine Learning Site Reliability Engineer at Oomnitza?

The company culture at Oomnitza is characterized by a spirit of service, intellectual curiosity, and an entrepreneurial mindset. As an AI & Machine Learning Site Reliability Engineer, you will be encouraged to embrace ambiguity, question established norms, and contribute positively to the overall company mission. The remote working model promotes flexibility while social events foster a sense of community among team members.

Join Rise to see the full answer
Common Interview Questions for AI & Machine Learning Site Reliability Engineer
Can you explain your experience with machine learning models and their deployment?

In responding to this question, detail specific projects where you have designed, trained, and deployed machine learning models in production environments. Highlight the methodologies used, such as regression or neural networks, as well as technologies like TensorFlow or PyTorch that you utilized to achieve deployment success. Emphasize challenges faced and how you overcame them to ensure model reliability.

Join Rise to see the full answer
How do you approach building scalable AI infrastructure?

When tackling this question, communicate your understanding of scalable systems, focusing on your experience in designing infrastructures that can grow with data needs. Discuss how factors such as redundancy, load balancing, and resource optimization play a role in maintaining high-performance in AI applications while also ensuring cost-efficiency.

Join Rise to see the full answer
What tools do you use for continuous integration and deployment in machine learning projects?

Describe specific tools you are familiar with, like Jenkins or Git for CI/CD practices tailored to ML workflows. Explain how you have implemented these tools to ensure automated testing and deployment, emphasizing the importance of reproducibility and automation in maintaining the integrity of machine learning processes.

Join Rise to see the full answer
Can you describe your experience with data processing frameworks?

Focus on specific frameworks you have utilized, such as Apache Spark, and describe how you have applied these tools to handle large datasets. Illustrate your ability to devise data processing pipelines that cater to the needs of machine learning projects, and mention any optimizations or efficiencies gained through your efforts.

Join Rise to see the full answer
What techniques do you employ for monitoring and maintaining the performance of AI models?

Discuss your strategies for model performance monitoring, such as implementing KPIs and using A/B testing. Explain the importance of retraining models and ensuring they reflect current data accurately, along with any tools or dashboards you use to facilitate performance tracking.

Join Rise to see the full answer
How do you ensure effective collaboration with cross-functional teams?

Outline your communication style and collaborative practices when working with data scientists, software engineers, and product managers. Highlight specific examples where effective collaboration led to successful project outcomes, ensuring you convey the importance of mutual understanding and collaboration in achieving company goals.

Join Rise to see the full answer
What do you understand about vector databases and their application in AI?

Detail your knowledge of vector databases' functionalities and the advantages they bring to AI applications, particularly in terms of storing and querying high-dimensional data. Discuss any hands-on experience you have had implementing or managing such databases within ML projects.

Join Rise to see the full answer
How do you stay updated with the latest trends in AI and machine learning?

Demonstrate your commitment to lifelong learning by discussing specific journals, conferences, workshops, or online courses you follow. Emphasize how your self-driven research and continuous education efforts keep your skills relevant in this fast-paced field.

Join Rise to see the full answer
Describe a challenging situation you faced in a previous role related to AI/ML?

When answering this question, narrate a compelling story about a specific challenge regarding AI/ML deployments, detailing the context, your thought process, actions taken, and the successful outcome. Highlight your problem-solving skills and resilience in challenging situations.

Join Rise to see the full answer
What is your experience with ethical considerations and governance in AI?

Share your understanding of ethical AI practices and governance frameworks you've implemented in previous projects. Illustrate how you ensure AI systems' transparency, accountability, and fairness, indicating your awareness of these crucial considerations in modern AI work.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
PA Consulting Remote IDA Business Park, Southern Cross Road, Irishtown, Bray, A98Y6W0, Co. Dublin, Ireland, Bray, Ireland
Posted 2 days ago
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
Qureight Ltd Remote No location specified
Posted 13 days ago
Posted 7 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition
Photo of the Rise User
AECOM Hybrid Denver, CO, USA
Posted 12 days ago

Oomnitza is the asset management solution built for the connected world. Oomnitza automates and manages the entire lifecycle of machines improving security, reducing manual tasks and providing higher quality data and reporting.

3 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
January 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!