Job details

Safety Case Specialist (Capability Evaluations)

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Anthropic's Responsible Scaling Policy

Last summer we published our first Responsible Scaling Policy (RSP), which focuses on addressing catastrophic safety failures and misuse. In adopting such a policy, our primary goal has been to help turn high-level safety concepts into practical policies for fast-moving technical organizations and demonstrate the viability of these measures as possible standards.

Our Responsible Scaling Policy has been a powerful rallying point with many teams' work over the last six months connecting directly back to major RSP workstreams. The progress we have made has required significant work from teams across Anthropic and there is much more work to be done. Our new Responsible Scaling Team will:

Help leadership align on a practical approach to scaling responsibly that will raise the safety waterline in industry, inform regulation, and mitigate catastrophic risks from models
Rally teams internally to operationalize and implement this technical roadmap and set of high-level commitments, making object level decisions as needed
Iterate internally on different approaches to safety challenges, feeding these learnings back into the high-level policy, and sharing our learnings with industry and policymakers

As we continue to iterate on and improve the original policy, we are actively exploring ways to incorporate practices from existing risk management and operational safety domains. While none of these domains alone will be perfectly analogous, we expect to find valuable insights from nuclear security, biosecurity, systems safety, autonomous vehicles, aerospace, and cybersecurity. We intend to build an interdisciplinary team to help us integrate the most relevant and valuable practices from each.

Note: For this role, we are looking for candidates who can start within 3 months. We will consider all candidates who can meet the organization's hybrid policy, provided you have significant (60%+) overlap with Pacific Time. You can also submit a more general expression of interest for future RSP roles here.

About the Role

As a Safety Case Specialist focusing on capability evaluations, you'll work with our evaluations teams to develop evidence-based cases demonstrating the absence of capabilities beyond those which our current safety mitigations are suitable for. You'll analyze our evaluation processes, evaluate their strengths and weaknesses, and develop clear arguments for their efficacy. Your work will involve creating templates and best practices for safety cases, directly informing decisions about model development and deployment. This role requires pragmatic problem-solving, the ability to synthesize diverse viewpoints, and strong analytical thinking skills. Your work will be crucial in ensuring our AI deployment practices balance innovation and safety effectively.

Responsibilities:

Develop and write robust, evidence-based safety cases demonstrating the absence of capabilities beyond those which our current safety mitigations are suitable for
Create templates and best practices for capability evaluation safety cases, ensuring consistency and rigor across all assessments
Work closely with evaluation teams to analyze and understand our evaluation processes, methodologies, and results
Evaluate the strengths and weaknesses of our capability evaluation approaches, identifying potential gaps or areas for improvement
Synthesize complex technical information about model capabilities into clear, logical arguments accessible to both technical and non-technical stakeholders
Contribute to the design and implementation of new evaluation methodologies to address emerging capabilities or risks
Work with cross-functional teams to ensure capability evaluations align with the Responsible Scaling Policy (RSP)
Prepare and present capability evaluation safety case findings to Anthropic's board and other key decision-makers
Stay informed about the latest developments in AI capabilities and evaluation techniques, incorporating new insights into our safety case methodologies

You may be a good fit if you have:

Ability to break down complex problems into manageable components, identifying conceptual ambiguities or contradictions and identifying underlying assumptions and biases
Ability to envision long-term, high-impact outcomes and work backwards to identify critical near-term actions that shape those outcomes
Ability to understand and articulate the concerns, motivations, and constraints of diverse stakeholders and synthesize disparate ideas into solutions that address stakeholder constraints
Ability to balance idealism with practicality in decision-making; capacity to make sound decisions under time pressure or with incomplete information; talent for assessing the real-world feasibility of proposed solutions
History of quickly mastering complex technical domains, even those outside their primary area of expertise
History of tracking multiple complex work streams simultaneously without losing track of details; excellent time management skills; good prioritization

Strong candidates may also have experience with:

Knowledge of AI safety and governance issues, including an understanding of potential risks associated with advanced AI systems and current approaches to mitigating these risks.
Experience in risk management, systems safety, or developing safety cases or other safety best practices in complex technical environments.

The expected salary range for this position is:

Annual Salary:

$320,000—$405,000 USD

Logistics

Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Compensation and Benefits*

Anthropic’s compensation package consists of three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements collectively to be highly competitive with market rates.

Equity - For eligible roles, equity will be a major component of the total compensation. We aim to offer higher-than-average equity compensation for a company of our size, and communicate equity amounts at the time of offer issuance.

US Benefits - The following benefits are for our US-based employees:

Optional equity donation matching.
Comprehensive health, dental, and vision insurance for you and all your dependents.
401(k) plan with 4% matching.
22 weeks of paid parental leave.
Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more!
Stipends for education, home office improvements, commuting, and wellness.
Fertility benefits via Carrot.
Daily lunches and snacks in our office.
Relocation support for those moving to the Bay Area.

UK Benefits - The following benefits are for our UK-based employees:

Optional equity donation matching.
Private health, dental, and vision insurance for you and your dependents.
Pension contribution (matching 4% of your salary).
21 weeks of paid parental leave.
Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more!
Health cash plan.
Life insurance and income protection.
Daily lunches and snacks in our office.

* This compensation and benefits information is based on Anthropic’s good faith estimate for this position as of the date of publication and may be modified in the future. Employees based outside of the UK or US will receive a different benefits package. The level of pay within the range will depend on a variety of job-related factors, including where you place on our internal performance ladders, which is based on factors including past work experience, relevant education, and performance on our interviews or in a work trial.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Come work with us!

Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

Anthropic Glassdoor Company Review

No rating

Anthropic DE&I Review

No rating

CEO of Anthropic

Unknown name

Approve of CEO

By Anthropic

Anthropic is an AI startup public-benefit company dedicated to AI safety and research, aiming to develop dependable, interpretable, and controllable AI systems. The company was was founded by former members of OpenAI in 2021.

150 jobs

BADGES