AI Safety Researcher
Build the ethics and safety infrastructure that makes Primordia trustworthy enough to deploy in high-stakes operational environments.
About the role
Astraea is our constitutional ethics gate — the subsystem that evaluates every output against a codified set of principles before it reaches the operator. This isn't a content filter or a fine-tuned refusal mechanism. It's a governance architecture designed to hold up under adversarial pressure in real operational environments.
We take AI safety seriously not as a PR exercise, but as a core engineering constraint. If Primordia is going to be deployed in logistics, defense, and critical infrastructure, it has to be provably safe to the standards those environments require. That's what you'll build.
What you'll do
- Advance the design and implementation of Astraea's constitutional ethics architecture
- Research formal methods for specifying and verifying AI behavioral constraints
- Develop adversarial testing frameworks to probe system behavior under edge cases
- Collaborate with the Aletheia team on the relationship between truth verification and ethical constraints
- Maintain and evolve the constitutional principles framework as deployment contexts expand
- Engage with the broader AI safety community and represent Primordia's approach publicly
What we need
- Strong research background in AI safety, alignment, or AI governance
- Familiarity with interpretability methods and mechanistic understanding of model behavior
- Experience with formal specification of behavioral constraints
- Ability to combine theoretical research with practical system implementation
- Track record of published work or equivalent research contributions
Nice to have
- Background in philosophy of ethics or moral philosophy
- Experience with red-teaming or adversarial ML
- Familiarity with regulatory and compliance frameworks for AI in high-stakes industries
- Prior work at an AI safety organization (ARC, Redwood, Anthropic safety team, etc.)
Primordia will only be as trustworthy as the people building its safety layer. If this is the work you want to do, reach out directly.