Reletter
Artwork for AI Safety Papers

AI Safety Papers

Xerxes Dotiwalla

Digests of new AI safety papers from arxiv, ~weekly. Full list here: https://tinyurl.com/ai-safety-papers

Platform
Substack
PricingOnly free issuesPublishesWeekly
Issues67Founded2 years agoLast Issue4 days ago
Active

Read this Newsletter

onexerxes.substack.com
Artwork for AI Safety Papers

Latest Issues

automated alignment is harder than you think, teaching Claude why, mapping honesty with deception probes, ...

Also: training on documents about monitoring leads to CoT obfuscation, assume-guarantee architecture for safe agent deployments

Automated alignment is harder than you think

“A leading proposal for aligning artificial superintelligence (AS...

4 days ago

natural language autoencoders, policy recommendations for RSI, ...

Also: conformity generates misalignment in agent societies, positive alignment, removing sandbagging by training with weak supervision

Natural Language Autoencoders: Turning Claude’s thoughts into text

“Natural language autoencoders (NLAs...

11 days ago
2

misalignment still present with inoculation prompting, introspection adapters, failing safer with reward hacking

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

“Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good?

Prior work suggests you can. We...

25 days ago
1

writing AI constitutions, commentary on Mythos and superintelligence, current AIs seem misaligned, ...

Also: RLVR can lead to reward hacking, taxonomy of LLM deception, automated alignment researchers, ten ways to think about gradual disempowerment

Mythos is just the beginning

“If you were waiting for a sign that superintelligence is comin...

a month ago

models transmit behavioral traits through hidden signals in data, emergent deception and trust in multi-agent simulation, ...

Also: detecting violations across many agent traces, detecting real-world scheming in the wild with open-source intelligence, mechanisms of introspection awareness

Language models transmit behavioural traits through hidden signals in data...

a month ago

Key Facts

Contact Information
Newsletter Author
Number of Subscribers
Find out how many people subscribe to this newsletter.

Audience Metrics

Subscribers, engagement, traffic and sponsorship for AI Safety Papers.

SubscribersEngagement67Monthly Web Visits
Accepts SponsorsEstimated Cost per Ad

Authors

The writers behind this newsletter.

  • Xerxes Dotiwalla

    AGI Alignment @ Google DeepMind https://x.com/onexerxes

  • Frequently Asked Questions

    How can I access the email archive for AI Safety Papers?

    You can find recent issues that have been published by AI Safety Papers on Reletter by scrolling up to where it says Latest Issues. Tap on the link for any of the most recent emails or hit More Issues to see older ones.

    How many subscribers does AI Safety Papers have?

    To see how many people subscribe to AI Safety Papers, simply upgrade your Reletter account. We provide readership numbers and lots of other stats for this newsletter so you can decide if it's worth reaching out to.

    How can I advertise in AI Safety Papers?

    Newsletter advertising can be extremely effective when it's done right. Before you pitch AI Safety Papers as a potential sponsor or partner, make sure that you've done your research and checked its newsletter stats with Reletter.

    Then, personalize one of our winning pitching templates and send it to the right person using the contact info provided.

    How much does it cost to sponsor a publication like AI Safety Papers?

    Newsletter ad rates (or CPM) vary depending on many factors, including industry, number of subscribers, open rate, ad placement and more.

    To find out how much an ad will cost, contact AI Safety Papers using the contact information provided and ask for a copy of their media kit.

    How can I find newsletters related to AI Safety Papers?

    Scroll up to where it says Related Newsletters to see other publications like AI Safety Papers. You can also search our email newsletter directory to discover other newsletters that cover the topics you're interested in.

    How do I contact AI Safety Papers?

    Reletter provides this newsletter's website URL above, where you will often find their contact information. We also provide links to associated social media accounts and pitching templates so you can reach out fast.