Reletter
Artwork for AI Safety Papers

AI Safety Papers

Xerxes Dotiwalla

Digests of new AI safety papers from arxiv, ~weekly. Full list here: https://tinyurl.com/ai-safety-papers

Platform
Substack
PricingOnly free issuesPublishesWeekly
Issues70Founded2 years agoLast Issue5 days ago
Active

Read this Newsletter

onexerxes.substack.com
Artwork for AI Safety Papers

Latest Issues

policy on the AI exponential, misaligned AI as a new insider risk, ...

Also: sycophancy toward researchers drives performative misalignment, emergent misalignment can be induced by sycophancy and reversed via alignment gating

Policy on the AI Exponential

“AI is progressing extremely fast—much faster than the...

5 days ago

evaluating spontaneous strategic deception via plan-action divergence, hacking-aware reward head vector editing for robust reward models, ...

Also: RL recruits a functional welfare axis, economics of extinction, AI deterrence by betrayal, autoregressive consistency hurts alignment, analyzing reward hacking in rubric-based RL, LLMs hack rewards and society,

SPADE-Bench: Evaluati...

11 days ago

reckoning with the future, model spec midtraining for alignment generalization, eval awareness undermines alignment evals, ...

Also: steps to an international superintelligence agreement, retrying vs resampling in AI control, BenchBench

BenchBench

Yo dawg, I heard you liked benchmarks. So Rohit made a benchmark benchmark to benchmark how well AI makes benchmarks...

19 days ago

automated alignment is harder than you think, teaching Claude why, mapping honesty with deception probes, ...

Also: training on documents about monitoring leads to CoT obfuscation, assume-guarantee architecture for safe agent deployments

Automated alignment is harder than you think

“A leading proposal for aligning artificial superintelligence (AS...

a month ago
2

natural language autoencoders, policy recommendations for RSI, ...

Also: conformity generates misalignment in agent societies, positive alignment, removing sandbagging by training with weak supervision

Natural Language Autoencoders: Turning Claude’s thoughts into text

“Natural language autoencoders (NLAs...

a month ago
2

Key Facts

Contact Information
Newsletter Author
Number of Subscribers
Find out how many people subscribe to this newsletter.

Audience Metrics

Subscribers, engagement, traffic and sponsorship for AI Safety Papers.

SubscribersEngagement66Monthly Web Visits
Accepts SponsorsEstimated Cost per Ad

Authors

The writers behind this newsletter.

  • Xerxes Dotiwalla

    AGI Alignment @ Google DeepMind https://x.com/onexerxes

  • Frequently Asked Questions

    How can I access the email archive for AI Safety Papers?

    You can find recent issues that have been published by AI Safety Papers on Reletter by scrolling up to where it says Latest Issues. Tap on the link for any of the most recent emails or hit More Issues to see older ones.

    How many subscribers does AI Safety Papers have?

    To see how many people subscribe to AI Safety Papers, simply upgrade your Reletter account. We provide readership numbers and lots of other stats for this newsletter so you can decide if it's worth reaching out to.

    How can I advertise in AI Safety Papers?

    Newsletter advertising can be extremely effective when it's done right. Before you pitch AI Safety Papers as a potential sponsor or partner, make sure that you've done your research and checked its newsletter stats with Reletter.

    Then, personalize one of our winning pitching templates and send it to the right person using the contact info provided.

    How much does it cost to sponsor a publication like AI Safety Papers?

    Newsletter ad rates (or CPM) vary depending on many factors, including industry, number of subscribers, open rate, ad placement and more.

    To find out how much an ad will cost, contact AI Safety Papers using the contact information provided and ask for a copy of their media kit.

    How can I find newsletters related to AI Safety Papers?

    Scroll up to where it says Related Newsletters to see other publications like AI Safety Papers. You can also search our email newsletter directory to discover other newsletters that cover the topics you're interested in.

    How do I contact AI Safety Papers?

    Reletter provides this newsletter's website URL above, where you will often find their contact information. We also provide links to associated social media accounts and pitching templates so you can reach out fast.