Do-Not-Train Signals

What data should an ML model developer be able to train on? This post is a proposal for addressing that question. In my view, powerful ML systems will radically change the world, and

Is Agency Identifiable?

Identifiability in IRL One of my favorite papers is this one, titled "Occam's razor is insufficient to infer the preferences of irrational agents". It relates to an area of

Distributional Trust

I've written previously about factors impacting cooperation, especially in the presence of large disagreements. One factor that relates to cooperation that has been on my mind lately is trust. There are

Social Firewalls

I'm interested in how people can cooperate despite large disagreements. Part of this is because I believe such cooperation may be necessary for tackling issue in AI safety (e.g. some

A baseline for regulation of ML models

Machine learning researchers and engineers love baselines. Baselines serve as an important starting point to make improvements, and the ability to check new ideas against baselines helps measure and incentivize progress. For hard