Do-Not-Train Signals

What data should an ML model developer be able to train on? This post is a proposal for addressing that question. In my view, powerful ML systems will radically change the world, and

Is Agency Identifiable?

Identifiability in IRL One of my favorite papers is this one, titled "Occam's razor is insufficient to infer the preferences of irrational agents". It relates to an area of

Distributional Trust

I've written previously about factors impacting cooperation, especially in the presence of large disagreements. One factor that relates to cooperation that has been on my mind lately is trust. There are

Social Firewalls

I'm interested in how people can cooperate despite large disagreements. Part of this is because I believe such cooperation may be necessary for tackling issue in AI safety (e.g. some