Do-Not-Train Signals
What data should an ML model developer be able to train on? This post is a proposal for addressing that question. In my view, powerful ML systems will radically change the world, and
Is Agency Identifiable?
Identifiability in IRL
One of my favorite papers is this one, titled "Occam's razor is insufficient to infer the preferences of irrational agents". It relates to an area of
Distributional Trust
I've written previously about factors impacting cooperation, especially in the presence of large disagreements. One factor that relates to cooperation that has been on my mind lately is trust. There are
Social Firewalls
I'm interested in how people can cooperate despite large disagreements. Part of this is because I believe such cooperation may be necessary for tackling issue in AI safety (e.g. some
A baseline for regulation of ML models
Machine learning researchers and engineers love baselines. Baselines serve as an important starting point to make improvements, and the ability to check new ideas against baselines helps measure and incentivize progress. For hard