Jan 18, 2024 9 min read Regulation

Do-Not-Train Signals

Generated with stable diffusion

What data should an ML model developer be able to train on? This post is a proposal for addressing that question. In my view, powerful ML systems will radically change the world, and it's very hard to predict what policies will help with this. So instead or trying to offer a comprehensive solution to address every possible consequence of these systems, I offer something more incremental. I don't have high hopes that anyone cares what I think, but perhaps if it is as broadly applicable as I hope, it might have a chance of getting implemented? Anyway, a droid can dream.

The proposal

I suggest that there should be a requirement for what data can be used to train machine learning models. I don't know whether this should be opt-in or opt-out (discussed below), but either way, people should be able to give some type of signal that certain data/information can't be used to train ML models. This could be something simple, for example: the text (DNT) or some other symbol/encoding for "do not train" included on a page, inspired by the idea of copyright symbols, or it could be something more unambiguous (e.g. I hereby opt out of my data being used for training <start> etc. <end>, with <start> and <end> bracketing the text in question). It could be some text placed in the header or footer of a website or some type of indication in a website's robots.txt. Putting the details aside for a second, the core of the proposal is that if a certain piece of data is marked in this way, then this data can not be used for training any ML model. Essentially, people would provide some signal/demarcation that they want to limit the use of certain content for training.

I don't know what the details should look like, but I feel like this is a core skill that we as a society need to be able to implement. There are some things that model developers shouldn't be allowed to train on, we need a signal that tells them not to. There are many details here: who defines the limitations, how is this enforced, many other important considerations. I may not have all the answers to these questions, but I feel like they need to be answered.

Reasons

I present reasons why such a scheme might be useful. I don't go into detail for each of these, but hopefully each one demonstrate why people who come from a wide variety of worldviews might favor such a proposal. My goal is to present the case that this proposal addresses a wide variety of concerns related to ML models. I doubt this policy can fully address any one issue, but I hope it can be a positive step on many important issues. I give a quick survey of those issues that occur to me below.

Ownership

There are several instances of ongoing litigation against developers of large language models over the use of allegedly copyrighted content for training^[1]. I'm no expert on copyright, and I think reasonable people can disagree on the moral and practical issues surrounding intellectual property. But it seems to me like permission to use certain content could be very important, and it would be useful to have clear standards around such permission. LLM developers used a lot of text written substantially before anyone knew anything about language models. A lot of legal argument around whether they could or couldn't use that data implicates technical legal issues. Regardless of how those cases come out, I think it might be useful to have some clarity on how potentially copyrighted data can/should be used going forward. This proposal gives a reasonable framework that allows people who create "content"/text/images/whatever to have some control over how such content is used that is clear and reasonable to deal with on a technical level for ML developers.

Privacy

Some people value privacy and control over their data for various reasons beyond wanting to be able to profit from their data/content in the open market. Others may worry about how large amounts of data about them collected in one place could compromise their online or physical security. Some may view privacy as an intrinsic good, and thus strive to maintain autonomy over their own information. Regardless of the reasons, there is a set of reasons people may not want others to collect or use their data that I will simply summarize as privacy-based. This proposal would offer a tool to vindicate those interests.

Evaluations

Sound evaluations of ML model performance (especially out-of-distribution performance) is a critical technical and policy challenge. Separating what information was or was not used in training vs. evaluation is critical to getting this right. Anthropic has noted that the possible leakage of evaluations in training data is an obstacle in model evaluations. Inspired by the idea of canary strings, I think the proposal I give here can help ensure that we avoid the scenario of evaluating on data that a model has been trained on. The problem with this is well described by Anthropic at the link above:

This is comparable to students seeing the questions before the test—it’s cheating.

I think we should be able to guarantee that ML model developers aren't cheating. To my mind, it's unacceptable for developers to let these "cheating" results count towards their accuracy or assessment of the impact of these models.

A basic principle of machine learning is that you can't evaluate a model on the data it was trained on. Models "trained on the internet" and evaluated on data that is on the internet raise questions about whether evaluation data was available at training time. In my view, a bed rock principle of machine learning is that evaluation data needs to be systematically excluded from training. The proposal I give here would (hopefully) allow a way to ensure this. I think this is a fundamental issue that has birthed some surprising and interesting nuance with the recent advances in ML. I hope to write about this more, but my current view is that we need to lean very hard into the core principle that you don't train on the test set. The burden should be on model developers to abide by this core principle, and this proposal is a reasonable way to help with that.

Practicalities

Implementation

I think many variants of this policy would be practical for an AI model developer to implement. Most would come down to interacting with or parsing a website in a not-too-complicated way (robots.txt/header/footer), or else detecting a pattern or template within a modality that the developer is already set up to process because that modality is used by the developers model(s) (text templates, image watermarking).

Enforceability

I think enforcement of this policy would require some type of legislative or regulatory requirement for it to be effective. Assuming such an action were in place, this would be a reasonably enforceable policy in my view. In order to comply with the requirements, any AI developer would likely need code designed to exclude/include training data, as required. A review or audit of this code would give a lot of information about compliance. A developer would likely need to commit to certain code or processes having this purpose upon investigation. Maliciously attempting to design code that looked reasonable but was actually non-compliant would come with risks of demonstrating intentional violation. My understanding is that willful violations of laws are often much more costly than plausibly accidental ones, so this could be a large disincentive to such behavior, and would be reasonably detectable through a combination of code review and document review. As a result, assuming some type of legal enforcement, I think the combination of costs and benefits would push model developers towards compliance.

Parameters

I discuss parameters of this policy that could be changed to improve the ability of the proposal to achieve different outcomes.

Opt-in vs. Opt-out

This system could be either opt-in or opt-out, opt-in meaning that if some data contains no indication of whether it can be used for training, it is presumed that it is not available for training, while opt-out would assume that it is available. Opt-in would be consistent with a property rights perspective that whoever is generates a certain piece of content or information should naturally have some amount of control over it, and thus it is on the person who wishes to use content made by someone else to obtain permission. Opt-in would also place many of the costs of implementation of model trains, who I think are likely to be relatively large entities with more resources that an individual person, so it may make some practical sense to have those entities cover the costs of obtaining permission.

On the other hand, some people may worry that an opt-in approach would unduly burden model developers, especially smaller ones, and therefore lead to concentration of power only with larger developers who can afford the costs of compliance. There may also be a concern about an opt-in approach would kill this nascent technology or otherwise stifle innovations. Although I confess I'm somewhat skeptical of these possibilities, I don't think they are crazy things to worry about. People with these concerns might adopt the framework that I have proposed but advocate for an opt-out approach.

Block/Allow lists

There could also be versions of this policy where people could label their data in such a way that specifies certain model developers may (allow-list) or may not (deny-list) use their data for training. As an example, suppose that a certain piece of HTML is added to a website's footer to specify that the contents of the site can or cannot be used for training. Let's also imagine the existence of some type of repository of identifiers for different model developers. This code in a site's footer could specify a default of allow or deny (or the default could be chosen as part of the policy based on factors discussed above) along with lists of developer IDs that are allowed/denied from using the contents of the site for training.

I can think of many applications of these types of lists. Perhaps someone wishing to license their data could deny all developers other than those who will pay the fee. Someone who wants to support "open source" could deny-list which they view as closed and alloy-list those they view as sufficiently open. Someone who wants to have their data available to train custom models but wants to keep that data only within services they use might find developers who offer such a service and allow-list the ones they choose to patronize. AI safety advocates concerned about differential progress by the safest developers may curate lists and undertake informational campaigns aimed at encourage consumers to adopt those lists.

Conclusion

Part of my goal with this proposal is to present an approach that has parameters which could be adjusted to address potential concerns or to align with different views on AI policy. Different perspectives might lead people to prefer different settings of these parameters, such as the possibilities discussed above the in the "opt-in vs opt-out" and "block/allow lists" sections above. I believe that having adjustable options which people could debate while still staying within the general framework can help with homing in on a policy the is acceptable to people with a broad range of views. Choosing to allow or disallow use of one's data would be a concrete signal expressing various positions and preferences. Observing these signals would be a useful indication of what the public thinks, and model developers would be incentivized to design training approaches to accommodate the preferences of data owners. Because the way people use these signals could change over time, I think this policy has a useful adaptive component. As models, AI policy, and the culture surrounding AI change, so too could the way people use these tools. Hopefully this would allow for continuous incremental improvements, even in the face of unexpected model capabilities.

In this way, the public be able to express views on AI and to have a degree of input into AI policy. It's easy for commentators such as myself to speculate about people's views on AI, but implementation of this proposal would allow people to vote with their feet on the issue of how data should or shouldn't be used for model training. This would also allow developers or advocates who have strong views about what the public would prefer to put their policy where their mouth is. Instead of merely claiming that certain types of models would be preferred by the public, we could allow the public to decide. Hopefully this would mean that even people with diverging views on AI could agree on a common set of parameters, because their differences of opinion stem from differing views of how the public will react. Implementation of this policy would in a sense function as a bet on the accuracy of those beliefs, with the outcome decided by people adopting the types of models they prefer. This isn't a perfect mechanism for assessing public opinion. For example, one would expect big differences between an opt-in vs opt-out setup just due to the power of defaults, but I think it would still be a powerful tool for understanding how people actually feel about the powerful ML models.

For example, see the cases listed here. ↩︎