A baseline for regulation of ML models

A baseline for regulation of ML models
generated with https://www.craiyon.com/

Machine learning researchers and engineers love baselines. Baselines serve as an important starting point to make improvements, and the ability to check new ideas against baselines helps measure and incentivize progress. For hard tasks that are likely to require new and innovative ideas, a cool thing about baselines is they don't necessarily have to be good enough to use in a practical scenario, because they can help drive innovation even if the baseline is far away from every being useful in practice.

Taking inspiration from the idea of baselines in technical machine learning research, I want to suggest a different kind of baseline. In this post, I'm going to propose a baseline for public policy aimed at regulating machine learning models. To be clear, I'm not saying the policy I'm going to propose should actually be implemented. For one, I doubt it is specific enough. For two, I'm sure it has flaws that would need to be patched before it could work. But I hope that proposing it can generate increased understanding of what workable and effective regulation around ML models could look like. I think there are a lot of reasons why people may view regulation of ML systems as generally good or generally bad, but the more important question in my view is not about regulation in general, but rather if there is some specific regulation that could be good. If the best ideas for regulation turn out to be good ideas, then implementing would be worthwhile even if it takes a ton of bad ideas to figure out the good ones. Perhaps an imperfect but specific idea can be a step on the way to figuring out what good ideas for regulation would look like.

Why would we want to regulate ML models?

Before I present my baseline, its probably worth saying why I think regulation of ML models is worth considering. First, what do I mean by "regulation"? I mean laws that restrict when and how machine learning models can be created (usually "trained") and used ("deployed"). I think regulation has particular potential for large models that require a lot of computing resources (and thus a lot of money). Training and deploying these models is something that is probably most likely to go on at large companies, especially tech companies. A lot of these are in the United States, where I also happen to be, so for that reason I'm often thinking of "regulation" as being federal laws or executive actions[1] by the US federal government. China also happens to have a lot of large and well-funded tech companies that are trying to train large and powerful ML models. I think the role of China is extremely important for addressing these issues, but I don't know all that much about the political, corporate, or scientific situations in China, so it's harder for me to come up with reasonable ideas relating to that part of the equation. I'll comment on the issue of international cooperation a bit later, but it's definitely an area of uncertainty for me. So, when I say "regulation", I'm imagining laws or executive actions from the US federal government aimed at restricting how large (usually tech) companies go about training and deploying large (requiring lots of compute) machine learning models.

Why am I interesting in the possibility of regulation? I'm concerned about the risk presented by ML models as they get progressively larger and therefore more powerful. A lot of my concern is that we don't really have a good grasp of how these models work, and that problem will be exacerbated as they get larger and more complex. Large companies with lots of resources seem to be rushing to realize the economic potential of these models, and I think this dynamic creates an incentive for companies that want to beat out competition to rush into building and deploying models as fast as possible, without fully assessing or mitigating risk. Regulation could encourage companies to pump the breaks on this process and therefore create space to address risks before models actually get trained and/or deployed. Once we have a better handle on how these models work and what is required for them to be safe, regulation could also help ensure that people follow those best practices.

The Baseline

Here is my proposed baseline:

Scope

These requirements would apply to language models[2] that use 1.0e+26[3] floating-point operations (FLOPs) during training.

Pre-training requirements

Prior to initiating training of the model, developers must seek review from an independent authority to obtain approval to begin training the model. This review must include an assessment of the approach for training and evaluating the model, access controls for the training data, model artifacts[4], and any APIs, the intended use of the model, and any potential safety issues. The reviewing authority may establish requirements for changes to the proposed process (e.g. require certain monitoring) that must be completed prior to training or adhered to during training.

Training requirements

During training, the reviewing authority will have the ability to audit the training process for compliance with any requirements.

Post-training requirements

The model developers will prepare a report presenting metrics that establish the accuracy and safety of the model after training, and provide this report to the reviewing authority.

The developers will also establish and make available to the reviewing authority a model evaluation environment, which provides the reviewing authority access to the model to evaluate for safety and accuracy. This will include the ability to run the model for inference on new/custom data developed by the reviewing authority, as well as the ability to inspect internal properties of the model (such as by inspecting parameters, gradients or performing other interpretability analysis).

Deployment requirements

Based on the post-training assessment, the reviewing authority may either approve or deny any proposed use or deployment of the model. The reviewing authority may allow use only under certain conditions or criteria, and may require implementation of requirements such as safety protocols or ongoing monitoring prior to deployment.

Independent review

Prior sections reference a "reviewing authority". This must be an institution or organization that is unaffiliated and has no business or financial relationship with the model developers and that has experience and technical expertise in the evaluation of large language models[5].

Costs

The model developers will pay reasonable costs incurred as part of this process (such as to facilitate the review, creation of the evaluation environment, compute used during the evaluation).

Compliance

Failure to comply will be punishable by fines to the model developers[6].

Considerations

I'm sure there are many problems with this proposal, but I hope that it can be a useful jumping off point for thinking about how regulation of ML models might help to ensuring that those models are safe and beneficial. To assist with that, I'll present some of the thought process behind my proposal.

Scope limitations

I think starting out with a more limited scope may make it easier to come up with good regulation as well as make those regulations more politically viable (discussed more below). The proposal above would restrict the scope to only language models of a certain size. Language models are definitely not the only type of model I'm worried about, but they are a specific type of model that seems to be undergoing a lot of rapid progress and that are already coming under scrutiny for various reasons. Criticism of these models in existing mainstream machine learning research makes them attractive as initial targets for regulation.

I also propose limiting the scope to models above a certain threshold of training compute. I don't think models that use large amounts of compute are the only models that have safety concerns, but I do think they are definitely more likely to be dangerous. I also think they are easier to target in some ways because a limited set of actors can realistically hope to build very expensive models, and the fact that they are going to be expensive regardless of regulation means that requiring developers to take certain costly precautions could be more politically acceptable.

As for the actual threshold itself, I picked it based on eyeballing table 3 from this paper, with the idea of aiming for models that use more compute than existing ones but not too much more. It's probably not the ideal threshold, but this is the type of thing I think having a baseline will help with. If some issues with the proposal could be solved by simply tweaking this value, I think that is a helpful thing to know.

Independent review requirement

In order to really get a good understanding of the safety concerns presented by a model, I think we need to separate training and evaluation. The developer of a model is going to have an incentive to make the model look good. I also think very complex models are likely to have flaws that are hard to uncover, so to really discover those flaws there needs to be an incentive in the opposite direction. Some entity needs to be encouraged to discover any flaws that exist even if it takes a lot of effort and resources. I think they only way to achieve that is to have some type of independent evaluation.

In terms of what type of institution should serve this role, I'm not sure. It could be an executive agency, or an agency could certify non-profits and then non-profits could do the actual reviews, or maybe it should be a federal court. This is one of the details that I think would need some more expertise from people knowledgeable about regulatory issues.

Pre-approval requirement

I propose requiring developers to seek approval even prior to training models. From a safety perspective, the ideal outcome is that unsafe systems are never deployed, rather than causing harm and then trying to control the damage. Requiring evaluations prior to certain critical steps helps push in a proactive rather than a reactive direction. I think this will also help discourage developers from rushing to deploy models with the hope that they can then leverage status quo bias to get around safety issues. This requirement would help establish a norm against that.

Access to model internals

Simply having the ability to run a model on various inputs may not be enough to assess its safety. Examining internal aspects of a model or running analysis that requires deeper access may be required.

Facilitating this access might create some additional risks during evaluation (e.g. copies of a model leaking and becoming publicly available). I think the ideal situation is that model evaluation occurs within a developer's internal systems, which is what I have in mind when I refer to a "model evaluation environment".

Political viability

Would such a regulation even be viable politically? I Think it would certainly require a substantial amount of advocacy to get such a regulation through the political process. However, I do think there are certain factors that suggest it might be possible to build a constituency for regulation like this:

  • Existing salient issues: The regulation of tech companies ("big tech") for various reasons has become a reasonably salient issue in US politics, including worries about how these companies are using or might use ML systems. The concerns that currently have traction in US political discourse are different from the ones I'm focused on, but I think the fact that related issues are at least of interest to a broader constituency suggests that there might be space for discussions of safety issues related to ML systems.

  • Bi-partisan appeal: Interest in regulation of "big tech" is an idea that has seen some interest from both major US political parties. My read on that situation is that with regard to "big tech" in general, the reasons for wanting regulation are very opposed between the parties, which would make compromise difficult. On the other hand, regulation that is about "safety" specifically could resonate with people of both parties. The "safety" framing might avoid the appearance of alignment with one party or set of values, while still addressing shared concerns.

  • Future technology: The proposed regulation would be directly costly to tech companies, and indirectly costly to everyone else to the extent that beneficial innovations are delayed. In my view, its easier politically to make the case that the indirect costs are outweighed by the benefits of increased safety when the technology that is being considered is in the future. Same thing when the direct costs are being born by institutions that are perceived as powerful and able to sustain the cost (related to the first bullet in this list). The downside being that the risks are likely to be viewed as highly speculative.

So, I think a major obstacle to political viability is that risks from ML systems aren't widely known or considered important, and are likely to be viewed as speculative. On the other hand, I don't think people are likely to have strong existing opposition to the idea for partisan reasons, and might actually be somewhat amenable to the idea of regulation as a result of existing sentiment towards large tech companies. I think this suggests that communicating the possible risks might be an avenue to generate political support for regulation like what I'm proposing.

International considerations

So, let's say by some miracle we overcome all the obstacles above. We have regulation like I have proposed in the United States, and it somehow works. What about other countries, especially China? I consider this the biggest challenge to regulation, and I don't really have any good ideas about how to address it. That said, I have two thoughts, one a reason why we might need to address it, and the second a reason why we might be able to.

A common worry within the AI safety community is the possibility of a race to achieve ever more powerful ML models. If various actors are rushing to beat each other to obtain models with certain capabilities they may end up cutting corners on safety. If the United States and China both justify policy that emphasizes speed of technological progress in machine learning over safety, there is a risk that corners get cut, and the fact that both sides are racing creates a dangerous spiral where everyone justifies the race by the fact that the other party seems to be racing. Avoiding a race requires that one side at some point be willing to take at least small steps to ease off the gas. The regulation I propose might not be a "small" step, but hopefully having some type of step in mind can help generate ideas about how a more cooperative dynamic could be achieved. Any potential regulation needs to consider the difficulties of international cooperation, but I would like to avoid different countries racing to beat each other in an ML arms race to the extent possible.

So, why might this be a reasonable possibility, despite the difficulties of international cooperation? Let's start with why the US and Chinese governments seem unlikely to be interested in regulations like the one I propose. I would guess it's mostly because they aren't convinced that there is really all that big of a safety issue[7]. If there was more widespread concern about AI safety, the US, China, and other countries that want to develop advanced ML systems might actually be more hesitant. I previously referenced the idea that our current understanding of ML systems is akin to alchemy. I don't think most government officials in any country think of ML systems in those terms. But if they did, I think that has the potential to change the dynamic and perhaps open the door for cooperation. If some powerful technology is as likely to blow up in your own face as help you against your adversaries, then it might be more tempting to try to come to an agreement to avoid using that technology. Convincing people that advanced ML systems pose more of a risk than is widely believed is still a tall order, but it's something that I think is possible and worth pursuing.


  1. I'm leaving out judicial actions for purposes of this post because I think they have unique challenges. I'm also not a lawyer or otherwise an expert in technology law. A judicial approach might have some interesting advantages, but I think what I'm proposing here would be better suited to a legislative or executive mechanism. ↩︎

  2. A specific type of machine learning model that has seen rapid progress. One of the most well known language models is GPT-3. ↩︎

  3. Where this is in scientific notation. ↩︎

  4. Such as files with the learned parameters of the model. ↩︎

  5. I'm not an expert in how to design regulatory institutions, so I don't know the ideal setup for this. So, in the spirit of creating something to iterate on, I'm just throwing something in here. ↩︎

  6. Again, I'm not an expert in how to design regulation, how exactly to ensure compliance is a big question for me. So, in the spirit of creating something to iterate on, I'm just throwing something in here also. ↩︎

  7. Except perhaps from misuse by other actors, but that is pushing towards racing and against cooperation. ↩︎