Frequently Asked Questions#
What are we licensing? The license covers the BigScience BLOOM models, any checkpoints released during the training, and any source code, scripts and/or documentation necessary to define, run, load, benchmark and evaluate the LLM (what we call "Complementary Material").
Is this an open source license? This is not an open source license according to the
Open Source Initiative definition, because it has some restrictions on the use of the model. That said, it does not impose any restrictions on reuse, distribution, commercialization, adaptation as long as the model is not being applied towards use-cases that have been restricted.
Can a BigScience BLOOM model be combined with an open source project? Combining the model with an existing open source project would be considered creating a Derivative of the model as per the license. Thus, the RAIL license's provision governing the use-based restrictions will have to be an enforceable component of subsequent licensing - or any other legal agreement - conditions when re-licensing the model or a derivative of the model. In effect, this means that you will not be able to re-license a BigScience BLOOM model strictly under existing open-source licenses - as defined by the
Open Source Initiative. However, please consult a lawyer to ascertain the best option for licensing your work that is based on the BigScience BLOOM models.
Why should BigScience decide what is appropriate or not regarding the use of the model? As creators of the model, we believe we have some responsibility to think about how our work is used. We believe we should do as much as we can to prevent possible harms from our work, especially if there are possible use cases that are incompatible or inappropriate with model performance as well as with the Ethical Charter adopted by the BigScience community.
Do "Use-based restrictions" apply to all the licensed artifacts? No, use-based restrictions only apply to the use of the BigScience BLOOM models, including downstream use of the weights, fine-tuning and task adaptation, but not to the rest of the material, including the source code. This is because the source code already exists under open source terms, and restraining its use would be both inefficient and incongruent, especially since someone could easily circumvent the use-based restrictions by getting the source code from an alternate source.
What about subsequent versions of the Model, do use-based restrictions apply to them? Yes. The RAIL license has been designed to be applicable for downstream licensing terms of any derivative versions of any of the BigScience BLOOM models offered and/or released by a downstream user. In other words, the analogy could be made to the so-called "copyleft" clause of the GPL-family licenses, meaning that the use of any Derivatives of the Model (as defined in the license) should be governed by the same use-based restrictions.
Can you give me an example? Imagine a company wants to use a BigScience BLOOM model in order to develop a version for a commercial chatbot. The company accesses the model, modifies it, and finetunes it to be the technical backbone of the chatbot app. Firstly, these actions will be governed by the RAIL license. Secondly and worth to note, according to the terms defined in our RAIL License this is considered a Derivative of the Model. Thus, the use of the chatbot will be governed by the use-based restrictions defined in the RAIL license, and accordingly, when commercializing the new version of the Model by means of a commercial license (or any other type of legal agreement), the latter will have to integrate these use-based restrictions as part of the subsequent license.
Does the license cover every harmful use case? No. We recognize that the list of use-based restrictions do not conceivably represent “everything” one could possibly do with our work. We focus on use cases which could be feasible for the model at this time. This license is a start by us at exploring how such RAIL licenses could be used to mitigate harm and we hope that these first set of provisions can evolve into more comprehensive provisions over time with community engagement.
I have questions on whether my use of the BigScience BLOOM model fits into one of the use restrictions: Imagine you are using a BigScience BLOOM model in your research project and you plan to publish your results - e.g. a modified version of the LLM and its related scientific discussion in an academic paper or a blog. The use “as is” is not prohibited if there is no purposed harm - e.g. use limited to publishing results in an academic research paper or sharing results in a blog. Thus, this should be fine, as long as it is not used to enable the applications that could violate the use restrictions. Further, note that the license requires that you, as the user of the LLM, have to include the use-based restrictions as provisions in any license (or similar legal agreement) that you adopt for hosting, sharing or releasing your work based on a BigScience BLOOM model or its checkpoints. Nonetheless, we are conscious that the concept of "harm" is not as straightforward, even more so from a legal perspective. Consequently, we have drafted our use case restrictions informed by the opinion of technical experts, experimental and empirical results on AI fairness evaluation, and ongoing legislative proposals, such as the AI Act, and more precisely articles 5, 6 and Annex III. If you are confused or unsure, you can always reach out to the BigScience community (email below), we will be happy to help you out.
What if I want to use it for a use case that should no longer be restricted because I have fixed a problem or a limitation of the model? Please contact BigScience (emails below) to review the use case and the changes made by you. The BigScience BLOOM model at sight will need to be relicensed to you separately to permit that use case, if approved.
Is it possible for the licensor to remotely restrict the use of the model? If so, what does it mean? The model by itself does not have any built-in mechanism for it to be restricted. However, if the model is hosted via an API, restricting access remotely can be possible as the API access key can be revoked.
We hope the BigScience RAIL License can help stimulate further ideas over how general-purpose LLMs might best be licensed to discourage misuse. This license is not perfect and we hope that the AI and open source communities will give us feedback to improve it. Sharing and collaborative development have been central to the rapid progress in the field of AI, this must continue and we hope a RAIL license such as this one can help create a balance between equal access to science and responsible use.
License authorship: Carlos Muñoz Ferrandis, Danish Contractor, Huu Nguyen, David Lansky
License Acknowledgments: Somaieh Nikpoor, Aaron Gokaslan, Margaret Mitchell, Yacine Jernite, Imane Bello, Giada Pistilli, Suzana IIić, Thomas Wolf, Stella Biderman, Victor Sahn, Matthias Gallé, Anna Rogers, Maraim Elbadri, Kenneth Heafield.
Blog Acknowledgments: Francesca Rossi, Margaret Mitchell, Yacine Jernite, Suzana IIić, Daniel McDuff
License contact:
Carlos Muñoz Ferrandis (carlosmunozferrandis@gmail.com)
Danish Contractor (danishcontractor@outlook.com)