BigScience Ethical Charter

Preamble

Introduction

The development and applications of research in NLP are advancing rapidly, with direct real-world consequences. As a result, possible societal benefits exist, but related risks also increase considerably. Aware of these potential challenges, BigScience drafted an ethical charter formalizing its core values and how they are articulated.

Scope

The scope of this ethical charter is threefold:

To establish the core values of BigScience in order to allow its contributors to commit to them, both individually and collectively.
To serve as a pivot for drafting BigScience documents intended to frame specific issues ethically and legally.
To enable Big Science to promote values within the research community through scientific publication, dissemination, and popularization.

People concerned

The members of BigScience hold the values stated in this ethical charter. As ethical guidelines, they apply to any activities and documents governing a specific aspect of the project.

Limitations of this ethical charter

Given the breadth of the scope of BigScience and thriving to seek progress in NLP research, we recognize that not all scientific research will have a positive impact on society. It is difficult to predict all the uses the scientific community will make of our artifacts. Therefore, we defer to our license and model card for further information.

Relevance over time

We interpret ethics as an ongoing process, not a time-fixed code with universal validity. For these reasons, when needed, BigScience will review, update and adapt the ethical charter from time to time.

Legitimacy

The elaboration of this ethical charter results from a bottom-up collaboration that tried to collect all the different thoughts and opinions of BigScience participants. Then, experts in applied ethics and law did a final revision. We aim for consensus: if any BigScience member individually does not feel aligned with one or more of the values inscribed in this ethical charter, the member will have the right to object at appropriate times and places to that end.

Ethical approach

We assume the basis of value pluralism within our community, and we cherish it. That is why the ethical notion of harmony (和) in Confucian moral theory seemed to be the appropriate approach for such an international and interdisciplinary scientific community as BigScience. “Harmony is by its very nature relational. It presupposes the coexistence of multiple parties; […] harmony is always contextual; epistemologically it calls for a holistic approach.[1]”

Ethical compliance

We distinguish two levels of ethical compliance operating within the charter: individual and collective. We are held accountable for ethical compliance both as individual BigScience contributors and as a collective research entity.

BigScience Values

We apply the distinction between intrinsic and extrinsic values in the structure of this ethical charter. The former refers to “what is valuable for its own sake, in itself […], as an end[2]”; the latter is characterized as “what is valuable as a means, or for something else’s work[3]”. We distinguish between intrinsic and extrinsic values because the latter can vary more efficiently to achieve the former goals: the latter are substitutable. This structure will help the reader understand how the two types of values combine and allow the BigScience community to adapt this ethical charter over time.

Intrinsic Values

Inclusivity

We work to ensure welcomeness in the process and equal access to the BigScience artifacts without any form of discrimination (e.g., religion, ethnicity, sexual orientation, gender, political orientation, age, ability). We believe that “inclusivity” is not just non-discrimination, but also a sense of belonging.

Diversity

The BigScience community has over 900 researchers and communities (see some listed collaborations here) from 50 countries covering over 20 languages. The collaborators bring together their expertise from various sources of knowledge, scientific fields, and institutional contexts (academia, industry, research institutions, etc).

Reproducibility

The BigScience project was born with the clear intention of being a research initiative devoted to open science. BigScience aims at ensuring the reproduction of the research experiments and scientific conclusions developed under its aegis.

Openness

Openness takes two dimensions, one focused on the process, and the other focused on its result. BigScience aims to be an open science framework whereby NLP, and broadly, AI-related researchers from all over the world can contribute and join the initiative. With regards to the results of our research, such as the future Large Language Model, these are created by the research community to the research community, and therefore will be released on an open basis, taking into account the risks derived from the use of the model.

Responsibility

Each contributor has both an individual and a collective responsibility for their work within the BigScience project. This responsibility is both social and environmental. BigScience intends to positively impact stakeholders through its artifacts regarding the former. Concerning the latter, BigScience is committed to developing tools to monitor and lower its artifacts’ carbon footprint and energy consumption. Moreover, other tools such as an open legal playbook for NLP researchers guiding them regarding the use and respect of IP and privacy rights also seek to promote responsibility around the scientific community.

Extrinsic Values

Accessibility

As a means to achieve openness. BigScience puts in its best efforts to make our research and technological outputs easily interpretable and explained to the wider public, outside the scientific community, especially to communities that have participated in data sharing. Currently instrumentalized in:

no-code tools for exploring the catalog, trained models, etc.
translating our calls for participation (in the data sourcing group)
journalism (articles published on the project)
linked to multidisciplinarity - legal hackathon as a step toward “non-technical” presentation

Transparency

As a means to achieve reproducibility. BigScience work is actively promoted at various conferences, webinars, academic research, and scientific popularization so others can see our work. We have set up a management framework to oversee the use of BigScience models, datasets, and tools, e.g. through working groups. All BigScience internal meetings and work progress are publicly shared within the Community, e.g. through public episodes. We are committed to building tools to interpret, monitor, explain, and make intelligible the artifacts developed by BigScience.

Interdisciplinarity

As a means to achieve inclusivity. We are constantly building bridges among computer science, linguistics, law, sociology, philosophy, and other relevant disciplines in order to adopt a holistic approach in developing BigScience artifacts.

Multilingualism

As a means to achieve diversity. By having a system that is multilingual from its conception, with the immediate goal of covering the 20 most spoken languages in the world and a broad reach to include up to hundreds based on collaborations with native speakers, we aim to reduce existing disparities in language and foster a more equitable distribution of the benefits of our artifacts.

___________________________

[1] Chenyang Li, “The Confucian Ideal of Harmony”, in Philosophy East and West, vol. 56, no. 4, 2006, p. 589.

[2] Chris Heathwood, “Monism and pluralism about value”, in The Oxford Handbook of Value Theory, Iwao Hires and Jonas Olson (ed.), Oxford University Press, Oxford, 2015, p. 29.

[3] Ibid.