A one-year long research workshop
on large multilingual models and datasets

Update: Big Science model training has launched! 🚀

You can follow its progress here and learn more by reading our blog post.

The acceleration in Artificial Intelligence will have a fundamental impact on society. A considerable part of this effort stems from training larger models on larger datasets.
The resources for this endeavour are found mainly in the hands of big technology giants. The stranglehold on this transformative technology poses problems, from a research advancement, environmental, ethical and societal perspective.
The BigScience project takes inspiration from scientific creation schemes such as CERN and the LHC, in which open scientific collaborations facilitate the creation of large-scale artefacts that are useful for the entire research community.


During one-year, from May 2021 to May 2022, 900 researchers from 60 countries and more than 250 institutions are creating together a very large multilingual neural network language model and a very large multilingual text dataset on the 28 petaflops Jean Zay (IDRIS) supercomputer located near Paris, France.
During the workshop, the participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
All the knowledge and information gathered during the workshop is openly accessible and can be explored on our Notion.

Coming events

BigScience is organizing the ACL 2022 Workshop "Challenges & Perspectives in Creating Large Language Models" in May 2022. This event will also serve as the closing session of this one year-long initiative aimed at developing a multilingual large language model.

More information and the program can be found here.

Who is organizing BigScience

BigScience is not a consortium nor an officially incorporated entity. It's an open collaboration boot-strapped by HuggingFace, GENCI and IDRIS, and organised as a research workshop. This research workshop gathers academic, industrial and independent researchers from many affiliations and whose research interests span many fields of research across AI, NLP, social sciences, legal, ethics and public policy.

While there is no formal relationship between any of the affiliation entities of the participants to the workshop and working group, the BigScience initiative is thankful for the freedom to participate to the workshop that the academic and industrial institutions behind all the participants have been providing. In particular, we would like to acknowledge and thank the support provided by:

