A one-year long research workshop
on large multilingual models and datasets
The acceleration in Artificial Intelligence will have a fundamental impact on society. A considerable part of this effort stems from training larger models on larger datasets.
The resources for this endeavour are found mainly in the hands of big technology giants. The stranglehold on this transformative technology poses problems, from a research advancement, environmental, ethical and societal perspective.
The BigScience project takes inspiration from scientific creation schemes such as CERN and the LHC, in which open scientific collaborations facilitate the creation of large-scale artefacts that are useful for the entire research community.


During one-year, from May 2021 to May 2022, 600 researchers from 50 countries and more than 250 institutions are creating together a very large multilingual neural network language model and a very large multilingual text dataset on the 28 petaflops Jean Zay (IDRIS) supercomputer located near Paris, France.
During the workshop, the participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
All the knowledge and information gathered during the workshop is openly accessible and can be explored on our Notion.

Coming events

Every 2-3 months, a day of live talks, posters and discussions presenting the current work of the working groups is organised. The next planed event, BigScience Episode #3, will be co-located with NeurIPS as a Social Event.

More information on the live events page.

Who is organizing BigScience

BigScience is not a consortium nor an officially incorporated entity. It's an open collaboration boot-strapped by HuggingFace, GENCI and IDRIS, and organised as a research workshop. This research workshop gathers academic, industrial and independent researchers from many affiliations and whose research interests span many fields of research across AI, NLP, social sciences, legal, ethics and public policy.

While there is no formal relationship between any of the affiliation entities of the participants to the workshop and working group, the BigScience initiative is thankful for the freedom to participate to the workshop that the academic and industrial institutions behind all the participants have been providing. In particular, we would like to acknowledge and thank the support provided by:

Logo of HuggingFaceLogo of CNRSLogo of INRIALogo of NAVERLABS RESEARCHLogo of the Snorkel companyLogo of RECITALLogo of LightOnLogo of Salesforce Research
Logo of University of VirginiaLogo of MILALogo of University of SigillumLogo of University of TurkuLogo of YandexLogo of University of EssexLogo of University of SheffieldLogo of University of Stanford
Logo of Brown UniversityLogo of University of MarylandLogo of Heriot Watt UniversityLogo of ENSIIE Engineering SchoolLogo of LISN Research LabLogo of IBM ResearchLogo of Universtity of DarmstadtLogo of National University of Singapore
Logo of Cornell UniversityLogo of MicrosoftOntocord


Twitter: @BigScienceW
Website home: https://bigscience.huggingface.co
Join the newsletter
Participate in the workshop
email: bigscience-contact [at] googlegroups [dot] com


BigScience is an open science project composed of hundreds of researchers around the world. We are not structured under a centralized legal entity, and while we plan to create a legal entity in the near future for data governance and community purposes, our project is currently simply contributed by independent volunteers.

Our webpage serves as an informative platform where we display materials and links, which are owned, licensed or hosted by entities with whom we have no legal relationship.  Therefore, by accessing or using the materials that we display on our webpage, or clicking on links to other websites, you consent to all of the terms and/or policies associated with these materials and other websites. If you do not agree with any of those, please do not access or use the materials or other websites.