A one-year long research workshop
on large multilingual models and datasets

Update: Introducing The World's Largest Open Multilingual Language Model - BLOOM 🌸

You can find the model here and learn more by reading our blog post.

The acceleration in Artificial Intelligence will have a fundamental impact on society. A considerable part of this effort stems from training larger models on larger datasets.
The resources for this endeavour are found mainly in the hands of big technology giants. The stranglehold on this transformative technology poses problems, from a research advancement, environmental, ethical and societal perspective.
The BigScience project takes inspiration from scientific creation schemes such as CERN and the LHC, in which open scientific collaborations facilitate the creation of large-scale artefacts that are useful for the entire research community.

Summary

During one-year, from May 2021 to May 2022, more than 1,000 researchers from 60 countries and more than 250 institutions are creating together a very large multilingual neural network language model and a very large multilingual text dataset on the 28 petaflops Jean Zay (IDRIS) supercomputer located near Paris, France.
During the workshop, the participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
All the knowledge and information gathered during the workshop is openly accessible and can be explored on our Notion.

Coming events

BigScience is organizing the ACL 2022 Workshop "Challenges & Perspectives in Creating Large Language Models" in May 2022. This event will also serve as the closing session of this one year-long initiative aimed at developing a multilingual large language model.

More information and the program can be found here.

Who is organizing BigScience

BigScience is not a consortium nor an officially incorporated entity. It's an open collaboration boot-strapped by HuggingFace, GENCI and IDRIS, and organised as a research workshop. This research workshop gathers academic, industrial and independent researchers from many affiliations and whose research interests span many fields of research across AI, NLP, social sciences, legal, ethics and public policy.

While there is no formal relationship between any of the affiliation entities of the participants to the workshop and working group, the BigScience initiative is thankful for the freedom to participate to the workshop that the academic and industrial institutions behind all the participants have been providing. In particular, we would like to acknowledge and thank the support provided by:


Logo of HuggingFaceLogo of CNRSLogo of INRIALogo of NAVERLABS RESEARCHLogo of the Snorkel companyLogo of RECITALLogo of LightOnLogo of Salesforce Research
Logo of University of VirginiaLogo of MILALogo of University of SigillumLogo of University of TurkuLogo of University of EssexLogo of University of SheffieldLogo of University of Stanford
Logo of Brown UniversityLogo of University of MarylandLogo of Heriot Watt UniversityLogo of ENSIIE Engineering SchoolLogo of LISN Research LabLogo of IBM ResearchLogo of Universtity of DarmstadtLogo of National University of Singapore
Logo of Cornell UniversityLogo of MicrosoftOntocord

Join/follow

Twitter: @BigScienceW
Website home: https://bigscience.huggingface.co
Join the newsletter
Participate in the workshop
email: bigscience-contact [at] googlegroups [dot] com


DISCLAIMER

BigScience is an open science project composed of hundreds of researchers around the world. We are not structured under a centralized legal entity, and while we plan to create a legal entity in the near future for data governance and community purposes, our project is currently simply contributed by independent volunteers.

Our webpage serves as an informative platform where we display materials and links, which are owned, licensed or hosted by entities with whom we have no legal relationship.  Therefore, by accessing or using the materials that we display on our webpage, or clicking on links to other websites, you consent to all of the terms and/or policies associated with these materials and other websites. If you do not agree with any of those, please do not access or use the materials or other websites.