🌸Introducing The World’s Largest Open Multilingual Language Model: BLOOM🌸

July 12, 2022 – We are releasing the 176B parameters multilingual BLOOM model in full open access

Legal Playbook For Natural Language Processing Researchers

June 21, 2022 – Data gathering, governance, and disposition of an AI model as a public resource for multiple jurisdictions [PDF]

BigScience Ethical Charter

June 9, 2022 – Formalizing BigScience core values

Masader: Metadata annotations for more than 200 Arabic NLP datasets

June 9, 2022 – Collecting and annotating more than 200 Arabic NLP datasets

The BigScience RAIL License

May 20, 2022 – Developing a Responsible AI License ("RAIL") for the use the BigScience LLM

Le projet BigScience lance l’entraînement de son modèle multilingue

March 15, 2022 – Lancement de l’entrainement du modèle multilingue de BigScience

BigScience Model Training Launched

March 15, 2022 – Kicking off the BigScience Large Language Model training

Which hardware do you need to train a 176B parameters model?

March 15, 2022 – Training a massive-scale language model

Building a TB Scale Multilingual Dataset for Language Modeling

March 15, 2022 – Developing a 350 billion token (1.5 TB of text data) multilingual dataset

What Language Model to Train if You Have One Million GPU Hours?

March 14, 2022 – Deciding on the final model size, shape, and pretraining duration

The Tale of T0

December 20, 2021 – Using T0 for cooking recommendation and answering world knowledge.