The “BigScience” project started from discussions in early 2021 between HuggingFace (Thomas Wolf), GENCI (Stéphane Requena) and IDRIS (Pierre-François Lavallée), GENCI and IDRIS being behind the French supercomputer Jean Zay. The supercomputer Jean Zay is a French national computing center for the CNRS (“Centre national de la recherche scientifique” aka the French national research organization), with a performance of more than 28 Pflops/s in 2020 which was very recently upgraded as we’ll detail now.
There are several reasons that make collaborative 
     training and sharing like BigScience as well as clusters like Jean Zay a rather 
     interesting direction for studying large language models from an environmental point of view.
The current direction of privately trained models means that similarly sized models are trained 
and kept private in the various big tech companies affording the compute to do so. 
The multiplication of very similar models (Google 137B language model, DeepMind 200B 
GOPHER model, OpenAI 175B GPT3, NVIDIA 500B model, see an updated list 
here) generate a duplication of energy spending with 
little pragmatic logic. It would likely be more interesting to train a single model together 
and share it among the research community instead of training a multiplicity of unshared models. In addition to these considerations, the Jean Zay supercomputer has been in particular 
selected because of its very interesting design choices and location:  
To be able to efficiently scale the model on the cluster GPU NVIDIA and DeepSpeed team were closely involved, in particular: