The “BigScience” project started from discussions in early 2021 between HuggingFace (Thomas Wolf), GENCI (Stéphane Requena) and IDRIS (Pierre-François Lavallée), GENCI and IDRIS being behind the French supercomputer Jean Zay. The supercomputer Jean Zay is a French national computing center for the CNRS (“Centre national de la recherche scientifique” aka the French national research organization), with a performance of more than 28 Pflops/s in 2020 which was very recently upgraded as we’ll detail now.
There are several reasons that make collaborative
training and sharing like BigScience as well as clusters like Jean Zay a rather
interesting direction for studying large language models from an environmental point of view.
The current direction of privately trained models means that similarly sized models are trained and kept private in the various big tech companies affording the compute to do so. The multiplication of very similar models (Google 137B language model, DeepMind 200B GOPHER model, OpenAI 175B GPT3, NVIDIA 500B model, see an updated list here) generate a duplication of energy spending with little pragmatic logic. It would likely be more interesting to train a single model together and share it among the research community instead of training a multiplicity of unshared models. In addition to these considerations, the Jean Zay supercomputer has been in particular selected because of its very interesting design choices and location:
To be able to efficiently scale the model on the cluster GPU NVIDIA and DeepSpeed team were closely involved, in particular: