Hugging Face just open-sourced two libraries 🎉 1. datatrove: process, filter and deduplicate text data at a very large scale. github.com/huggingface/da… 2. nanotron: easy distributed primitives in order to train models efficiently using 3D parallelism github.com/huggingface/na…
0
2
3
487
0