MosaicML Unveils MPT-30B: A Game-Changer in Open-Source Foundation Models

Image credit: MosaicML
Overview
MosaicML has launched MPT-30B, a new open-source model under their Foundation Series, and it's setting new standards in the realm of machine learning. The model is an improvement on its predecessor, MPT-7B, which has seen over 3 million downloads since its release.
MPT-30B was designed to outperform original models such as GPT-3. Accompanied by two fine-tuned variants, MPT-30B-Instruct and MPT-30B-Chat, it excels at single-turn instruction following and multi-turn conversations. Notable features include an 8k token context window, support for longer contexts via ALiBi, and efficient performance using FlashAttention.
Unique to MPT-30B, it was trained on NVIDIA H100s, making it one of the first language models to utilize this technology. Moreover, its size is designed for easy deployment on a single GPU.
MosaicML's platform offers ways to customize and deploy MPT-30B, ensuring data privacy and owner rights to final model weights. They also provide an optimized inference stack for custom MPT-30B models and offer pricing per-GPU-minute.
Key Takeaways:
MosaicML introduces MPT-30B, a more advanced model in the Foundation Series, improving upon MPT-7B.
MPT-30B outperforms GPT-3, with new capabilities for single-turn instruction following and multi-turn conversations.
Unique features include an 8k token context window, ALiBi support for longer contexts, and efficient performance with FlashAttention.
MPT-30B is the first language model trained on NVIDIA H100s.
MosaicML's platform allows customization and deployment of MPT-30B, ensuring data privacy and ownership rights. Source