top of page

RedPajama Project and Microsoft's Athena: Open-Source Foundation Models and Custom AI Chips for LLMs

Credit image: Together

The RedPajama project, a collaboration between Together,, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute, aims to create fully open-source foundation models. It has completed its first step by reproducing the LLaMA training dataset, consisting of over 1.2 trillion tokens. RedPajama seeks to bridge the quality gap between open and closed models, removing limitations on research, customization, and usage with sensitive data. Meanwhile, Microsoft has been developing a custom AI chip, code-named Athena, to address in-house LLM training and machine learning inference needs.

The RedPajama project has three key components: pre-training data, base models, and instruction tuning data and models. With the first component completed, the project has released a fully open, 1.2 trillion token dataset created by following the LLaMA paper's recipe. The RedPajama-Data-1T dataset, available through Hugging Face, includes seven data slices: CommonCrawl, C4, GitHub, arXiv, Books, Wikipedia, and StackExchange.

In collaboration with the Meerkat project, a Meerkat dashboard and embeddings for exploring the GitHub subset of the corpus are being released. The next steps for RedPajama include training a strong base model and using OpenChatKit's high-quality natural user instructions to release instruction-tuned versions of the models. Full article

Microsoft's Athena, similar to Google's TPU and Amazon's Trainium and Inferentia processor architectures, focuses on LLM training and machine learning inference. As generative AI models outpace compute capabilities, custom AI accelerator strategies like Athena can help companies reduce training time and costs while delivering disruptive economies of scale. Athena also aims to address customers' inference needs with customized silicon.

Although not seen as a significant threat to Nvidia's AI/ML dominance, Microsoft's Athena exemplifies the trend of hyperscalers developing their own silicon to compete with Nvidia and Intel in general-purpose cloud compute. The last phase of Moore's law is expected to be driven by heterogeneous acceleration, comprising GPUs and application-specific custom chips, with implications for technology providers that have yet to engage in the rapidly evolving AI market. Full article

bottom of page