The architecture and key features of Bloomberg's 50 billion parameter financial language model set to transform the financial sector
Image credit: Bloomberg
Bloomberg has introduced a new language model named BloombergGPT, which is a 50 billion parameter language model trained on a mixed dataset of financial and general-purpose data. The goal of this model is to achieve best-in-class results on financial benchmarks while maintaining competitive performance on general NLP benchmarks. The financial domain has specific complexities and terminologies that warrant a domain-specific system, and the authors demonstrate the effectiveness of their approach by outperforming existing models on in-domain financial tasks while maintaining strong performance on general-purpose benchmarks.
The BloombergGPT model is trained on a dataset called "FinPile", which consists of a wide range of English financial documents such as news, filings, press releases, web-scraped financial documents, and social media. FinPile is roughly half domain-specific text and half general-purpose text, with a total of 708 billion tokens. The financial datasets make up 54.2% of the training, while public datasets make up 48.73% of the training. FinPile is time-stamped, with dates ranging from 2007-03-01 to 2022-07-31. The dataset has been de-duplicated to improve data quality.
The model architecture is a decoder-only causal language model based on BLOOM, with 70 layers of transformer decoder blocks and an additional layer normalization after token embeddings. It has a size of 50B parameters and is trained with a standard left-to-right causal language modeling objective using the AdamW optimizer. To train a model with a larger memory footprint than available GPU memory on cloud instances, they rely on stage 3 of ZeRO optimization, SageMaker Model Parallelism, MiCS, Activation Checkpointing, Mixed Precision Training, and Fused Kernels.
During the training of BloombergGPT, the progress of the model was monitored, and decisions were made based on the validation loss. The model was trained for a total of 139,200 steps (53 days) on 80% of the training data. Changes in the learning rate and the addition of dropout were made when validation loss started increasing. The training was ended at step 146,000 due to a lack of progress in the validation loss, and the checkpoint at step 139,200 was selected as the final model based on validation loss and downstream evaluations.
The model is evaluated on finance-specific and general-purpose tasks. The evaluation methodology includes few-shot classification and generation and standardized prompting without any tuning of prompts or other techniques. The heldout loss of BloombergGPT consistently outperformed other models on a dataset containing examples from all sections of FinPile. BloombergGPT outperformed other models on financial tasks such as sentiment analysis on publicly available financial benchmarks, adapting them to a few-shot setting.
The paper provides qualitative examples of the benefits of the BloombergGPT model's domain specialization in the financial industry. The model can be used to generate Bloomberg Query Language (BQL) queries from natural language queries, make suggestions for news headlines, and answer financial questions such as identifying CEOs. The model outperforms other models in identifying CEOs in financial companies. Overall, these examples demonstrate the advantages of a domain-specific language model like BloombergGPT for financial applications.
The paper also discusses the history and development of language models, particularly autoregressive language models using transformer architecture, and the emergence of domain-specific large language models. It also talks about the importance of training data, evaluation, model size, tokenization, and positional embeddings in language model performance. The paper cites various research and examples of domain-specific models such as BioBERT, ClinicalBERT, and Galactica, as well as large corpora of raw text data, including the Colossal Clean Crawled Corpus and The Pile. Lastly, it mentions different evaluation strategies, including automatic evaluation and task-specific evaluations, and the challenges of constructing high-quality training corpora.
The paper concludes by discussing the ethical considerations and limitations of large language models, with a focus on the development of BloombergGPT. The paper explains the rigorous risk and testing assessment process involved in ensuring the accuracy and factual information of natural language applications in the finance industry. Toxicity and bias are also addressed, with the company taking extra care to ensure that the content produced is not harmful. The paper also discusses the debate surrounding the release of large language models, with various strategies presented, including open sharing with licensing restrictions, selective access, and no access. BloombergGPT is not released due to concerns about data leakage attacks and privacy guarantees.
Conclusion The paper presents BloombergGPT, a state-of-the-art LLM for financial NLP that achieves strong results on general LLM benchmarks and outperforms comparable models on financial tasks due to a well-curated internal dataset, a unique choice in tokenizer, and an up-to-date architecture. The paper provides insights into the development of domain-specific LLMs, training data, evaluation, model size, tokenizer, and model building challenges. The paper's training logs will provide a guide for those training their own LLMs. The authors plan to pursue several interesting research directions, including task fine-tuning, exploring the effects of training on less biased language, and understanding how their tokenization strategy affects the resulting model. Lastly, the paper discusses the ethical considerations and limitations of large language models and contributes to the ongoing conversation on the responsible use of AI and NLP technologies.