Credit images : Vicuna
Researchers from UC Berkeley, CMU, Stanford, and UC San Diego have introduced Vicuna-13B, an open-source chatbot that demonstrates high-quality performance, rivaling OpenAI ChatGPT and Google Bard. The chatbot has been fine-tuned on LLaMA using user-shared conversations collected from ShareGPT. The cost of training Vicuna-13B is approximately $300.
According to a preliminary evaluation using GPT-4 as a judge, Vicuna-13B achieves more than 90% quality of OpenAI ChatGPT and Google Bard, outperforming other models like LLaMA and Stanford Alpaca in over 90% of cases. The researchers have made the training and serving code, along with an online demo, available for non-commercial use.
Vicuna-13B is trained using around 70K conversations collected from ShareGPT.com. The training process involved PyTorch FSDP on eight A100 GPUs and took one day to complete. The serving system is designed to handle multiple models with distributed workers and supports plug-ins of GPU workers from both on-premise clusters and the cloud.
The researchers proposed an evaluation framework based on GPT-4 to automate chatbot performance assessment. Using eight question categories, they tested the performance of five chatbots: LLaMA, Alpaca, ChatGPT, Bard, and Vicuna. GPT-4 was then used to rate the quality of their answers. Vicuna outperformed state-of-the-art open-source models in over 90% of the questions and achieved competitive performance against proprietary models.
However, the proposed evaluation framework is not yet rigorous or mature, as large language models may hallucinate. Developing a comprehensive, standardized evaluation system for chatbots remains an open question requiring further research.
Vicuna-13B has some limitations, such as not being good at tasks involving reasoning or mathematics and potential issues with safety or mitigating toxicity or bias. To address safety concerns, the OpenAI moderation API is used to filter out inappropriate user inputs in the online demo. The researchers anticipate that Vicuna will serve as an open starting point for future research to tackle these limitations.