Blog and News

Reflecting on a Year of AI Innovation at Bixal

April 16, 2024

By Jeff Fortune, Senior Director of Artificial Intelligence and Data Engineering

Introduction

2023 was a pivotal time for the development of artificial intelligence (AI) and large language models (LLM). Throughout the year, we witnessed unprecedented breakthroughs and the integration of AI into various sectors, fundamentally changing how we work, learn, and interact with our devices. From significant advancements in natural language processing to engaging in ethical debates and implementing regulations, 2023 established a new standard for the future of technology. It has become increasingly evident that AI is now a fundamental aspect of the human experience.

One notable milestone of the year was the release of the app ChatGPT, a conversational model that embodies the impact of these advancements, showcasing practical and powerful applications across different industries. These latest innovations not only build upon previous achievements, but also collectively redefine the standards for language understanding and generation in AI.

What Bixal Has Learned and Developed

In early 2023, Bixal’s technology team embarked on the development of our first transformer model that had the same capabilities as GPT-2’s open-source foundational model, drawing inspiration from Google Research's 2017 research paper “Attention is All you Need.” To create this model, we utilized a diverse dataset consisting of public government documents and integrated segments from Common Crawl. Our goal was to build a linguistic model capable of generating answers based on the questions provided to the LLM.

The release of Llama 2 by Meta and Mistral 7B by Mistral AI transformed our internal capabilities with LLMs. Llama 2 is a responsible, open-source AI leader with built-in safety features to ensure appropriate conversations. A large ecosystem of open-source tooling has been developing around the Llama 2 LLM. Mistral 7B is a powerhouse capable of reasoning and following complex directions due to its impressive 36,000 token context window and model infrastructure. It is perfect for instruction following and retrieval-augmented generation (RAG) tasks. These foundational models provided the open-source community with a solid foundation to build upon. Leveraging their research and models, we fine-tuned the Mistral 7B foundational model and evolved the next iterations of the Bixal LLM. This improved version demonstrated considerably better reasoning abilities and was trained utilizing Direct Preference Optimization (DPO) datasets. DPO is a technique that trains models on examples meticulously ranked according to user preferences. This targeted training approach ensures that the model's outputs not only achieve superior quality but also adhere more precisely to the desired outcomes.

We achieved a significant milestone by successfully launching Bixal Bot V3 LLM on one of the smallest GPU-accelerated servers hosted on Amazon Web Services (AWS). We did this by using quantization of the model's weights. Quantization is a technique used in machine learning that reduces the precision of the weights and activations of models to lower bit-widths, which can lead to smaller model sizes and faster inference times with minimal impact on accuracy. This process is particularly useful for deploying models on devices with limited resources, like small GPU-accelerated servers, as it allows for efficient computation and reduced memory footprint. Our custom-built C++ server enabled us to directly handle the encoding and decoding of tokens, optimizing the inference and response process.

The success of our endeavors also spurred the adoption of Llama.cpp, an open-source framework that has become widely used and serves as the foundation for several other projects. Llama.cpp offers efficient utilization of the GPT-Generated Unified Format (GGUF) and GPT-Generated Model Language (GGML) quantization strategy across multiple operating systems, including Windows, MacOS (Intel, M1, M2, M3), and Linux (Debian base). This framework revolutionizes our interaction with LLMs, empowering us to bring these models to our local desktop computers. While they may not match the capabilities of GPT-4 models, they offer comparable functionality to GPT-3.5-turbo in various linguistic tasks and assistant-based activities such as brainstorming, email communication, feedback writing, and summarization. Additionally, when combined with our retrieval augmented generation (RAG) strategy, they prove highly effective for domain-specific tasks.

Future Focus Areas

Fine-Tuning: Bixal's approach to fine-tuning its LLMs with public government datasets is crucial for creating specialized and efficient models. This customization enhances the models' ability to generate language specific to Bixal's needs, improving relevance and quality while also contributing to our capability to meet the requirements of our federal partners. Fine-tuning also allows optimization of pretrained models like Llama-2 and Mistral, saving time and resources while improving performance. We aim to extend our research on direct preference optimization and create synthetic datasets to bridge gaps in training data and achieve results comparable to GPT-4.

Retrieval-Augmented Generation: By enhancing Bixal's data practices and access to high-quality cleaned data, we can build a robust and secure knowledge retrieval mechanism. This approach has already shown transformative results in Bixal demos and our work for USAID, enabling LLMs to provide more accurate, detailed, and contextually appropriate responses. Integrating RAG and maturing our data practices will enable LLMs to pull relevant information from a broader body, ensuring up-to-date and well-informed answers and content generation.

Open-Source Internal Development Tooling: Shifting towards open-source applications and direct tool building will reduce data security risks and budgetary impact, while accelerating innovation and providing AI tools to Bixal team members more rapidly. This approach also enables user testing and valuable feedback to further improve and innovate our LLMs.

Conclusion

From the creation of our first transformer model with GPT-2 capabilities to the fine-tuning and development of our advanced LLMs, we have successfully leveraged research, synthetic datasets, retrieval-augmented generation, open-source development tooling, and the potential of integrating OpenAI application programming interfaces (APIs).

The advancements we have made in our language models have not only enabled us to provide accurate responses but also to bridge gaps in data availability and improve our knowledge retrieval mechanism. As we continue to innovate and refine our models, we are incredibly excited about the prospective advancements and the opportunities they present. By enhancing our models' capabilities further, we aspire to deliver even more robust offerings to our team at Bixal, and thereby empowering our clients with cutting-edge solutions that drive impact and progress.

Content manager, Ewa Beaujon, contributed to the writing of this article.