AutoGen: Enabling Next-Gen Large Language Model Applications

PLUS - Fine-Tune Smaller Transformer Models: Text Classification

DevThink.AI

Essential AI Content for Software Devs, Minus the Hype

Another packed edition of your favorite Generative AI newsletter is here! This week, we've curated some truly engaging content that I think you'll find invaluable. From a hands-on workshop exploring Amazon Bedrock to an introduction to the powerful Retrieval Augmented Generation (RAG) technique, there's something for every software developer looking to stay ahead of the curve. Don't miss our deep dive on fine-tuning smaller transformer models and the latest updates on open-source conversational AI systems. Let's dive in!

In this edition

📖 TUTORIALS & CASE STUDIES

Amazon Bedrock Workshop: Exploring AWS's Powerful Generative AI Service

Read time: 8 minutes

This hands-on Amazon Bedrock workshop introduces developers to leveraging foundation models through AWS's fully managed Bedrock service. The workshop covers key use cases like text generation, knowledge bases, image generation, and agent systems, demonstrating how Bedrock's APIs and SDKs can be used to build powerful AI-powered applications. Developers will also learn about integrations with open-source tools like LangChain and FAISS, making it a valuable resource for staying ahead in the competitive software development landscape.

Simple Wonders of RAG using Ollama, Langchain and ChromaDB

Read time: 12 minutes

This article provides a practical introduction to Retrieval Augmented Generation (RAG), a powerful technique for enhancing language models with external knowledge. The author demonstrates how to use RAG with the Ollama LLM, Langchain framework, and ChromaDB vector store to significantly improve the quality of responses to domain-specific questions. Readers will learn the key concepts behind RAG and see step-by-step examples of implementing it in their own applications.

Fine-Tune Smaller Transformer Models: Text Classification

Read Time: 8 minutes

This article demonstrates how to build a small, efficient text classification model using a pre-trained ALBERT encoder. The author discusses the advantages of smaller models for specific use cases, such as cost-effectiveness and better performance on redundant tasks. The article also covers techniques for leveraging synthetic data generated by large language models to train the model, providing a valuable resource for software developers looking to incorporate generative AI into their applications.

Multi-Document Agentic RAG using Llama-Index and Mistral

Read Time: 8 minutes

This article introduces a new framework called Self-Reflective Retrieval-Augmented Generation (SELF-RAG), which enhances a LLM quality and factuality through retrieval and self-reflection. SELF-RAG adaptively retrieves passages on-demand, generates and reflects on retrieved passages and its own generations using special tokens. This enables the LLM to tailor its behavior to diverse task requirements, outperforming state-of-the-art LLMs and retrieval-augmented models on open-domain QA, reasoning, and fact verification tasks. 

🧰 TOOLS

RAG LLM Ops App for Easy Deployment and Testing

Read time: 5 minutes

This article introduces the talkd/dialog repository, which provides an API-focused solution to simplify the deployment and maintenance of Retrieval Augmented Generation (RAG) language models. The app aims to help developers leverage AI in their applications without requiring extensive server management knowledge. The project includes features like a PostgreSQL database, prompt customization, and integration with the Open-WebUI interface, making it a valuable tool for software engineers exploring generative AI capabilities.

Perplexica: An Open-Source AI-Powered Search Engine

Read time: 10 minutes

Perplexica is an open-source AI-powered search engine that leverages advanced techniques like similarity search and embeddings to provide more relevant and up-to-date results compared to traditional search engines. Key features include local LLM support, specialized focus modes, and a "Copilot Mode" that generates additional queries to enhance the search process. As an alternative to proprietary tools like Perplexity AI, Perplexica offers software developers a privacy-focused, customizable search solution to enhance their research and development workflows.

Training and Finetuning Embedding Models with Sentence Transformers v3

Read time: 11 minutes

This article provides a detailed overview of the latest updates to the Sentence Transformers library, which allows software developers to leverage powerful embedding models for a wide range of applications like semantic search and text similarity. It covers the key components for training and finetuning Sentence Transformer models, including datasets, loss functions, training arguments, evaluators, and the new trainer. With examples for multi-dataset training and leveraging common benchmarks like STSb and AllNLI, the article equips developers with the knowledge to optimize Sentence Transformer models for their specific needs.

Abacus AI Releases Smaug-Llama-3-70B-Instruct: A New Benchmark in Open-Source Conversational AI

Read Time: 7 minutes

Abacus AI has introduced the Smaug-Llama-3-70B-Instruct model, a promising open-source conversational AI system that outperforms existing models like GPT-4 Turbo in maintaining context and delivering coherent responses over extended dialogues. The model leverages advanced techniques and new datasets to achieve superior performance, as demonstrated by its strong scores on benchmarks like MT-Bench and Arena Hard. This advancement represents a significant step forward in building reliable and sophisticated AI-driven communication tools.

AutoGen: Enabling Next-Gen Large Language Model Applications

Read time: 5 minutes

AutoGen is a framework from Microsoft that provides a multi-agent conversation system and enhanced LLM inference APIs. It offers a collection of working systems spanning diverse applications, allowing software developers to build LLM-powered workflows and leverage powerful language models like GPT-3 more easily. AutoGen aims to help developers stay competitive by providing the tools to integrate cutting-edge generative AI capabilities into their applications.

Mistral-finetune: Efficient Fine-Tuning of Mistral's Generative AI Models

Read Time: 8 minutes

Mistral-finetune is a lightweight codebase that enables memory-efficient and performant fine-tuning of Mistral's large language models using the LoRA (Low-Rank Adaptation) training technique. It provides a simple, guided entry point for fine-tuning Mistral models on instruction-following and function-calling datasets, with support for multi-GPU training and Weights & Biases integration. The article covers installation, dataset preparation, training configuration, and inference - key information for software developers looking to leverage state-of-the-art generative AI in their applications.

 

📰 NEWS & EDITORIALS

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

Read time: 11 minutes

This article introduces Llama 3-V, an open-source multimodal model built on top of Llama 3 that outperforms the current SOTA model Llava by 10-20% on various benchmarks. Llama 3-V was trained for under $500 and offers comparable multimodal capabilities to much larger closed-source models like GPT4-V, demonstrating the power of optimized training pipelines and model architectures.

Microsoft, Beihang Release MoRA: An Efficient LLM Fine-Tuning Technique

Read Time: 7 minutes

Researchers have introduced MoRA, a new parameter-efficient fine-tuning (PEFT) technique that outperforms the popular LoRA method for fine-tuning LLMs. Unlike LoRA's low-rank matrices, MoRA uses a square matrix to better learn and memorize new knowledge, making it a valuable tool for enterprise LLM applications that require adding custom capabilities to base models.

Codestral: Hello, World!

Read time: 10 minutes

Codestral is a new 22B-parameter open-source generative AI model designed specifically for code generation tasks. It supports over 80 programming languages, outperforms existing models on benchmarks like HumanEval and RepoBench, and can be accessed through various integrations like VSCode, JetBrains, LlamaIndex, and LangChain. Codestral aims to help developers write, test, and understand code more efficiently, boosting their productivity and reducing coding errors.

GenAI Multi-Agent Systems: A Secret Weapon for Tech Teams

Read time: 9 minutes

This article explores how software developers and product teams are leveraging Generative AI (GenAI) multi-agent systems to enhance development and strategy. By combining AI agents focused on tasks like ideation, design, and testing, these systems can quickly generate tailored product concepts, prototypes, and user feedback - supercharging the innovation process. The article outlines key approaches to building multi-agent systems and provides guidance on setting up the data, prompts, and integration to unlock their full potential.

From Prompt Engineering to Agent Engineering

Read Time: 10 minutes

This article introduces a practical framework for "agent engineering" - designing AI-powered systems that can autonomously perform complex tasks by combining large language models, retrieval-augmented generation, and specialized APIs. The framework guides developers through defining an agent's capabilities, actions, and proficiency requirements, then mapping those to appropriate technologies and techniques. It emphasizes the evolution from simple prompt engineering to building sophisticated multi-agent systems, helping software engineers leverage the latest advancements in generative AI to build more capable and intelligent applications.

What We Learned from a Year of Building with LLMs (Part I)

Read Time: 10 minutes

This article provides practical lessons and best practices for software developers building applications with LLMs. It covers effective prompting techniques, retrieval-augmented generation, workflow optimization, and evaluation strategies. The authors, a diverse team of LLM practitioners, share their hands-on experiences to help developers leverage LLMs more effectively in their projects and stay competitive in the job market.

 

Thanks for reading and we will see you next time

Follow me on twitter, DM me links you would like included in a future newsletters.