Graph RAG - A Conceptual Introduction to Powerful AI Applications

PLUS - SWE-agent: A Tool to Automatically Fix GitHub Issues Using GPT-4

DevThink.AI

Essential AI Content for Software Devs, Minus the Hype

Another week, another batch of must-know insights for the modern software developer! This edition covers some truly compelling topics, including an introduction to the powerful Graph RAG framework, a guide to building an AI-powered CLI, and an exploration of Anthropic's latest model transparency efforts. Plus, we've got the scoop on tools like SWE-agent and pgvectorscale that can supercharge your dev workflows. Dive in and discover the cutting edge of generative AI!

In this edition

📖 TUTORIALS & CASE STUDIES

Graph RAG - A Conceptual Introduction to Powerful AI Applications

Read time: 10 minutes

This article introduces Graph RAG, an advanced Retrieval Augmented Generation (RAG) framework that enhances LLMs by connecting them to structured knowledge graphs. Unlike traditional text embeddings, Graph RAG enables LLMs to answer complex, aggregative questions by leveraging the hierarchical relationships within the knowledge graph. This powerful capability helps developers build unique AI applications that can truly "think around the corner" and provide more insightful responses to user queries.

Creating a RAG Chatbot with Langflow and Astra DB

Read time: 11 minutes

This article demonstrates how to create a Retrieval-Augmented Generation (RAG) chatbot using Langflow, a graphical interface for the Langchain framework. It guides readers through integrating LLMs with vector databases to provide context-rich responses. The chatbot is built using the Titanic dataset, showcasing Langflow's ability to easily customize and deploy LLM-powered applications.

Building an AI-Powered CLI with Golang and Google Gemini

Read time: 8 minutes

This article demonstrates how to build an AI-powered command-line interface (CLI) using Golang and Google's Gemini API. The author guides developers through setting up the Cobra CLI framework, integrating the Gemini API for generating text responses, and making the CLI prompts dynamic. This project shows how software developers can leverage generative AI to enhance their CLI tools and workflow.

Deploying LLMs Into Production Using TensorRT LLM

Read Time: 11 minutes

This article provides a comprehensive guide on accelerating the inference performance of LLMs using NVIDIA's TensorRT-LLM framework. It covers the benefits of TensorRT-LLM, such as paged attention and efficient key-value caching, and walks through the steps to compile and deploy a Mistral 7B Instruct model as a containerized API endpoint on Google Kubernetes Engine. The article includes detailed code examples and performance benchmarks, demonstrating how TensorRT-LLM can significantly boost the scalability of open-source LLMs in production environments.

Understanding the Best Practices and Ideas for LLM-Enabled RAG Systems

Read Time: 12 minutes

This article provides a comprehensive overview of best practices for building effective RAG systems, which leverage LLMs to search a knowledge base and generate informed responses. It covers key stages like query classification, chunking, indexing, retrieval, reranking, repacking, and fine-tuning, offering insights backed by research experiments. The author synthesizes these findings to propose two distinct RAG system recipes, helping software developers leverage this powerful AI framework.

🧰 TOOLS

Claude's API now supports CORS requests, enabling client-side applications

Read time: 8 minutes

This article discusses Anthropic's new feature that allows developers to call the Claude LLMs directly from a user's browser using a new HTTP header. This opens up new possibilities for client-side applications, like the author's Haiku app that generates haikus from webcam photos. While this approach has security risks, the author demonstrates how to implement a "bring your own API key" pattern to enable legitimate use cases for this feature.

SWE-agent: A Tool to Automatically Fix GitHub Issues Using GPT-4

Read Time: 7 minutes

SWE-agent is a novel tool that leverages GPT-4 and other large language models to automatically fix bugs and issues in real GitHub repositories. SWE-agent can resolve 12.47% of issues in the SWE-bench evaluation set, demonstrating state-of-the-art performance. The tool uses a custom "Agent-Computer Interface" to help language models browse, view, edit, and execute code efficiently. Software developers can use SWE-agent through a web interface or command line, and the project is open-source and maintained by researchers at Princeton University.

Supercharging Vector Search with pgvectorscale

Read Time: 8 minutes

pgvectorscale is a new vector search solution that builds on the open-source pgvector extension for PostgreSQL. It offers significant performance and cost advantages over competing cloud-hosted vector search services, achieving up to 28x lower latency and 16x higher throughput for AI-powered applications. Featuring a custom DiskANN-inspired index type and advanced compression techniques, pgvectorscale enables developers to leverage powerful vector search capabilities directly within their PostgreSQL databases.

Helicone: A Powerful Platform for Generative AI Developers

Read Time: 5 minutes

Helicone is a comprehensive platform that empowers software developers to leverage generative AI in their applications. Featuring sub-millisecond latency, 100% log coverage, and industry-leading query times, Helicone provides the essential tools for managing prompts, analyzing costs and performance, and ensuring high reliability. With enterprise-grade scalability and risk-free experimentation, Helicone helps developers get to production-quality faster, without the need for SDKs or complex integrations.

LitServe: A Lightning-Fast Serving Engine for AI Models

Read Time: 7 minutes

LitServe is a flexible, easy-to-use serving engine for AI models built on FastAPI. It offers features like batching, streaming, and GPU autoscaling to help software developers deploy and scale AI models efficiently, without the need to rebuild a server for each model. LitServe is at least 2x faster than plain FastAPI and supports a wide range of AI models, including LLMs, computer vision, and audio processing, making it an ideal tool for developers looking to leverage generative AI in their applications.

Salesforce/xLAM-7b-r: Powering Next-Gen AI Agents

Read Time: 8 minutes

Salesforce's xLAM-7b-r is an advanced Large Action Model (LAM) designed to enhance decision-making and translate user intentions into executable actions. This open-source model can autonomously plan and execute tasks, serving as the brains of AI agents that can automate workflows across various domains. The newsletter's software developer audience will be interested in xLAM's potential to power the next generation of intelligent applications.

 

📰 NEWS & EDITORIALS

Andrej Karpathy Praises Cursor Over GitHub Copilot

Read time: 5 minutes

Prominent AI researcher Andrej Karpathy has expressed his preference for the AI coding tool Cursor over GitHub Copilot, stating that it is a "net win" for his workflow. Karpathy noted that he now spends more time "writing English" ( prompting and editing) rather than traditional coding, enabled by Cursor's advanced language model capabilities. This shift in the programming process highlights the rapid evolution of generative AI tools that are transforming how software developers work. Read more on the Cursor AI tool vs. GitHub Copilot.

AI Coding: Human Engineers Are More Important Than Ever

Read time: 8 minutes

This article explores how AI-powered coding tools are transforming software development, making human engineers more crucial than ever. While AI can generate code quickly, developers remain responsible for ensuring security, optimizing performance, and providing exceptional user experiences. The article highlights how AI-assisted coding empowers developers to tackle more ambitious projects, but also emphasizes the need for critical thinking and domain expertise to guide the AI's output effectively.

Anthropic Publishes the 'System Prompts' that Make Claude Tick

Read Time: 6 minutes

Anthropic has published the system prompts for its latest AI models, Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3 Haiku. These prompts outline the models' capabilities, personality traits, and restrictions - providing rare transparency into how generative AI assistants are primed and controlled. As software developers build applications using Retrieval Augmented Generation or agent-based systems powered by large language models, this insight into Anthropic's prompts can help them better understand the underlying AI behaviors.

With 10x Growth Since 2023, Llama is the Leading Engine of AI Innovation

Read Time: 8 minutes

The Llama LLM has seen explosive growth, with over 350 million downloads and 20 million downloads in the last month alone. Llama usage has more than doubled in the last 3 months, making it the leading open-source model for developers building AI applications. Major enterprises like AT&T, Goldman Sachs, and Shopify are leveraging Llama to drive innovation, showcasing its versatility and performance.

Synthetic Data Solves AI's Biggest Problem

Read time: 8 minutes

This article explores how synthetic data, AI-generated data that mirrors real-world statistics, can help software developers overcome the lack of affordable, high-quality data for building and testing AI models. Unlike fake data, synthetic data emulates complex relationships and patterns, enabling more accurate testing and development of dynamic, data-driven applications. The article also highlights how synthetic data is already being used by major companies and organizations to drive innovation while protecting customer privacy.

 

Thanks for reading, and we will see you next time

Follow me on LinkedIn or Threads