Welcome to the Gemini Era

PLUS - Llamafile: A New Way to Run Large Language Models Locally

Sam Keen
December 07, 2023

Essential AI Content for Software Devs, Minus the Hype

In this edition: Big news week, leading with the introduction of Gemini, Deep Mind’s multimodal LLM. In addition to that we have some great links to Tools and Tutorials.

📖 TUTORIALS & CASE STUDIES
🧰 TOOLS
📰 NEWS

📖 TUTORIALS & CASE STUDIES

Exploring LLM Visualization Tools
read time: 10 minutes

Dive into the world of Large Language Models (LLMs) with this amazing visual guide.

Llamafile: A New Way to Run Large Language Models Locally
read time: 8 minutes
Mozilla and Justine Tunney have released llamafile, a tool that allows developers to run LLMs like ChatGPT on their own computers. A llamafile is a single multi-GB file containing both the model weights and the code needed to run the model. The tool uses Cosmopolitan Libc to compile a single binary that works on multiple operating systems and hardware architectures.

Deconstructing Retrieval Augmented Generation (RAG) for Large Language Models
read time: 15 minutes
This article provides a comprehensive overview of Retrieval Augmented Generation (RAG) for LLMs. It discusses the challenges, concepts, and future work related to RAG, including query transformations, routing, query construction, indexing, and post-processing. The article also highlights the potential of open-source models in improving RAG and the importance of benchmarks for evaluation.

Getting Started with Llama
read time: 5 minutes
This guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. It covers various hosting options such as Amazon Web Services, Cloudflare, Google Cloud Platform, Microsoft Azure, and ONNX for Windows. The guide also includes how-to guides for fine-tuning, prompting, and validation. Additionally, it provides integration guides for Code Llama, compatible extensions, LangChain, and LlamaIndex. Check out the full guide here.

🧰 TOOLS

Marker: A Fast and Accurate PDF to Markdown Converter
read time: 10 minutes
Marker is a deep learning-based tool that converts PDF, EPUB, and MOBI to markdown. It's 10x faster than Nougat, more accurate on most documents, and has a low hallucination risk. It supports a range of documents, multiple languages, and works on GPU, CPU, or MPS. Check out the Marker GitHub repository for more details.

Introducing Canopy: A New RAG Framework Powered by Pinecone
read time: 8 minutes

Pinecone has launched Canopy, an open-source Retrieval Augmented Generation (RAG) framework for building GenAI applications. Canopy simplifies the process of building a production-ready chat assistant, handling tasks like text data chunking, embedding, chat history management, and augmented generation. It uses the Pinecone vector database for storage and retrieval, and is free for up to 100K vectors. Canopy is designed to be easy to implement, reliable at scale, modular, and extensible.

📰 NEWS

DeepMind's Gemini: A Leap Forward in Multimodal AI
read time: 10 minutes

DeepMind introduces Gemini, a multimodal AI model capable of reasoning across text, images, video, audio, and code. Gemini outperforms previous state-of-the-art models and even human experts on Massive Multitask Language Understanding (MMLU) and other benchmarks. It also excels in tasks like Python code generation, image understanding, video captioning, and automatic speech translation.

Envisioning the Future of Large Language Models as Operating Systems
read time: 15 minutes
This article explores the concept of LLMs as operating systems (LLM OS), a concept introduced by Andrej Karpathy. It discusses the potential of LLMs to enhance user productivity and privacy by running locally on devices, and the possibility of LLMs interacting directly with the OS, enabling a range of applications from personalized search to complex automation.

The Future of AI Tools: Cognitive Composability and the Marketplace
read time: 10 minutes
OpenAI's recent feature enhancement allows developers to create tools that can be called as functions from GPT. This paves the way for an AI tools marketplace where developers can build and describe tools for specific tasks, making them discoverable and invokable over the internet. This concept, termed 'Cognitive Composability', could revolutionize how we leverage AI.

Introducing PPLX Online LLMs: The Next Level in AI Information Retrieval
read time: 15 minutes
Perplexity introduces two new online LLMs, pplx-7b-online and pplx-70b-online, designed to provide accurate, up-to-date, and factual responses. These models leverage internet knowledge, surpassing traditional LLMs in freshness and accuracy. They are accessible via the pplx-api and Perplexity Labs, offering developers a powerful tool for real-time information retrieval.

AI and Trust: The Implications and Need for Regulation
read time: 20 minutes
In this comprehensive article, the author explores the concept of trust in relation to AI, arguing that we often confuse interpersonal trust with social trust. The author warns of the potential for corporations to exploit this confusion, and emphasizes the need for government regulation to ensure trustworthy AI, including transparency laws, safety regulations, and the creation of public AI models.

Thanks for reading and we will see you next time

Follow me on twitter, DM me links you would like included in a future newsletters.