Sunday, November 05, 2023

Fine-tuning vs RAG for LLMs

Large language models (LLMs) have revolutionized the field of natural language processing (NLP), enabling state-of-the-art performance on a wide range of tasks, including text classification, translation, summarization, and generation. 

When it comes to usecases around leveraing LLMs for extracting insights from our own knowledge repositories, we have broadly two design approaches:

  • Fine-tuning a LLM
  • RAG (Retrieval Augmented Generation)



Many fields have their own specialised terminology. This vocabulary may be missing from the common pretraining data utilised by LLMs. 

Fine Tuning a LLM
The process of fine-tuning a pre-trained LLM on a fresh domain-specific dataset is known as fine-tuning. Fine-tuning is the process of further training a previously trained LLM on a smaller, domain-specific, labelled dataset.
To fine-tune an LLM, you'll need a dataset of labelled data, with each data point representing an input and output pair. A written passage, a query, or a code snippet might be used as input. The result might be a label, a summary, a translation, or code.
Once you have a dataset, you can use a supervised learning method to fine-tune the LLM. By minimising a loss function, the algorithm will learn to map the input to the output.
It can be computationally costly to fine-tune an LLM.

Another subset of the above approach is called PEFT (Parameter-efficient fine-tuning) and LoRA is the most popular approach for PEFT today.  
LoRA (Low-Rank Adaptation of Large Language Models) is a fine-tuning approach for LLMs that is more efficient and memory-efficient than standard fine-tuning. Traditional fine-tuning entails altering all of an LLM's parameters. This can be computationally expensive and memory-intensive, particularly for big LLMs with billions of parameters.LoRA, on the other hand, merely modifies a few low-rank matrices. Because of this, LoRA is far more efficient and memory-efficient than conventional fine-tuning.

An excellent article explaining the concepts of full fine-tuning and LoRA is here -- https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you/

RAG (Retrieval Augmented Generation)
RAG is an effective strategy for improving the performance and relevance of LLMs by combining "prompt engineering" with "context retrieval" from external data sources.
Given below is a high level process flow for RAG. 
  1. All documents from a domain specific knoweldge source are converted into embeddings and stored in a special vector database. These vector embeddings are nothing but “N-dimensional matrices” of numbers.
  2. When the user types his query, even the query is converted into an embedding (matrix of numbers) using a AI model.
  3. Semantic search techniques are used to identify all contextual sentences in the “document-embedding” for the given query. Most popular algorithm is “cosine similarity”. This algorithm uses a cosine maths function to get all the sentences (matrices) that are ‘near’ or ‘close’ to the query (matrix). This entails matrix multiplications and other maths functions. 
  4. All retrieved “semantically similar sentences/paragraphs” from multiple documents are finally again sent to a LLM for ‘summarization’. The LLM would paraphrase all the disparate sentences into a coherent story that is readable. 
RAG along with Prompt Engineering can be used to build powerful knowlege management platforms such as this - https://www.youtube.com/watch?v=lndJ108DlBs

The table belows shows the advantages/disadvantages of both the approaches. For most usecases, a proper utilization of prompt engineering and RAG would suffice. 

Friday, November 03, 2023

Ruminating on Debezium CDC

Debezium is a distributed open source platform for change data capture (CDC). It collects real-time changes to database tables and transmits them to other applications. Debezium is developed on top of Apache Kafka, which provides a dependable and scalable streaming data infrastructure.

Debezium operates by connecting to a database and watching for table updates. When a change is identified, Debezium creates a Kafka event with the change's information. Other applications, such as data pipelines, microservices, and analytics systems, can then ingest these events.



There are several benefits of utilising Debezium CDC, including:

  • Debezium feeds updates to database tables in near real time, allowing other applications to react to changes almost quickly.
  • Debezium is built on Apache Kafka, which provides a dependable and scalable streaming data platform.
  • Debezium can stream updates to a number of databases, including MySQL, PostgreSQL, Oracle, and Cassandra using connectors. 
  • Debezium is simple to install and operate. It has connectors for major databases and may be deployed on a number of platforms, including Kubernetes/Docker.
Use cases for Debezium CDC:
  • Data pipelines and real-time analytics: Debezium can be used to create data pipelines that stream changes from databases to other data systems, such as data warehouses, data lakes, and analytics systems.  For example, you could use Debezium to stream changes from a MySQL database to Apache Spark Streaming. Apache Spark Streaming can then process the events and generate real-time analytics, such as dashboards and reports.