Large language models (LLMs) have revolutionized the field of natural language processing (NLP), enabling state-of-the-art performance on a wide range of tasks, including text classification, translation, summarization, and generation.
When it comes to usecases around leveraing LLMs for extracting insights from our own knowledge repositories, we have broadly two design approaches:
- Fine-tuning a LLM
- RAG (Retrieval Augmented Generation)
- All documents from a domain specific knoweldge source are converted into embeddings and stored in a special vector database. These vector embeddings are nothing but “N-dimensional matrices” of numbers.
- When the user types his query, even the query is converted into an embedding (matrix of numbers) using a AI model.
- Semantic search techniques are used to identify all contextual sentences in the “document-embedding” for the given query. Most popular algorithm is “cosine similarity”. This algorithm uses a cosine maths function to get all the sentences (matrices) that are ‘near’ or ‘close’ to the query (matrix). This entails matrix multiplications and other maths functions.
- All retrieved “semantically similar sentences/paragraphs” from multiple documents are finally again sent to a LLM for ‘summarization’. The LLM would paraphrase all the disparate sentences into a coherent story that is readable.