Content
In our previous blog article we explored Large Language Models (LLMs) and their growing impact on business workflows. Today, we dive deeper into a specific and highly practical use case: Retrieval-Augmented Generation (RAG). RAG combines the language understanding of LLMs with targeted retrieval from your own data, enabling faster, more accurate, and context-aware responses.
Why RAG?
As we have already discussed, LLMs are powerful tools when working with text information. Nevertheless, they have several limitations in business contexts:
Static Knowledge
LLMs are “frozen” objects. By “frozen” we mean that the model cannot update its training dataset on its own, which means that the knowledge state is somewhat static and is up to date for same timestamp in the past. You can figure it out by asking any popular LLM regarding some event that occurred recently and with the high probability it tells you that it does not have information about your query or its knowledge base is up to some date (unless the LLM is provided with a web search capabilities). And it is quite complicated to retrain LLMs, for example, each week, because it requires some time, vast computational resources and significant budgets.
Hallucinations (AI)
As we mentioned in article regarding LLMs, due to stochastic nature and a limited size of context length (becomes not that relevant problem of modern state-of-the-art models, but still an issue for many small-to-medium size models) of nowadays language models, sometimes they tend to hallucinate. LLM is said to “hallucinate” when an LLM starts to come up with a fictional context which leads to an unreliable response to the user’s prompts. So even if you manage to copy-paste a huge amount of text into the input, at some point of chat the model will start forgetting previous information and create hallucinations.
Limited access to private data
Last, but not least, is the fact that LLM’s training data, or knowledge base, consists of information that comes from open or available sources such as Internet or specific datasets. That means that if you would like to query LLM about something that is not publicly available (for example, an essay that you store on your laptop) it won’t be able to provide response according to this hidden source.
RAG reduces these issues by supplementing the model’s text-processing power with relevant, dynamic context from your own data sources.
What is the difference between RAG & a simple LLM?
A typical process flow when using LLM is the following one:
Question -> Answer generation
While RAG integrates a couple of additional steps:
Question -> Related context retrieval -> Generation -> Answer with source
Here is a little practical example of RAG:
User Question
- A person asks AI something, like “What is our company’s parental leave policy?”
Retrieval
- The system searches the company’s knowledge base and pulls out the most relevant information (e.g., the HR policy document).
Generation
- The AI blends the retrieved information with its natural language ability to form a clear, contextual answer (e.g., “Our parental leave policy allows 16 weeks of paid leave for primary caregivers and 8 weeks for secondary caregivers.”).
Answer with Context
- The AI replies and shows the source document (link to the HR policy)
This approach ensures accuracy, relevance, efficiency, and transparency.
Let’s go step-by-step with RAG framework and discover how it works.
Everything starts with a Retrieval component. The first step is to prepare our data source. As we mentioned earlier, we need to extract only relevant information from our source. That means, we need to divide our whole document into sub pieces. In RAG terminology this is called Chunking. There are different strategies for how chunking can be performed, for the sake of explanation simplicity we will divide our documentation into paragraphs. After Chunking occurs a so-called Embedding process. However, instead of token embedding as we do it in LLM, we embed the whole chunk of information and encode its semantic meaning. We perform this step to allow our framework find related chunks based on the user query. So, when the user writes an input to the RAG, this input is passed through the same embedding model. And the idea of a related context search is quite simple. The question and related answer should have somewhat similar embeddings (similarly encoded text describe related information). Having the context that we need we can proceed to the Augmentation process.
So, the Augmentation part is simple. You have to provide to the LLM a single prompt with the user’s query, retrieved content, and ask it to reply using retrieved context information.
During the last step the response Generation occurs – an ordinary LLM routine. The model takes the input prompt and creates a coherent response using natural language. In addition to that, it can also mention used sources in the response.
Benefits of RAG
Implementing RAG brings multiple advantages for businesses:
- Accuracy: by providing only contextual information to LLM we reduce chances of wrong answers and hallucinations.
- Relevance: responses are tailored to your specific content rather than general web knowledge.
- Efficiency: users save time by avoiding manual searches through documents.
- Trust: source citations make it easy to verify the answer’s correctness.
- Privacy: self-hosted RAG pipelines can secure sensitive internal data and prevent data leakage to third-party organisations.
Best approaches and technical considerations
Ensure High-Quality Data
Modern RAG systems work with different types of information. It can be textual data that is stored as a Microsoft Office document, presentation, even Excel table, a PDF, a Markdown document, there are also cases when images are provided as a data source. But with no regard to any kind of source, there is always one principle: “Garbage in, garbage out”. Your organisation can implement a state-of-the-art RAG system, but with bad context source it is almost impossible to obtain good performance. Therefore, before even using any AI tools it is important to take considerations about your data. Steps like preprocessing, enriching data fields and merging information from multiple sources can significantly improve your RAG pipeline. In addition, it is also quite useful to enrich your data with metadata that describes properties of your data. It can be creation date, title, any kind of descriptive features that encompasses data information. In previous section we covered only a semantic search process that uses embeddings, but in modern RAG systems it is quite common to combine that with additional search mechanisms like metadata querying, or even full-text search. It is beneficial to incorporate multiple ways of finding relevant context.
A good search with a mediocre model is better than a mediocre search with a good model
If you consider using open-source LLMs as an alternative of LLM API providers, it is better to invest your time and resources into improving a Retrieval part of RAG than in a Generation one. Some small (<7B parameters) and medium (7B<20B parameters) language models are capable of generating a good quality output with well-extracted context. But even the best model can’t improve the absence of relevant context.
Develop a response policy
Naturally, there might occur situations, when your RAG pipeline does not find response to the user query. There might be several reasons from a simple absence of such information in your sources to problems within the retrieval pipeline. First of all, it is good to have a monitoring system that allows you to identify such cases, but secondly you need to think about further user experience. For example, you can implement user-friendly fallback messages like: “We couldn’t find an exact answer. Please, contact our support team”.
Use Cases
SAP Joule
SAP Joule offers Document Grounding, a RAG-based feature that connects Joule assistant to enterprise data. With Document Grounding you can connect your Microsoft SharePoint as a data source and use different text files such as docx, pdf, txt, json and extract plain text from png, jpg or tiff format files (important to know that tables and images inside of documents are not yet supported). In addition to that you can also configure SAP Build Work Zone as a data source with similar file formats together with blog posts and knowledge base articles. Overall limitation of uploaded sources is equal to 2000 documents. Document Grounding allows companies to query internal documents with RAG-powered AI, ensuring fast, relevant, and accurate responses within SAP ecosystem.
You can observe how the SAP Joule Document Grounding feature works here: https://www.sap.com/assetdetail/2024/09/10718415-d97e-0010-bca6-c68f7e60039b.html
NotebookLM
Another prominent example of a RAG-powered application is NotebookLM. Developed by Google, NotebookLM is an AI research assistant that can be tailored to your sources of information. With this app, you can upload documents of interest and interact with Google’s LLMs. This tool provides direct references from the uploaded documents, enabling precision and increasing trust in LLM responses. NotebookLM also offers additional features such as mind map creation, report generation, audio podcast and video overview creation, and much more.
Conclusion
Large Language Models (LLMs) impress with their generative power but remain limited by static training data and the risk of inaccuracies. Retrieval-Augmented Generation (RAG) addresses these shortcomings by enriching LLMs with dynamic, up-to-date knowledge sources, creating more accurate, reliable, and explainable results. In practice, LLMs and RAG should not be seen as competitors but as complementary approaches: while LLMs excel in creativity and fluency, RAG ensures precision and relevance in knowledge-driven contexts.
At PIKON, our dedicated Data Science team helps you identify the right approach for your specific use case. Whether you want to explore RAG, optimize the use of LLMs, or design a tailored AI solution, we support you in turning these technologies into real business value.

AI Discovery Workshop
AI is transforming industries but how do you know if it’s the right fit for your business?
The AI Discovery Workshop is designed to cut through the hype and focus on what truly matters for your organisation.
More information
Contact

