Why would you need it?

Baseline RAG works by taking a private dataset you create chunks using embeddings you store those chunks in a vector database. Once the information is chunked you perform nearest neighbor search based on a query and you use the result of the search to augment the context window.

  • vector DB is limited when you scale your knowledge base
    • additional topics
    • more text on same topic
  • retrieving the N-chunks may skip relevant data or broader themes

What is it?

Storing semantic dependencies between two entities. Made out of nodes and edges. Use cases:

  • search engines
  • recommendation systems

You extract from your raw text entities you find out if there is a relationship between those entities and get the strength of the relationship

LLMs can be leveraged today in order to get deeper meaning / contextual information rather than simple concurrence-based relationships.

Use GraphML to do:

  • semantical aggregation
  • hierarchical agglomerations

Additional reading