Smart Search

Edwin_van_den_Akker · June 2, 2025, 1:02pm

I had this crazy idea…
…to figure out a way to search in a smarter way. Not just searching for exact text. Sometimes I use ChatGPT (or some other LLM), to find information in large documents or websites: simply by uploading a document or entering a URL for the website I want to search through.

Sometimes I need to find information in my own databases or documents.
But I might look for the wrong words, or I might have some typos in the original documents. Yeah, typos happen to all of us, right?

Encoding data: Symbols
So, instead of just storing the “raw” data in a database or document, I could store certain word-concepts. I mean, “forest” and “woods” are two different words, but with the same meaning. What if I can just generate one "symbol" for those words.

I could encode that a search query to get a bunch of those symbols. That would make give me results of all the synonyms.

LLM and Data Caching
I could run a small local LLM, to save the expensive API calls to an online platform, like OpenAI.
I could use that LLM to build a dataset of the symbols for words in documents.
That dataset can be used to encode (or decode) my query and data I want to digest.

The reason of using a dataset, is to build some sort of cache, instead of running the LLM all the time, which can be very power (and CPU) hungry.

Am I re-inventing the wheel here?
I am aware that this approach is not suitable for most cases. But it can be useful in a case where I store lots of documents like agreements, contracts, user manuals, etc…

Gabriel_Ludosanu · June 2, 2025, 1:13pm

Something like this: What is a Vector Database & How Does it Work? Use Cases + Examples | Pinecone ?