Best way to store large amounts of searchable text?

Hi all,

I’m starting on a project that will involve storing hundreds of megabytes of user-generated text, with metadata attached to each word. I’ll need to make sure this text is searchable with relatively quick results.

My first thought was to use SQLite, but this is a new area for me and I’m wondering if anyone might have any suggestions as how to best approach storing and searching this data.

Thanks in advance for any insight.

SQLite’s FTS5 System works very well for searching large amounts of data, and if you don’t mind reindexing, you might consider using an in-memory db for faster results.

This brings back memories of when I had to do this for a client in 4D back in the Mac OS 7 or 8 days. I broke up the words into a table and created a link to the text where they appeared. When a user searched for a phrase, I’d break up their query into words to locate the relevant records, then query those records for the matching phrase (if they needed an exact match).

Boy I could have used FTS back then. :slight_smile:

I tried running the following in NaviCat for SqlLite, seems only FTS4 works, see below

sqlite> create virtual table plsqlsearch using fts5(procname,procsource);
no such module: fts5
sqlite> create virtual table plsqlsearch using fts4(procname,procsource);
Query OK (0.11 sec)

sqlite>

Perhaps I have to create the table in XOJO first instead of NaviCat

Very possibly since Navicat may not have been built with a version of the SQLIte libraries that have FTS 5 built in

I just used my Tadpole SQLite manager and it does FTS5 (tested it with your above query)
So the XOJO Sqlite Engine will support it

and the version of SQLite it is using right now is 3.14.1 (Xojo 2019 is 3.25)

Correct, I ran the demo and it’s fine. I’m down a rabbit hole with FTS5. I’m writing a small search too for our thousands of stored procedures in ORACLE so we can search for code or snippets of code to help our newer people find things.

Curious… if you are searching ORACLE why not use ORACLE to do it?

Long story Dave. But here goes :wink:

We are a hosted site for our eHealth system (Cerner, out of Kansas City Missouri) and they lock down the database and we don’t have the ability to create conText indexes on any database tables to use Oracle’s Full Text Searching. But I can query the Oracle Meta Data Tables (All_source, all* tables etc). So I’m pulling that down and doing the search locally. And for selfish reasons I want to learn more about SqlLite. 35 years of Oracle is getting boring

That explains it… I used to work for AETNA, and they tried to do the same thing until we convinced them that it was necessary to do certain things. At which point they allowed us (certain IT depts) to have our own schema with READ ONLY access to the production data…of course we still had to pass independent SOX audits