Weird result from FTS search in SQLite

To do search in emails I’m using FTS in SQLite. The database uses ICU. One user sent me an example where the search for Chinese characters has an unexpected result. He sent me an example database with 2 emails only:

  • case 1: if search term is 明天, then email 1 shows up which is good
  • case 2: if search term is 明, then email 2 shows up which is NOT good, I expect both emails should show up

Now comes the weird part: when I got the database neither case 1 nor case 2 gave any result. After redoing the FTS both searches work fine:

The user also redid the FTS and now case 1 works fine for him and case 2 does not.

Any idea what could cause this problem and how to fix it?

@Christian_Schmitz : which is the latest unicode version supported by the plugin version 24.0?

I’m using fts4 and a really old unicode version. So I wanted to update. But I wasn’t able to find which unicode version is the latest built in one. 14???

We load the unicode libraries included with macOS, Linux or on Windows with the application or the OS.
That may not be the latest one.

Latest should be 15.1 or higher.

1 Like

I should have been more clear. This is my code to make an fts table:

Dim sql As String = "CREATE VIRTUAL TABLE bodyindex USING fts4(tokenize=unicode61,content='', messagebody);"

I have been using this for a very long time. Which version of unicode can I use for the tokenizer? Is this still necessary? I wasn’t able to find much information on the SQLite website. And it’s been too long since I did this code.