DB Select fails with non-ASCII characters

Dale_Arends · February 23, 2020, 7:21pm

Using SQLIte, I’m having a problem getting a SELECT statement to work when the search term includes accented characters. I have a database of models (objects, not people) where one group of items has names like Kerts_2, Kerts_3, and Kerts_4. I know they are in the database correctly because when I use the following SQL my app returns the items properly.

SELECT Name FROM Models WHERE upper(Name) LIKE 'KER%'

But when I use the following SQL, I get no values returned.

SELECT Name FROM Models WHERE upper(Name) LIKE 'KERTS%'

Even if I remove ‘upper()’ from the SQL and replace the LIKE item

SELECT Name FROM Models WHERE Name LIKE 'Kerts%'

I get no returned values.

Just for a test, I pasted the query into the SQLite Database Browser and it failed also. Is SQLite unable to handle items like this? somehow I find that hard to believe so I assume I’m missing something. But what?

BTW, I am using 2019r1.1 on Windows 10 for this project but I go the same results in 2019r3.1.

Paul_Rodman · February 23, 2020, 7:51pm

I had a similar issue recently. Using prepared statements resolved it.

Jean-Yves_Pochez · February 23, 2020, 7:52pm

what is the encoding of the sqlite db ?

anon93744516 · February 23, 2020, 7:58pm

Using SQLite Manager and the sample chinook.db from here, it looks this is not a problem with SQLite itself.

I would take a closer look at what is actually stored vs. what you use for input to your LIKE operator.

I hope that helps!

Dale_Arends · February 23, 2020, 9:10pm

Hmmm. The SQLIte database is encoded UTF8. Using a hex editor, it looks like the database is storing the accented characters as a single byte. If I look at the SQL statement itself in hex it seems to have the accented characters as double-byte. Maybe I should convert the encoding of the SQL to ANSI before I send the query? That would work but is it the “proper” solution?

I think I’ll try setting it up as a Prepared statement and see what happens.

TimStreater · February 23, 2020, 10:31pm

Have you used the official SQLite command line tool to examine the database? You could also ask on the SQLite Users’ Mailing List (see at http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users). The SQLite developers are on that list.

Dale_Arends · February 24, 2020, 5:19pm

I really don’t believe that the problem is in SQLite. If it were, I would think the entire SQLite user universe would be reacting.

The one thing that does seem to work is converting the encoding of the SQL statement to ANSI before submitting the SELECT statement to the detabase.

sql = "SELECT Name FROM Models WHERE Name LIKE 'Kerts%'" sql = sql.ConvertEncoding(Encodings.WindowsANSI)

RomanV · February 24, 2020, 5:36pm

Could this be your issue?

“SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression ‘a’ LIKE ‘A’ is TRUE but ‘’ LIKE ‘’ is FALSE.”

Dale_Arends · February 24, 2020, 6:26pm

[quote=476663:@Roman Varas]Could this be your issue?

“SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression ‘a’ LIKE ‘A’ is TRUE but ‘æ’ LIKE ‘Æ’ is FALSE.”[/quote]
Not in this case. The database is definitely storing accented characters in a single byte (8 bits) so I needed to ensure that I am sending them as 8 bit characters in the SQL statement. Xojo string value had the accented i as HEX(C3 AD) where the database needed it as HEX(ED). Doing the ANSI conversion made the change.

As for characters like ‘æ’ and ‘Æ’, well, I guess I have some more testing to do.

Beatrix_Willius · February 24, 2020, 6:52pm

From the SQLite FAQ:

code Case-insensitive matching of Unicode characters does not work.

The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability.
Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an “ICU” extension that does these overloads. Or, developers can write their own overloads based on their own Unicode-aware comparison routines already contained within their project.[/code]

You have to load the ICU lib as an SQLite extension. The website from Thomas Tempelman has a prebuilt dylib. It’s not very hard to load that into SQLite.