Postgre Insert Error Due to String Encoding

We are getting db errors in Postgre during an insert. The error occurs when inserting the name “O’QUINCE EDWARDS”. The apostrophe after the O is not a normal one. The error is:

ERROR:  invalid byte sequence for encoding "UTF8": 0x92"

The data comes to us from a variety of sources. My guess is that most data we receive is generated on Windows systems. As a test, we set the encoding using this code:

strFileData= strFileData.DefineEncoding(Encodings.WindowsLatin1)

When we run the app, the error goes away. Prior to today, we have not set the encoding at all and we have not had any problems. If we change the code and explicitly set the string encoding to WindowsLatin1, then what could possibly happen with other data that we process that comes from other sources? We are trying not to fix one problem and cause another.

That means that the data you read was not in fact UTF-8. There’s not much you can do except find out what each data source is giving you. You CAN determine if a given string is UTF-8 or not using Encodings.UTF8.IsValidData, but there’s no iron-clad way of telling what encoding it is if not UTF-8.

You could try replacing that character with 0x27 (plain apostrophe) instead of a right single quote from whatever windows encoding it’s from. I would imagine that for name fields that would be a minimally invasive fix. I can’t think of a case where you’d actually need a right single quote in a name. Probably got that data from some editor like word that loves to use “smart” quotes.