FreeTDS and UTF-8 NFD "decomposed" form

Hi,

I have problems storing NFD “decomposed” UTF-8 strings in a SQL Server database using FreeTDS ODBC.
FreeTDS stores an empty string (not null) and does not raise an exception.

The problem comes from macOS filenames which are in UTF-8 NFD “decomposed” form.

Example, if I do a print screen on my Mac, the filename (French) becomes “Capture d’écran 2015-09-08 à 19.37.10.png”.

From the folderItem FileName, the “é” and “à” are in the “fully decomposed form” I’m getting 2 ascii codes:
e: 101 + ?: 769
a: 97 + ?: 768

If I type these letters myself I get the following ascii codes:
é: 233
à: 224

Is there a way to convert from NFD “decomposed” form to NFC “precomposed form” ??
Does someone have such an algorithm ?

Using declares and NSString could solve the problem when staying in macOS environment but when such files are copied to a Windows system the retain their UTF-8 NFD form. So I’d need a Windows solution as well.

All ideas are welcome !

Thanks,

Olivier

The MBS plugin has functions for this. I remember having used these but I can’t find the code right now.

Hi Beatrix,

Thanks. I’m using MBS SQL plugin and already tried the SQLStringMBS but it’s still giving an UTF-8 NFD decomposed form.
I’ll check if there are any other string functions.

see this Thread .
There is a Normalize function which solves such an issue.

No need for a plugin for such tasks.

Hi Thomas,

Thanks !
This solves the problem for macOS users.

But some applications users are working on Windows and are receiving files from macOS users.
The problems is less important in Windows, the “decomposed” string is stored in the database but not correct.
The filename is stored as “Capture d’e´cran 2017-10-12 a` 20.24.17.png”.

I need to find an equivalent Normalize function for windows.

Thanks anyway !

I found this in another thread but the Windows part gives an OutOfBound error here:

return ConvertEncoding(DefineEncoding(m.stringValue(0, newSize*2), Encodings.UTF16), Encodings.UTF8)

I’ll try to solve this (full function below).

In the meantime I have a solution for macOS thanks to @Thomas Eckert…

Thanks.

[code]Function Normalize(extends s as String, form as UInt32) As String

// Normalizes characters of a text string according to Unicode 4.0 TR#15

// can’t normalize a string with unknown encoding
if Encoding(s) is nil then
return s
end if

// can’t normalize a string not encoded in UTF8
if Encoding(s) <> Encodings.UTF8 then
return ConvertEncoding(s, Encodings.UTF8)
end if

#if targetMacOS

soft declare function CFStringCreateMutableCopy lib "Carbon.framework" ( _
alloc as Ptr, _
maxLength as UInt32, _
theString as CFStringRef) as CFStringRef

soft declare sub CFStringNormalize lib "Carbon.framework" (theString as CFStringRef, theForm as UInt32)

dim mutableStringRef as CFStringRef = CFStringCreateMutableCopy(nil, 0, s)

CFStringNormalize mutableStringRef, form
return mutableStringRef


'enum CFStringNormalizationForm {
'kCFStringNormalizationFormD = 0,
'kCFStringNormalizationFormKD = 1,
'kCFStringNormalizationFormC = 2,
'kCFStringNormalizationFormKC = 3

#elseif targetWin32

// not available on Windows < Vista
if not system.isFunctionAvailable("NormalizeString", "Normaliz.dll") then
  return s
end if

'soft declare function GetLastError lib "Kernel32.dll" () as UInt32

soft declare function NormalizeString lib "Normaliz.dll" ( _
NormForm as Int32, _
lpSrcString as WString, _
cwSrcLength as integer, _
lpDstString as Ptr, _
cwDstLength as integer) as integer

dim estimatedSize as integer = NormalizeString(form, s, len(s), nil, 0)

if estimatedSize > 0 then
  dim m as new memoryBlock(estimatedSize)
  dim newSize as integer = NormalizeString(form, s, lenb(s), m, m.size)
  return ConvertEncoding(DefineEncoding(m.stringValue(0, newSize*2), Encodings.UTF16), Encodings.UTF8)
else
  return ""
end if

' if estimatedSize <= 0 then
' err = GetLastError
' end if


' typedef enum _NORM_FORM {
' NormalizationOther  = 0,
' NormalizationC      = 0x1,
' NormalizationD      = 0x2,
' NormalizationKC     = 0x5,
' NormalizationKD     = 0x6
' } NORM_FORM;

#else

dim exc as new UnsupportedOperationException
exc.message = "Normalize is not supported on this platform"
raise exc

#endif

End Function
[/code]

with MBS Plugins, you can look here:
http://monkeybreadsoftware.net/faq-howtonormalizestringonmac.shtml

or see the cross platform functions here:
http://monkeybreadsoftware.net/string-stringcompose-method.shtml

Hi Christian,

Indeed !
The cross-platform is what I need and works perfectly.

Thanks,

Olivier