Bug in String.ReplaceAll?

Michel, while i’m not French, i’m romanian, but you are close, i do work with french medical files which for no idea what reason those people , having a medical app to manage their data they love to put the whole history of the patient in that file name, now idea honesly how they manage even to do that.

Second as mentioned in the previous posts, they do bring data from Windows, they use the data on Macs and they store the data on linux. , as examples of file names you have “GENOU GAUCHE - ARTHROGRAPHIE DU GENOU DROIT - ARTHROSCANNER DU GENOU DROIT 27.06.2016 FIRST NAME LAST NAME.pdf” or “FIRSTNAME^LASTNAME^^ - RX SCANIOMETRIE DES\ M.I - RAD/ Scaniométrie-6.jpg” and many others like this, so when i tried to migrate the data i discovered the wonderful issues. and saw that most of them failed, because they could not be processed.

Started to dig i find like 300 cases where issues come like this and have to dig into almost 200.000 files .

If you look to the Last post of Rick you will see that in his post he tried to replicate the issue and he gets same result as me. ReplaceAll not replacing / and ;

Honestly i’m not sure that is understood here and what would be the purpose of “ReplaceAll” then, if i want to use Individual findings and to loop them i use Replace, the purpose of ReplaceAll is to find all the matchings in that specific string and the case where i mentioned “…” is that i have places where user changes the name and starts to put crazy characters inside some of which are for example …pdf and ReplaceAll(“…”, “.”) supposed to fix that.

While some work, fortunately “…” works as expected, it seems that “/” does not want to be replaced when it is in a Word same as Rick tested if i have "Folder / File " some times it replaces that , but if i have as he stated “Folder/File” or even worst “Folder/ file” it keeps the “/” there , which it should not

Rick is using characters that look like standard / and ; but they are not. You can copy/paste the code in Xojo to see the hex values of those. Because they are not the same, then ReplaceAll is trying to change one but is not found in the original string.

3 Likes

Go to this post:

Sometimes strings look normal but may contain “invisible features” at byte level. As already said many times, we need some of your failed cases exposed as “what we see” and “how it really is” at byte level (hex codes)

If we can get some of those strings you fail to fix, we can try to understand what’s going on and suggest ways trying to fix it.

As they come from several sources, they could have mixed encodings. Some from some ANSI code pages, some UTF-8, some UTF-16…

1 Like

Once i have it i will make a small project and run it on the linux that has that and we will see what we get and i’ll share it here .

Thanks

:+1:t2:

i think i might have found the issue, not in all the cases but post processing narrowed a lot of errors.

So apparently Parent Folder is named , if you can believe it as “2022.11:15 rapport de sortie” so : breaks somehow the shell path.

Now, because both MacOS and Linux use same utility with same result , i use it this way :

sh.Execute("file --mime-type " + f.ShellPath)

shResult = sh.Result

If shResult <> "" Then
  shResult = shResult.Trim
  
  fType = shResult.NthField(":", 2)
  
  Return fType
  
Else
  Return fType
  
End If

Any better way of doing this ? so far XOJO does not have anyting as far as i know to detect mime in the Framework .

Considering the fact that i will need to rename Parent folder, i assume that

MoveTo
Does not work here, unless i take and move as well all the files in the folder.

What would be the best way to do this ? better use shell or stick to the XOJO code ?

Thanks .

Just to clarify, so far i found this and i guess it was creating confusion based on the shell result, as instead of the mime was giving those split results but on a closer look it seems that in the end fileName is properly saved after the replace all but because of the “:” in the parent it breaks the shell comand giving a stupid reply as a good example i have :

Full Path : patients/CXXX/import/2022.11:15 rapport de sortie/IMG_2385.jpg
FileName : IMG_2385.jpg
but shell breaks it as : 15 rapport de sortie/IMG_2385.jpg immediately after the “:”
So i guess in order to se that way i need to fix Parent folders as well.

I think you are doing something wrong not tied to a full path of a file, but to bad use of shell.

Here is an experiment with 2 hits and 1 fail:

admin@lubuntu-vm:~/Imagens$ pwd
/home/admin/Imagens

admin@lubuntu-vm:~/Imagens$ ls
'2022.11:15 rapport de sortie'

admin@lubuntu-vm:~/Imagens$ ls 2022.11\:15\ rapport\ de\ sortie/
IMG_2385.jpg

admin@lubuntu-vm:~/Imagens$ file --mime-type ./2022.11\:15\ rapport\ de\ sortie/IMG_2385.jpg
./2022.11:15 rapport de sortie/IMG_2385.jpg: image/jpeg

admin@lubuntu-vm:~/Imagens$ file --mime-type './2022.11:15 rapport de sortie/IMG_2385.jpg'
./2022.11:15 rapport de sortie/IMG_2385.jpg: image/jpeg

admin@lubuntu-vm:~/Imagens$ file --mime-type ./2022.11:15 rapport de sortie/IMG_2385.jpg
./2022.11:15:        cannot open `./2022.11:15' (No such file or directory)
rapport:             cannot open `rapport' (No such file or directory)
de:                  cannot open `de' (No such file or directory)
sortie/IMG_2385.jpg: cannot open `sortie/IMG_2385.jpg' (No such file or directory)

Escaped OK, quoted OK, raw CRASH

I guess your problem is not tied to file paths, but your understanding about how to use files and shell.

Also your post processing clean up code isn’t very safe, it needs to walk multiple possibilities of colons and spaces.

This is better:

sh.Execute("file --mime-type " + f.ShellPath)

shResult = sh.Result

fType = "" // Undefined

Do
  
  shResult = shResult.NthField(": ", 2)           // s:o: r: t:ie/IMG_2385.jpg: image/jpeg
  If shResult.Length > 0 Then fType = shResult    // Store last part, e.g. image/jpeg
  
Loop Until shResult.Length = 0

Return fType

Well, as far as i know using FolderItem.ShellPath supposed to automatically escape the characters and spaces so i was not expected that i do need to do that.
Keep in mind that you don’t have the full path in my example , i removed that for the sake of data anonymity . I guess .NativePath supposed to get the path format and proper path based on the OS, but i discovered that it’s not always the case and it works well only on MacOS so far.

I. guess i’ll try to quote the file path and see what i get, maybe that would help and i don’t need to normalise the files anymore, even if i did them already for most of them.

Thanks a lot for the example, i’ll adapt mine and follow your guidelines and see what i get .

Thanks again.

Then you would get “:” escaped and your shell command would not break as mine didn’t. I don’t know why you had problems there.

admin@lubuntu-vm:~/Imagens$ file --mime-type ./2022.11\:15\ rapport\ de\ sortie/IMG_2385.jpg
./2022.11:15 rapport de sortie/IMG_2385.jpg: image/jpeg