Bug in String.ReplaceAll?

No - do not do this. Work within the text encoding scheme, or suffer a lifetime of misery in your code and support tickets. :slight_smile:

As I explained before here in the forum just a week or so ago, “:” is NOT illegal on MacOS. It depends on the API that you use. If you use an API / function that uses POSIX paths (with “/”) as Separators, then you can use “:” in the file or directory name. OTOH, if you deal with functions based on the old HFS naming scheme, as the Finder does, and also FolderItem.Name and FolderItem.Child, then you use instead a “/”, which is then legal.
“/” shows where “:” does not, and vice versa.
So, there are effectively only two illegal chars:
The zero byte (even that used to be legal but is now causing problems) and either “:” or “/” depending on the API - but they’re interchangeable.

1 Like

First of all, replacing “…” into “.” has to be done repeatedly, until there are no “…” left, or you’ll get a “…” wrong.
And, as you already wrote in a previous post, I believe, you need to check if a name ends with a period, and then remove that, too.

It appears you do not retrieve the names correctly, or do not rename them in the correct way.

Basically, you’d get the name from FolderItem.Name, put it into a String, then do your replacements on the string, and finally assign that cleaned string to the Name property of the file to rename it, or create a new FolderItem with the Child() function, passing that cleaned string.
Also, you may want to first separate the extension from the name, by finding the last period, using the InStr function, like this (out of my head):

dim fname as String = "test.txt"
dim start as Integer
dim extpos as Integer
do
  start = fname.Instr(start+1, ".")
  if start = 0 then break
  extpos = start
loop

Now exppos should have the position of the extension, which you can extract with Mid(extpos), and then shorten the fname with Left(extpos-1). But don’t do this if extpos = 0, because then there was no dot.

Which of this is a file name and what is a directory path? A file name can’t have both the “:” and “/” in its name, as neither macOS not Linux nor Windows would allow this. This what you’re showing is clearly a path, not a file name. That confusion alone suggests that you’re not understanding the difference between a pure file name and a path. Get this right, first of all.

Also, perhaps you should change the title of this topic, because it’s rather “How do I convert file names for Windows?”

What text encoding scheme if a filename has been moved around systems? The encoding scheme cannot be assumed can it? If it can be assumed, why isn’t ReplaceAll working in all cases?

“Linux” will allow a colon “:” in the filename. Only a slash and NULL is forbidden. However as soon as a USB stick is plugged into a Linux desktop we are talking VFAT probably. A lot of systems have adopted a version of UTF8. However the basic7-bit ASCI characters are all single byte anyway, so if there are issues with ReplaceAll they will probably be encoding-related. Linux file systems store filenames byte-for-byte undencoded for the reasons we are talking about here. In IT it’s best if there’s only one source of the truth where possible.

What does this mean?

There are minor flavours of UTF8, such as Java’s “modified-UTF” for serialization dealing with nulls, or when a UTF16 system transcodes to UTF8. Oracle database UTF8 had minor variations. Some systems require a byte order mark at the start of a file. Some implementations throw an exception when there is an error in the encoding while later implementations insert a “�” character. This is how potentially interesting filenames get created.

Maybe a way around all this for the OP would be to use Xojo’s ConvertEncoding function to ASCI and comparing the ReaplaceAll with that to the ReplaceAll on a UTF8 encoded string as an anomaly detector. After all, all the forbidden characters are in the ASCI range so lookalikes would most likely be interoperable between file systems anyway when trying to standardise filenames for that reason?

If not, case sensitivity in Linux/Unix filenames will be a challenge Mac and Windows filenames which are only case preserving. e.g. in Linux Readme.txt <> README.TXT

You may need to set the encoding for the string, since often enough system level strings are not UTF-8, which is the encoding within Xojo. Windows shell is CP-1252 and Mac shell is Linux.

https://documentation.xojo.com/api/text/encoding_text/defineencoding.html#defineencoding

Var source As String = MyMemoryBlock.StringValue(0, 8)
TextField1.Text = source.DefineEncoding(Encodings.UTF16)

As long as one only uses ASCII chars in the ReplaceAll parameters, the replaced string can be any 8 byte or UTF-8 encoding, with the Encoding set to nil - it doesn’t matter to the ReplaceAll function, because if the encoding is not set, it’ll replace bytes, and those are always the same.

@Michel_Bujardet same goes for CP-1252, because that also uses ASCII in the lower 127 chars, so the replacing with ASCII chars will also work there even if encoding is not set on the string.

Only if you wanted to replace the string with non-ASCII chars, you need to ensure the string has the right encoding.

Ergo, encoding can’t be the issue here.

1 Like

This is the case in French and other accented languages.

Yes. But this thread is about someone replacing plain ASCII chars, so all these encoding related do not apply to this case, agreed?

Well, there are some considerations here.

The Windows Console (shell) operates on the Code Page set in the system. It is not globally 1252. If I type chcp here it reports mine as 850 for example, in Russia it will be something else, but INTERNALLY, at filesystem level and newer APIs (calls_W instead of calls_A) they are UTF-16. And they remap char sets (what we see) doing so. Currently, since Win10, Windows supports UTF-8, but it was introduced as a “beta” feature. So it will take some years until UTF-8 and Windows and Apps start working without issues.

macOS uses and stores and shows UTF-8 files and folders.

Linux stores bytes but defaults to interpret them as UTF-8. You could check your locale looking at the environment var LC_NAME (echo $LC_NAME), almost 100% of the times it shows locale.UTF-8

Nope. There’s a dev with a not complete understanding of what is going on, and with a lot of filenames that he fails to collect and give us to “byte code” inspection; so we know nothing about what is going on until we have samples to inspect.

2 Likes

Actually, there may be. First because Aurelien, like me, is French, and some of the file names may contain extended ASCII. Secondly, UTF-8 discriminates better between characters, which may not exhibit the random issue described.

I will leave it at that. Unless the OP can evidence conditions where replaceAll malfunctions show up it is a pretty futile exercise to try and help with no solid proof.

Sorry, but you are wrong. I explained that the encoding can be unset and yet, you can replace ASCII chars in a string with french chars inside.

Test:

dim s as String = "äbé"
s = s.DefineEncoding(nil)
s = s.ReplaceAll ("b", "X")
s = s.DefineEncoding(Encodings.UTF8)
break

Now s is “äXé” as expected.

Hence, no issue with replacing ASCII chars in french text even if the encoding is wrong. I explained that 2 times above, and yet, everyone still questions me instead of just testing that I’m actually right.

Frankly, at this point, I don’t care. This long thread is a ridiculous waste of time. I never experienced random issues as described by the OP, so for all intents and purposes, it works for me. Period.

Var file As String = "Folder ̸  File;"

file = file.ReplaceAll("/", "̶ ")

file = file.ReplaceAll(";", "")

MessageBox file

Quit

image

We need to inspect the data

1 Like

Clearly, there is no bug in ReplaceAll. The problem resides in the data - the file names - being processed.

1 Like

Hello Rick,

Can you please provide more details regarding the data you need to inspect , from your example it seems that you managed to replicate one of my issues, while Tim says it works, based on your code there should be Folder File in your messagebox but still i see Folder/ File there , which ignores in a way the first replace line.

Thanks

and apparently both cases are ignored in your case as well, so you managed to replicate it, @Tim_Hare would you care to explain Rick’s example, where ReplaceAll supposed to replace “/” and “;” but , still MessageBox shows both of them ?

Thanks