An odd shell result / regex failure

If you want to take a look: Dropbox - tst.txt - Simplify your life

What worked for me with was getting rid of the double FF bytes that occur in that sample output. For example, after getting the shell results into string stuffToParse:

stuffToParse = stuffToParse.ReplaceAllBytes(String.ChrByte(255) + String.ChrByte(255), "")
stuffToParse = stuffToParse.ConvertEncoding(Encodings.UTF8)

dim rg as new RegEx
dim myMatch as RegExMatch
rg.SearchPattern = "(a)"
myMatch = rg.Search(stuffToParse)

It’s not double FF, it’s FF followed by FD.

The byte sequence FFFD does not appear in the data file he posted on Dropbox. At least not when downloaded to my system.

It’s not double FF, it’s FF followed by FD.

From post number 3 of this thread:

And my reply to it, post number 5:

65533 came from BBEdit, which may be a result of pasting into BBEdit. Looking at the asc values of every single character in Xojo, it says those characters have an asc value of 1835008.

All of the txt file stuff came later.

I’m getting rid of the FFFF before converting the string to UTF-8. Following is the full source of the test I slapped together, but in summary, I read from the file using ASCII encoding, strip the FFFF’s, then convert to UTF-8 for RegEx.

dim fileName as String = args(1)
dim f as FolderItem = SpecialFolder.Desktop.Child(fileName)

dim tis as TextInputStream = TextInputStream.Open(f)
tis.Encoding = Encodings.ASCII

dim stuffToParse as string
stuffToParse = tis.ReadAll
break

stuffToParse = stuffToParse.ReplaceAllBytes(String.ChrByte(255) + String.ChrByte(255), "")
stuffToParse = stuffToParse.ConvertEncoding(Encodings.UTF8)

dim rg as new RegEx
dim myMatch as RegExMatch
rg.SearchPattern = "(a)"
myMatch = rg.Search(stuffToParse)

break

OK, that makes more sense. 1835008 is 1C 00 00 in hex. 1C is ASCII FS (file separator), then followed by two nulls. Three valid ASCII characters.