Hello all,
Is there a built-in feature that will remove unprintable chars from a string?
Thanks,
Tim
Hello all,
Is there a built-in feature that will remove unprintable chars from a string?
Thanks,
Tim
Hi Alberto
No from a string. ie, something is read. However that something has unprintable chars within it. I need to remove them.
Tim
Maybe using RegEx?
String.Trim removes white spaces from the ends of a string. If you want to remove from inside a string you need to do something like:
Put this in a global module and you can call it like this:
MyString = MyString.RemoveUnPrintables
Function RemoveUnPrintables(Extends MyString as String) As String
Dim MyStringArray() as String
MyStringArray = MyString.Split("")
For i as integer = MyStringArray.Ubound-1 DownTo 0
If MyStringArray(i).Asc < 32 or MyStringArray(i).Asc > 126 Then
MyStringArray.Remove(i)
End If
Next
MyString = MyStringArray.FromArray("")
Return MyString
End Function
And yes, I am using “old school” language. I eschew terms like VAR and LastIndex in favor of Dim and Ubound.
Public Function RemoveUnPrintables(Extends MyString As String) As String
Var MyStringArray() As String
MyString = MyString.Trim // added to deal with proceeding or trailing unprintables
MyStringArray = MyString.Split("")
For i As Integer = MyStringArray.LastIndex - 1 DownTo MyStringArray.FirstIndex
If MyStringArray(i).Asc < 32 Or MyStringArray(i).Asc > 126 Then
MyStringArray.Remove(i)
End If
Next
MyString = String.FromArray(MyStringArray, "") // formatted correctly
Return MyString
End Function
Changing a few details from Jon’s formulation.
MyString = MyString.Trim
to deal with the possibility of a trailing unprintable.
MyString = MyStringArray.FromArray(“”)
I wonder if these unprintable characters are simply that the read data has not been correctly identified in terms of the UTF encoding.
Do you set the encoding, or just hope that it is ANSI?
What’s your definition of “unprintable” ?
That would remove valid characters such as “é”.
Actually, new language would be:
Function RemoveUnPrintables(Extends MyString as String) As String
Dim MyStringArray() as String
MyStringArray = MyString.Split("")
For i as integer = MyStringArray.LastIndex DownTo 0 // No -1 as LastIndex is exactly that.
If MyStringArray(i).Asc < 32 or MyStringArray(i).Asc > 126 Then
MyStringArray.RemoveAt(i) // At is used when i is an index into the array.
End If
Next
MyString = MyStringArray.FromArray("")
Return MyString
End Function
Both will only work with UK / US ascii style text. Accented letters from Europe would likely go wrong. Other scripts would fail and show a blank string.
Ah you are correct. I was looking at straight ASCII. See my updated method below.
Actually, new language would be:
MyStringArray.RemoveAt(i) // At is used when i is an index into the array.
Both will only work with UK / US ascii style text. Accented letters from Europe would likely go wrong. Other scripts would fail and show a blank string.
Yes RemoveAt is more “proper” these days!
And yes, as @Arnaud_N pointed out my code would remove some valid characters. But the same concept applies. It’s probably better to search of unprintable Unicode characters than ASCII. Looks like a more “proper” rendering would be:
Function RemoveUnPrintables(Extends MyString as String) As String
Dim MyStringArray() as String
MyStringArray = MyString.Split("")
Dim mCodePoints() as Uint32 = MyString.Codepoints
For i as integer = MyStringArray.LastIndex DownTo 0 // No -1 as LastIndex is exactly that.
If mCodePoints(i) < 33 or (mCodePoints(i) > 127 And mCodePoints(i) < 160) Then
MyStringArray.RemoveAt(i) // At is used when i is an index into the array.
End If
Next
MyString = MyStringArray.FromArray("")
Return MyString
That looks like it would properly handle all the UTF-8 characters. Now if the encoding of the string is in something else like UTF-16 or UTF-32 you would have to handle it appropriately.
RegEx is likely the best way to handle this for all scripts. There are ways of representing properties of characters, such as printable, valid etc. For example, “\P{C}” matches visible characters.
Yeah. Good point and RegEx is fast…
// No -1 as LastIndex is exactly that.
Well, if you proceed everything with a MyString.Trim, then the last character has already been vetted and does not need to be checked again.
Two issues with that. If that is so you can also stop at 1. However the contents of the if does not replicate trim.
+++1 for a regex method.
Public Function RemoveUnprintableChars(input As String) As String
// Define a regular expression to match only printable characters
Dim rgx As New RegEx
rgx.SearchPattern = "[\p{Print}]"
// Replaces all non-printable characters
rgx.ReplacementPattern = ""
rgx.Options.ReplaceAllMatches = True
Return rgx.Replace(input)
End Function
use
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}\x20]
What is the 2019R1.1 equivalent to:
MyStringArray.FromArray("")
I am still working with API for certain web apps…
Tim
Join(arsMyStringArray, "")