Is there a built-in feature to remove unprintable chars?

Hello all,

Is there a built-in feature that will remove unprintable chars from a string?

Thanks,
Tim

Like this?

Hi Alberto

No from a string. ie, something is read. However that something has unprintable chars within it. I need to remove them.

Tim

Maybe using RegEx?

1 Like

String.Trim removes white spaces from the ends of a string. If you want to remove from inside a string you need to do something like:

Put this in a global module and you can call it like this:

MyString = MyString.RemoveUnPrintables

Function RemoveUnPrintables(Extends MyString as String) As String

Dim MyStringArray() as String

MyStringArray = MyString.Split("")

For i as integer = MyStringArray.Ubound-1 DownTo 0
   If MyStringArray(i).Asc < 32 or MyStringArray(i).Asc > 126 Then
      MyStringArray.Remove(i)
   End If
Next

MyString = MyStringArray.FromArray("")

Return MyString

End Function

And yes, I am using “old school” language. I eschew terms like VAR and LastIndex in favor of Dim and Ubound.

Public Function RemoveUnPrintables(Extends MyString As String) As String
  Var MyStringArray() As String
  
  MyString = MyString.Trim // added to deal with proceeding or trailing unprintables
  
  MyStringArray = MyString.Split("")
  
  For i As Integer = MyStringArray.LastIndex - 1 DownTo MyStringArray.FirstIndex
    If MyStringArray(i).Asc < 32 Or MyStringArray(i).Asc > 126 Then
      MyStringArray.Remove(i)
    End If
  Next
  
  MyString = String.FromArray(MyStringArray, "") // formatted correctly
  
  Return MyString
  
End Function

Changing a few details from Jon’s formulation.

  1. Throwing in the line

MyString = MyString.Trim

to deal with the possibility of a trailing unprintable.

  1. I did not have success with the formulation –

MyString = MyStringArray.FromArray(“”)

  1. And I am using “new school” language. :slightly_smiling_face:

I wonder if these unprintable characters are simply that the read data has not been correctly identified in terms of the UTF encoding.
Do you set the encoding, or just hope that it is ANSI?

1 Like

What’s your definition of “unprintable” ?

2 Likes

That would remove valid characters such as “é”.

Actually, new language would be:

Function RemoveUnPrintables(Extends MyString as String) As String

Dim MyStringArray() as String

MyStringArray = MyString.Split("")

For i as integer = MyStringArray.LastIndex DownTo 0 // No -1 as LastIndex is exactly that.
   If MyStringArray(i).Asc < 32 or MyStringArray(i).Asc > 126 Then
      MyStringArray.RemoveAt(i) // At is used when i is an index into the array.
   End If
Next

MyString = MyStringArray.FromArray("")

Return MyString

End Function

Both will only work with UK / US ascii style text. Accented letters from Europe would likely go wrong. Other scripts would fail and show a blank string.

1 Like

Ah you are correct. I was looking at straight ASCII. See my updated method below. :smiley:

Yes RemoveAt is more “proper” these days! :smiley:
And yes, as @Arnaud_N pointed out my code would remove some valid characters. But the same concept applies. It’s probably better to search of unprintable Unicode characters than ASCII. Looks like a more “proper” rendering would be:

Function RemoveUnPrintables(Extends MyString as String) As String

Dim MyStringArray() as String

MyStringArray = MyString.Split("")

Dim mCodePoints() as Uint32 = MyString.Codepoints

For i as integer = MyStringArray.LastIndex DownTo 0 // No -1 as LastIndex is exactly that.

If mCodePoints(i) < 33 or (mCodePoints(i) > 127 And mCodePoints(i) < 160) Then
      MyStringArray.RemoveAt(i) // At is used when i is an index into the array.
   End If
Next

MyString = MyStringArray.FromArray("")

Return MyString

That looks like it would properly handle all the UTF-8 characters. Now if the encoding of the string is in something else like UTF-16 or UTF-32 you would have to handle it appropriately.

RegEx is likely the best way to handle this for all scripts. There are ways of representing properties of characters, such as printable, valid etc. For example, “\P{C}” matches visible characters.

Yeah. Good point and RegEx is fast…

Well, if you proceed everything with a MyString.Trim, then the last character has already been vetted and does not need to be checked again.

Two issues with that. If that is so you can also stop at 1. However the contents of the if does not replicate trim.

+++1 for a regex method.

Public Function RemoveUnprintableChars(input As String) As String
  // Define a regular expression to match only printable characters
  Dim rgx As New RegEx
  rgx.SearchPattern = "[\p{Print}]"
  
  // Replaces all non-printable characters
  rgx.ReplacementPattern = ""
  rgx.Options.ReplaceAllMatches = True
  
  Return rgx.Replace(input)
End Function

use

\p{Lower}   A lower-case alphabetic character: [a-z]
\p{Upper}   An upper-case alphabetic character:[A-Z]
\p{Alpha}   An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}   A decimal digit: [0-9]
\p{Alnum}   An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Graph}   A visible character: [\p{Alnum}\p{Punct}]
\p{Print}   A printable character: [\p{Graph}\x20]
1 Like

What is the 2019R1.1 equivalent to:

I am still working with API for certain web apps…
Tim

Join(arsMyStringArray, "")