Check if string is cyrillic

Hi,
I want check if one string is cyrillic because the user can put in input only ASCII string. How do it?

Thank You very much and sorry for my english.
Sincerely

go over all characters and check if asc() returns value <= 127 for them.

if speed is needed, maybe IsASCIIStringMBS is quicker.

Thank you so much @Christian Schmitz ! :slight_smile:

You can use a regular expression.

\\A\\p{Cyrillic}+\\z

This will allow for single word. If you are looking for sentences that include white space and punctuation, let me know.

Oh, I see, you want to make sure there is no Cyrillic in the string.

dim rx as new RegEx
rx.SearchPattern = "\\A[\\x20-\\x7E]+\\z"

if rx.Search( testString ) is nil then
  // Invalid string
end if

Thanks a lot @Kem Tekinay !

@Kem Tekinay

[quote]\A\p{Cyrillic}+\z
This will allow for single word. If you are looking for sentences that include white space and punctuation, let me know.[/quote]

Thank you, Kem: until now I was detecting languages checking for characters in a sentence, but your regex works great.
I’d appreciate if the pattern included white space and punctuation.
Since I don’t know if the pattern including space and punctuation vary according to languages, these are the languages I need to detect: Latin, Greek, Hebrew, Syriac and Bengali.
Thanks again.

If you want to detect a sentence in just one language, this would do it:

\\A[\\p{Cyrillic}\\PL]+\\z

This pattern anchors at the start of the text (\A), then looks for a series of characters that are either Cyrillic or not a letter at all until the end of the text. You could substitute Latin, Greek, Hebrew, Syriac and Bengali for Cyrillic.

If you want to find any of these without mixing languages, you could use this pattern:

\\A(?:(\\PL+)|([\\p{Cyrillic}\\PL]+)|([\\p{Latin}\\PL]+)|([\\p{Greek}\\PL]+)|([\\p{Hebrew}\\PL]+)|([\\p{Syriac}\\PL]+)|([\\p{Bengali}\\PL]+))\\z

If there is no match, it means the characters do not make up valid text in one of those languages. If there is a match, you can check the SubExpressionCount. If it is 2, if means that the text is all non-letters. If it is 3, it means it is Cyrillic. 4 means Latin, etc.

If you don’t care which language it is, just make sure there is some match.

Greatly appreciated. Thank you, Kem.