RegEx for Unicode uppercase letters

I need to tell whether a letter is upper case or not in all languages. I thought I’d write a RegEx expression for this, but when I use a \p construction an error was thrown that that the PCRE library was compiled without Unicode support (which is mentioned on the PCRE site: http://www.regular-expressions.info/pcre.html).

So two questions. 1) is there any reason why it was compiled with Unicode support, and if not should I file a feature request? and 2) is there a workaround with the current implementation of RegEx in Xojo?

[quote=19798:@Jonathan Ashwell]I need to tell whether a letter is upper case or not in all languages. I thought I’d write a RegEx expression for this, but when I use a \p construction an error was thrown that that the PCRE library was compiled without Unicode support (which is mentioned on the PCRE site: http://www.regular-expressions.info/pcre.html).

So two questions. 1) is there any reason why it was compiled with Unicode support, and if not should I file a feature request? and 2) is there a workaround with the current implementation of RegEx in Xojo?[/quote]

I don’t know if you have read this page: http://www.regular-expressions.info/realbasic.html

Text from link above:
“REALbasic uses the UTF-8 version of PCRE. This means that if you want to process non-ASCII data that you’ve retrieved from a file or the network, you’ll need to use REALbasic’s TextConverter class to convert your strings into UTF-8 before passing them to the RegEx object. You’ll also need to use the TextConverter to convert the strings returned by the RegEx class from UTF-8 back into the encoding your application is working with.”

Thanks, I have. But that really isn’t relevant here. the characters I want to evaluate are in UTF8. In RegEx it’s possible to determine if a character is uppercase. But in the PCRE compile used by Xojo you can’t use those functions (I linked to the documentation and the error in my original post). The questions still stand.

I have read your link and compared it with Xojo documentation. So far I haven’t found any documentation for \p in Xojo: http://documentation.xojo.com/index.php/RegEx also Xojo’s RecEx support UTF8 by default. And when I read PCRE site it says: PCRE implements almost the entire Perl 5.8 regular expression syntax. Only the support for various Unicode properties with \p is incomplete,

Unless Xojo’s documentation is missing “\p” options It could be it’s automatic included when using Xojo RegEx class. What result do you get when you’r not using options \p ?

It’s \p. If you use it in Xojo RegEx, you get the error that PCRE was compiled without Unicode support. Do you understand what I’m trying to achieve? I want to be use RegEx to tell if a character I pass it is uppercase or not. And it should work with all unicode characters, not just the ASCII subset. Do you know how to do this without using the \p RegEx commands?

Sorry I misunderstood your question.

There might be a workaround by use of “Uppercase” and then compare result before and after the conversion to Uppercase.

I don’t think that would work – Asian characters won’t change with uppercase(), which if I understand what you’re getting at would mean I should consider them to be in uppercase and of course they’re not. It would be great if a Xojo engineer would comment on this. I’m going to go ahead and file a bug/feature report.

Xojo’s PCRE library is also very old (5 YEARS), version 7.7. (according to documentation).

Latest version is 8.33 (http://vcs.pcre.org/viewvc/code/tags/)

Wow. Time for an update! (I should say around 3 years ago). :stuck_out_tongue:

PCRE is just “another thing” right now, Unicode was updated few times and many UTF patches were done.

http://vcs.pcre.org/viewvc/code/tags/pcre-8.33/ChangeLog?revision=1336&view=markup&pathrev=1336

RegExMBS has that support. At the moment, that’s your only option, but the other benefit is that it’s faster and has more options.

Feature request filed

<https://xojo.com/issue/28136>