I have a “poor mans” Syntax Highlighter… its kind of “brute force” so it is good only for a few thousand characters at most (which is fine for this project)
What it does is apply multiple RegEx statements to set colors
and it does it in a “priority” order
as you can see there is no “logic” each RegEx might override the results from a previous one (especially #5)
But here is the problem… Comments are // for line and /* */ for block
but they ONLY count if they are NOT inside single or double quoted strings
can the RegEx in #6 and #7 be expanded to check that condition?
This pattern shows a technique as applied to #6. In short, it first attempts to match a single-quoted or double-quoted string. If it can match that, it forces the engine to move past it with (*SKIP) and fail the match with (*FAIL). The engine resumes matching at that point, effectively skipping all such quoted strings until it finds the comment.
[quote=329874:@Kem Tekinay]This pattern shows a technique as applied to #6. In short, it first attempts to match a single-quoted or double-quoted string. If it can match that, it forces the engine to move past it with (*SKIP) and fail the match with (*FAIL). The engine resumes matching at that point, effectively skipping all such quoted strings until it finds the comment.
('|")((?!\\g1).)*\\g1(*SKIP)(*FAIL)|(/{2}(.*))$
[/quote]
so for block it would be
Yes, I think you are escaping more than you need. It generally doesn’t matter as long as the character you’re escaping doesn’t have a meaning to the regex engine, or the regex engine in question specifically disallows it. (PCRE as compiled into Xojo doesn’t care.) You only need to escape these characters (off the top of my head, so I may have missed something):
I am not sure if this is considered improper on this forum (because it is more a Regex question than answering Dave’s question) , but I have a question about the Tekinay Regex string.
why is not this (seemingly simpler) formulation satisfactory?
As a general rule, you should avoid the .* structure in favor of what you’re truly trying to match. In this case, for example, that pattern should work for a single-line quote (which may be appropriate here) but not for a multi-line one. You’d have to turn on the right regex switch so . could match end-of-line or use some other technique like [\\s\\S]*.
To sum up, you’re right, that would probably work just fine since this is a fairly simple pattern, but as a practice, you should avoid .* and it’s implications. In this case, we want to match anything that’s not the opening quote character, so that’s how I wrote it.