BenBE's humble thoughts Thoughts the world doesn't need yet …

22.05.2010

Regular Expression Highlighting

Filed under: GeSHi — Schlagwörter: , , , — BenBE @ 17:43:15

I just found some feature request asking nicely for some regular expression highlighting support in GeSHi. And since I can’t deny any requests that are by nature to lead to obscure code: Here it is 😉

First of let’s start with some basics: As there are only few characters that have a special mening and that can be used alone they were put into different groups. And since this topic handles a language file: Let’s do this by examples 😉

I'm a goo?d regexp\.*

So, nothing special here. It’s just some basic symbol groups that give classes of characters. Oh, classes: That’s easy too. I first thought of doing them as hardquoted strings, but I’d have missed some support for the escap sequences than. And with normal strings the ending won’t work. So, character groups are just marked by there symbols [ and ]:

foo|ba[rz]

So, wasn’t that hard either. But now we have some more complex work: Getting all the different bracketed expressions right. As I basically have the same issue here which I already had for character classes, I don’t match the (), but the identifier within the brackets:

(?=42) (?!23)
(?<=5) (?<!1337)
(?>23)

So no surprises here either, if you just know that the actual part within the brackets will get marked.

After this basic work, let’s come to modifiers and recursion (Their syntax looks identical):

(?i:saturday|sunday)
(?:(?i)saturday|sunday)
(?R)

As this already looks quite interesting, I’d suggest continuing with all those fancy backslash-style escapes:

\007 \x12 \u1234 \U12345678 \p{^Lu} \P{Lu} \cz

Here the color gives a hint on the kind of character you get from this expression or at least the way that character is specified.

But not only characters are supported but also the other sequences:

\A \Z \n

So up to this point there’s hopefully nothing you really miss, except for the back references, that follow now:

(?P<TEST>foo) (?1) (?P>foo) (?P=foo)
\1 \g1 \g{1} \g{-1} \g{TEST} \k<TEST> \k'TEST' \k{TEST}

And in case you are one of those programmers that need comments in their regexp: Here they are!

(?#Test comments)

I hope, I didn’t miss anything here. And for the end some more examples, combining all of the above:

(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}

\{(?:\d+,?|\d*,\d+)\}

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b

^(?<client>\S+)\s+(?<auth>\S+\s+\S+)\s+\[(?<datetime>[^]]+)\]\s+"(?:GET|POST|HEAD) 
(?<file>[^ ?"]+)\??(?<parameters>[^ ?"]+)? HTTP/[0-9.]+"\s+404\s+
(?<size>[-0-9]+)\s+"(?<referrer>[^"]*)"\s+"(?<useragent>[^"]*)"$

(?(?=[^a-z]*[a-z])\d{2}-[a-z]{3}-\d{2}|\d{2}-\d{2}-\d{2})

I’m still open for suggestions on the coloring, as that part was the most difficult of all.

Flattr this!

Keine Kommentare »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress