{"id":669,"date":"2010-05-22T17:43:15","date_gmt":"2010-05-22T15:43:15","guid":{"rendered":"http:\/\/blog.benny-baumann.de\/?p=669"},"modified":"2010-06-03T13:39:10","modified_gmt":"2010-06-03T11:39:10","slug":"regular-expression-highlighting","status":"publish","type":"post","link":"https:\/\/blog.benny-baumann.de\/?p=669","title":{"rendered":"Regular Expression Highlighting"},"content":{"rendered":"<p>I just found some <a href=\"https:\/\/sourceforge.net\/tracker\/index.php?func=detail&#038;aid=2995682&#038;group_id=114997&#038;atid=670234\">feature request asking nicely for some regular expression highlighting support in GeSHi<\/a>. And since I can&#8217;t deny any requests that are by nature to lead to obscure code: Here it is \ud83d\ude09<!--more--><\/p>\n<p>First of let&#8217;s start with some basics: As there are only few characters that have a special mening and that can be used alone they were put into different groups. And since this topic handles a language file: Let&#8217;s do this by examples \ud83d\ude09<\/p>\n<pre lang=\"pcre\">I'm a goo?d regexp\\.*<\/pre>\n<p>So, nothing special here. It&#8217;s just some basic symbol groups that give classes of characters. Oh, classes: That&#8217;s easy too. I first thought of doing them as hardquoted strings, but I&#8217;d have missed some support for the escap sequences than. And with normal strings the ending won&#8217;t work. So, character groups are just marked by there symbols [ and ]:<\/p>\n<pre lang=\"pcre\">foo|ba[rz]<\/pre>\n<p>So, wasn&#8217;t that hard either. But now we have some more complex work: Getting all the different bracketed expressions right. As I basically have the same issue here which I already had for character classes, I don&#8217;t match the (), but the identifier within the brackets:<\/p>\n<pre lang=\"pcre\" escaped=\"true\">(?=42) (?!23)\r\n(?&lt;=5) (?&lt;!1337)\r\n(?>23)<\/pre>\n<p>So no surprises here either, if you just know that the actual part within the brackets will get marked.<\/p>\n<p>After this basic work, let&#8217;s come to modifiers and recursion (Their syntax looks identical):<\/p>\n<pre lang=\"pcre\">(?i:saturday|sunday)\r\n(?:(?i)saturday|sunday)\r\n(?R)<\/pre>\n<p>As this already looks quite interesting, I&#8217;d suggest continuing with all those fancy backslash-style escapes:<\/p>\n<pre lang=\"pcre\">\\007 \\x12 \\u1234 \\U12345678 \\p{^Lu} \\P{Lu} \\cz<\/pre>\n<p>Here the color gives a hint on the kind of character you get from this expression or at least the way that character is specified.<\/p>\n<p>But not only characters are supported but also the other sequences:<\/p>\n<pre lang=\"pcre\">\\A \\Z \\n<\/pre>\n<p>So up to this point there&#8217;s hopefully nothing you really miss, except for the back references, that follow now:<\/p>\n<pre lang=\"pcre\" escaped=\"true\">(?P&lt;TEST&gt;foo) (?1) (?P>foo) (?P=foo)\r\n\\1 \\g1 \\g{1} \\g{-1} \\g{TEST} \\k&lt;TEST&gt; \\k'TEST' \\k{TEST}<\/pre>\n<p>And in case you are one of those programmers that need comments in their regexp: Here they are!<\/p>\n<pre lang=\"pcre\">(?#Test comments)<\/pre>\n<p>I hope, I didn&#8217;t miss anything here. And for the end some more examples, combining all of the above:<\/p>\n<pre lang=\"pcre\" escaped=\"true\">(0[1-9]|1[012])[- \/.](0[1-9]|[12][0-9]|3[01])[- \/.](19|20)[0-9]{2}\r\n\r\n\\{(?:\\d+,?|\\d*,\\d+)\\}\r\n\r\n\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b\r\n\r\n^(?&lt;client>\\S+)\\s+(?&lt;auth>\\S+\\s+\\S+)\\s+\\[(?&lt;datetime>[^]]+)\\]\\s+\"(?:GET|POST|HEAD) \r\n(?&lt;file>[^ ?\"]+)\\??(?&lt;parameters>[^ ?\"]+)? HTTP\/[0-9.]+\"\\s+404\\s+\r\n(?&lt;size>[-0-9]+)\\s+\"(?&lt;referrer>[^\"]*)\"\\s+\"(?&lt;useragent>[^\"]*)\"$\r\n\r\n(?(?=[^a-z]*[a-z])\\d{2}-[a-z]{3}-\\d{2}|\\d{2}-\\d{2}-\\d{2})<\/pre>\n<p>I&#8217;m still open for suggestions on the coloring, as that part was the most difficult of all.<\/p>\n<p class=\"wp-flattr-button\"><a href=\"https:\/\/blog.benny-baumann.de\/?flattrss_redirect&amp;id=669&amp;md5=b3b92303e7e52771f8a6750572261780\" title=\"Flattr\" target=\"_blank\"><img src=\"http:\/\/blog.benny-baumann.de\/wp-content\/plugins\/flattr\/img\/flattr-badge-large.png\" srcset=\"http:\/\/blog.benny-baumann.de\/wp-content\/plugins\/flattr\/img\/flattr-badge-large.png\" alt=\"Flattr this!\"\/><\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>I just found some feature request asking nicely for some regular expression highlighting support in GeSHi. And since I can&#8217;t deny any requests that are by nature to lead to obscure code: Here it is \ud83d\ude09<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[3],"tags":[98,345,126,215],"class_list":["post-669","post","type-post","status-publish","format-standard","hentry","category-geshi","tag-developement","tag-geshi","tag-language-files","tag-pcre"],"_links":{"self":[{"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/posts\/669","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=669"}],"version-history":[{"count":4,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/posts\/669\/revisions"}],"predecessor-version":[{"id":684,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=\/wp\/v2\/posts\/669\/revisions\/684"}],"wp:attachment":[{"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=669"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=669"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.benny-baumann.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=669"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}