{"id":194,"date":"2014-07-23T00:45:27","date_gmt":"2014-07-23T00:45:27","guid":{"rendered":"http:\/\/www.minotaurdesign.com\/blog\/?p=194"},"modified":"2014-07-23T00:46:25","modified_gmt":"2014-07-23T00:46:25","slug":"notes-codestock-2014%e2%80%b2s-regular-expressions-presentation","status":"publish","type":"post","link":"https:\/\/www.minotaurdesign.com\/blog\/2014\/07\/23\/notes-codestock-2014%e2%80%b2s-regular-expressions-presentation\/","title":{"rendered":"Notes from CodeStock 2014\u2032s \u201cRegular Expressions\u201d presentation"},"content":{"rendered":"<p>Brian Friesen&#8217;s talk on Regular Expressions was probably my favorite of the conference.  You gotta admire a guy who builds a regular expression engine to properly demo and train folks up on the &#8216;devil&#8217;s language&#8217; (as I and others I&#8217;ve known have called it)&#8230;  Folks that hate regex and those that live and breathe the stuff all got something from the session.  Good stuff.<\/p>\n<h3>Regular Expressions &#8211; now you have two problems<\/h3>\n<p>Brian Friesen<br \/>\n<a href=\"https:\/\/twitter.com\/@brianfriesen\" target=\"_blank\">@brianfriesen<\/a><\/p>\n<p>Works at Quicken Loans (side note, I used QL last year &#8211; hands down the best UX I&#8217;ve ever seen in a mortgage product\/service)<\/p>\n<p><a href=\"github.com\/QuickenLoans\/RegExpose\" target=\"_blank\">github.com\/QuickenLoans\/RegExpose<\/a><\/p>\n<p><a href=\"http:\/\/Regexper.com\" target=\"_blank\">Regexper.com<\/a><\/p>\n<p><a href=\"http:\/\/Regular-expressions.info\" target=\"_blank\">Regular-expressions.info<\/a><\/p>\n<p><strong>The Bible:<\/strong><br \/>\n<em>Mastering Regular Expressions<\/em> by Jeffery Friedl<br \/>\n&#8211; can read the first 2\/5 of the book and you have enough<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p><strong>1. RegEx are hierarchical<\/strong><\/p>\n<p>Root node = the whole expression<\/p>\n<p>ABC would tree out like this:<\/p>\n<p>ABC &#8211; root<br \/>\nA &#8211; character literal<br \/>\nB &#8211; character literal<br \/>\nC &#8211; character literal<\/p>\n<p><strong>2. RegEx do their thing sequentially<\/strong><\/p>\n<p>A RegEx will match if each of its child nodes matches in sequence<br \/>\nAfter a match, the RegEx engine will continue trying to find further matches until it has covered the entire string.<\/p>\n<p><strong>RegEx are by default case-sensitive<\/strong><\/p>\n<p><strong>Character classes are surrounded by square brackets<\/strong><br \/>\n&#8211; for a range, use a dash.<br \/>\n&#8211; if you need a dash in the match, put it at the beginning of your set within square brackets<br \/>\n&#8211; you can also include specific characters or numbers to match against<br \/>\n&#8211; can include a-z and A-Z, or you can pass an additional param to ignore case<br \/>\n&#8211; a negated character class = add a &#8220;^&#8221; carat character before a match. ie [^a-f0-9] would ignore a-f<\/p>\n<p><strong>Shorthand matches<\/strong><br \/>\n&#8211; \\d is the same as [0-9]<br \/>\n&#8211; \\D is the same as [^0-9]<br \/>\n&#8211; \\s matches whitespace chars<br \/>\n&#8211; \\w is the same as any word character, meaning [A-Za-z0-9_]<br \/>\n&#8211; . matches any character, depending on options (ie except for new line character)<\/p>\n<p><strong>Alternation<\/strong><br \/>\n(it&#8217;s a pipe dream)<\/p>\n<p>| character means &#8220;or&#8221;<br \/>\n&#8211; linos|tigers|bears &#8216;lions&#8217; would match, but regExp doesn&#8217;t know if its the BEST match, so it saves state (a breadcrumb) and moves to check the other choices.<br \/>\n&#8211; if a match hits on the last option in a set of choices, no state will be saved, no breadcrumbs etc.<\/p>\n<p><strong>Quantifiers (quantifiers are always AFTER)<\/strong><br \/>\n(Because sometimes, quantity trumps quality)<\/p>\n<p><strong>Greedy Quantifiers<\/strong> (greedy means quantifier always wants more, ie. will keep going)<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n&#8211; ? = optional<br \/>\n&#8211; * = will match zero, will match many<br \/>\n&#8211; PO*P would match &#8220;PP&#8221; as well as &#8220;POOP&#8221;<br \/>\n&#8211; + = must match at least once to succeed<br \/>\n&#8211; NO+! would match &#8220;NO!&#8221; as well as &#8220;NOOOOOOOOOOO!&#8221;<br \/>\n&#8211; {} = match a specific number of things<br \/>\n&#8211; \\d{3} = this means match exactly 3 digits<br \/>\n&#8211; \\d{3,15} = this means match at least 3, but up to 15<br \/>\n&#8211; \\d{3,} = this means match at least 3 with no high end limit<\/p>\n<p>Ultimate lazy quantifier: .*<\/p>\n<p>Causion against using &#8220;*.&#8221; &#8211; can lead to a match failing since the greedy &#8216;any character as many as possible&#8217; matching could lead to skipping more specific matches after the *.<\/p>\n<p><strong>Lazy Quantifiers <\/strong>&#8211; will only match as much as is needed, without going overboard<br \/>\nOnce a match is made it will pass control to the next match paramater until it&#8217;s needed again&#8230;<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\n.*? = Lazy<br \/>\n{3,5}? = also Lazy (in this case, once 3 digits were matched, the next node in regexp would be matched)<\/p>\n<p>&#8211; ab.*?cd would match &#8220;abc12345cd&#8221; with the lazy quantifier returning to the &#8216;c&#8217; character repeatedly before going back to the next number character<\/p>\n<p><strong>More alternation<\/strong><br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\n(?:white|dog|brick) house<br \/>\n&#8211; match against &#8220;dog house&#8221;<\/p>\n<p><strong>Quantifers + grouping<\/strong><br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n(?:NaN)+<br \/>\n&#8211; match against &#8220;NaNNaNNaNNaNNaN&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Brian Friesen&#8217;s talk on Regular Expressions was probably my favorite of the conference. You gotta admire a guy who builds a regular expression engine to properly demo and train folks up on the &#8216;devil&#8217;s language&#8217; (as I and others I&#8217;ve known have called it)&#8230; Folks that hate regex and those that live and breathe the &hellip; <a href=\"https:\/\/www.minotaurdesign.com\/blog\/2014\/07\/23\/notes-codestock-2014%e2%80%b2s-regular-expressions-presentation\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Notes from CodeStock 2014\u2032s \u201cRegular Expressions\u201d presentation<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,17],"tags":[48,45,46,47,43,44],"_links":{"self":[{"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/posts\/194"}],"collection":[{"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/comments?post=194"}],"version-history":[{"count":2,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/posts\/194\/revisions"}],"predecessor-version":[{"id":196,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/posts\/194\/revisions\/196"}],"wp:attachment":[{"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/media?parent=194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/categories?post=194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.minotaurdesign.com\/blog\/wp-json\/wp\/v2\/tags?post=194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}