A r t i c l e s
Navigation

Note: This site is
a bit older, personal views
may have changed.

M a i n P a g e

D i r e c t o r y

Regex vs Maintainability Graph


Actually it was me that drew that with my mouse, and I wasn't a child. I can't stop laughing because the messiness and poor aesthetics of this graph represent the kind of ugliness I have seen in software sewn together with regexes, such as PHP Smarty and the MediaWiki parser, and a lot of Perl code.

A different downhill skiing graph can be seen below:

Regexes work great in real time settings such as a Find dialog box or a command line temporary hack. They are not meant for serious programming. In fact, regular expressions were the first programming language I learned and mastered really.. in my text editor before I was a programmer. Boy were they powerful for a newbie. People that love and live regexes even after years of programming should learn to write real parsers. See The-Art-Of-Parsing. I wrote an ebay parser that goes through HTML piece by piece, without using one regex in maintainable Modern Pascal that produces extensive detailed data like this.

I've also written hundreds of other data miners and rarely do I need a regex if I spend the time writing a maintainable parser. In some settings regexes are useful for quick hacks. But if you want to write maintainable parsers (like my company does for building databases containing millions of mined records that need to be maintained each month) then learn how to parse using tapedeck style thinking instead of one-off `we'll fix this regex later` thinking.

What's interesting is that if you learn how to parse using tapedeck style thinking you actually become addicted to it so much that you fall in love with it and avoid having an affair with regexes whenever possible. You learn to type procedures and while loops out nearly as fast as you can pump out an ugly regex. Even if it takes more time, you end up being able to extend the parser so much easier.. since you simply peak into the tape and stop/pause/fastfwd whenever you need to.

Not that regexes can't be used in some cases. When I first wrote this wiki as a joke, to see if it worked in a day, I used a regex, and was too lazy to improve the wiki. At the time I was converting some regex units from delphi over to freepascal, so I needed a way to test the regex units. I used the wiki as my test, and it worked okay. I heavily commented what the messy regexes did, and the comments exceeded the amount of code that ran the wiki. This made me laugh too, since usually properly commented regexes end up taking up as much screen space as procedures or functions. Most people don't comment their regexes though, and think that regexes save lots of space and time - and of course, no need to comment code since the code explains it all. Yeah, right.

On the surface, regexes appear to be productive and powerful. This is just like on the surface white sugar seems to taste good on the tongue. If white sugar initially taste good on the tongue, and your body is telling you it is good, and since we should listen to our bodies, should we eat lots of plain white sugar? Once you eat lots of plain sugar and wait for a while, you will feel like puking.. even if that initial taste felt good on the tongue. Similarly, you will feel like puking if you inject too many regular expressions into your source code, I'm afraid. The initial one or two isn't that harmful, hence why they are more useful in real time situations like in a find dialog box where only one will be used at that moment.

Use regexes in moderation - don't get drunk on them, and don't eat them as if they are candy. Remember, you are no longer a child and you cannot have candy every day. Candy tastes good as a newbie into the world, but you soon learn that this isn't the long term solution to a healthy body and healthy programs.

About
This site is about programming and other things.
_ _ _