I know a Regular Expression for “Really Bloody Complicated”

I recently discovered Regular Expressions. I get the feeling that these are something that every programmer is familiar with, but I had never known about them until I started mucking around with field validation for work’s website.

We never covered this when I was at Uni, I wonder if other programming courses do? They seem kind of mathsy, which I don’t like, but they also seem pretty useful, which makes me want to learn more about how to use them.

Anyway, today I want to make one to validate a code that customers will enter in our online registration page. There is no way to check if the code they enter is correct on the site (for several reasons) so I want to validate it as much as possible to lower the chance of error.

At the moment I am just using \w{4} to validate the field, which will help filter out the symbols and crap, but there are actually some invalid characters that can still get through, because the registration code they are entering only contains characters “0123456789ABCDEFGHJKLMNPQRTUVWXY” the letters I, O, S and Z are missing because A) I needed only 32 characters and B) those are the ones that might get confused with some of the numbers.

I guess the first thing to do will be to get the lowercase letters out of the mix. Actually, I just realised I can leave them in because I do a ToUpper on the entire string before I use it.

I wonder, is it easier to list the valid characters like [A-Za-z0-9] or to list the characters I don’t want to find? My original idea was to put in A-H J-N ect, but I think the second method will probably work better. Just as soon as I figure out how you do that!

Looks like using a ^ in a [] will be a not, so I’m trying to tell it I want 4 alphanumeric characters, but not an ‘I’. I mucked around with The Regulator for a while, but it seems a little screwy, I got some weird results out of it, so I decided to go to http://www.regexlib.com and use their online expression tester instead.

My first test is with the expression ^[\w][^I]{4}$ and it seems to work! Time to try it in an actual ASP application and see if it actually works there….

Ok, for some reason it likes 5 characters instead of 4, I have no idea why. Perhaps it is counting from zero. At any rate, I changed the 4 to a 3, and all is good. Next step, have it look for another illegal letter, not just the ‘I’. I have no idea how to put multiple characters in, I’ll try just putting them all in there and see what happens.

So I try ^[\w][^IOZS]{4}$ and voila! It filters out all the characters I don’t want in there.

But wait, there is a problem….

If the IOS or Z is the first character, it gets through. I guess it is looking for the good characters first. The problem also occurs with the \w check, if there is a symbol in one of the last three slots it also gets through. Guess I’m not so smart after all hey? So what to do?? I will have to leave this for now as it is lunch time, but I plan to have more regex adventures!

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s