Why more programmers should know Regular Expressions?

Regular Expressions are everywhere, from software development to mobile applications, and being familiar with it can be of great use if you work with programming or data analysis.

Credit: Background image by Lorenzo Cafaro from Pixabay.

Before we actually start, if you don’t know exactly what regular expressions are, here is a quick intro (from Wikipedia):

If you work with websites, apps, or development of any kind of software on your daily basis, you may end up one day needing to validate a form field or find a specific piece of text inside a content (to replace it for something else or just to use it in the logic of your application). And when that moment comes, there is a possibility that you will face something like this:

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

The example above is a classic regex pattern for matching email addresses 😐.

I know, I know… this can seem very scary at first, but believe me, this is not rocket science. Actually, you can learn the basics pretty quickly.

So, let’s go to the reasons why I believe more developers (and even other tech professionals, like data analysts) should learn regex.

1. It’s a really powerful tool

Regex came a long ago to fill a need that we have in a lot of situations when dealing with data:

Identifying patterns

And it is good at it. It provides a lot of flexibility with a short syntax for writing patterns.

There are problems that you can solve with one line of regex that you couldn’t (and shouldn’t) with 10 or more lines of conditionals.

To be honest, I believe that there are situations that are even impractical without regex (or something similar), so when used correctly, it can expand your possibilities in different ways.

2. Learn it once, use it forever

One amazing thing about regex is that you can use it on most programming languages that you may work with, like Java, PHP, Ruby, Python, JavaScript, .NET, C++, and even with software like code editors and so on.

Some implementations may miss one feature or another, or have some peculiarities, but most of it is universal.

It means that it’s a knowledge that you can carry on during your whole career, independently of which programming language or stack you might be working with now, or in the future.

3. It’s easier than it looks

The main thing that stops people from learning regular expressions, is that it seems very complicated and confusing if you look at some patterns for the first time. But once you start getting familiar with each symbol and its relationship with each other, you realize how “simple” it actually is.

Take this as an example:

/\d{3}/

The \d characters represent a number digit, while the {3} pattern represent how many times this character should appear, 3 in this case.

The forward slashes (/) at the beginning and end of the pattern are the delimiters of any regular expression.

So, this pattern matches 3 consecutive digits, like “123”, “007”, “101”, etc.

You could use this to look for occurences in a string, replace it for something else, validate a field and much more.

As you can see, regex is all about knowing the symbols and where to use each one. And being familiar with the most important symbols already gives you a lot of possibilities. This is just a simple example, and I don't expect you to learn regex in 2 minutes 😄, but I'm sure you can learn most of it in 1 or 2 days, practicing, of course. There are a lot of great resources out there to help you learn regex. Just remember, you don't learn how to ride a bike by just watching, everything needs practice.

4. There are amazing tools to help you

Memorizing all the symbols and rules can be tough in the beginning, but that’s why tools are here for.

Like the old man on The Legend of Zelda said:

It’s dangerous to go alone! Take this. 🗡️

And when it comes to creating or testing regex patterns, there are a lot of tools that can help you write, test, and learn during your journey.

Here are some great examples:

These tools come with great syntax highlighting, a quick explanation of each symbol on your pattern, pattern validation, and a lot of other features that make writing regex a lot easier. RegExr for example is almost my RegEx editor whenever I need to write a pattern. So take advantage of the tools avaliable.

6. You can understand someone else's code

Fortunately, we live in a world with the internet, and if you need a regex to solve a very common problem, you could simply Google it and voilà, you've found a pattern that hopefully fits your need perfectly.

However, this can be a dangerous approach if you don't know how to read, and properly evaluate, the code you're adding to your application. Not due to security issues (usually) but the pattern that you've found might have blind spots that the author didn't notice.

I'll use a very common example for this. If you remember, I've mentioned a regex pattern for validating email addresses at the beginning of this article. If you Google “email validation regex”, you will notice that there is an infinity of variations of patterns for this single task.

This is one of the examples that I've found:

/^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/

It had a good rating (3.8 of 5) on the list I've found on regexr.com Community Patterns, and was probably used by a lot of people at this time.

But there is a big problem with this pattern. Can you tell me what it is by just looking at it?

If you test is against some email examples for domains with the most common extensions, like .com, .co, .dev, .io, etc, it will work as expected, but once you test extensions with 5+ characters, like .studio, .digital, .design, .website, it won't match, considering the email as invalid. This is due to the rule {2,4} that you can see at the end of the pattern.

Well, if you're lucky enough to notice this before implementing it into your production website, great! But if not, some users may be having problems with your application, or worse.

So, don't get me wrong, I'm not saying you should write your own email validation regex (I definitely wouldn't try to write one myself), but at least know how to read and review the code you're getting from somewhere else.

5. It can be a stand out skill

Depending on the team that you are working with, and your position on it, knowing regex can be a valuable skill.

Not everyone knows regex, of course, so when you are in a situation where it is needed, you can help solving a problem that would need a lot more work with another approach.

Once you learn regex, you'll probably notice a lot of cases where you can benefit from it, that you usually wouldn't before knowing how to write your own patterns. The possibilities of regex are huge.

Conclusion

Even though regex can be a little frightening at a first glance, it's a powerful tool to have in your utility belt. It can save you a lot of time on manual work, help you automate tasks, improve the quality of your logic, and even reduce some lines of code and headaches.

It's important to know though, that as I've mentioned in this article, that writing a regex can save you, but also be your doom, so be responsible with the patterns you write and try to forecast and test as much as you can to make sure you are using a consistent solution, to avoid blind spots that could break or harm your application.

Well, that's all folks!

If you don't know regex yet or never tried it for any reason, maybe this article is the spark you were missing to give it a try, and discover the wonders of regular expressions. 😉

A web developer from Belo Horizonte, Brazil