Regular Expressions demystified

Regular Expressions are slow, ugly, error-prone, incomprehensible,… Or are they? Find out by learning regexp basics.

Hey, good news–I finally managed to set up a mailing list! Now you can get updates via email! For more info, click here or at the envelope icon at the top of this page.

Regular Expressions seem to divide software developers. Some love them and use them without thinking twice, some frown upon any regexp they spot in someone else’s code. To others, regular expressions are all Greek. Who is right? Is the truth somewhere in the middle, as is so often the case when people take extreme standpoints on a topic?

I’d say the best way is to find out by yourself. For this I made a short video about the basic building blocks of regular expressions. Here we go:

The video does not cover Go regexp methods (or only very briefly near the end), so let’s examine some useful methods from the regexp library here.

Importing the regexp package

package main

The regexp package is the only one we need. regexp/syntax contains some low-level functions that usually are not used directly. regexp uses these methods internally.

import (
	"fmt"
	"regexp"
)

prettyMatches formats Matches nicely.

func prettyMatches(m []string) string {
	s := "["
	for i, e := range m {
		s += e
		if i < len(m)-1 {
			s += "|"
		}
	}
	s += "]"
	return s
}

PrettySubmatches formats Submatches nicely.

func prettySubmatches(m [][]string) string {
	s := "[\n"
	for _, e := range m {
		s += "    " + prettyMatches(e) + "\n"
	}
	s += "]"
	return s
}

Let’s define some regular expressions and text to search in. Note the backticks used for the regexp that contains a backslash. If we used double quotes, we would need to double the backslash to avoid an “unknown escape sequence” error.

var (
	exps = []string{"b.*tter", "b(i|u)tter", `batter (\w+)`}

	text = `Betty Botter bought some butter 
But she said the butter’s bitter 
If I put it in my batter, it will make my batter bitter 
But a bit of better butter will make my batter better 
So ‘twas better Betty Botter bought a bit of better butter`
)

PrintSlice prints a slice in a more readable way. Standard Println or Printf separate the elements by a space, but our text also contains spaces, so we need something Now try some of the various Find functions.

func main() {
	for _, e := range exps {
		re := regexp.MustCompile(e)
		fmt.Println(e + ":")
		fmt.Println("1. FindString: ", re.FindString(text))
		fmt.Println("2. FindStringIndex: ", re.FindStringIndex(text))
		fmt.Println("3. FindStringSubmatch: ", re.FindStringSubmatch(text))
		fmt.Printf("4. FindAllString: %v\n", prettyMatches(re.FindAllString(text, -1)))
		fmt.Printf("5. FindAllStringIndex: %v\n", re.FindAllStringIndex(text, -1))
		fmt.Printf("6. FindAllStringSubmatch: %v\n\n", prettySubmatches(re.FindAllStringSubmatch(text, -1)))
	}
}

Closing remarks

I hope you enjoyed the video. As always, the code is available on GitHub:

go get -d github.com/appliedgo/regexp
cd $GOPATH/src/github.com/appliedgo/regexp
go run regexp.go

Also available on the Go Playground.

Feel free to experiment with the expressions and see if the outcome is what you expected!

Go Regexp Syntax Reference

RegexPlanet

Happy coding!

Share


Sorry, there is no comments function here at the moment.

I invite you to head over to Reddit and discuss this blog post, or anything else about applying Go to real-world problems, on the /r/appliedgo subreddit. Feel free to create a new thread over there if you do not find one for this post. (It makes no sense if I created a new thread for every post unless there is something to talk about.)

Love to see your feedback!