Regular Expressions demystified
Regular Expressions are slow, ugly, error-prone, incomprehensible,… Or are they? Find out by learning regexp basics.
Regular Expressions seem to divide software developers. Some love them and use them without thinking twice, some frown upon any regexp they spot in someone else's code. To others, regular expressions are all Greek. Who is right? Is the truth somewhere in the middle, as is so often the case when people take extreme standpoints on a topic?
I'd say the best way is to find out by yourself. For this I made a short video about the basic building blocks of regular expressions. Here we go:
The video does not cover Go regexp methods (or only very briefly near the end), so let's examine some useful methods from the regexp
library here.
package main
regexp
package is the only one we need. regexp/syntax
contains some low-level functions that usually are not used directly. regexp
uses these methods internally.import (
"fmt"
"regexp"
)
func prettyMatches(m []string) string {
s := "["
for i, e := range m {
s += e
if i < len(m)-1 {
s += "|"
}
}
s += "]"
return s
}
func prettySubmatches(m [][]string) string {
s := "[\n"
for _, e := range m {
s += " " + prettyMatches(e) + "\n"
}
s += "]"
return s
}
var (
exps = []string{"b.*tter", "b(i|u)tter", `batter (\w+)`}
text = `Betty Botter bought some butter
But she said the butter’s bitter
If I put it in my batter, it will make my batter bitter
But a bit of better butter will make my batter better
So ‘twas better Betty Botter bought a bit of better butter`
)
func main() {
for _, e := range exps {
re := regexp.MustCompile(e)
fmt.Println(e + ":")
fmt.Println("1. FindString: ", re.FindString(text))
fmt.Println("2. FindStringIndex: ", re.FindStringIndex(text))
fmt.Println("3. FindStringSubmatch: ", re.FindStringSubmatch(text))
fmt.Printf("4. FindAllString: %v\n", prettyMatches(re.FindAllString(text, -1)))
fmt.Printf("5. FindAllStringIndex: %v\n", re.FindAllStringIndex(text, -1))
fmt.Printf("6. FindAllStringSubmatch: %v\n\n", prettySubmatches(re.FindAllStringSubmatch(text, -1)))
}
}
Closing remarks
I hope you enjoyed the video. As always, the code is available on GitHub:
go get -d github.com/appliedgo/regexp
cd $GOPATH/src/github.com/appliedgo/regexp
go run regexp.go
Also available on the Go Playground.
Feel free to experiment with the expressions and see if the outcome is what you expected!
Links from the video
Happy coding!
Importing the regexp package