| by Arround The Web | No comments

16 Bash Regular Expression (RegEx) Examples Using grep, sed, and awk Commands

Regular Expression or Regex is a sequence of special characters that form a pattern to find specific pattern-matching instances from a file and manipulate them. It is mainly used for searching specific characters, words, filtering, or text manipulation in a file.

Regex is largely used in programming languages including in Linux Bash scripting. In this guide, I will cover how to use regex in Bash, the main regular expressions, and how to use various regex processing engines.

What is Regex

The regular expression or regex in Linux is a pattern made up of special metacharacters to match the specific pattern in a string or text file.

Let’s understand it with a real-life example, suppose you want to write a code to find some specific variables and then manipulate them in another code file then you may need regular expression in such a situation.

Regex Versions

There are different versions of regex and it is important to note that not all regex processor commands support all the regex.

  • Basic Regular Expression (BRE)
  • Extended Regular Expression (ERE)
  • Perl Compatible Regular Expression (PCRE)

The BRE and ERE are further classified as:

 

  • POSIX BRE/ERE
  • GNU BRE/ERE

POSIX BRE and POSIX ERE are subsets of GNU BRE and ERE and GNU BRE/ERE is a subset of PCRE.

Syntax of Regex

Different command-line utilities are used with regular expressions, such as:

  • grep
  • sed
  • awk

These commands are also called regex engines which help in translating complex regular expressions and give the output.

Regex Syntax with grep

grep [options] 'pattern' [string/filename]

 

Regex Syntax with sed

sed [options] '/pattern/' [string/filename]

 

Regex Syntax with awk

awk [options] '/pattern/' [string/filename]

 

Components of Regex Syntax

There are six main components of regular expression syntax:

  • Characters
  • Metacharacters
  • Quantifiers
  • Character Classes
  • Grouping
  • Lookarounds

Characters: These components form patterns using any single characters or multiple characters to search for exact patterns in the string or file. For example, the character H will search for H in the file, similarly, the letters Linux will search exact Linux word in the file. However, -i flag can be used to make search case-insensitive.

Metacharacters: These components contain special characters and have specific functionality. For example, the dot metacharacter is used to find the character, while ^ match the start of the line while $ match the end of the line. Another metacharacter pipe a|b is used to match a and b in the file.

Quantifiers: These components are used to find how many times a specific pattern must be repeated. For example, the question mark ? repeats the preceding character only zero or one time on the other asterisk find the preceding character zero or more times.

Character Classes: The groups of characters enclosed in the square brackets ([]) are known as character class components. For example, [0-9] indicates numbers from 0 to 9, similarly [A-Z] indicates all the alphabets from A to Z in capital.

Grouping: This component helps in grouping the pattern into round brackets (). It is especially useful to find repeating sequences like phone numbers or email addresses.

Lookarounds: This component is used to make a positive and negative look around a pattern preceded or succeeded by another pattern. For example, a(?=b) will look for all the a‘s if they come before b.

16 Examples of Using Regex in Linux

The following examples cover the usage of various metacharacters, quantifiers, groups, and some unique patterns.

Example 1: Using Dot (.) Expression

The dot is the basic metacharacter of regular expression used to match a single character. For example:

Using grep:

grep 'U.untu' mytextfile.txt

 

Using sed:

sed -n '/U.untu/p' mytextfile.txt

 

Using awk:

awk '/U.untu/' mytextfile.txt

 

Example 2: Using Caret (^) Expression

The caret ^ metacharacter is used to find all the lines starting from the given pattern. For example:

Using grep:

grep '^K' mytextfile.txt

 

Using sed:

sed -n '/^K/p' mytextfile.txt

 

Using awk:

awk '/^K/' mytextfile.txt

 

The above commands print out all the entries in the file starting from K.

Example 3: Using Dollar ($) Expression

This metacharacter is used to find all the string ends with the specific character. For example:

Using grep:

grep 'u$' mytextfile.txt

 

Using sed:

sed -n '/u$/p' mytextfile.txt

 

Using awk:

awk '/u$/' mytextfile.txt

 

The above commands give all the strings in the file ending with u.

Example 4: Using Asterisk (*) Expression

The asterisk * quantifier matches the occurrence of the preceding characters in the string zero or more times. It is equivalent to {0,}. For example:

Using grep:

grep 'tu*' mytextfile.txt

 

Using sed:

sed -n '/tu*/p' mytextfile.txt

 

Using awk:

awk '/tu*/' mytextfile.txt

 

Example 5: Using Question Mark (?) Expression

The question mark (?) quantifier is used to search whether the preceding character occurs zero or one time only. It acts as an optional qualifier.

Using grep:

grep -E 'Ub?' mytextfile.txt

 

The -E in the above command is for extended regex.

Using sed:

sed -nE '/Ub?/p' mytextfile.txt

 

The -E in the above command is for extended regex.

Using awk:

awk '/Ub/' mytextfile.txt

 

The difference between the asterisk and question mark quantifiers is mentioned in the image below:

The asterisk searches all the characters matching with the preceding character while the question mark of the preceding character is matched only on time.

Example 6: Using Backslash (\) Expression

This backslash \ metacharacter is used to represent the special character. For example, the asterisk * itself is a special character to use asterisk literally we will use backslash.

Let’s find all the Linux distributions in the file mytextfile.txt with space in their names.

Using grep:

grep '\s' mytextfile.txt

 

Using sed:

sed -n '/\s/p' mytextfile.txt

 

Using awk:

awk '/\s/' mytextfile.txt

 

Moreover, the backslash is also used with other various escape characters.

Example 7: Creating a Group of Patterns Using Round Braces ()

The round braces () are used to match the group of expressions inside them. Let’s understand it with an example:

Using grep:

grep -E '(untu)' mytextfile.txt

 

Using sed:

sed -nE '/(untu)/p' mytextfile.txt

 

Using awk:

awk '/(untu)/' mytextfile.txt

 

In the above commands, the engine is grabbing entries with match with the group (untu). The group can also be used with another pattern followed by it. For example, if we replace the (untu) group with (unt)u the result will be the same.

Example 8: Creating Ranged Patterns Using Curly Braces {} Expression

The curly braces {} are repetition quantifiers. You can use them in 3 different ways:

{x}: Appearing x number of times

{x,y} or {min,max}: Appearing x number of times but not more than y

{x,}: Appearing x number of times and more

Using grep:

grep -E l\{2} mytextfile.txt

 

Using sed:

sed -nE '/l{2}/p' mytextfile.txt

 

Using awk:

awk '/l{2}/' mytextfile.txt

 

The above command finds entries that contain the character l twice such as Bill and Skill.

Example 9: Using \d to Find Numerals

The \d expression is used to find the string with numerals in a string or file. For example:

Using grep:

grep -P '\d' mytextfile.txt

 

P indicates that it is a Perl-compatible regex, for more details read here.

In sed and awk, \d cannot be used because sed is POSIX regex and not Perl PCRE, therefore, the equivalent to \d in sed and awk is [0-9] or [[:digit:]].

Using sed:

sed -n '/[0-9]/p' mytextfile.txt

 

Or:

sed -n '/[[:digit:]]/' mytextfile.txt

 

Using awk:

awk '/[0-9]/' mytextfile.txt

 

Or:

awk -n '/[[:digit:]]/' mytextfile.txt

 

Example 10: Using Logical OR Operator |

The | regular expression works the same way the logical OR gate works. For example, if we use a|o then this pattern will find all the words containing either a character or o character.

Using grep:

grep -E 'k|a' mytextfile.txt

 

Using sed:

sed -nE '/k|a/p' mytextfile.txt

 

Using awk:

awk '/k|a/' mytextfile.txt

 

Example 11: Finding all the Words with a Specific Number of Characters

To find all the words with a specific number of characters we can use a dot (.). The dot metacharacter is used to find one character but it can also be used to search multiple characters. To find all the lines with four characters, use:

Using grep:

grep '^....$' mytextfile.txt

 

Using sed:

sed -n '/^....$/p' mytextfile.txt

 

Using awk:

awk '/^....$/' mytextfile.txt

 

Example 12: Finding IP Address From a File

The IP address is one of the key items that you want to extract from a file using regex. Since an IP address contains dots, finding it using regex is not a straightforward operation: we have to escape the dot (.) using slash. Let’s extract IP addresses from the /etc/hosts file.

Using grep:

grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /etc/hosts

 

Or:

grep -E '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' /etc/hosts

 

Using sed:

sed -nE '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/p' /etc/hosts

 

Using awk:

awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/' /etc/hosts

 

Example 13: Finding the Decimal Numbers

A decimal number is a combined number of a whole number and a fraction part that comes after dot (.). To get the decimal number from a file using regex, see the command given below:

Using grep:

grep -P '^\d*\.\d+$' mytextfile.txt

 

The \d is equivalent to [[:digit:]] and [0-9] but grep supports PCRE regex so we can use it with the -P flag.

Using sed:

sed -nE '/[[:digit:]]*\.[[:digit:]]+$/p' mytextfile.txt

 

Using awk:

awk '/[0-9]*\.[0-9]+$/' mytextfile.txt

 

You can replace [[:digit:]] with [0-9] in the above commands.

Example 14: Finding the Time in HH:MM 12-Hr Format

Regex can also be used to find time in a file, let’s take a look at how to extract time from a file using grep, sed, and awk commands:

Using grep:

grep -E '^(0?[1-9]|1[0-2]):[0-5][0-9]$' mytextfile.txt

 

Using sed:

sed -nE '/^(0?[1-9]|1[0-2]):[0-5][0-9]$/p' mytextfile.txt

 

Using awk:

awk '/^(0?[1-9]|1[0-2]):[0-5][0-9]$/' mytextfile.txt

 

Example 15: Finding the Date in YYYY-MM-DD Format

To extract time from a file we will use the following regex with grep:

Using grep:

grep -P '([12]\d{3}:(0[1-9]|1[0-2]):(0[1-9]|[12]\d|3[01]))' mytextfile.txt

 

Using sed:

sed -nE '/([12][[:digit:]]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][[:digit:]]|3[01]))/p' mytextfile.txt

 

Using awk:

awk '/([12][[:digit:]]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][[:digit:]]|3[01]))/' mytextfile.txt

 

Example 16: Finding and Replacing Patterns

The substitution is another regex method that is used to find and replace the pattern. For example:

Using sed:

Syntax:

sed -i 's/[pattern]/[replacement]/' [filename]

 

The original text file is:

Execute the command:

sed -i 's/KaOS/Peepin/' mytextfile.txt

 

In the above command -i is used to modify the original file, without this flag the file will only print the result in the terminal instead of modifying the file.

Check the file for verification:

The search can further be enhanced using g at the end of the expression; the g will replace all the occurrences. While without it only the first occurrence will be replaced.

Using awk:

Syntax:

awk '{sub([pattern]/[replacement]); print}' [filename] > [new_filename]

 

Execute the command:

awk '{sub(Peepin/,"KaOS"); print}' mytextfile.txt > newtextfile.txt

 

To verify check the new file:

Peepin has now gone back to KaOS in the new file.

It can also be enhanced using gsub in the place of the sub to replace all the occurrences of the pattern.

Another important thing to consider, the awk command will only display the output in the terminal without modifying the original file. To save the result use the redirection operator and provide the file name.

Regex Cheat Sheet

Regular expressions offer many metacharacters and quantifiers to create complex patterns to make the search even better. Let’s list commonly used regex rules.

Regex Character Classes

Quantifiers and Alternation

Anchors

Escape Characters

Groups and Lookarouds

Conclusion

The regex is a sequence of characters used to create a pattern that helps in searching some information from a file and processing it. There are various regex engines in Linux, such as grep, sed, and awk. This guide covers basic to advanced regular expressions using all three Linux commands. It is important to note that not all Linux regex engines support all the flavors of regex, for example, sed and awk do not support PCRE, while all engines support BRE, ERE, and PCRE.

Share Button

Source: linuxhint.com

Leave a Reply