Wednesday, May 9, 2012

regular expression

regular expression is some kind of pattern to match a criteria. well i'm not good enough at explaining by words, so let's continue by examples. in this case i'll use grep as the command. just to make sure that you already alias grep with grep --color=auto so the result of grep will be marked red

if we want to capture lines containing a phrase for example phrase "lo" from command 'ip a'. let's first invoke 'ip a'. note that lines beginning with # are the things you should type and the result of the command lies below.
#ip a
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:1f:16:2a:6c:00 brd ff:ff:ff:ff:ff:ff
3: wmaster0: mtu 0 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ieee802.11 0c:60:76:7b:54:e0 brd 00:00:00:00:00:00
4: wlan0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 0c:60:76:7b:54:e0 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.2/24 brd 192.168.1.255 scope global wlan0
inet6 fe80::e60:76ff:fe7b:54e0/64 scope link
valid_lft forever preferred_lft forever
5: vboxnet0: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff

that's a complete result before we filter anything. let's find some lines containing phrase "lo"


#ip a|grep lo
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 192.168.1.2/24 brd 192.168.1.255 scope global wlan0

we got four lines here. but it also lists lines containing lo in other words (loopback and global) not standalone lo. to get lines with standalone lo we have
#ip a|grep -E '\'
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
inet 127.0.0.1/8 scope host lo

now let's capture IP addresses. IP addresses are always four octets separated by dots (.). each octet can be a number from 0 to 255. we must split up this number to meet the requirements. in this case. we divide into 3 accounts, so 0 to 255 could be 250-255 or 200-249 or 0-199. [0-9] represents any number from 0 to 9 and | represents logical "or" and ? can represent that phrase is ignorant and/or exists. so the expression would be '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)). remember this is only an octet. to capture a complete IP Address we have to write that expression four times. the dot separator can be represented by \. (escaped dot). the complete command may seem like this
#ip a|grep -E '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))'
inet 127.0.0.1/8 scope host lo
inet 192.168.1.2/24 brd 192.168.1.255 scope global wlan0

to capture IP address with the netmask we must again split up thye netmask into 2 accounts. netmask in this case is written by prefix not in the form of IP address. netmask is from 0 to 32 (30-32 or 0-29). the slash must be escaped with "\".
#ip a|grep -E '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\/((3[0-2]|[0-2]?[0-9]))'
inet 127.0.0.1/8 scope host lo
inet 192.168.1.2/24 brd 192.168.1.255 scope global wlan0

to capture a whole global line that we must read carefully from left to right. first we must put a caret "^" that represents the start of the line. second there are 4 spaces so we must indicate them with "\ {4}. followed by word "inet", followed again by a space "\ ", then an IP address with subnetmask (we have discussed this earlier. then a word "brd" means broadcast followed by broadcast ID. broadcast ID have the same rule as IP address. then space, word "scope" space "word global space and at last the interface. interface could be wlan0, wlan1, eth0, eth1 etc.
#ip a|grep -E '^\ {4}inet\ ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\/((3[0-2]|[0-2]?[0-9]))\ brd\ ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\ scope\ global\ ((wlan[0-9]|eth[0-9]))'
inet 192.168.1.2/24 brd 192.168.1.255 scope global wlan0

escape "\" is necessary when capturing characters that has special meaning. some characters that has special meaning in regular expression are
backslash \ escaping character
slash / directory delimiter
space command delimiter
dot . any single character
parentheses () special environment
brackets [] to allow any single character in the brackets
braces {} amount of characters before an open brace
plus + any number before the plus sign starting from 1
asterisk * any number before the plus sign starting from 0
question mark ? can be 0 or 1 character before ? sign
comma , delimiter used in braces
pipe | logical or
quote ' used to specify a phrase or pattern
double quote " used to specify a phrase or pattern
back tick ` command substitution
caret ^ the start of the line
dollar $ the end of the line

No comments:

Post a Comment