2004.12_Regular Wizardry-Using Awk with a Text-File Table.pdf
(
4594 KB
)
Pobierz
Layout 1
COVER STORY
AWK
Regular Wizardry
Text files formatted as tables are
easily searched and modified using
AWK. Admins in particular appreciate
AWK, which is typically installed on
any flavor of Unix.
BY NICO HOCHGESCHWENDER
part of any admin’s daily grind.
This important task may involve
evaluating logfiles, creating or modifying
configurations, or adding new accounts.
For some tasks, the classic Unix pro-
gramming language AWK offers the most
efficient solution. AWK is a compact lan-
guage with syntax similar to C, which
makes AWK easy to learn for anyone
with C experience.
An AWK script parses the input file
line by line, searching for patterns.
When AWK finds a match, it performs a
specified action. If the programmer does
not specify a pattern, AWK simply per-
forms the action for each line. As you’ll
learn in this article, AWK is a very effi-
cient tool for searching any kind of text
file, including a previously prepared
table stored in text format. A formatted
text file, accompanied by a few simple
AWK commands or scripts, can serve as
a very simple and flexible data retrieval
system without the complication or
expense of an SQL database. This article
describes how a system administrator
can use AWK to obtain information
about computers on a local network.
The free (i.e., released under the GPL)
AWK version
gawk
[1] is a standard com-
ponent of any Linux distribution. And as
traditional Unix systems also include
AWK, the tool is particularly useful for
platform-independent scripting. If you
have Solaris, HP/UX and AIX servers as
well as Linux computers, AWK could
become an indispensable tool.
The examples in this article are based
on the list of computers in Listing 1. The
table shows the contents of a text file
that lists the computer name, IP address,
operating system, software, and RAM for
each computer. The idea is that, when a
computer is reconfigured or added to the
network, the system administrator
updates the information in this file.
The field
$1
refers to the first column. If
you need to know the IP address instead,
simply replace
$1
with
$2
.
$0
corre-
sponds to the full line.
gawk '{print $0}'
list
outputs the complete file on the
screen.
Examples
The following search key will give us the
whole range of information for a com-
puter called Goofy1:
Approach
The basic syntax for a gawk single-line
script is as follows:
gawk '$1=="Goofy1" {print $0}'
U
list
gawk [options]
U
'<I(>AWK program<I>'
U
<I>input_file<I>
In each line, AWK checks if the expres-
sion
Goofy1
occurs exactly once in
column
$1
. If so, it prints the whole line,
{print $0}
. Instead of the equal to opera-
tor (
==
), this example uses the
negation operator,
!=
, to ensure that
AWK will only run the command if there
is no match (see Table 1.) The
gawk '$1
!= "Goofy1" {print $0}' list
command
sends the contents of lines without the
Goofy1
string to standard output.
Larger AWK scripts should be stored in a
file. In that case, the syntax is
gawk
[options] -f scriptfile inputfiles
. The first
thing we would like AWK to do now is
give us a list of the computer names in
our sample file (Listing 1):
gawk '{print $1}' list
30
December 2004
www.linux-magazine.com
AWK with a Text File Table
A
utomatic text file manipulation is
AWK
COVER STORY
AWK can search for ranges
in addition to individual
search patterns. The follow-
ing syntax uses two regular
expressions (see Table 2),
surrounded by slash charac-
ters; AWK compares them
with the whole line.
to send output to the screen.
But just like the shell, AWK is
capable of redirecting data
streams into files:
gawk '$1=="Obelix"
U
{print $0 > "/home/
U
linux/test"}' list
gawk '/Goofy1/,
U
/Asterix/
U
{print $0}' list
The preceding command
sends the data stream to a file
called
test
. If the file does not
exist, AWK will automatically
create it. You can redirect out-
put using
>>
. It is okay to
use the shell’s own redirec-
tion function for this simple
example, but the AWK variant
allows you to redirect output
to different files and view the
output on the screen at the
same time.
AWK also understands the
printf()
function, which is
also used by C and the shell.
Admins can use
printf()
for
enhanced output formatting.
Just like its counterpart in C,
printf()
does not wrap the
output but expects the pro-
grammer to add
\n
for new
lines.
The output is the whole area
from the
Goofy1
search string
up to and including
Asterix1
.
Logical Operators
Search keys can be extended
and linked using Boolean
operators (Table 1) such as
the AND operator.
gawk '($3=="OSX") &&
U
($4=="Photoshop")
U
{print $1}' list
Logical operands need to be
put in round brackets, and
this can cause errors, espe-
cially if you need to deal with
more complex expressions.
AWK has a logical OR opera-
tor,
||
in addition to the
logical AND.
/PERAFOR,INUX
4AKECONTROLOFYOUR7EBEXPERIENCE
gawk '{printf
U
("%x\n",$5)}' list
! 7EB STANDARDS COMPLIANT AND FULLFEATURED
)NTERNETPOWERTOOL/PERAINCLUDESCUSTOMPOP
UPBLOCKINGTABBEDBROWSINGINTEGRATEDSEARCH
ESADVANCEDSECURITYMANAGEMENTANDUNIQUE
FUNCTIONS LIKE /PERAS GROUNDBREAKING EMAIL
CLIENT233.EWSFEEDSAND)2#CHAT!NDBECAUSE
WEKNOWTHATPEOPLEHAVEDIFFERENTNEEDSYOU
CANCUSTOMIZETHELOOKFEELANDFEATURESOFTHE
/PERABROWSERWITHAFEWCLICKSOFTHEMOUSE
Output
Thus far we have been happy
In this example, we want
printf
to output an integer
Listing 1: Computer list
01 DagobertDuck 10.1.1.3 Debian Kylix 256
02 Goofy1 10.1.1.4 Solaris Mathematica 512
03 MickeyMouse 10.1.1.5 Debian Apache
512
04 LuckyLuke1
10.1.1.6 Debian Samba
256
)TSYOUR7EB%NJOY
05 LuckyLuke2
10.1.1.7 Debian Eclipse 256
06 LuckyLuke3
10.1.1.8 Suse Mupad
256
07 LuckyLuke4
10.1.1.9 Debian Mupad
128
08 LuckyLuke43 10.1.1.10 Debian Mupad
128
09 LuckyMickeyMouse 10.1.1.1 Debian Mupad
128
10 Asterix1
10.1.1.12 RedHat NetBeans 128
11 Asterix2
10.1.1.13 Debian NFS
256
12 Obelix
10.1.1.14 RedHat ICC
256
13 Apple1
10.1.1.15 OSX Photoshop
1024
14 Apple2
10.1.1.16 OS6 Photoshop 128
$OWNLOAD/PERAFROM
WWWOPERACOMLINUXMAG
15 Apple3
10.1.1.17 OSX Photoshop 512
www.linux-magazine.com
December 2004
31
COVER STORY
AWK
value in hex (
%x
) and then add a new
line (
\n
). The argument passed to gawk
is the content of column five. Refer to [2]
for more detail on this.
a variable called
sum
. It out-
puts the value of the field and
the sum.
Table 2: Regular expressions
Expression
Explanation
.
Replaces an arbitrary character
Regular Expressions
Regular expressions are often
useful if you need to manipu-
late or search text documents.
Meta-characters give you the
ability to create quite com-
plex search keys. AWK
supports regular expressions:
^
Finds the following regular expression at the start of the
line
A New Start
The
BEGIN
and
END
constructs are use-
ful for outputting headlines or messages.
AWK runs any
BEGIN
commands before
parsing the input file and any
END
com-
mands after completing the last line.
$
Finds the following regular expression at the end of the line
[ ]
Finds any character between the square brackets
[a-d1-7]
Character classes with ranges: all letters between a and d,
and all numbers between 1 and 7
X?
Either no Xs or exactly one X
X*
Either no Xs or more than one X
X|Z
X or Z
XZ
X immediately followed by Z
gawk 'BEGIN
U
{print "Search for MickeyMouse"}
$1=="MickeyMouse" {print $0}
END {print "-------"}' list
gawk '$1 ~ /[0-9]/
{print $0}'
U
list
could do the job more elegantly using an
AWK string function.
Besides string manipulation, AWK can
also handle numerical operations. The
last line in the file in Listing 1 contains
numbers, which AWK can manipulate
numerically or non-numerically. For
example, you can type the following
command if you need to know how
much memory you have in your lab
environment:
This script searches column one,
$1
, for
a search key that contains a line number
between 0 and 9. To tell AWK only to
search in
$1
, you need to explicitly
assign the column number to the search
key using a tilde (~) character. The
negation operator would achieve exactly
the opposite effect:
!~
searches any
lines in which the regular expression
does not occur.
To find any computers whose names
end in
Duck
in the list, we need the fol-
lowing command:
gawk
U
'{sub(/Suse/, "Debian", $3);
U
print >> "/home/linux/test"}'
U
list
The preceding command passes the
search key,
/Suse/
, the replacement text
"Debian"
, and the column
$3
, to the
sub()
(substitute) function. Assuming
that the search key occurs in this col-
umn, the replacement text is substituted
in. If you need more information on
string functions, check out [3] and the
manpage.
gawk '{sum+=$5; print $5, sum}'
U
list
This mini-program adds field five in
every line and stores the current total in
gawk '$1 ~ /Duck$/ {print $0}'
U
list
Going Bigger
More complex AWK scripts allow you to
define your own functions, loops, and
multi-dimensional arrays. The GNU
variant can even handle TCP/IP commu-
nications [4].
Table 1: AWK Operators
Operator
Explanation
The dollar operator in
/Duck$/
appends
the regular expression to the end of field
$1
.
/^Lucky/
finds any entries that start
with
Lucky
, such as
LuckyLuke
or
Luck-
yMickeyMouse
. Boolean operators
provide a useful extension to this func-
tionality:
$
Field operator
++ –
Postfix increment and decrement
++ –
Prefix increment and decrement
■
^
Power
!
Logical negation
INFO
+ -
Sign operations
[1] GNU AWK:
http://www.gnu.org/software/gawk/
[2] Printf examples from the gawk manual:
http://www.gnu.org/software/gawk/
manual/html_node/Printf-Examples.html
[3] Helmut Herold,“AWK and SED”: Addison
Wesley, 1991
[4] TCP/IP communication with gawk:
http://www.gnu.org/software/gawk/
manual/html_node/
TCP_002fIP-Networking.html
* / %
Multiplication, division, modulo operation
+ -
Addition, subtraction
gawk '$1 ~/(y|M)/ {print $0}'
U
list
<
Less than
<=
Less than or equal to
==
Equal to
This command searches the first column
for occurrences of
y
or
M
. Table 2 gives
you an overview of meta-characters.
!=
Unequal to
>=
Greater than or equal to
>
Greater than
~ !~
Compare to regular expression
String Functions
AWK has a wide range of functions for
string replacement or substitution. In our
sample file, only one computer has a
Suse operating system. Let’s assume that
the administrator who manages this net-
work migrates this computer to Debian
and now needs to update the computer
list. Instead of using an editor, the admin
&&
Logical AND
||
Logical OR
=
Assignment
+=
Addition and assignment
Nico Hochgeschwender is studying
Computer Science and majoring in
mobile robotics. When he has time to
spare, he enjoys mountaineering or
cycle racing.
-=
Subtraction and assignment
*=
Multiplication and assignment
/=
Division and assignment
%=
Modulo operation and assignment
^=
Power and assignment
32
December 2004
www.linux-magazine.com
Plik z chomika:
SOLARIX33
Inne pliki z tego folderu:
2010.10_Popular Demand-Sql Selects in Javascript with Publicsql.pdf
(728 KB)
2010.10_New Start-What's Coming in Html 5.pdf
(645 KB)
2010.10_Hook, Line, and Sinker-Testing Web Applications with Google's Skipfish.pdf
(756 KB)
2010.10_Easy Street-Simplifying Site Management.pdf
(601 KB)
2010.09_Return of the Lost Sibling-Failure Tolerance with Xen 4 and Remus.pdf
(634 KB)
Inne foldery tego chomika:
Ask Klaus
Beginners
Comment
Community
Community Notebook
Zgłoś jeśli
naruszono regulamin