blob: 5830143d0a478e58e9c45453cfeff792d3398f4a [file] [log] [blame]
Brian Kernighan87b94932012-12-22 10:35:39 -05001.de EX
2.nf
3.ft CW
4..
5.de EE
6.br
7.fi
8.ft 1
9..
Brian Kernighan87b94932012-12-22 10:35:39 -050010.TH AWK 1
11.CT 1 files prog_other
12.SH NAME
13awk \- pattern-directed scanning and processing language
14.SH SYNOPSIS
15.B awk
16[
17.BI \-F
18.I fs
19]
20[
21.BI \-v
22.I var=value
23]
24[
25.I 'prog'
26|
27.BI \-f
28.I progfile
29]
30[
31.I file ...
32]
33.SH DESCRIPTION
34.I Awk
35scans each input
36.I file
37for lines that match any of a set of patterns specified literally in
Arnold D. Robbins32093f52018-08-22 20:40:26 +030038.I prog
Brian Kernighan87b94932012-12-22 10:35:39 -050039or in one or more files
40specified as
41.B \-f
42.IR progfile .
43With each pattern
44there can be an associated action that will be performed
45when a line of a
46.I file
47matches the pattern.
48Each line is matched against the
49pattern portion of every pattern-action statement;
50the associated action is performed for each matched pattern.
51The file name
52.B \-
53means the standard input.
54Any
Arnold D. Robbins32093f52018-08-22 20:40:26 +030055.I file
Brian Kernighan87b94932012-12-22 10:35:39 -050056of the form
57.I var=value
58is treated as an assignment, not a filename,
59and is executed at the time it would have been opened if it were a filename.
60The option
61.B \-v
62followed by
63.I var=value
64is an assignment to be done before
65.I prog
66is executed;
67any number of
68.B \-v
69options may be present.
70The
71.B \-F
Arnold D. Robbins32093f52018-08-22 20:40:26 +030072.I fs
Brian Kernighan87b94932012-12-22 10:35:39 -050073option defines the input field separator to be the regular expression
Arnold D. Robbins32093f52018-08-22 20:40:26 +030074.IR fs .
Brian Kernighan87b94932012-12-22 10:35:39 -050075.PP
76An input line is normally made up of fields separated by white space,
Arnold D. Robbins32093f52018-08-22 20:40:26 +030077or by the regular expression
Brian Kernighan87b94932012-12-22 10:35:39 -050078.BR FS .
79The fields are denoted
80.BR $1 ,
81.BR $2 ,
82\&..., while
83.B $0
84refers to the entire line.
85If
86.BR FS
87is null, the input line is split into one field per character.
88.PP
Arnold D. Robbins32093f52018-08-22 20:40:26 +030089A pattern-action statement has the form:
Brian Kernighan87b94932012-12-22 10:35:39 -050090.IP
91.IB pattern " { " action " }
92.PP
93A missing
94.BI { " action " }
95means print the line;
96a missing pattern always matches.
97Pattern-action statements are separated by newlines or semicolons.
98.PP
99An action is a sequence of statements.
100A statement can be one of the following:
101.PP
102.EX
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300103.ta \w'\f(CWdelete array[expression]\fR'u
Brian Kernighan87b94932012-12-22 10:35:39 -0500104.RS
105.nf
106.ft CW
107if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
108while(\fI expression \fP)\fI statement\fP
109for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
110for(\fI var \fPin\fI array \fP)\fI statement\fP
111do\fI statement \fPwhile(\fI expression \fP)
112break
113continue
114{\fR [\fP\fI statement ... \fP\fR] \fP}
115\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
116print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
117printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
118return\fR [ \fP\fIexpression \fP\fR]\fP
119next #\fR skip remaining patterns on this input line\fP
120nextfile #\fR skip rest of this file, open next, start at top\fP
121delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
122delete\fI array\fP #\fR delete all elements of array\fP
123exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
124.fi
125.RE
126.EE
127.DT
128.PP
129Statements are terminated by
130semicolons, newlines or right braces.
131An empty
132.I expression-list
133stands for
134.BR $0 .
135String constants are quoted \&\f(CW"\ "\fR,
136with the usual C escapes recognized within.
137Expressions take on string or numeric values as appropriate,
138and are built using the operators
139.B + \- * / % ^
140(exponentiation), and concatenation (indicated by white space).
141The operators
142.B
143! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
144are also available in expressions.
145Variables may be scalars, array elements
146(denoted
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300147.IB x [ i ] \fR)
Brian Kernighan87b94932012-12-22 10:35:39 -0500148or fields.
149Variables are initialized to the null string.
150Array subscripts may be any string,
151not necessarily numeric;
152this allows for a form of associative memory.
153Multiple subscripts such as
154.B [i,j,k]
155are permitted; the constituents are concatenated,
156separated by the value of
157.BR SUBSEP .
158.PP
159The
160.B print
161statement prints its arguments on the standard output
162(or on a file if
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300163.BI > " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500164or
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300165.BI >> " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500166is present or on a pipe if
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300167.BI | " cmd
Brian Kernighan87b94932012-12-22 10:35:39 -0500168is present), separated by the current output field separator,
169and terminated by the output record separator.
170.I file
171and
172.I cmd
173may be literal names or parenthesized expressions;
174identical string values in different statements denote
175the same open file.
176The
177.B printf
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300178statement formats its expression list according to the
179.I format
Brian Kernighan87b94932012-12-22 10:35:39 -0500180(see
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300181.IR printf (3)).
Brian Kernighan87b94932012-12-22 10:35:39 -0500182The built-in function
183.BI close( expr )
184closes the file or pipe
185.IR expr .
186The built-in function
187.BI fflush( expr )
188flushes any buffered output for the file or pipe
189.IR expr .
190.PP
191The mathematical functions
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300192.BR atan2 ,
193.BR cos ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500194.BR exp ,
195.BR log ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500196.BR sin ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500197and
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300198.B sqrt
Brian Kernighan87b94932012-12-22 10:35:39 -0500199are built in.
200Other built-in functions:
201.TF length
202.TP
203.B length
204the length of its argument
205taken as a string,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300206number of elements in an array for an array argument,
207or length of
Brian Kernighan87b94932012-12-22 10:35:39 -0500208.B $0
209if no argument.
210.TP
211.B rand
212random number on (0,1)
213.TP
214.B srand
215sets seed for
216.B rand
217and returns the previous seed.
218.TP
219.B int
220truncates to an integer value
221.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300222\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
Brian Kernighan87b94932012-12-22 10:35:39 -0500223the
224.IR n -character
225substring of
226.I s
227that begins at position
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300228.I m
Brian Kernighan87b94932012-12-22 10:35:39 -0500229counted from 1.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300230If no
231.IR m ,
232use the rest of the string
233.I
Brian Kernighan87b94932012-12-22 10:35:39 -0500234.TP
235.BI index( s , " t" )
236the position in
237.I s
238where the string
239.I t
240occurs, or 0 if it does not.
241.TP
242.BI match( s , " r" )
243the position in
244.I s
245where the regular expression
246.I r
247occurs, or 0 if it does not.
248The variables
249.B RSTART
250and
251.B RLENGTH
252are set to the position and length of the matched string.
253.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300254\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
Brian Kernighan87b94932012-12-22 10:35:39 -0500255splits the string
256.I s
257into array elements
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300258.IB a [1] \fR,
259.IB a [2] \fR,
Brian Kernighan87b94932012-12-22 10:35:39 -0500260\&...,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300261.IB a [ n ] \fR,
Brian Kernighan87b94932012-12-22 10:35:39 -0500262and returns
263.IR n .
264The separation is done with the regular expression
265.I fs
266or with the field separator
267.B FS
268if
269.I fs
270is not given.
271An empty string as field separator splits the string
272into one array element per character.
273.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300274\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500275substitutes
276.I t
277for the first occurrence of the regular expression
278.I r
279in the string
280.IR s .
281If
282.I s
283is not given,
284.B $0
285is used.
286.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300287\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500288same as
289.B sub
290except that all occurrences of the regular expression
291are replaced;
292.B sub
293and
294.B gsub
295return the number of replacements.
296.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300297.BI sprintf( fmt , " expr" , " ...\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500298the string resulting from formatting
299.I expr ...
300according to the
301.IR printf (3)
302format
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300303.IR fmt .
Brian Kernighan87b94932012-12-22 10:35:39 -0500304.TP
305.BI system( cmd )
306executes
307.I cmd
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300308and returns its exit status. This will be \-1 upon error,
309.IR cmd 's
310exit status upon a normal exit,
311256 +
312.I sig
313upon death-by-signal, where
314.I sig
315is the number of the murdering signal,
316or 512 +
317.I sig
318if there was a core dump.
Brian Kernighan87b94932012-12-22 10:35:39 -0500319.TP
320.BI tolower( str )
321returns a copy of
322.I str
323with all upper-case characters translated to their
324corresponding lower-case equivalents.
325.TP
326.BI toupper( str )
327returns a copy of
328.I str
329with all lower-case characters translated to their
330corresponding upper-case equivalents.
331.PD
332.PP
333The ``function''
334.B getline
335sets
336.B $0
337to the next input record from the current input file;
338.B getline
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300339.BI < " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500340sets
341.B $0
342to the next record from
343.IR file .
344.B getline
345.I x
346sets variable
347.I x
348instead.
349Finally,
350.IB cmd " | getline
351pipes the output of
352.I cmd
353into
354.BR getline ;
355each call of
356.B getline
357returns the next line of output from
358.IR cmd .
359In all cases,
360.B getline
361returns 1 for a successful input,
3620 for end of file, and \-1 for an error.
363.PP
364Patterns are arbitrary Boolean combinations
365(with
366.BR "! || &&" )
367of regular expressions and
368relational expressions.
369Regular expressions are as in
370.IR egrep ;
371see
372.IR grep (1).
373Isolated regular expressions
374in a pattern apply to the entire line.
375Regular expressions may also occur in
376relational expressions, using the operators
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300377.B ~
Brian Kernighan87b94932012-12-22 10:35:39 -0500378and
379.BR !~ .
380.BI / re /
381is a constant regular expression;
382any string (constant or variable) may be used
383as a regular expression, except in the position of an isolated regular expression
384in a pattern.
385.PP
386A pattern may consist of two patterns separated by a comma;
387in this case, the action is performed for all lines
388from an occurrence of the first pattern
389though an occurrence of the second.
390.PP
391A relational expression is one of the following:
392.IP
393.I expression matchop regular-expression
394.br
395.I expression relop expression
396.br
397.IB expression " in " array-name
398.br
399.BI ( expr , expr,... ") in " array-name
400.PP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300401where a
402.I relop
403is any of the six relational operators in C,
404and a
405.I matchop
406is either
Brian Kernighan87b94932012-12-22 10:35:39 -0500407.B ~
408(matches)
409or
410.B !~
411(does not match).
412A conditional is an arithmetic expression,
413a relational expression,
414or a Boolean combination
415of these.
416.PP
417The special patterns
418.B BEGIN
419and
420.B END
421may be used to capture control before the first input line is read
422and after the last.
423.B BEGIN
424and
425.B END
426do not combine with other patterns.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300427They may appear multiple times in a program and execute
428in the order they are read by
429.IR awk .
Brian Kernighan87b94932012-12-22 10:35:39 -0500430.PP
431Variable names with special meanings:
432.TF FILENAME
433.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300434.B ARGC
435argument count, assignable.
436.TP
437.B ARGV
438argument array, assignable;
439non-null members are taken as filenames.
440.TP
Brian Kernighan87b94932012-12-22 10:35:39 -0500441.B CONVFMT
442conversion format used when converting numbers
443(default
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300444.BR "%.6g" ).
445.TP
446.B ENVIRON
447array of environment variables; subscripts are names.
448.TP
449.B FILENAME
450the name of the current input file.
451.TP
452.B FNR
453ordinal number of the current record in the current file.
Brian Kernighan87b94932012-12-22 10:35:39 -0500454.TP
455.B FS
456regular expression used to separate fields; also settable
457by option
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300458.BI \-F fs\fR.
Brian Kernighan87b94932012-12-22 10:35:39 -0500459.TP
460.BR NF
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300461number of fields in the current record.
Brian Kernighan87b94932012-12-22 10:35:39 -0500462.TP
463.B NR
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300464ordinal number of the current record.
Brian Kernighan87b94932012-12-22 10:35:39 -0500465.TP
466.B OFMT
467output format for numbers (default
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300468.BR "%.6g" ).
469.TP
470.B OFS
471output field separator (default space).
472.TP
473.B ORS
474output record separator (default newline).
475.TP
476.B RLENGTH
477the length of a string matched by
478.BR match .
479.TP
480.B RS
481input record separator (default newline).
482.TP
483.B RSTART
484the start position of a string matched by
485.BR match .
Brian Kernighan87b94932012-12-22 10:35:39 -0500486.TP
487.B SUBSEP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300488separates multiple subscripts (default 034).
Brian Kernighan87b94932012-12-22 10:35:39 -0500489.PD
490.PP
491Functions may be defined (at the position of a pattern-action statement) thus:
492.IP
493.B
494function foo(a, b, c) { ...; return x }
495.PP
496Parameters are passed by value if scalar and by reference if array name;
497functions may be called recursively.
498Parameters are local to the function; all other variables are global.
499Thus local variables may be created by providing excess parameters in
500the function definition.
501.SH EXAMPLES
502.TP
503.EX
504length($0) > 72
505.EE
506Print lines longer than 72 characters.
507.TP
508.EX
509{ print $2, $1 }
510.EE
511Print first two fields in opposite order.
512.PP
513.EX
514BEGIN { FS = ",[ \et]*|[ \et]+" }
515 { print $2, $1 }
516.EE
517.ns
518.IP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300519Same, with input fields separated by comma and/or spaces and tabs.
Brian Kernighan87b94932012-12-22 10:35:39 -0500520.PP
521.EX
522.nf
523 { s += $1 }
524END { print "sum is", s, " average is", s/NR }
525.fi
526.EE
527.ns
528.IP
529Add up first column, print sum and average.
530.TP
531.EX
532/start/, /stop/
533.EE
534Print all lines between start/stop pairs.
535.PP
536.EX
537.nf
538BEGIN { # Simulate echo(1)
539 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
540 printf "\en"
541 exit }
542.fi
543.EE
544.SH SEE ALSO
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300545.IR grep (1),
Brian Kernighan87b94932012-12-22 10:35:39 -0500546.IR lex (1),
547.IR sed (1)
548.br
549A. V. Aho, B. W. Kernighan, P. J. Weinberger,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300550.IR "The AWK Programming Language" ,
551Addison-Wesley, 1988. ISBN 0-201-07981-X.
Brian Kernighan87b94932012-12-22 10:35:39 -0500552.SH BUGS
553There are no explicit conversions between numbers and strings.
554To force an expression to be treated as a number add 0 to it;
555to force it to be treated as a string concatenate
556\&\f(CW""\fP to it.
557.br
558The scope rules for variables in functions are a botch;
559the syntax is worse.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300560.br
561POSIX-standard interval expressions in regular expressions are not supported.
562.br
563Only eight-bit characters sets are handled correctly.