blob: d27dbffb24690668cef7b5a0c2f186dd985c056d [file] [log] [blame]
Brian Kernighan87b94932012-12-22 10:35:39 -05001.de EX
2.nf
3.ft CW
4..
5.de EE
6.br
7.fi
8.ft 1
9..
Arnold D. Robbinsa2a41a82020-11-24 19:14:26 +020010.de TF
11.IP "" "\w'\fB\\$1\ \ \fP'u"
12.PD 0
13..
Brian Kernighan87b94932012-12-22 10:35:39 -050014.TH AWK 1
15.CT 1 files prog_other
16.SH NAME
17awk \- pattern-directed scanning and processing language
18.SH SYNOPSIS
19.B awk
20[
21.BI \-F
22.I fs
23]
24[
25.BI \-v
26.I var=value
27]
28[
29.I 'prog'
30|
31.BI \-f
32.I progfile
33]
34[
35.I file ...
36]
37.SH DESCRIPTION
38.I Awk
39scans each input
40.I file
41for lines that match any of a set of patterns specified literally in
Arnold D. Robbins32093f52018-08-22 20:40:26 +030042.I prog
Brian Kernighan87b94932012-12-22 10:35:39 -050043or in one or more files
44specified as
45.B \-f
46.IR progfile .
47With each pattern
48there can be an associated action that will be performed
49when a line of a
50.I file
51matches the pattern.
52Each line is matched against the
53pattern portion of every pattern-action statement;
54the associated action is performed for each matched pattern.
Arnold D. Robbins795a06b2019-07-28 05:51:52 -060055The file name
Brian Kernighan87b94932012-12-22 10:35:39 -050056.B \-
57means the standard input.
58Any
Arnold D. Robbins32093f52018-08-22 20:40:26 +030059.I file
Brian Kernighan87b94932012-12-22 10:35:39 -050060of the form
61.I var=value
62is treated as an assignment, not a filename,
63and is executed at the time it would have been opened if it were a filename.
64The option
65.B \-v
66followed by
67.I var=value
68is an assignment to be done before
69.I prog
70is executed;
71any number of
72.B \-v
73options may be present.
74The
75.B \-F
Arnold D. Robbins32093f52018-08-22 20:40:26 +030076.I fs
Brian Kernighan87b94932012-12-22 10:35:39 -050077option defines the input field separator to be the regular expression
Arnold D. Robbins32093f52018-08-22 20:40:26 +030078.IR fs .
Brian Kernighan87b94932012-12-22 10:35:39 -050079.PP
80An input line is normally made up of fields separated by white space,
Arnold D. Robbins32093f52018-08-22 20:40:26 +030081or by the regular expression
Brian Kernighan87b94932012-12-22 10:35:39 -050082.BR FS .
83The fields are denoted
84.BR $1 ,
85.BR $2 ,
86\&..., while
87.B $0
88refers to the entire line.
89If
90.BR FS
91is null, the input line is split into one field per character.
92.PP
Arnold D. Robbins32093f52018-08-22 20:40:26 +030093A pattern-action statement has the form:
Brian Kernighan87b94932012-12-22 10:35:39 -050094.IP
95.IB pattern " { " action " }
96.PP
Arnold D. Robbins795a06b2019-07-28 05:51:52 -060097A missing
Brian Kernighan87b94932012-12-22 10:35:39 -050098.BI { " action " }
99means print the line;
100a missing pattern always matches.
101Pattern-action statements are separated by newlines or semicolons.
102.PP
103An action is a sequence of statements.
104A statement can be one of the following:
105.PP
106.EX
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300107.ta \w'\f(CWdelete array[expression]\fR'u
Brian Kernighan87b94932012-12-22 10:35:39 -0500108.RS
109.nf
110.ft CW
111if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
112while(\fI expression \fP)\fI statement\fP
113for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
114for(\fI var \fPin\fI array \fP)\fI statement\fP
115do\fI statement \fPwhile(\fI expression \fP)
116break
117continue
118{\fR [\fP\fI statement ... \fP\fR] \fP}
119\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
120print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
121printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
122return\fR [ \fP\fIexpression \fP\fR]\fP
123next #\fR skip remaining patterns on this input line\fP
124nextfile #\fR skip rest of this file, open next, start at top\fP
125delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
126delete\fI array\fP #\fR delete all elements of array\fP
127exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
128.fi
129.RE
130.EE
131.DT
132.PP
133Statements are terminated by
134semicolons, newlines or right braces.
135An empty
136.I expression-list
137stands for
138.BR $0 .
139String constants are quoted \&\f(CW"\ "\fR,
140with the usual C escapes recognized within.
141Expressions take on string or numeric values as appropriate,
142and are built using the operators
143.B + \- * / % ^
144(exponentiation), and concatenation (indicated by white space).
145The operators
146.B
147! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
148are also available in expressions.
149Variables may be scalars, array elements
150(denoted
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300151.IB x [ i ] \fR)
Brian Kernighan87b94932012-12-22 10:35:39 -0500152or fields.
153Variables are initialized to the null string.
154Array subscripts may be any string,
155not necessarily numeric;
156this allows for a form of associative memory.
157Multiple subscripts such as
158.B [i,j,k]
159are permitted; the constituents are concatenated,
160separated by the value of
161.BR SUBSEP .
162.PP
163The
164.B print
165statement prints its arguments on the standard output
166(or on a file if
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300167.BI > " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500168or
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300169.BI >> " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500170is present or on a pipe if
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300171.BI | " cmd
Brian Kernighan87b94932012-12-22 10:35:39 -0500172is present), separated by the current output field separator,
173and terminated by the output record separator.
174.I file
175and
176.I cmd
177may be literal names or parenthesized expressions;
178identical string values in different statements denote
179the same open file.
180The
181.B printf
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300182statement formats its expression list according to the
183.I format
Brian Kernighan87b94932012-12-22 10:35:39 -0500184(see
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300185.IR printf (3)).
Brian Kernighan87b94932012-12-22 10:35:39 -0500186The built-in function
187.BI close( expr )
188closes the file or pipe
189.IR expr .
190The built-in function
191.BI fflush( expr )
192flushes any buffered output for the file or pipe
193.IR expr .
194.PP
195The mathematical functions
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300196.BR atan2 ,
197.BR cos ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500198.BR exp ,
199.BR log ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500200.BR sin ,
Brian Kernighan87b94932012-12-22 10:35:39 -0500201and
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300202.B sqrt
Brian Kernighan87b94932012-12-22 10:35:39 -0500203are built in.
204Other built-in functions:
205.TF length
206.TP
207.B length
208the length of its argument
209taken as a string,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300210number of elements in an array for an array argument,
211or length of
Brian Kernighan87b94932012-12-22 10:35:39 -0500212.B $0
213if no argument.
214.TP
215.B rand
pfgc70b9fe2014-09-19 18:24:02 +0000216random number on [0,1).
Brian Kernighan87b94932012-12-22 10:35:39 -0500217.TP
218.B srand
219sets seed for
220.B rand
221and returns the previous seed.
222.TP
223.B int
Arnold D. Robbinsfabf9ef2019-06-06 11:29:20 -0600224truncates to an integer value.
Brian Kernighan87b94932012-12-22 10:35:39 -0500225.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300226\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
Brian Kernighan87b94932012-12-22 10:35:39 -0500227the
228.IR n -character
229substring of
230.I s
231that begins at position
Arnold D. Robbins795a06b2019-07-28 05:51:52 -0600232.I m
Brian Kernighan87b94932012-12-22 10:35:39 -0500233counted from 1.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300234If no
Arnold D. Robbinsfabf9ef2019-06-06 11:29:20 -0600235.IR n ,
236use the rest of the string.
Brian Kernighan87b94932012-12-22 10:35:39 -0500237.TP
238.BI index( s , " t" )
239the position in
240.I s
241where the string
242.I t
243occurs, or 0 if it does not.
244.TP
245.BI match( s , " r" )
246the position in
247.I s
248where the regular expression
249.I r
250occurs, or 0 if it does not.
251The variables
252.B RSTART
253and
254.B RLENGTH
255are set to the position and length of the matched string.
256.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300257\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
Brian Kernighan87b94932012-12-22 10:35:39 -0500258splits the string
259.I s
260into array elements
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300261.IB a [1] \fR,
262.IB a [2] \fR,
Brian Kernighan87b94932012-12-22 10:35:39 -0500263\&...,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300264.IB a [ n ] \fR,
Brian Kernighan87b94932012-12-22 10:35:39 -0500265and returns
266.IR n .
267The separation is done with the regular expression
268.I fs
269or with the field separator
270.B FS
271if
272.I fs
273is not given.
274An empty string as field separator splits the string
275into one array element per character.
276.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300277\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500278substitutes
279.I t
280for the first occurrence of the regular expression
281.I r
282in the string
283.IR s .
284If
285.I s
286is not given,
287.B $0
288is used.
289.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300290\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500291same as
292.B sub
293except that all occurrences of the regular expression
294are replaced;
295.B sub
296and
297.B gsub
298return the number of replacements.
299.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300300.BI sprintf( fmt , " expr" , " ...\fB)
Brian Kernighan87b94932012-12-22 10:35:39 -0500301the string resulting from formatting
302.I expr ...
303according to the
304.IR printf (3)
305format
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300306.IR fmt .
Brian Kernighan87b94932012-12-22 10:35:39 -0500307.TP
308.BI system( cmd )
309executes
310.I cmd
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300311and returns its exit status. This will be \-1 upon error,
312.IR cmd 's
313exit status upon a normal exit,
Arnold D. Robbins795a06b2019-07-28 05:51:52 -0600314256 +
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300315.I sig
316upon death-by-signal, where
317.I sig
318is the number of the murdering signal,
319or 512 +
320.I sig
321if there was a core dump.
Brian Kernighan87b94932012-12-22 10:35:39 -0500322.TP
323.BI tolower( str )
324returns a copy of
325.I str
326with all upper-case characters translated to their
327corresponding lower-case equivalents.
328.TP
329.BI toupper( str )
330returns a copy of
331.I str
332with all lower-case characters translated to their
333corresponding upper-case equivalents.
334.PD
335.PP
336The ``function''
337.B getline
338sets
339.B $0
340to the next input record from the current input file;
341.B getline
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300342.BI < " file
Brian Kernighan87b94932012-12-22 10:35:39 -0500343sets
344.B $0
345to the next record from
346.IR file .
347.B getline
348.I x
349sets variable
350.I x
351instead.
352Finally,
353.IB cmd " | getline
354pipes the output of
355.I cmd
356into
357.BR getline ;
358each call of
359.B getline
360returns the next line of output from
361.IR cmd .
362In all cases,
363.B getline
364returns 1 for a successful input,
3650 for end of file, and \-1 for an error.
366.PP
367Patterns are arbitrary Boolean combinations
368(with
369.BR "! || &&" )
370of regular expressions and
371relational expressions.
372Regular expressions are as in
Arnold D. Robbins795a06b2019-07-28 05:51:52 -0600373.IR egrep ;
Brian Kernighan87b94932012-12-22 10:35:39 -0500374see
375.IR grep (1).
376Isolated regular expressions
377in a pattern apply to the entire line.
378Regular expressions may also occur in
379relational expressions, using the operators
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300380.B ~
Brian Kernighan87b94932012-12-22 10:35:39 -0500381and
382.BR !~ .
383.BI / re /
384is a constant regular expression;
385any string (constant or variable) may be used
386as a regular expression, except in the position of an isolated regular expression
387in a pattern.
388.PP
389A pattern may consist of two patterns separated by a comma;
390in this case, the action is performed for all lines
391from an occurrence of the first pattern
392though an occurrence of the second.
393.PP
394A relational expression is one of the following:
395.IP
396.I expression matchop regular-expression
397.br
398.I expression relop expression
399.br
400.IB expression " in " array-name
401.br
402.BI ( expr , expr,... ") in " array-name
403.PP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300404where a
405.I relop
406is any of the six relational operators in C,
407and a
408.I matchop
409is either
Brian Kernighan87b94932012-12-22 10:35:39 -0500410.B ~
411(matches)
412or
413.B !~
414(does not match).
415A conditional is an arithmetic expression,
416a relational expression,
417or a Boolean combination
418of these.
419.PP
420The special patterns
421.B BEGIN
422and
423.B END
424may be used to capture control before the first input line is read
425and after the last.
426.B BEGIN
427and
428.B END
429do not combine with other patterns.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300430They may appear multiple times in a program and execute
431in the order they are read by
432.IR awk .
Brian Kernighan87b94932012-12-22 10:35:39 -0500433.PP
434Variable names with special meanings:
435.TF FILENAME
436.TP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300437.B ARGC
438argument count, assignable.
439.TP
440.B ARGV
441argument array, assignable;
442non-null members are taken as filenames.
443.TP
Brian Kernighan87b94932012-12-22 10:35:39 -0500444.B CONVFMT
445conversion format used when converting numbers
446(default
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300447.BR "%.6g" ).
448.TP
449.B ENVIRON
450array of environment variables; subscripts are names.
451.TP
452.B FILENAME
453the name of the current input file.
454.TP
455.B FNR
456ordinal number of the current record in the current file.
Brian Kernighan87b94932012-12-22 10:35:39 -0500457.TP
458.B FS
459regular expression used to separate fields; also settable
460by option
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300461.BI \-F fs\fR.
Brian Kernighan87b94932012-12-22 10:35:39 -0500462.TP
463.BR NF
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300464number of fields in the current record.
Brian Kernighan87b94932012-12-22 10:35:39 -0500465.TP
466.B NR
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300467ordinal number of the current record.
Brian Kernighan87b94932012-12-22 10:35:39 -0500468.TP
469.B OFMT
470output format for numbers (default
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300471.BR "%.6g" ).
472.TP
473.B OFS
474output field separator (default space).
475.TP
476.B ORS
477output record separator (default newline).
478.TP
479.B RLENGTH
480the length of a string matched by
481.BR match .
482.TP
483.B RS
484input record separator (default newline).
Arnold D. Robbins7cae39d2019-10-06 22:34:20 +0300485If empty, blank lines separate records.
486If more than one character long,
487.B RS
488is treated as a regular expression, and records are
489separated by text matching the expression.
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300490.TP
491.B RSTART
492the start position of a string matched by
493.BR match .
Brian Kernighan87b94932012-12-22 10:35:39 -0500494.TP
495.B SUBSEP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300496separates multiple subscripts (default 034).
Brian Kernighan87b94932012-12-22 10:35:39 -0500497.PD
498.PP
499Functions may be defined (at the position of a pattern-action statement) thus:
500.IP
501.B
502function foo(a, b, c) { ...; return x }
503.PP
504Parameters are passed by value if scalar and by reference if array name;
505functions may be called recursively.
506Parameters are local to the function; all other variables are global.
507Thus local variables may be created by providing excess parameters in
508the function definition.
Arnold D. Robbinsde6284e2020-01-19 20:37:33 +0200509.SH ENVIRONMENT VARIABLES
510If
511.B POSIXLY_CORRECT
512is set in the environment, then
513.I awk
514follows the POSIX rules for
515.B sub
516and
517.B gsub
518with respect to consecutive backslashes and ampersands.
Brian Kernighan87b94932012-12-22 10:35:39 -0500519.SH EXAMPLES
520.TP
521.EX
522length($0) > 72
523.EE
524Print lines longer than 72 characters.
525.TP
526.EX
527{ print $2, $1 }
528.EE
529Print first two fields in opposite order.
530.PP
531.EX
532BEGIN { FS = ",[ \et]*|[ \et]+" }
533 { print $2, $1 }
534.EE
535.ns
536.IP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300537Same, with input fields separated by comma and/or spaces and tabs.
Brian Kernighan87b94932012-12-22 10:35:39 -0500538.PP
539.EX
540.nf
541 { s += $1 }
542END { print "sum is", s, " average is", s/NR }
543.fi
544.EE
545.ns
546.IP
547Add up first column, print sum and average.
548.TP
549.EX
550/start/, /stop/
551.EE
552Print all lines between start/stop pairs.
553.PP
554.EX
555.nf
556BEGIN { # Simulate echo(1)
557 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
558 printf "\en"
559 exit }
560.fi
561.EE
562.SH SEE ALSO
Arnold D. Robbins795a06b2019-07-28 05:51:52 -0600563.IR grep (1),
564.IR lex (1),
Brian Kernighan87b94932012-12-22 10:35:39 -0500565.IR sed (1)
566.br
567A. V. Aho, B. W. Kernighan, P. J. Weinberger,
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300568.IR "The AWK Programming Language" ,
569Addison-Wesley, 1988. ISBN 0-201-07981-X.
Brian Kernighan87b94932012-12-22 10:35:39 -0500570.SH BUGS
571There are no explicit conversions between numbers and strings.
572To force an expression to be treated as a number add 0 to it;
573to force it to be treated as a string concatenate
574\&\f(CW""\fP to it.
Arnold D. Robbins91eaf7f2020-02-20 19:53:39 +0200575.PP
Brian Kernighan87b94932012-12-22 10:35:39 -0500576The scope rules for variables in functions are a botch;
577the syntax is worse.
Arnold D. Robbins91eaf7f2020-02-20 19:53:39 +0200578.PP
Arnold D. Robbins32093f52018-08-22 20:40:26 +0300579Only eight-bit characters sets are handled correctly.
Arnold Robbinscc9e9b62020-12-08 08:05:22 +0200580.SH UNUSUAL FLOATING-POINT VALUES
581.I Awk
582was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
583and Infinity values, which are supported by all modern floating-point
584hardware.
585.PP
586Because
587.I awk
588uses
589.IR strtod (3)
590and
591.IR atof (3)
592to convert string values to double-precision floating-point values,
593modern C libraries also convert strings starting with
594.B inf
595and
596.B nan
597into infinity and NaN values respectively. This led to strange results,
598with something like this:
599.PP
600.EX
601.nf
602echo nancy | awk '{ print $1 + 0 }'
603.fi
604.EE
605.PP
606printing
607.B nan
608instead of zero.
609.PP
610.I Awk
611now follows GNU AWK, and prefilters string values before attempting
612to convert them to numbers, as follows:
613.TP
614.I "Hexadecimal values"
615Hexadecimal values (allowed since C99) convert to zero, as they did
616prior to C99.
617.TP
618.I "NaN values"
619The two strings
620.B +nan
621and
622.B \-nan
623(case independent) convert to NaN. No others do.
624(NaNs can have signs.)
625.TP
626.I "Infinity values"
627The two strings
628.B +inf
629and
630.B \-inf
631(case independent) convert to positive and negative infinity, respectively.
632No others do.