Wpis z mikrobloga

#aferaprezydencka

GAWK(1) Utility Commands GAWK(1)

NAME
gawk - pattern scanning and processing language

SYNOPSIS
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

DESCRIPTION
Gawk is the GNU Project's implementation of the AWK programming lan‐
guage. It conforms to the definition of the language in the POSIX
1003.1 Standard. This version in turn is based on the description in
The AWK Programming Language, by Aho, Kernighan, and Weinberger. Gawk
provides the additional features found in the current version of Brian
Kernighan's awk and a number of GNU-specific extensions.

The command line consists of options to gawk itself, the AWK program
text (if not supplied via the -f or --file options), and values to be
made available in the ARGC and ARGV pre-defined AWK variables.

When gawk is invoked with the --profile option, it starts gathering
profiling statistics from the execution of the program. Gawk runs more
slowly in this mode, and automatically produces an execution profile in
the file awkprof.out when done. See the --profile option, below.

Gawk also has an integrated debugger. An interactive debugging session
can be started by supplying the --debug option to the command line. In
this mode of execution, gawk loads the AWK source code and then prompts
for debugging commands. Gawk can only debug AWK program source pro‐
vided with the -f option. The debugger is documented in GAWK: Effec‐
tive AWK Programming.

OPTION FORMAT
Gawk options may be either traditional POSIX-style one letter options,
or GNU-style long options. POSIX options start with a single “-”,
while long options start with “--”. Long options are provided for both
GNU-specific features and for POSIX-mandated features.

Gawk-specific options are typically used in long-option form. Argu‐
ments to long options are either joined with the option by an = sign,
with no intervening spaces, or they may be provided in the next command
line argument. Long options may be abbreviated, as long as the abbre‐
viation remains unique.

Additionally, every long option has a corresponding short option, so
that the option's functionality may be used from within #! executable
scripts.

OPTIONS
Gawk accepts the following options. Standard options are listed first,
followed by options for gawk extensions, listed alphabetically by short
option.

-f program-file
--file program-file
Read the AWK program source from the file program-file, instead
of from the first command line argument. Multiple -f (or
--file) options may be used.

-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).

-v var=val
--assign var=val
Assign the value val to the variable var, before execution of
the program begins. Such variable values are available to the
BEGIN rule of an AWK program.

-b
--characters-as-bytes
Treat all input data as single-byte characters. In other words,
don't pay any attention to the locale information when attempt‐
ing to process strings as multibyte characters. The --posix
option overrides this one.

-c
--traditional
Run in compatibility mode. In compatibility mode, gawk behaves
identically to Brian Kernighan's awk; none of the GNU-specific
extensions are recognized. See GNU EXTENSIONS, below, for more
information.

-C
--copyright
Print the short version of the GNU copyright information message
on the standard output and exit successfully.

-d[file]
--dump-variables[=file]
Print a sorted list of global variables, their types and final
values to file. If no file is provided, gawk uses a file named
awkvars.out in the current directory.
Having a list of all the global variables is a good way to look
for typographical errors in your programs. You would also use
this option if you have a large program with a lot of functions,
and you want to be sure that your functions don't inadvertently
use global variables that you meant to be local. (This is a
particularly easy mistake to make with simple variable names
like i, j, and so on.)

-D[file]
--debug[=file]
Enable debugging of AWK programs. By default, the debugger
reads commands interactively from the terminal. The optional
file argument specifies a file with a list of commands for the
debugger to execute non-interactively.

-e program-text
--source program-text
Use program-text as AWK program source code. This option allows
the easy intermixing of library functions (used via the -f and
--file options) with source code entered on the command line.
It is intended primarily for medium to large AWK programs used
in shell scripts.

-E file
--exec file
Similar to -f, however, this is option is the last one pro‐
cessed. This should be used with #! scripts, particularly for
CGI applications, to avoid passing in options or source code (!)
on the command line from a URL. This option disables command-
line variable assignments.

-g
--gen-pot
Scan and parse the AWK program, and generate a GNU .pot (Porta‐
ble Object Template) format file on standard output with entries
for all localizable strings in the program. The program itself
is not executed. See the GNU gettext distribution for more
information on .pot files.

-h
--help Print a relatively short summary of the available options on the
standard output. (Per the GNU Coding Standards, these options
cause an immediate, successful exit.)

-i include-file
--include include-file
Load an awk source library. This searches for the library using
the AWKPATH environment variable. If the initial search fails,
another attempt will be made after appending the .awk suffix.
The file will be loaded only once (i.e., duplicates are elimi‐
nated), and the code does not constitute the main program
source.

-l lib
--load lib
Load a shared library lib. This searches for the library using
the AWKLIBPATH environment variable. If the initial search
fails, another attempt will be made after appending the default
shared library suffix for the platform. The library initializa‐
tion routine is expected to be named dl_load().

-L [value]
--lint[=value]
Provide warnings about constructs that are dubious or non-porta‐
ble to other AWK implementations. With an optional argument of
fatal, lint warnings become fatal errors. This may be drastic,
but its use will certainly encourage the development of cleaner
AWK programs. With an optional argument of invalid, only warn‐
ings about things that are actually invalid are issued. (This is
not fully implemented yet.)

-M
--bignum
Force arbitrary precision arithmetic on numbers. This option has
no effect if gawk is not compiled to use the GNU MPFR and MP
libraries.

-n
--non-decimal-data
Recognize octal and hexadecimal values in input data. Use this
option with great caution!

-N
--use-lc-numeric
This forces gawk to use the locale's decimal point character
when parsing input data. Although the POSIX standard requires
this behavior, and gawk does so when --posix is in effect, the
default is to follow traditional behavior and use a period as
the decimal point, even in locales where the period is not the
decimal point character. This option overrides the default
behavior, without the full draconian strictness of the --posix
option.

-o[file]
--pretty-print[=file]
Output a pretty printed version of the program to file. If no
file is provided, gawk uses a file named awkprof.out in the cur‐
rent directory.

-O
--optimize
Enable optimizations upon the internal representation of the
program. Currently, this includes simple constant-folding, and
tail call elimination for recursive functions. The gawk main‐
tainer hopes to add additional optimizations over time.

-p[prof-file]
--profile[=prof-file]
Start a profiling session, and send the profiling data to prof-
file. The default is awkprof.out. The profile contains execu‐
tion counts of each statement in the program in the left margin
and function call counts for each user-defined function.

-P
--posix
This turns on compatibility mode, with the following additional
restrictions:

· \x escape sequences are not recognized.

· Only space and tab act as field separators when FS is set to a
single space, newline does not.

· You cannot continue lines after ? and :.

· The synonym func for the keyword function is not recognized.

· The operators and = cannot be used in place of ^ and ^=.

-r
--re-interval
Enable the use of interval expressions in regular expression
matching (see Regular Expressions, below). Interval expressions
were not traditionally available in the AWK language. The POSIX
standard added them, to make awk and egrep consistent with each
other. They are enabled by default, but this option remains for
use with --traditional.

-S
--sandbox
Runs gawk in sandbox mode, disabling the system() function,
input redirection with getline, output redirection with print
and printf, and loading dynamic extensions. Command execution
(through pipelines) is also disabled. This effectively blocks a
script from accessing local resources (except for the files
specified on the command line).

-t
--lint-old
Provide warnings about constructs that are not portable to the
original version of UNIX awk.

-V
--version
Print version information for this particular copy of gawk on
the standard output. This is useful mainly for knowing if the
current copy of gawk on your system is up to date with respect
to whatever the Free Software Foundation is distributing. This
is also useful when reporting bugs. (Per the GNU Coding Stan‐
dards, these options cause an immediate, successful exit.)

-- Signal the end of options. This is useful to allow further argu‐
ments to the AWK program itself to start with a “-”. This pro‐
vides consistency with the argument parsing convention used by
most other POSIX programs.

In compatibility mode, any other options are flagged as invalid, but
are otherwise ignored. In normal operation, as long as program text
has been supplied, unknown options are passed on to the AWK program in
the ARGV array for processing. This is particularly useful for running
AWK programs via the “#!” executable interpreter mechanism.

For POSIX compatibility, the -W option may be used, followed by the
name of a long option.

AWK PROGRAM EXECUTION
An AWK program consists of a sequence of pattern-action statements and
optional function definitions.

@include "filename"
@load "filename"
pattern { action statements }
function name(parameter list) { statements }

Gawk first reads the program source from the program-file(s) if speci‐
fied, from arguments to --source, or from the first non-option argument
on the command line. The -f and --source options may be used multiple
times on the command line. Gawk reads the program text as if all the
program-files and command line source texts had been concatenated
together. This is useful for building libraries of AWK functions,
without having to include them in each new AWK program that uses them.
It also provides the ability to mix library functions with command line
programs.

In addition, lines beginning with @include may be used to include other
source files into your program, making library use even easier. This
is equivalent to using the -i option.

Lines beginning with @load may be used to load shared libraries into
your program. This is equivalent to using the -l option.

The environment variable AWKPATH specifies a search path to use when
finding source files named with the -f and -i options. If this vari‐
able does not exist, the default path is ".:/usr/local/share/awk".
(The actual directory may vary, depending upon how gawk was built and
installed.) If a file name given to the -f option contains a “/” char‐
acter, no path search is performed.

The environment variable AWKLIBPATH specifies a search path to use when
finding source files named with the -l option. If this variable does
not exist, the default path is ".:/usr/local/lib/gawk". (The actual
directory may vary, depending upon how gawk was built and installed.)

Gawk executes AWK programs in the following order. First, all variable
assignments specified via the -v option are performed. Next, gawk com‐
piles the program into an internal form. Then, gawk executes the code
in the BEGIN rule(s) (if any), and then proceeds to read each file
named in the ARGV array (up to ARGV[ARGC]). If there are no files
named on the command line, gawk reads the standard input.

If a filename on the command line has the form var=val it is treated as
a variable assignment. The variable var will be assigned the value
val. (This happens after any BEGIN rule(s) have been run.) Command
line variable assignment is most useful for dynamically assigning val‐
ues to the variables AWK uses to control how input is broken into
fields and records. It is also useful for controlling state if multi‐
ple passes are needed over a single data file.

If the value of a particular element of ARGV is empty (""), gawk skips
over it.

For each input file, if a BEGINFILE rule exists, gawk executes the
associated code before processing the contents of the file. Similarly,
gawk executes the code associated with ENDFILE after processing the
file.

For each record in the input, gawk tests to see if it matches any pat‐
tern in the AWK program. For each pattern that the record matches,
gawk executes the associated action. The patterns are tested in the
order they occur in the program.

Finally, after all the input is exhausted, gawk executes the code in
the END rule(s) (if any).

Command Line Directories
According to POSIX, files named on the awk command line must be text
files. The behavior is ``undefined'' if they are not. Most versions
of awk treat a directory on the command line as a fatal error.

Starting with version 4.0 of gawk, a directory on the command line pro‐
duces a warning, but is otherwise skipped. If either of the --posix or
--traditional options is given, then gawk reverts to treating directo‐
ries on the command line as a fatal error.

VARIABLES, RECORDS AND FIELDS
AWK variables are dynamic; they come into existence when they are first
used. Their values are either floating-point numbers or strings, or
both, depending upon how they are used. AWK also has one dimensional
arrays; arrays with multiple dimensions may be simulated. Gawk pro‐
vides true arrays of arrays; see Arrays, below. Several pre-defined
variables are set as a program runs; these are described as needed and
summarized below.

Records
Normally, records are separated by newline characters. You can control
how records are separated by assigning values to the built-in variable
RS. If RS is any single character, that character separates records.
Otherwise, RS is a regular expression. Text in the input that matches
this regular expression separates the record. However, in compatibil‐
ity mode, only the first character of its string value is used for sep‐
arating records. If RS is set to the null string, then records are
separated by blank lines. When RS is set to the null string, the new‐
line character always acts as a field separator, in addition to what‐
ever value FS may have.

Fields
As each input record is read, gawk splits the record into fields, using
the value of the FS variable as the field separator. If FS is a single
character, fields are separated by that character. If FS is the null
string, then each individual character becomes a separate field. Oth‐
erwise, FS is expected to be a full regular expression. In the special
case that FS is a single space, fields are separated by runs of spaces
and/or tabs and/or newlines. (But see the section POSIX COMPATIBILITY,
below). NOTE: The value of IGNORECASE (see below) also affects how
fields are split when FS is a regular expression, and how records are
separated when RS is a regular expression.

If the FIELDWIDTHS variable is set to a space separated list of num‐
bers, each field is expected to have fixed width, and gawk splits up
the record using the specified widths. The value of FS is ignored.
Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.

Similarly, if the FPAT variable is set to a string representing a regu‐
lar expression, each field is made up of text that matches that regular
expression. In this case, the regular expression describes the fields
themselves, instead of the text that separates the fields. Assigning a
new value to FS or FIELDWIDTHS overrides the use of FPAT.

Each field in the input record may be referenced by its position: $1,
$2, and so on. $0 is the whole record. Fields need not be referenced
by constants:

n = 5
print $n

prints the fifth field in the input record.

The variable NF is set to the total number of fields in the input
record.

References to non-existent fields (i.e., fields after $NF) produce the
null-string. However, assigning to a non-existent field (e.g., $(NF+2)
= 5) increases the value of NF, creates any intervening fields with the
null string as their values, and causes the value of $0 to be recom‐
puted, with the fields being separated by the value of OFS. References
to negative numbered fields cause a fatal error. Decrementing NF
causes the values of fields past the new value to be lost, and the
value of $0 to be recomputed, with the fields being separated by the
value of OFS.

Assigning a value to an existing field causes the whole record to be
rebuilt when $0 is referenced. Similarly, assigning a value to $0
causes the record to be resplit, creating new values for the fields.

Built-in Variables
Gawk's built-in variables are:

ARGC The number of command line arguments (does not include
options to gawk, or the program source).

ARGIND The index in ARGV of the current file being processed.

ARGV Array of command line arguments. The array is indexed from
0 to ARGC - 1. Dynamically changing the contents of ARGV
can control the files used for data.

BINMODE On non-POSIX systems, specifies use of “binary” mode for
all file I/O. Numeric values of 1, 2, or 3, specify that
input files, output files, or all files, respectively,
should use binary I/O. String values of "r", or "w" spec‐
ify that input files, or output files, respectively, should
use binary I/O. String values of "rw" or "wr" specify that
all files should use binary I/O. Any other string value is
treated as "rw", but generates a warning message.

CONVFMT The conversion format for numbers, "%.6g", by default.

ENVIRON An array containing the values of the current environment.
The array is indexed by the environment variables, each
element being the value of that variable (e.g., ENVI‐
RON["HOME"] might be "/home/arnold"). Changing this array
does not affect the environment seen by programs which gawk
spawns via redirection or the system() function.

ERRNO If a system error occurs either doing a redirection for
getline, during a read for getline, or during a close(),
then ERRNO will contain a string describing the error. The
value is subject to translation in non-English locales.

FIELDWIDTHS A whitespace separated list of field widths. When set,
gawk parses the input into fields of fixed width, instead
of using the value of the FS variable