PENDING - YouTube Series about AWK command
Overview
What is AWK command ?
The basic function of
awk
is to search files for lines (or other units of text) that contain certain patterns. When a line matches one of the patterns,awk
**performs specified action**s on that line.awk
continues to process input lines in this way until it reaches the end of the input files.
What does it specialises in ?
Programs in
awk
are different from programs in most other languages, becauseawk
programs are data driven (i.e., you describe the data you want to work with and then what to do when you find it). Most other languages are procedural; you have to describe, in great detail, every step the program should take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason,awk
programs are often refreshingly easy to read and write.
When you run
awk
, you specify anawk
program that tellsawk
what to do. The program consists of a series ofrules (it may also contain function definitions, an advanced feature that we will ignore for now; see User-Defined Functions). Each rule specifies one pattern to search for and one action to perform upon finding the pattern.Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in braces to separate it from the pattern. Newlines usually separate rules. Therefore, an
awk
program looks like this:
1 2 3
pattern { action } pattern { action } ...
Offical Examples
- How to Run
awk
Programs - Data files for the Examples
- Some Simple Examples
- An Example with Two Rules
- A More Complex Example
Terminologies
Data Source
Record
- Record refers to a basic unit of data, that
awk
is able to process at a single operation. - By default, each line in a file or input stream is considered a record; Seprated using the
RS
built-in varaible which by default is\n
).
Field
- Field referes to a sub-unit of data that are contained in a record, that
awk
is able to match pattern or execute action on. - By default, each word divided by space or tabs is considered a field; Seprated using
FS
built-in variable which by default is\t
orspace
; and you can set it to something else like,
to handlecsv
data input (refer to the “Built-in Variable Section”)
Patter/Action
Pattern
Pattern refer to the condition/rules used by
awk
to determine which record are selected for processing.The pattern can be:
Regular Expression: define regex pattern
(e.g.
awk '/pattern/{action}' filename
)Relational Expression: logical condition based on arithmetic or string comparison
(e.g.
awk '$1 > 100 {print $0}' filename
)Range Pattern: select or ignore a range of records by using range patterns with commas
(e.g.
awk '/start_pattern/,/end_pattern/' filename
)BEGIN and END block: executes before/after any records are processed.
(e.g.
awk 'BEGIN {action} {regular processing} END {action}' filename
)
Action
**Action refers to a block of code/program that **
awk
wil process matching record with.The action can be:
Print / Caculation / Assignment
Print output: simply use stdout to show the processed result
(e.g.
awk '/pattern/' {print $1, $3} filename
)Variable Assignment: set or change the value of variabale
(e.g.
awk '{ total += $1 } END { print "Total:", total }' filename
)Arithmetic Operations: perform calculation on field
(e.g.
awk '{ if ($1 > 100) print $0 }' filename
)
Conditional / Loop / Array Operation
Conditional Statment: execute code block based on specific operation
(e.g.
awk '{ if ($1 > 100) print $0; else print "condition not met" }' filename
)For Loop: iterate over the fields
(e.g.
awk '{ for (i = 1; i <= NF; i++) print $i }' filename
)Array Operations: use associative arrays for storing and accessing data based on keys.
(e.g.
awk '{ count[$1]++ } END { for (val in count) print val, count[val] }' filename
)
Functional Call: use built-in or user-defined functions for complex tasks.
built-int function example:
awk '{ print length($0) }' filename
user defined function:
1 2 3
# Define a function to compute factorial and use it awk 'function factorial(n) { return (n == 1 || n == 0) ? 1 : n * factorial(n-1) } { print factorial($1) }' filename
Command Baisc
Primitive Form
(for this section, we will use person-info as our data source)
In awk’s simplest form, the command can be used to perform action alone (awk '{action}' input_file
); Or match a certain pattern, then perform actions only on the matching lines (awk '/regex_pattern/{action}' input_file
):
|
|
|
|
Action - Built-in Variable
(for this example, we will use employee.txt as our data source)
You might have noticed in the previous example, we have used '{print $0}'
as the action of the awk
command to print out the whole line when matched, in this action, the $0
is a build-in variable. Similarly we have:
|
|
And you are able to change these built-in variables via either (using FS
as example):
- Environmental variable:
awk -v FS=',' '{ print $1, $2 }' filename
- Inline setting:
awk -F, '{ print $1, $2 }' filename
- BEGIN block setting:
awk 'BEGIN { FS = "," } { print $1, $2 }' filename
Action - Program from file
As the previous examples demonstrated, it is easy to perform AWK command with short pattern and action inline
|
|
But when the pattern and action gets long, you mihgtmight want to have them in a separate source file:
|
|
Pattern - Regex Patterns
(for this secion, we will use mail-list as our data source)
You can check if a certain pattern have appeared:
|
|
You can perform matching operation on one column and print out a different column:
|
|
You can perform logical operations (&&
/ ||
/ !
) on the regular expression like you would do on other programming language:
|
|
|
|
|
|
Command Input Options
input from stdin/stdout
You can use awk command to perform actions on your standard input
|
|
You can also use awk command to perform action on your standard output
|
|
input from file
For instance you have the following file (employee-1.txt, employee-2.txt):
|
|
|
|
You can then filter their content, via pattern sales
(check if it appears in any of the line):
|
|