Skip to content

Latest commit

 

History

History
266 lines (248 loc) · 6.44 KB

tr7a-cheatsheets.asciidoc

File metadata and controls

266 lines (248 loc) · 6.44 KB

Cheatsheets

Terminal Commands

Table 1. Hadoop Filesystem Commands
action command

list files

hadoop fs -ls

list files' disk usage

hadoop fs -du

total HDFS usage/available

visit namenode console

copy local → HDFS

copy HDFS → local

copy HDFS → remote HDFS

make a directory

hadoop fs -mkdir ${DIR}

move/rename

hadoop fs -mv ${FILE}

dump file to console

hadoop fs -cat ${FILE} | cut -c 10000 | head -n 10000

remove a file

remove a directory tree

remove a file, skipping Trash

empty the trash NOW

health check of HDFS

report block usage of files

decommission nodes

list running jobs

kill a job

kill a task attempt

CPU usage by process

htop, or top if that’s not installed

Disk activity

Network activity

grep -e '[regexp]'

head, tail

wc

uniq -c

sort -n -k2

| action | command | Flags | Sort data | sort | reverse the sort: -r; sort numerically: -n; sort on a field: -t [delimiter] -k [index] | Sort large amount of data | sort --parallel=4 -S 500M | use four cores and a 500 megabyte sort buffer | Cut delimited field | cut -f 1,3-7 -d ',' | emit comma-separated fields one and three through seven | Cut range of characters | cut -c 1,3-7 | emit characters one and three through seven | Split on spaces | | ruby -ne 'puts $_.split(/\\s+/).join("\t")' | split on continuous runs of whitespace, re-emit as tab-separated | Distinct fields | | sort | uniq | only dupes: -d | Quickie histogram | | sort | uniq -c | TODO: check the rendering for backslash | Per-process usage | htop | Installed | Running system usage | dstat -drnycmf -t 5 | 5-second rolling system stats. You likely will have to install dstat yourself. If that’s not an option, use iostat -x 5 & sleep 3 ; ifstat 5 for an interleaved 5-second running average.

For example: `cat *

cut -c 1-4

sort

uniq -c` cuts the first 4-character

Not all commands available on all platforms; OSX users should use Homebrew, Windows users should use Cygwin.

=== Regular Expressions ===

.Regular Expression Cheatsheet [options="header"]

=======

character

meaning

TODO

.

any character

\w

any word character: a-z, A-Z, 0-9 or _ underscore. Use [:word:] to match extended alphanumeric characters (accented characters and so forth)

\s

any whitespace, whether space, tab (\t), newline (\n) or carriage return (\r).

\d

\x42 (or any number)

the character with that hexadecimal encoding.

\b

word boundary (zero-width)

^

start of line; use \\A for start of string (disregarding newlines). (zero-width)

$

end of line; use \\z for end of string (disregarding newlines). (zero-width)

[^a-mA-M]

match character in set

[a-mA-M]

reject characters in set

a|b|c

a or b or c

(…​)

group

(?:…​)

non-capturing group

(?<varname>…​)

named group

*, +

zero or more, one or more. greedy (captures the longest possible match)

*?, +?

non-greedy zero-or-more, non-greedy one-or-more

{n,m}

repeats n or more, but m or fewer times

=======

These [regexp_examples] are for practical extraction, not validation — they may let nitpicks through that oughtn’t (eg, a time zone of -0000 is illegal by the spec, but will pass the date regexp given below). As always, modify them in your actual code to be as brittle (restrictive) as reasonable.

.Example Regular Expressions [options="header"]

=======

intent

Regular Expression

Comment

Double-quoted string

`%r{"((?:\\.

[^\"])*)"}`

all backslash-escaped character, or non-quotes, up to first quote

Decimal number with sign

%r{(\.\d)}

optional sign; digits-dot-digits

Floating-point number

%r{([+\-]?\d+\.\d+(?:[eE][+\-]?\d+)?)}

optional sign; digits-dot-digits; optional exponent

ISO date

`%r{\b(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)([\+\-]\d\d:?\d\d

[\+\-]\d\d

Z)\b}`

groups give year, month, day, hour, minute, second and time zone respectively.

=======

Ascii table:

"\x00" \c "\x01" \c "\x02" \c "\x03" \c "\x04" \c "\x05" \c "\x06" \c "\a" \c "\b" \c "\t" \c \s "\n" \c \s "\v" \c \s "\f" \c \s "\r" \c \s "\x0E" \c "\x0F" \c "\x10" \c "\x11" \c "\x12" \c "\x13" \c "\x14" \c "\x15" \c "\x16" \c "\x17" \c "\x18" \c "\x19" \c "\x1A" \c "\e" \c "\x1C" \c "\x1D" \c "\x1E" \c "\x1F" \c " " \s "!" "\"" "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" "0" \w "1" \w "2" \w "3" \w "4" \w "5" \w "6" \w "7" \w "8" \w "9" \w ":" ";" "<" "=" ">" "?" "@" "A" \w "B" \w "C" \w "D" \w "E" \w "F" \w "G" \w "H" \w "I" \w "J" \w "K" \w "L" \w "M" \w "N" \w "O" \w "P" \w "Q" \w "R" \w "S" \w "T" \w "U" \w "V" \w "W" \w "X" \w "Y" \w "Z" \w "[" "\\" "]" "^" "_" \w "`" "a" \w "b" \w "c" \w "d" \w "e" \w "f" \w "g" \w "h" \w "i" \w "j" \w "k" \w "l" \w "m" \w "n" \w "o" \w "p" \w "q" \w "r" \w "s" \w "t" \w "u" \w "v" \w "w" \w "x" \w "y" \w "z" \w "{" "

" "}" "~" "\x7F" \c "\x80" \c

=== Pig Operators ===

.Pig Operator Cheatsheet [options="header"]

=======

action

operator

JOIN

FILTER