action | command |
---|---|
list files |
|
list files' disk usage |
|
total HDFS usage/available |
visit namenode console |
copy local → HDFS |
|
copy HDFS → local |
|
copy HDFS → remote HDFS |
|
make a directory |
|
move/rename |
|
dump file to console |
|
remove a file |
|
remove a directory tree |
|
remove a file, skipping Trash |
|
empty the trash NOW |
|
health check of HDFS |
|
report block usage of files |
|
decommission nodes |
|
list running jobs |
|
kill a job |
|
kill a task attempt |
|
CPU usage by process |
|
Disk activity |
|
Network activity |
|
|
|
|
|
|
|
|
|
|
|
| action | command | Flags
| Sort data | sort
| reverse the sort: -r
; sort numerically: -n
; sort on a field: -t [delimiter] -k [index]
| Sort large amount of data | sort --parallel=4 -S 500M
| use four cores and a 500 megabyte sort buffer
| Cut delimited field | cut -f 1,3-7 -d ','
| emit comma-separated fields one and three through seven
| Cut range of characters | cut -c 1,3-7
| emit characters one and three through seven
| Split on spaces | | ruby -ne 'puts $_.split(/\\s+/).join("\t")'
| split on continuous runs of whitespace, re-emit as tab-separated
| Distinct fields | | sort | uniq
| only dupes: -d
| Quickie histogram | | sort | uniq -c
| TODO: check the rendering for backslash
| Per-process usage | htop
| Installed
| Running system usage | dstat -drnycmf -t 5
| 5-second rolling system stats. You likely will have to install dstat yourself. If that’s not an option, use iostat -x 5 & sleep 3 ; ifstat 5
for an interleaved 5-second running average.
For example: `cat * |
cut -c 1-4 |
sort |
uniq -c` cuts the first 4-character Not all commands available on all platforms; OSX users should use Homebrew, Windows users should use Cygwin. === Regular Expressions === |
======= |
character |
meaning |
|
TODO |
|||
|
any character |
|
|
any word character: a-z, A-Z, 0-9 or |
|
any whitespace, whether space, tab ( |
|
|
the character with that hexadecimal encoding. |
|
|
word boundary (zero-width) |
|
start of line; use |
|
end of line; use |
|
match character in set |
|
reject characters in set |
|
a or b or c |
|
group |
|
non-capturing group |
|
named group |
|
||
zero or more, one or more. greedy (captures the longest possible match) |
|
non-greedy zero-or-more, non-greedy one-or-more |
|
repeats |
======= These [regexp_examples] are for practical extraction, not validation — they may let nitpicks through that oughtn’t (eg, a time zone of |
||
======= |
intent |
Regular Expression |
Comment |
Double-quoted string |
`%r{"((?:\\. |
[^\"])*)"}` |
all backslash-escaped character, or non-quotes, up to first quote |
Decimal number with sign |
|
optional sign; digits-dot-digits |
Floating-point number |
|
optional sign; digits-dot-digits; optional exponent |
ISO date |
`%r{\b(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)([\+\-]\d\d:?\d\d |
[\+\-]\d\d |
Z)\b}` |
groups give year, month, day, hour, minute, second and time zone respectively. |
======= Ascii table: "\x00" \c "\x01" \c "\x02" \c "\x03" \c "\x04" \c "\x05" \c "\x06" \c "\a" \c "\b" \c "\t" \c \s "\n" \c \s "\v" \c \s "\f" \c \s "\r" \c \s "\x0E" \c "\x0F" \c "\x10" \c "\x11" \c "\x12" \c "\x13" \c "\x14" \c "\x15" \c "\x16" \c "\x17" \c "\x18" \c "\x19" \c "\x1A" \c "\e" \c "\x1C" \c "\x1D" \c "\x1E" \c "\x1F" \c " " \s "!" "\"" "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" "0" \w "1" \w "2" \w "3" \w "4" \w "5" \w "6" \w "7" \w "8" \w "9" \w ":" ";" "<" "=" ">" "?" "@" "A" \w "B" \w "C" \w "D" \w "E" \w "F" \w "G" \w "H" \w "I" \w "J" \w "K" \w "L" \w "M" \w "N" \w "O" \w "P" \w "Q" \w "R" \w "S" \w "T" \w "U" \w "V" \w "W" \w "X" \w "Y" \w "Z" \w "[" "\\" "]" "^" "_" \w "`" "a" \w "b" \w "c" \w "d" \w "e" \w "f" \w "g" \w "h" \w "i" \w "j" \w "k" \w "l" \w "m" \w "n" \w "o" \w "p" \w "q" \w "r" \w "s" \w "t" \w "u" \w "v" \w "w" \w "x" \w "y" \w "z" \w "{" " |
" "}" "~" "\x7F" \c "\x80" \c === Pig Operators === |
======= |
action |
operator |
JOIN |
|||
FILTER |