Skip to content

Latest commit



266 lines (248 loc) · 6.44 KB


File metadata and controls

266 lines (248 loc) · 6.44 KB


Terminal Commands

Table 1. Hadoop Filesystem Commands
action command

list files

hadoop fs -ls

list files' disk usage

hadoop fs -du

total HDFS usage/available

visit namenode console

copy local → HDFS

copy HDFS → local

copy HDFS → remote HDFS

make a directory

hadoop fs -mkdir ${DIR}


hadoop fs -mv ${FILE}

dump file to console

hadoop fs -cat ${FILE} | cut -c 10000 | head -n 10000

remove a file

remove a directory tree

remove a file, skipping Trash

empty the trash NOW

health check of HDFS

report block usage of files

decommission nodes

list running jobs

kill a job

kill a task attempt

CPU usage by process

htop, or top if that’s not installed

Disk activity

Network activity

grep -e '[regexp]'

head, tail


uniq -c

sort -n -k2

| action | command | Flags | Sort data | sort | reverse the sort: -r; sort numerically: -n; sort on a field: -t [delimiter] -k [index] | Sort large amount of data | sort --parallel=4 -S 500M | use four cores and a 500 megabyte sort buffer | Cut delimited field | cut -f 1,3-7 -d ',' | emit comma-separated fields one and three through seven | Cut range of characters | cut -c 1,3-7 | emit characters one and three through seven | Split on spaces | | ruby -ne 'puts $_.split(/\\s+/).join("\t")' | split on continuous runs of whitespace, re-emit as tab-separated | Distinct fields | | sort | uniq | only dupes: -d | Quickie histogram | | sort | uniq -c | TODO: check the rendering for backslash | Per-process usage | htop | Installed | Running system usage | dstat -drnycmf -t 5 | 5-second rolling system stats. You likely will have to install dstat yourself. If that’s not an option, use iostat -x 5 & sleep 3 ; ifstat 5 for an interleaved 5-second running average.

For example: `cat *

cut -c 1-4


uniq -c` cuts the first 4-character

Not all commands available on all platforms; OSX users should use Homebrew, Windows users should use Cygwin.

=== Regular Expressions ===

.Regular Expression Cheatsheet [options="header"]






any character


any word character: a-z, A-Z, 0-9 or _ underscore. Use [:word:] to match extended alphanumeric characters (accented characters and so forth)


any whitespace, whether space, tab (\t), newline (\n) or carriage return (\r).


\x42 (or any number)

the character with that hexadecimal encoding.


word boundary (zero-width)


start of line; use \\A for start of string (disregarding newlines). (zero-width)


end of line; use \\z for end of string (disregarding newlines). (zero-width)


match character in set


reject characters in set


a or b or c




non-capturing group


named group

*, +

zero or more, one or more. greedy (captures the longest possible match)

*?, +?

non-greedy zero-or-more, non-greedy one-or-more


repeats n or more, but m or fewer times


These [regexp_examples] are for practical extraction, not validation — they may let nitpicks through that oughtn’t (eg, a time zone of -0000 is illegal by the spec, but will pass the date regexp given below). As always, modify them in your actual code to be as brittle (restrictive) as reasonable.

.Example Regular Expressions [options="header"]



Regular Expression


Double-quoted string



all backslash-escaped character, or non-quotes, up to first quote

Decimal number with sign


optional sign; digits-dot-digits

Floating-point number


optional sign; digits-dot-digits; optional exponent

ISO date




groups give year, month, day, hour, minute, second and time zone respectively.


Ascii table:

"\x00" \c "\x01" \c "\x02" \c "\x03" \c "\x04" \c "\x05" \c "\x06" \c "\a" \c "\b" \c "\t" \c \s "\n" \c \s "\v" \c \s "\f" \c \s "\r" \c \s "\x0E" \c "\x0F" \c "\x10" \c "\x11" \c "\x12" \c "\x13" \c "\x14" \c "\x15" \c "\x16" \c "\x17" \c "\x18" \c "\x19" \c "\x1A" \c "\e" \c "\x1C" \c "\x1D" \c "\x1E" \c "\x1F" \c " " \s "!" "\"" "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" "0" \w "1" \w "2" \w "3" \w "4" \w "5" \w "6" \w "7" \w "8" \w "9" \w ":" ";" "<" "=" ">" "?" "@" "A" \w "B" \w "C" \w "D" \w "E" \w "F" \w "G" \w "H" \w "I" \w "J" \w "K" \w "L" \w "M" \w "N" \w "O" \w "P" \w "Q" \w "R" \w "S" \w "T" \w "U" \w "V" \w "W" \w "X" \w "Y" \w "Z" \w "[" "\\" "]" "^" "_" \w "`" "a" \w "b" \w "c" \w "d" \w "e" \w "f" \w "g" \w "h" \w "i" \w "j" \w "k" \w "l" \w "m" \w "n" \w "o" \w "p" \w "q" \w "r" \w "s" \w "t" \w "u" \w "v" \w "w" \w "x" \w "y" \w "z" \w "{" "

" "}" "~" "\x7F" \c "\x80" \c

=== Pig Operators ===

.Pig Operator Cheatsheet [options="header"]


