Skip to content

Commit

Permalink
Fixed maven build.
Browse files Browse the repository at this point in the history
Fixed unit tests.
Update network fault methods to add an argument specifying the network interface.
  • Loading branch information
julienlau committed May 22, 2019
1 parent d0c5c48 commit 87442e6
Show file tree
Hide file tree
Showing 19 changed files with 387 additions and 342 deletions.
218 changes: 141 additions & 77 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,34 @@ AnarchyApe

Fault injection tool for hadoop cluster from yahoo anarchyape

Pre-requisites
-----------

- Java JDK >= 1.7
- pdsh
- stress-ng for CPU and Memory hog

Compilation
-----------
[Java]
Required version: 1.7.0.21 or later
[Java with maven]
```
mvn package
mv target/anarchyape.jar ape.jar
```

[Java manual]
```
cd src/main/java
download log4j.ar and commons-cli.jar
# download log4j-1.4.12.jar and apache commons-cli-1.2.jar
java -cp . ape/Main.java
rm ape.jar
# Use either (preferred):
jar cfm ape.jar META-INF/MANIFEST.MF ape META-INF/services org
(old way)
# either :
javac -cp .:log4j-1.4.12.jar:commons-cli-1.2.jar ape/*.java
```

Expand All @@ -40,7 +52,7 @@ Running
```
java -jar ape.jar [commands]
log file: /var/log/ape.log
log file: anarchyape.log
(old way)
java -cp .:log4j-1.4.12.jar ape/Main
Expand All @@ -58,8 +70,11 @@ Install pdsh:
yum install pdsh
apt-get install pdsh
cp ape /usr/local/bin/ape
chmod go-rwx /usr/local/bin/ape
java -jar ape.jar -R node1,node2,node3 -S 100 5
creates a script to run on the remote hosts:
# creates a script to run on the remote hosts:
pdsh -Rssh -w node1,node2,node3 '/usr/local/bin/ape -L -S 100 5'
```

Expand Down Expand Up @@ -91,82 +106,131 @@ Available Commands
------------------
Here are some common failures in Hadoop environments:
```
• Data node is killed
• Application Master (AM) is killed
• Application Master is suspended
• Node Manager (NM) is killed
• Node Manager is suspended
• Data node is suspended
• Tasktracker is suspended
• Node panics and restarts
• Node hangs and does not restart
• Random thread within data node is killed
• Random thread within data node is suspended
• Random thread within tasktracker is killed
• Random thread within tasktracker is suspended
• Network becomes slow
• Network is dropping significant numbers of packets
• Network disconnect (simulate cable pull)
• One disk gets VERY slow
• CPU hog consumes x% of CPU cycles
• Mem hog consumes x% of memory
• Corrupt ext3 data block on disk
• Corrupt ext3 metadata block on disk
# Data node is killed
# Application Master (AM) is killed
# Application Master is suspended
# Node Manager (NM) is killed
# Node Manager is suspended
# Data node is suspended
# Tasktracker is suspended
# Node panics and restarts
# Node hangs and does not restart
# Random thread within data node is killed
# Random thread within data node is suspended
# Random thread within tasktracker is killed
# Random thread within tasktracker is suspended
# Network becomes slow
tc qdisc add dev eth0 root netem delay 100.0ms && sleep 30.0 && tc qdisc del dev eth0 root netem
# Network is dropping significant numbers of packets
# Network disconnect (simulate cable pull)
iptables -A INPUT -p tcp --dport 8080 -j DROP # block port
iptables -D INPUT -p tcp --dport 8080 -j DROP # unblock port
# One disk gets VERY slow
# CPU hog consumes x% of CPU cycles, for example by running stress-ng command remotely with :
stress-ng -c 1 --verify -t 1m -v
# Mem hog consumes x% of memory, for example by running stress-ng command remotely with :
stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t 1m -v
# Corrupt blocks of a given file on rw-mounted disk
```
Command line options:
```
usage: ape [options] ... <failure command>
options:
-c,--corrupt-file <file> <size> <offset> Corrupt the file given
the address as the first
argument, size as the 2nd
arg, and offset as the
3rd argument
-C,--corrupt-block <meta/ord> <size> <offset> Corrupt a random HDFS
block file with a size in
bytes as the 2nd arg and
offset in bytes as the
3rd argument
-d,--network-disconnect <time in seconds> Disconnect the network (eth0 only)
for a certain period of
time in seconds specified in the
argument, and then resumes
-e,--continue-node <NodeType> Continues a tasktracker
or a datanode at the
given hostname that has
already been suspended
-F,--forkbomb Hangs a host by executing
a fork bomb
-h,--help Displays this help menu
-k,--kill-node <nodetype> Kills a datanode,
tasktracker, jobtracker,
or namenode.
-L,--local Run commands locally
-P,--panic Forces a kernel panic and does not restart the system.
-p,--network-drop <percentage> <duration> Drops a specified
percentage of ALL inbound
network packets for a
duration specified in seconds.
-r,--remount Remounts all filesystems as read-only
-R,--remote <HostnameList> Run commands remotely
-s,--suspend-node <NodeType> Suspends a tasktracker or
a datanode at the given
hostname
-S,--network-slow <delay> <duration> Delay ALL network packet
delivery by a specified
amount of time (in
milliseconds) for a
period specified in
seconds
-t,--touch Touches a file called
/tmp/foo.tst
-u,--udp-flood <hostname> <port> <duration> Flood the target hostname with a DoS attack.
For proper effect, use the -R
flag and designate more
than one host.
-v,--verbose Turn on verbose mode
-V,--version Displays the version number
-u,--udp-flood <hostname> <port> <duration> Flood the target
hostname with a DoS attack. For proper effect, use the -R flag and
designate more than one host.
-C,--corrupt-block <meta/ord> <size> <offset> Corrupt a random HDFS
block file with a size in bytes as the 2nd arg and offset in bytes as the
3rd argument
-d,--network-disconnect <time> <nic> Disconnect the network
for a certain period of time specified in the argument on a given network
interface, and then resumes
-p,--network-drop <percentage> <duration> <nic> Drops a specified
percentage of all inbound network packets for a duration specified in
seconds.
-c,--corrupt-file <file> <size> <offset> Corrupt the file given
the address as the first argument, size as the 2nd arg, and offset as the
3rd argument
-e,--continue-node <NodeType> Continues a tasktracker
or a datanode at the given hostname that has already been suspended
-k,--kill-node <nodetype> Kills a datanode,
tasktracker, jobtracker, or namenode.
-s,--suspend-node <NodeType> Suspends a tasktracker
or a datanode at the given hostname
-r,--remount Remounts all
filesystems as read-only
-S,--network-slow <delay> <duration> <nic> Delay all network
packet delivery by a specified amount of time (in milliseconds) for a
period specified in seconds on a given network interface
-F,--forkbomb Hangs a host by
executing a fork bomb
-L,--local Run commands locally
-P,--panic Forces a kernel panic
and does not restart the system.
-R,--remote <HostnameList> Run commands remotely
-V,--version Displays the version
number
-h,--help Displays this help menu
-t,--touch Touches a file called
/tmp/foo.tst
-v,--verbose Turn on verbose mode
command:
-fb <lambda> -k <lambda> fork bomb
-kp <lambda> -k <lambda> kernel panic
-r <lambda> -k <lambda> remount root as read only
-kn <lambda> -k <lambda> kill a node process
-dos <lambda> -k <lambda> denial of service by launching 4 bombarding threads
-cb <lambda> -k <lambda> corrupt a random HDFS block
-cf <lambda> -k <lambda> corrupt a file at the given address
-nic <lambda> -k <lambda> interface
-p <lambda> -k <lambda> packet drop
```

Example of usage local mode
------------------
```
#java -jar ape.jar -h
#java -jar ape.jar -L -p,--network-drop <percentage> <duration> <nic>
java -jar ape.jar -L -p 80 30 eth0
#java -jar ape.jar -R -S <delay in millisec> <duration in sec> <nic>
#java -jar ape.jar -L -S 100 30 eth0
#java -jar ape.jar -L -d,--network-disconnect <time in seconds> <nic>
#java -jar ape.jar -L -d 30 eth0
#java -jar ape.jar -L --remount
#java -jar ape.jar -L --panic
#java -jar ape.jar -L --forkbomb
#java -jar ape.jar -L -c,--corrupt-file <file> <size> <offset>
#java -jar ape.jar -L --corrupt-file <file> <size> <offset>
#java -jar ape.jar -L -C,--corrupt-block <meta/ord> <size> <offset>
#java -jar ape.jar -R precise386 -S 100 30
#java -jar ape.jar -R cluser-ip-list.xml -S 100 5
#java -jar ape.jar -remote cluster-ip-list.xml -fb lambda -k lambda
#java -jar ape.jar -remote cluster-ip-list.xml -F lambda
#java -cp .:log4j.jar ape/Main -h
#java -cp .:log4j.jar ape/Main -R slaves
#java -cp .:log4j.jar ape/Main -V
```

Example of usage remote mode
------------------
```
#java -jar ape.jar -h
#java -jar ape.jar -R -S <delay in millisec> <duration in sec> <nic>
java -jar ape.jar -R precise386 -S 100 30 eth0
#java -jar ape.jar -R cluser-ip-list.xml -S 100 5 eth0
#java -jar ape.jar -remote cluster-ip-list.xml -fb lambda -k lambda
#java -jar ape.jar -remote cluster-ip-list.xml -F lambda
#java -jar ape.jar -L -v
#java -jar ape.jar -R -F
#java -jar ape.jar -R -S 1 1 eth0
#java -cp .:log4j.jar ape/Main -h
#java -cp .:log4j.jar ape/Main -R slaves
#java -cp .:log4j.jar ape/Main -remote slaves -v
#java -cp .:log4j.jar ape/Main -V
```
11 changes: 11 additions & 0 deletions bin/ape
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

export apedir=/opt/anarchyape
export thejar=anarchyape.jar
if [[ ! -e ${apedir}/${thejar} ]]; then
echo "ERROR ! jar not found"
exit 99
fi

echo "java -jar ${apedir}/${thejar} $*"
java -jar ${apedir}/${thejar} $*
9 changes: 0 additions & 9 deletions management

This file was deleted.

15 changes: 0 additions & 15 deletions network

This file was deleted.

Loading

0 comments on commit 87442e6

Please sign in to comment.