-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
155 lines (109 loc) · 5.45 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
OMBUD
=====
Ombud (Swedish for proxy) is a simple network proxy which takes commands
from connecting clients. The commands have the form "ADDRESS:PORT\r\n",
when Ombud recieves a command it connects to the given address and port,
reads whatever is sent, caches it on disk, and relays it back to the
client.
BUILD AND REQUIREMENTS
----------------------
You need to run Linux 3.9 or later, this is because SO_REUSEPORT was
introduced in that release.
In order to build, first install build dependencies
sudo apt-get install build-essential libssl-dev
then build by executing
make
RUN
---
Execute Ombud with
make run
This will start Ombud on the default port 8090. The cache is created in
$PWD/cache-ombud.
NB! The cache is removed from source directory during invokations of
make clean.
It is possible to give port number and number of processes as
*positional* command line arguments
bin/ombud 8077 1 # port 8077, one (1) process
Quit by sending SIGINT, i.e. pressing Ctrl-C.
USAGE
-----
Connect with a network client to Ombud's port and send commands
$ nc localhost 8090
localhost:22
SSH-2.0-OpenSSH_7.1p2 Debian-2
It is possible to give both alpha numerical and numerical hosts and
ports.
DESIGN
------
The main function is in the event loop, which handles client connections
and executes the client commands (connecting to services) and
caches/sends back the response. The event handling is based on epoll(7),
which is Linux specific, and non-blocking sockets.
The event loops are forked into their own processes, one process per CPU
core, and the listen sockets are setup with SO_REUSEPORT. This will
enable all processes to listen on the same port, and the kernel handles
distribution of incoming requests. (It is stated that SO_REUSEPORT will
distribute incoming requests uniformly among the processes, which was
stated to not be the case with threaded acceptors. See Linux kernel
commit c617f39 for more information.)
NB! Using SO_REUSEPORT is a little bit experimentation. If it
doesn't work try running one process only.
When clients request data from address:port a cache lookup is performed.
On a cache hit the contents are sent to the client with sendfile(2),
which shuffles data from a file descriptor to a socket without leaving
kernel space. On a cache miss, Ombud connects to the given address and
port, reads the data, writes it to the cache, and finally relays it back
to the client.
The client connections are toggled between the states READ_CMD, which
reads commands from the client, and READ_REMOTE, which executes the
supplied client command (i.e. fetches data from the remote service or
the cache and relays back to the client).
ASSUMPTIONS
-----------
* Only IPv4, this application knows little to nothing about IPv6.
* Well formed client commands and connectable services are assumed,
malformed requests and unconnectable hosts are silently dropped.
* Not a lot of data will be sent from the services that Ombud connects
to, a static buffer size of 8 kB is deemed to be adequate for the use
case. (This was a simplification instead of having to create a chunked
buffer, which would have been used in a real world application.)
* Data downloaded from hosts is assumed to never change, hence the
content in the cache never expires.
* There is no portability, Linux (3.9+) only. (I wanted to experiment
with SO_REUSEPORT!)
POSSIBLE IMPROVEMENTS
---------------------
* Use libevent instead, for portability and also a nice framework for
event driven programming. Given time I would have tried libuv from
Joyent, it looks interesting.
* Evaluate if using thread pools for incoming connections, workers etc
yields higher performance. Possibly trying a combination of processes,
threads, and multiplexing.
* In order to make a more efficient cache, hostnames could be resolved
to ip addresses before they are used as part of the key.
* Linking against OpenSSL. If the code is going to be released under the
GPL, an OpenSSL linking exception is required in the license in order
to be specific about the user's rights and responsibilities. It would
of course be possible to use libgcrypt in this case instead,
especially since SHA1() is only used because it is simple.
* Make cache directory and listen port configurable from the command
line with getopt and possibly through a configuration file.
* Since this is a simple server it might be enough to store the cache in
memory. Alternatively, store it in a ramdisk.
* More robust error handling, e.g. if cache storage breaks between
succesful lookup and send.
* Of course the standard things: more documentation, configuration file,
command line options (with getopt_long or some such) instead of fixed
argument placing on the command line etc.
* Toggable DEBUG prints.
* Input data validation, little to none is done now. Dangerous!
* Possibly it clutters down the code casting between uint8_t and char
all the time... The interfaces are clean though, the clutter comes
from standard functions that use (char *).
* Tests. Unit tests with e.g. Google Test, functional tests etc. TDD is
really nice. This time around though everything was built in a quick
prototype fashion. (Maybe one could argue that the valgrind make
target is a basic test for some good and bad input data.)
* Analyze Valgrind report and fix, or suppress, errors. Valgrind helped
during development, it showed some errors that was fixed.
* Proper cleanup of sockets and allocated memory upon catching SIGINT.