Skip to content

Commit b223aea

Browse files
authored
Merge pull request #7 from certtools/iep-007
(IEP 007) Running bots as library
2 parents 4266c5a + 5bd65ab commit b223aea

File tree

2 files changed

+227
-4
lines changed

2 files changed

+227
-4
lines changed

007/README.md

+222
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# IEP007 - Running IntelMQ bots as Python Library
2+
3+
A working example call (Proof of Concept) is located here:
4+
https://github.com/wagner-intevation/intelmq/blob/bot-library/intelmq/tests/lib/test_bot.py#L141
5+
6+
## Background
7+
As of IntelMQ 3.1.0, IntelMQ Bots can only be started on the command line with `intelmqctl`.
8+
Most tools (including the IntelMQ API, and thus the IntelMQ Manager) use `intelmqctl start` to start bot instances.
9+
`intelmqctl start` spawns a new child process and detaches it.
10+
11+
Only `intelmqctl run` provides the ability to run bots interactively in the foreground of the command line and provides some neat features for debugging purposes.
12+
13+
Starting IntelMQ bots using Python code requires much of effort (and code complexity). Additionally, the bot's parameters can only be provided by modifying the IntelMQ runtime configuration file.
14+
Messages can only be fed and retrieved from the bot by connecting to the pipeline (e.g.) separately and writing/reading properly serialized messages there.
15+
16+
Integrating IntelMQ Bots into other (Python) tools is therefore hard to impossible in the current IntelMQ 3.1 version.
17+
18+
In a nutshell, calling a bot and processing should take, at most, a few lines.
19+
The following complete example shows what the procedure could look like.
20+
The bot class is instantiated, passing a few parameters.
21+
```python
22+
from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
23+
from intelmq.lib.bot import BotLibSettings
24+
domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id
25+
# the {} | {} syntax is available in Python >= 3.9
26+
settings=BotLibSettings | {
27+
'field': 'fqdn',
28+
'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
29+
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
30+
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
31+
# >>> output['output'][0]['source.domain_suffix']
32+
# 'com'
33+
```
34+
35+
### Use cases
36+
37+
#### General
38+
Any IntelMQ-related or third-party program may use IntelMQ's most potent components - IntelMQ's bots.
39+
40+
The full potential shows off when stacking multiple bots together and iterating over lots of data:
41+
42+
```python
43+
# instantiate all bots first, for an example see above
44+
domain_suffix = DomainSuffixExpertBot(...)
45+
url2fqdn = Url2fqdnExpertBot(...)
46+
http_status = HttpstatusExpertBot(...)
47+
tuency = TuencyExpertBot(...)
48+
lookyloo = LookylooExpertBot(...)
49+
50+
# a list of input messages
51+
messages = [{...}]
52+
53+
for message in message:
54+
for bot in (domain_suffix,
55+
url2fqdn,
56+
http_status,
57+
tuency,
58+
lookyloo):
59+
# for simiplicity we assume that the bots always send one message
60+
message = bot.process_message(message)['output'][0]
61+
# message now has the cumulated data of five bots
62+
63+
# messages now is a list of output messages
64+
```
65+
66+
#### IntelMQ Webinput Preview
67+
68+
The IntelMQ webinput can show previews of the *processed* data to the operator, not just the input data, adding much more value to the preview functionality.
69+
Currently the preview gives the operator feedback on the parsing step. The further processing of the data by the bots is invisible to the operator.
70+
This causes confusion and uncertainty for the operators.
71+
72+
The Webinput backend can call the bots and process the events, without any interference to the running bot processes, pipelines and bot management.
73+
The data flow illustrated:
74+
```
75+
Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> preview shown to operator
76+
```
77+
The implementation details for the webinput are not part of this proposal document.
78+
79+
In the next step, the webinput can also show previews of notifications (e.g. Emails). This is also not part of this proposal document.
80+
```
81+
Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> notification tool (preview mode) -> notification preview shown to operator
82+
```
83+
84+
## Requirements
85+
86+
### Messages and Pipeline
87+
Providing input messages as function parameters and receiving output messages should be possible.
88+
In this case, messages should not be serialized or encoded, they should stay Message objects (derived from `dict` and behaving like dictionaries).
89+
90+
It should also be possible to let the bot use the configured pipeline (e.g. redis) and behave like a normal bot.
91+
92+
### Exceptions and dumped messages
93+
An exception in the bot's `process()` method should not be caught in intermediate layers and raised to the caller's function call.
94+
95+
Option: If there is a helper function to call `process()` multiple times (having a bunch of input messages), the exceptions are caught together with the (dumped) messages, accessible to the caller.
96+
97+
### Parameters and Configuration
98+
The global IntelMQ configuration should be effective.
99+
The user may override configuration options by providing a configuration dictionary.
100+
101+
#### Pre-configured bots
102+
It should be possible to run bots defined in IntelMQ's runtime configuration file. Additional overriding parameters can be provided.
103+
104+
#### Un-Configured bots
105+
It should be possible to run bots, which are not defined in IntelMQ's runtime configuration file. The bot configuration is provided as function parameter.
106+
107+
### Signals
108+
Normally bots react to signals like SIGTERM, SIGHUP and SIGINT to treat them specially.
109+
In library mode, this would interfere with the signal handling of the calling code.
110+
Thus, IntelMQ bots called as library must not manipulate the signal handling or call `sys.exit`.
111+
112+
### Logging
113+
114+
By default, IntelMQ logs to `/var/log/intelmq/` or `/opt/intelmq/var/log`, respectively - depending on the installation type and actual configuration.
115+
When IntelMQ bots are called by other external scripts as library, the logging is in most cases not wanted and causes permission errors.
116+
On the other hand, the logging might as well be fine.
117+
There should be an easy way to disable the file-logging, while keeping the possibility to use the default behavior.
118+
119+
### Dependency on configuration files
120+
121+
IntelMQ in library-mode must not depend on existing IntelMQ configuration files or logging directories, but be able to behave as IntelMQ normally do.
122+
It is up to the user to decide the behavior, e.g. if the log of the bot should be written to files.
123+
124+
IntelMQ normally loads the runtime configuration file `/etc/intelmq/runtime.yaml` or `/opt/intelmq/etc/runtime.yaml`.
125+
In library-mode, IntelMQ tries to load the file, but does continue normally if it does not exist.
126+
127+
IntelMQ normally loads the harmonization configuration file `/etc/intelmq/harmonization.yaml` or `/opt/intelmq/etc/harmonization.yaml`.
128+
In library-mode, IntelMQ tries to load the file, and if it does not exist, loads the internal default harmonization configuration, which is part of the IntelMQ packages.
129+
130+
## Rationales
131+
132+
### Compatibility
133+
Since the beginning of IntelMQ, the bot's `process` methods use the methods `self.receive_message`, `self.acknowledge_message` and `self.send_message`.
134+
Breaking this paradigm and changing to method parameters and return values or generator yields would indicate an API change and thus lead to IntelMQ version 4.0.
135+
Thus, we stick to the current behavior.
136+
137+
## Specification
138+
139+
Only changes in the `intelmq.lib.bot.Bot` class are needed.
140+
No changes in the bots' code are required.
141+
142+
### Bot constructor
143+
144+
The operator constructs the bot by initializing the bot's class.
145+
Global and bot configuration parameters are provided as parameter to the constructor in the same format as IntelMQ runtime configuration.
146+
147+
```python
148+
class Bot:
149+
def __init__(bot_id: str,
150+
*args, **kwargs, # any other paramters left out for clarity
151+
settings: Optional[dict] = None)
152+
```
153+
After reading the runtime configuration file, the constructor applies all values of the `settings` parameter.
154+
155+
### Method call
156+
157+
The `intelmq.lib.bot.Bot` class gets a new method `process_message`.
158+
The definition:
159+
```python
160+
class Bot:
161+
def process_message(message: Optional[intelmq.lib.message.Message] = None):
162+
```
163+
For collectors:
164+
It takes *no* messages as input and returns a list of messages.
165+
For parsers, experts and outputs:
166+
It takes exactly one message as input and returns a list of messages.
167+
The messages are neither serialized nor encoded in any form, but are objects
168+
of the `intelmq.lib.message.Message` class. If the message is of instance a dict
169+
(with or without `__type` item), it will be automatically converted to the appropriate
170+
Message object (`Report` or `Event`, depending on the Bot type).
171+
172+
Return value is a list of messages sent by the bot.
173+
No exceptions of the bot are caught, the caller should handle them according to their needs.
174+
The bot does not dump any messages to files on errors, irrelevant of the bot's dumping configuration.
175+
176+
As bots can send messages to multiple queues, the return value is a dictionary of all destination queues.
177+
The items are lists, holding the sent messages.
178+
179+
#### Option: Processing multiple messages at once
180+
This is a more complex situation in regards to error handling.
181+
Should one exception stop the processing?
182+
Should the processing continue and the exceptions be saved in a variable that is returned at the end with the sent messages?
183+
184+
## Examples
185+
186+
### Domain Suffix Expert Example
187+
188+
```python
189+
from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
190+
from intelmq.lib.bot import BotLibSettings
191+
domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id
192+
settings=BotLibSettings | {
193+
'field': 'fqdn',
194+
'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
195+
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
196+
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
197+
# >>> output['output'][0]['source.domain_suffix']
198+
# 'com'
199+
```
200+
201+
### Accessing queues
202+
```python
203+
from intelmq.lib.bot import BotLibSettings
204+
205+
EXAMPLE_REPORT = {"feed.url": "http://www.example.com/",
206+
"time.observation": "2015-08-11T13:03:40+00:00",
207+
"raw": utils.base64_encode(RAW),
208+
"__type": "Report",
209+
"feed.name": "Example"}
210+
211+
bot = test_parser_bot.DummyParserBot('dummy-bot', settings=BotLibSettings |
212+
{'destination_queues': {'_default': 'output',
213+
'_on_error': 'error'}})
214+
215+
sent_messages = bot.process_message(EXAMPLE_REPORT)
216+
# sent_messages is now a dict with all queues. queue names below are examples
217+
218+
# this is the output queue
219+
assert sent_messages['output'][0] == MessageFactory.from_dict(test_parser_bot.EXAMPLE_EVENT)
220+
# this is a dumped message
221+
assert sent_messages['error'][0] == input_message
222+
```

README.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@ The IEPs should be discussion on the [intelmq-dev Mailinglist](https://lists.cer
1414
|004|[Internal Data Format: Meta Information and Data Exchange](004/)|Implementation waiting. Decided and formalized via a [JSON Schema](004/schema/schema.json)|3.x.0 or 4.0.0|
1515
|005|[Internal Data Format: Notification settings](005/)|Undiscussed|3.x.0 or 4.0.0|
1616
|006|[Internal Data Format: Msgpack as serializer](006/)|Undiscussed|3.x.0 or 4.0.0|
17+
|007|[Running IntelMQ as Python Library](007/)|Discussion in progress|3.2.0|
1718

1819
### Status legend
19-
* Implementation completed: IEP was approved by the community, the implementation is completed
20-
* Implementation in progress: IEP was approved by the community, the implementation is in progress
21-
* Implementation waiting: IEP was approved by the community, the implementation is waiting
22-
* Undecided: Community did not yet decide
2320
* Undiscussed: The IEP was not yet discussed and/or is not yet finished
21+
* Discussion in progress: Community did not yet decided
22+
* Implementation waiting: IEP was approved by the community, the implementation is waiting
23+
* Implementation in progress: IEP was approved by the community, the implementation is in progress
24+
* Implementation completed: IEP was approved by the community, the implementation is completed
2425
* Dismissed: IEP was not approved by the community

0 commit comments

Comments
 (0)