|
| 1 | +# IEP007 - Running IntelMQ bots as Python Library |
| 2 | + |
| 3 | +A working example call (Proof of Concept) is located here: |
| 4 | +https://github.com/wagner-intevation/intelmq/blob/bot-library/intelmq/tests/lib/test_bot.py#L141 |
| 5 | + |
| 6 | +## Background |
| 7 | +As of IntelMQ 3.1.0, IntelMQ Bots can only be started on the command line with `intelmqctl`. |
| 8 | +Most tools (including the IntelMQ API, and thus the IntelMQ Manager) use `intelmqctl start` to start bot instances. |
| 9 | +`intelmqctl start` spawns a new child process and detaches it. |
| 10 | + |
| 11 | +Only `intelmqctl run` provides the ability to run bots interactively in the foreground of the command line and provides some neat features for debugging purposes. |
| 12 | + |
| 13 | +Starting IntelMQ bots using Python code requires much of effort (and code complexity). Additionally, the bot's parameters can only be provided by modifying the IntelMQ runtime configuration file. |
| 14 | +Messages can only be fed and retrieved from the bot by connecting to the pipeline (e.g.) separately and writing/reading properly serialized messages there. |
| 15 | + |
| 16 | +Integrating IntelMQ Bots into other (Python) tools is therefore hard to impossible in the current IntelMQ 3.1 version. |
| 17 | + |
| 18 | +In a nutshell, calling a bot and processing should take, at most, a few lines. |
| 19 | +The following complete example shows what the procedure could look like. |
| 20 | +The bot class is instantiated, passing a few parameters. |
| 21 | +```python |
| 22 | +from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot |
| 23 | +from intelmq.lib.bot import BotLibSettings |
| 24 | +domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id |
| 25 | + # the {} | {} syntax is available in Python >= 3.9 |
| 26 | + settings=BotLibSettings | { |
| 27 | + 'field': 'fqdn', |
| 28 | + 'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'} |
| 29 | +queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'}) |
| 30 | +# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix': |
| 31 | +# >>> output['output'][0]['source.domain_suffix'] |
| 32 | +# 'com' |
| 33 | +``` |
| 34 | + |
| 35 | +### Use cases |
| 36 | + |
| 37 | +#### General |
| 38 | +Any IntelMQ-related or third-party program may use IntelMQ's most potent components - IntelMQ's bots. |
| 39 | + |
| 40 | +The full potential shows off when stacking multiple bots together and iterating over lots of data: |
| 41 | + |
| 42 | +```python |
| 43 | +# instantiate all bots first, for an example see above |
| 44 | +domain_suffix = DomainSuffixExpertBot(...) |
| 45 | +url2fqdn = Url2fqdnExpertBot(...) |
| 46 | +http_status = HttpstatusExpertBot(...) |
| 47 | +tuency = TuencyExpertBot(...) |
| 48 | +lookyloo = LookylooExpertBot(...) |
| 49 | + |
| 50 | +# a list of input messages |
| 51 | +messages = [{...}] |
| 52 | + |
| 53 | +for message in message: |
| 54 | + for bot in (domain_suffix, |
| 55 | + url2fqdn, |
| 56 | + http_status, |
| 57 | + tuency, |
| 58 | + lookyloo): |
| 59 | + # for simiplicity we assume that the bots always send one message |
| 60 | + message = bot.process_message(message)['output'][0] |
| 61 | + # message now has the cumulated data of five bots |
| 62 | + |
| 63 | +# messages now is a list of output messages |
| 64 | +``` |
| 65 | + |
| 66 | +#### IntelMQ Webinput Preview |
| 67 | + |
| 68 | +The IntelMQ webinput can show previews of the *processed* data to the operator, not just the input data, adding much more value to the preview functionality. |
| 69 | +Currently the preview gives the operator feedback on the parsing step. The further processing of the data by the bots is invisible to the operator. |
| 70 | +This causes confusion and uncertainty for the operators. |
| 71 | + |
| 72 | +The Webinput backend can call the bots and process the events, without any interference to the running bot processes, pipelines and bot management. |
| 73 | +The data flow illustrated: |
| 74 | +``` |
| 75 | +Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> preview shown to operator |
| 76 | +``` |
| 77 | +The implementation details for the webinput are not part of this proposal document. |
| 78 | + |
| 79 | +In the next step, the webinput can also show previews of notifications (e.g. Emails). This is also not part of this proposal document. |
| 80 | +``` |
| 81 | +Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> notification tool (preview mode) -> notification preview shown to operator |
| 82 | +``` |
| 83 | + |
| 84 | +## Requirements |
| 85 | + |
| 86 | +### Messages and Pipeline |
| 87 | +Providing input messages as function parameters and receiving output messages should be possible. |
| 88 | +In this case, messages should not be serialized or encoded, they should stay Message objects (derived from `dict` and behaving like dictionaries). |
| 89 | + |
| 90 | +It should also be possible to let the bot use the configured pipeline (e.g. redis) and behave like a normal bot. |
| 91 | + |
| 92 | +### Exceptions and dumped messages |
| 93 | +An exception in the bot's `process()` method should not be caught in intermediate layers and raised to the caller's function call. |
| 94 | + |
| 95 | +Option: If there is a helper function to call `process()` multiple times (having a bunch of input messages), the exceptions are caught together with the (dumped) messages, accessible to the caller. |
| 96 | + |
| 97 | +### Parameters and Configuration |
| 98 | +The global IntelMQ configuration should be effective. |
| 99 | +The user may override configuration options by providing a configuration dictionary. |
| 100 | + |
| 101 | +#### Pre-configured bots |
| 102 | +It should be possible to run bots defined in IntelMQ's runtime configuration file. Additional overriding parameters can be provided. |
| 103 | + |
| 104 | +#### Un-Configured bots |
| 105 | +It should be possible to run bots, which are not defined in IntelMQ's runtime configuration file. The bot configuration is provided as function parameter. |
| 106 | + |
| 107 | +### Signals |
| 108 | +Normally bots react to signals like SIGTERM, SIGHUP and SIGINT to treat them specially. |
| 109 | +In library mode, this would interfere with the signal handling of the calling code. |
| 110 | +Thus, IntelMQ bots called as library must not manipulate the signal handling or call `sys.exit`. |
| 111 | + |
| 112 | +### Logging |
| 113 | + |
| 114 | +By default, IntelMQ logs to `/var/log/intelmq/` or `/opt/intelmq/var/log`, respectively - depending on the installation type and actual configuration. |
| 115 | +When IntelMQ bots are called by other external scripts as library, the logging is in most cases not wanted and causes permission errors. |
| 116 | +On the other hand, the logging might as well be fine. |
| 117 | +There should be an easy way to disable the file-logging, while keeping the possibility to use the default behavior. |
| 118 | + |
| 119 | +### Dependency on configuration files |
| 120 | + |
| 121 | +IntelMQ in library-mode must not depend on existing IntelMQ configuration files or logging directories, but be able to behave as IntelMQ normally do. |
| 122 | +It is up to the user to decide the behavior, e.g. if the log of the bot should be written to files. |
| 123 | + |
| 124 | +IntelMQ normally loads the runtime configuration file `/etc/intelmq/runtime.yaml` or `/opt/intelmq/etc/runtime.yaml`. |
| 125 | +In library-mode, IntelMQ tries to load the file, but does continue normally if it does not exist. |
| 126 | + |
| 127 | +IntelMQ normally loads the harmonization configuration file `/etc/intelmq/harmonization.yaml` or `/opt/intelmq/etc/harmonization.yaml`. |
| 128 | +In library-mode, IntelMQ tries to load the file, and if it does not exist, loads the internal default harmonization configuration, which is part of the IntelMQ packages. |
| 129 | + |
| 130 | +## Rationales |
| 131 | + |
| 132 | +### Compatibility |
| 133 | +Since the beginning of IntelMQ, the bot's `process` methods use the methods `self.receive_message`, `self.acknowledge_message` and `self.send_message`. |
| 134 | +Breaking this paradigm and changing to method parameters and return values or generator yields would indicate an API change and thus lead to IntelMQ version 4.0. |
| 135 | +Thus, we stick to the current behavior. |
| 136 | + |
| 137 | +## Specification |
| 138 | + |
| 139 | +Only changes in the `intelmq.lib.bot.Bot` class are needed. |
| 140 | +No changes in the bots' code are required. |
| 141 | + |
| 142 | +### Bot constructor |
| 143 | + |
| 144 | +The operator constructs the bot by initializing the bot's class. |
| 145 | +Global and bot configuration parameters are provided as parameter to the constructor in the same format as IntelMQ runtime configuration. |
| 146 | + |
| 147 | +```python |
| 148 | +class Bot: |
| 149 | + def __init__(bot_id: str, |
| 150 | + *args, **kwargs, # any other paramters left out for clarity |
| 151 | + settings: Optional[dict] = None) |
| 152 | +``` |
| 153 | +After reading the runtime configuration file, the constructor applies all values of the `settings` parameter. |
| 154 | + |
| 155 | +### Method call |
| 156 | + |
| 157 | +The `intelmq.lib.bot.Bot` class gets a new method `process_message`. |
| 158 | +The definition: |
| 159 | +```python |
| 160 | +class Bot: |
| 161 | + def process_message(message: Optional[intelmq.lib.message.Message] = None): |
| 162 | +``` |
| 163 | +For collectors: |
| 164 | + It takes *no* messages as input and returns a list of messages. |
| 165 | +For parsers, experts and outputs: |
| 166 | + It takes exactly one message as input and returns a list of messages. |
| 167 | +The messages are neither serialized nor encoded in any form, but are objects |
| 168 | +of the `intelmq.lib.message.Message` class. If the message is of instance a dict |
| 169 | +(with or without `__type` item), it will be automatically converted to the appropriate |
| 170 | +Message object (`Report` or `Event`, depending on the Bot type). |
| 171 | + |
| 172 | +Return value is a list of messages sent by the bot. |
| 173 | +No exceptions of the bot are caught, the caller should handle them according to their needs. |
| 174 | +The bot does not dump any messages to files on errors, irrelevant of the bot's dumping configuration. |
| 175 | + |
| 176 | +As bots can send messages to multiple queues, the return value is a dictionary of all destination queues. |
| 177 | +The items are lists, holding the sent messages. |
| 178 | + |
| 179 | +#### Option: Processing multiple messages at once |
| 180 | +This is a more complex situation in regards to error handling. |
| 181 | +Should one exception stop the processing? |
| 182 | +Should the processing continue and the exceptions be saved in a variable that is returned at the end with the sent messages? |
| 183 | + |
| 184 | +## Examples |
| 185 | + |
| 186 | +### Domain Suffix Expert Example |
| 187 | + |
| 188 | +```python |
| 189 | +from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot |
| 190 | +from intelmq.lib.bot import BotLibSettings |
| 191 | +domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id |
| 192 | + settings=BotLibSettings | { |
| 193 | + 'field': 'fqdn', |
| 194 | + 'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'} |
| 195 | +queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'}) |
| 196 | +# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix': |
| 197 | +# >>> output['output'][0]['source.domain_suffix'] |
| 198 | +# 'com' |
| 199 | +``` |
| 200 | + |
| 201 | +### Accessing queues |
| 202 | +```python |
| 203 | +from intelmq.lib.bot import BotLibSettings |
| 204 | + |
| 205 | +EXAMPLE_REPORT = {"feed.url": "http://www.example.com/", |
| 206 | + "time.observation": "2015-08-11T13:03:40+00:00", |
| 207 | + "raw": utils.base64_encode(RAW), |
| 208 | + "__type": "Report", |
| 209 | + "feed.name": "Example"} |
| 210 | + |
| 211 | +bot = test_parser_bot.DummyParserBot('dummy-bot', settings=BotLibSettings | |
| 212 | + {'destination_queues': {'_default': 'output', |
| 213 | + '_on_error': 'error'}}) |
| 214 | + |
| 215 | +sent_messages = bot.process_message(EXAMPLE_REPORT) |
| 216 | +# sent_messages is now a dict with all queues. queue names below are examples |
| 217 | + |
| 218 | +# this is the output queue |
| 219 | +assert sent_messages['output'][0] == MessageFactory.from_dict(test_parser_bot.EXAMPLE_EVENT) |
| 220 | +# this is a dumped message |
| 221 | +assert sent_messages['error'][0] == input_message |
| 222 | +``` |
0 commit comments