-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace logshipper.tail with our own tail implementation #4191
Conversation
56f9a2b
to
9d18f9f
Compare
Thanks for splitting this out in separate PR 👍 Did you happen to have time to look into the issue I mentioned in the other PR yet? |
You're referring to this comment, right? Right now I'm just playing around trying to avoid having to implement irrelevant changes (like this one). No idea why pylint wakes up and complains about files I never touched, so I'm trying to implement it in such a way that I work around that particular issue. I'm still figuring out how to test the file watch sensor. Once I add tests for that (even failing ones), I'll have a better idea of how to fix them, including the eventlet issue you pointed out. |
9dfbe39
to
f3f498f
Compare
7537de0
to
7ab6ed1
Compare
6c542fe
to
988dce0
Compare
Okay, I rewrote this to use my Python 2.7 fork of watchdog3. Watchdog uses different APIs from different OSes to monitor for file changes - inotify (Linux), FSEvents and kqueue (OS X), ReadDirectoryChangesW (Windows), and kqueue (BSD), with a slow but general polling backend that works everywhere. Events are processed in a separate, full Python thread, so there's no longer any more need for eventlet greenthreads. So, oddly enough, this change means that the |
Is the If it's not, then I really don't see much benefit of this approach over the existing / old one - we still need to maintain a custom fork (and that was one of the primary reasons to move away from it - not needed to depend on the forked library). I would even argue it's worse because now we need to maintain one fork + copied and modified logshipper tail code in StackStorm. |
Yes, watchdog3 is maintained - I've already had one PR merged in, another PR waiting, and I'm planning on submitting a third PR today from my But, as for why I picked watchdog to begin with: I realized that using eventlet and green threads would require client code to call I only re-implemented the tail API from logshipper to handle keeping track of all of the files being tailed. I figured it was good to separate concerns - The fact that watchdog (and now I'm fairly confident that Nuxeo will continue to support watchdog3 because they explicitly say so in their description on PyPI, and they have a Nuxeo Drive product that sounds like it's a competitor to Dropbox or Google Drive - file monitoring is super important for an app like that. So I can get my patches merged in I will simply have to change where watchdog3 is installed from. Another interesting project is aionotify, which is a "simple, asyncio-based inotify library", but it only supports Python 3.4+, and hasn't seen any updates since 2016. |
…logshipper and pyinotify dependencies
requirements.txt
Outdated
@@ -53,6 +51,8 @@ sseclient==0.0.19 | |||
stevedore==1.28.0 | |||
tooz==1.62.0 | |||
unittest2 | |||
# TODO: Move this to into the StackStorm organization | |||
git+https://github.com/blag/watchdog3.git#python2.7-support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is auto generating when you run "make requirements". I assume it was updated manually since it's missing "inotify" dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you will also need to add watchdog3
linux pack requirement to st2common/in-requirements.txt
otherwise it won't be included in final generated requirements.txt
.
Please also add a comment in there why we need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed notation is also wrong, should be @git_revision#python_egg_name
, so git+https://github.com/blag/[email protected]#egg=watchdog3
The changes look good to me 👍 I still need to test them locally (plan to do it today) and I would prefer to merge this PR once your |
contrib/linux/requirements.txt
Outdated
pyinotify>=0.9.5,<=0.10 | ||
-e git+https://github.com/Kami/logshipper.git@stackstorm_patched#egg=logshipper | ||
git+https://github.com/blag/watchdog3.git#python2.7-support | ||
pathtools3>=0.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't need to specify this here since it's already specified in watchdog3
setup.py file.
I tried to test it and immediately encountered an issue. Te process crashes if the file which is to be tailed doesn't exist yet. Currently with logshipper we also support scenarios where the file doesn't exist yet and when it gets re-created. So we should also support that with the new underlying backend. 2018-07-18 08:37:59,305 INFO [-] Running sensor in passive mode
2018-07-18 08:37:59,306 DEBUG [-] starting tails
2018-07-18 08:37:59,314 DEBUG [-] Found existing trigger: TriggerDB(description=None, id=5b2cc8de962d740e5d3bbf4e, name="d544c39d-52bd-4dd0-baa3-c00e45e66363", pack="linux", parameters={u'file_path': u'/tmp/st2_test'}, ref="linux.d544c39d-52bd-4dd0-baa3-c00e45e66363", ref_count=1, type="linux.file_watch.line", uid="trigger:linux:d544c39d-52bd-4dd0-baa3-c00e45e66363:851746484aeb7ece7f72c1e68c2d91a3") in db.
2018-07-18 08:37:59,314 DEBUG [-] Calling sensor "add_trigger" method (trigger.type=linux.file_watch.line)
2018-07-18 08:37:59,315 DEBUG [-] adding tail /tmp/st2_test
2018-07-18 08:37:59,316 INFO [-] Opening tail /tmp/st2_test
2018-07-18 08:37:59,317 ERROR [-] Traceback (most recent call last):
2018-07-18 08:37:59,317 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 458, in fire_timers
2018-07-18 08:37:59,320 ERROR [-] timer()
2018-07-18 08:37:59,321 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
2018-07-18 08:37:59,323 ERROR [-] cb(*args, **kw)
2018-07-18 08:37:59,324 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 218, in main
2018-07-18 08:37:59,325 ERROR [-] result = function(*args, **kwargs)
2018-07-18 08:37:59,326 ERROR [-] File "/data/stanley/st2common/st2common/services/triggerwatcher.py", line 142, in _load_triggers_from_db
2018-07-18 08:37:59,327 ERROR [-] self._handlers[publishers.CREATE_RK](trigger)
2018-07-18 08:37:59,328 ERROR [-] File "/data/stanley/st2reactor/st2reactor/container/sensor_wrapper.py", line 277, in _handle_create_trigger
2018-07-18 08:37:59,328 ERROR [-] self._sensor_instance.add_trigger(trigger=trigger)
2018-07-18 08:37:59,329 ERROR [-] File "/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py", line 46, in add_trigger
2018-07-18 08:37:59,330 ERROR [-] self._tail.add_file(filepath=file_path, seek_to_end=seek_to_end)
2018-07-18 08:37:59,330 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 87, in add_file
2018-07-18 08:37:59,331 ERROR [-] self.tails[filepath] = self.open_tail(filepath, recursive, seek_to_end)
2018-07-18 08:37:59,332 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 184, in open_tail
2018-07-18 08:37:59,332 ERROR [-] fd = os.open(path, os.O_RDONLY | os.O_NONBLOCK)
2018-07-18 08:37:59,333 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/green/os.py", line 109, in open
2018-07-18 08:37:59,339 ERROR [-] fd = __original_open__(file, flags, mode)
2018-07-18 08:37:59,340 ERROR [-] OSError: [Errno 2] No such file or directory: '/tmp/st2_test'
2018-07-18 08:37:59,342 INFO [-] Stopping trigger watcher
2018-07-18 08:37:59,343 INFO [-] Invoking cleanup on sensor
2018-07-18 08:37:59,343 DEBUG [-] stopping tails
2018-07-18 08:37:59,344 DEBUG [-] remove tails And even when I create the file I get this error which makes it look like we are using wrong primitives for listening for events on a directory and not a file. 2018-07-18 08:39:39,922 ERROR [-] Traceback (most recent call last):
2018-07-18 08:39:39,923 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 114, in wait
2018-07-18 08:39:39,925 ERROR [-] listener.cb(fileno)
2018-07-18 08:39:39,926 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 218, in main
2018-07-18 08:39:39,930 ERROR [-] result = function(*args, **kwargs)
2018-07-18 08:39:39,932 ERROR [-] File "/data/stanley/st2common/st2common/services/triggerwatcher.py", line 142, in _load_triggers_from_db
2018-07-18 08:39:39,933 ERROR [-] self._handlers[publishers.CREATE_RK](trigger)
2018-07-18 08:39:39,935 ERROR [-] File "/data/stanley/st2reactor/st2reactor/container/sensor_wrapper.py", line 277, in _handle_create_trigger
2018-07-18 08:39:39,936 ERROR [-] self._sensor_instance.add_trigger(trigger=trigger)
2018-07-18 08:39:39,936 ERROR [-] File "/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py", line 46, in add_trigger
2018-07-18 08:39:39,937 ERROR [-] self._tail.add_file(filepath=file_path, seek_to_end=seek_to_end)
2018-07-18 08:39:39,938 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 87, in add_file
2018-07-18 08:39:39,939 ERROR [-] self.tails[filepath] = self.open_tail(filepath, recursive, seek_to_end)
2018-07-18 08:39:39,939 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 185, in open_tail
2018-07-18 08:39:39,940 ERROR [-] watch = self.observer.schedule(self, path, recursive)
2018-07-18 08:39:39,940 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/observers/api.py", line 296, in schedule
2018-07-18 08:39:39,941 ERROR [-] emitter.start()
2018-07-18 08:39:39,941 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/utils/__init__.py", line 105, in start
2018-07-18 08:39:39,942 ERROR [-] self.on_thread_start()
2018-07-18 08:39:39,943 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/observers/inotify.py", line 119, in on_thread_start
2018-07-18 08:39:39,944 ERROR [-] self._inotify = InotifyBuffer(path, self.watch.is_recursive)
2018-07-18 08:39:39,944 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/observers/inotify_buffer.py", line 37, in __init__
2018-07-18 08:39:39,945 ERROR [-] self._inotify = Inotify(path, recursive)
2018-07-18 08:39:39,945 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/observers/inotify_c.py", line 198, in __init__
2018-07-18 08:39:39,947 ERROR [-] self._add_dir_watch(path, recursive, event_mask)
2018-07-18 08:39:39,948 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/watchdog/observers/inotify_c.py", line 393, in _add_dir_watch
2018-07-18 08:39:39,948 ERROR [-] raise OSError("Path is not a directory")
2018-07-18 08:39:39,948 ERROR [-] OSError: Path is not a directory
2018-07-18 08:39:39,949 ERROR [-] Removing descriptor: 5 I would need to check logshipper, but we might need to actually listen for both events - file creation in directory to see when the file is created (for first issue mentioned above) so we can start tailing it and then add watcher on a file. We also need to be sure we correctly remove watchers, etc. so we won't be leaking them around. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the inline comments - it's not working yet and needs more work.
@@ -27,7 +27,7 @@ passlib==1.7.1 | |||
lockfile==0.12.2 | |||
python-gnupg==0.4.2 | |||
jsonpath-rw==1.4.0 | |||
pyinotify==0.9.6 | |||
git+https://github.com/blag/watchdog3.git#python2.7-support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please move it to StackStorm
org?
A lot of issues when we're creating forks under own github user account, instead of org.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comment above - it looks like the upstream project is active so I would just wait for @blag's changes to me merged and then pull directly from a specific revision from the upstream repo (it will probably take a while until they publish new version to PyPi, but maybe @blag can poke them :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for one way or another that will remove presence of personal github accounts
On a related note (addition to my comment here - #4191 (comment)), we should add some end to end (st2tests tests) for this sensor (at least two scenarios - file already exists, file doesn't exist yet and it's re created and appended to multiple times). As mentioned above, it's not working at all, yet none of our existing tests caught that. |
Actually, it was working when the tests ran - that's one of the tests we run (see My last PR for upstream watchdog3 adds support for following files as well as directories, and is already merged into I'm happy with waiting until that's merged into upstream, and then simply using upstream's version once they push it to PyPI. I'll push them to do that once that last PR is merged. |
@blag Thanks for the update. Two things:
|
I manually installed it using pip from the supposedly correct branch: pip install "git+https://github.com/blag/watchdog3.git@for-stackstorm#egg=watchdog3" Yet is still failing when the file doesn't exist: (virtualenv) vagrant@ubuntu-xenial:/data/stanley$ /data/stanley/virtualenv/bin/python2.7 /data/stanley/st2reactor/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args='["--debug"]'
2018-07-19 08:00:05,662 INFO [-] No config found for sensor "FileWatchSensor"
2018-07-19 08:00:05,663 INFO [-] Watcher started
2018-07-19 08:00:05,664 INFO [-] Running sensor initialization code
2018-07-19 08:00:05,665 INFO [-] Running sensor in passive mode
2018-07-19 08:00:05,665 DEBUG [-] starting tails
2018-07-19 08:00:05,674 DEBUG [-] Start from server, version: 0.9, properties: {'information': 'Licensed under the MPL. See http://www.rabbitmq.com/', 'product': 'RabbitMQ', 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'capabilities': {'exchange_exchange_bindings': True, 'connection.blocked': True, 'authentication_failure_close': True, 'basic.nack': True, 'per_consumer_qos': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'publisher_confirms': True}, 'cluster_name': 'rabbit@ubuntu-xenial', 'platform': 'Erlang/OTP', 'version': '3.5.7'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: [u'en_US']
2018-07-19 08:00:05,678 INFO [-] Connected to amqp://guest:**@127.0.0.1:5672//
2018-07-19 08:00:05,680 DEBUG [-] using channel_id: 1
2018-07-19 08:00:05,684 DEBUG [-] Channel open
2018-07-19 08:00:05,687 INFO [-] Stopping trigger watcher
2018-07-19 08:00:05,688 DEBUG [-] Closed channel #1
2018-07-19 08:00:05,690 DEBUG [-] Found existing trigger: TriggerDB(description=None, id=5b2cc8de962d740e5d3bbf4e, name="d544c39d-52bd-4dd0-baa3-c00e45e66363", pack="linux", parameters={u'file_path': u'/tmp/st2_test'}, ref="linux.d544c39d-52bd-4dd0-baa3-c00e45e66363", ref_count=1, type="linux.file_watch.line", uid="trigger:linux:d544c39d-52bd-4dd0-baa3-c00e45e66363:851746484aeb7ece7f72c1e68c2d91a3") in db.
2018-07-19 08:00:05,690 DEBUG [-] Calling sensor "add_trigger" method (trigger.type=linux.file_watch.line)
2018-07-19 08:00:05,691 DEBUG [-] adding tail /tmp/st2_test
2018-07-19 08:00:05,693 INFO [-] Opening tail /tmp/st2_test
2018-07-19 08:00:05,694 ERROR [-] Traceback (most recent call last):
2018-07-19 08:00:05,704 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 114, in wait
2018-07-19 08:00:05,706 ERROR [-] listener.cb(fileno)
2018-07-19 08:00:05,706 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 218, in main
2018-07-19 08:00:05,708 ERROR [-] result = function(*args, **kwargs)
2018-07-19 08:00:05,708 ERROR [-] File "/data/stanley/st2common/st2common/services/triggerwatcher.py", line 142, in _load_triggers_from_db
2018-07-19 08:00:05,709 ERROR [-] self._handlers[publishers.CREATE_RK](trigger)
2018-07-19 08:00:05,710 ERROR [-] File "/data/stanley/st2reactor/st2reactor/container/sensor_wrapper.py", line 277, in _handle_create_trigger
2018-07-19 08:00:05,711 ERROR [-] self._sensor_instance.add_trigger(trigger=trigger)
2018-07-19 08:00:05,712 ERROR [-] File "/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py", line 43, in add_trigger
2018-07-19 08:00:05,713 ERROR [-] self._tail.add_file(filepath=file_path, seek_to_end=seek_to_end)
2018-07-19 08:00:05,713 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 87, in add_file
2018-07-19 08:00:05,714 ERROR [-] self.tails[filepath] = self.open_tail(filepath, recursive, seek_to_end)
2018-07-19 08:00:05,715 ERROR [-] File "/data/stanley/contrib/linux/sensors/tail.py", line 184, in open_tail
2018-07-19 08:00:05,716 ERROR [-] fd = os.open(path, os.O_RDONLY | os.O_NONBLOCK)
2018-07-19 08:00:05,716 ERROR [-] File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/eventlet/green/os.py", line 109, in open
2018-07-19 08:00:05,717 ERROR [-] fd = __original_open__(file, flags, mode)
2018-07-19 08:00:05,718 ERROR [-] OSError: [Errno 2] No such file or directory: '/tmp/st2_test'
2018-07-19 08:00:05,719 ERROR [-] Removing descriptor: 5
2018-07-19 08:00:05,719 INFO [-] Invoking cleanup on sensor
2018-07-19 08:00:05,720 DEBUG [-] stopping tails
2018-07-19 08:00:05,721 DEBUG [-] remove tails And also if the file exists (sensor just immediately exits): (virtualenv) vagrant@ubuntu-xenial:/data/stanley$ /data/stanley/virtualenv/bin/python2.7 /data/stanley/st2reactor/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args='["--debug"]'
2018-07-19 08:01:21,288 INFO [-] No config found for sensor "FileWatchSensor"
2018-07-19 08:01:21,289 INFO [-] Watcher started
2018-07-19 08:01:21,289 INFO [-] Running sensor initialization code
2018-07-19 08:01:21,290 INFO [-] Running sensor in passive mode
2018-07-19 08:01:21,291 DEBUG [-] starting tails
2018-07-19 08:01:21,300 INFO [-] Stopping trigger watcher
2018-07-19 08:01:21,301 INFO [-] Invoking cleanup on sensor
2018-07-19 08:01:21,302 DEBUG [-] stopping tails
2018-07-19 08:01:21,302 DEBUG [-] remove tails |
And the tests itself seem to be testing just the tail implementation (which is fine) and not the sensor (which doesn't work right now). It just means we need additional tests for the sensor. And as mentioned above, we also need to correctly handle scenario where the file doesn't exist yet when the tail watcher is started. |
And remove logshipper and pyinotify dependencies
Split out from #4145.
This PR pulls in the
tail
implementation from the logshipper package, switches that over to using my fork of thewatchdog3
project instead of the (unmaintained)pyinotify
package, and re-implements theFileWatchSensor
to use the new tail implementation.Fixes #3934.
TODO:
Tests forFileWatchSensor