Replacing FUSE with a notification system #339

silverdaz · 2018-08-21T15:54:29Z

This PR replaces the FUSE program with a notification system. We also upgraded the authentication mechanism to support multiple user ids.

It has the following benefits:

No mountpoint from /lega/<user> to /ega/inbox/<user>. In case something went wrong the uploaded file would not end up in the inbox.
No need to use cron to clean up the mountpoints (since we chroot the user in its home directory, umount would fail).
No FUSE library, no device mapping /dev/fuse, no FUSE python code. The file system calls do not go through the library, so we get performance.
No Swarm limitation.

Regarding the authentication mechanism:

No common user lega to impersonate all logged-in users. Instead, each user has its own id. Each site administration can configure in which range the user id lands (by shifting the user id). The fake
CentralEGA is updated accordingly.
NSS+PAM code uses SQLite 3.24 (the latest) as a cache.
We added more verbose description of the NSS+PAM configurations.
Better logic for expiration of the cache entries.
Logs are sent to syslog, and progress is sent to stderr (if compiled with the debug flag).
No JQ+JV dependency for JSON parsing. Instead, we use a lightweight embedded code.
the auth code has been tested with Aspera, OpenSSH SFTP server and ProFTPd.

Regarding the SFTP server and the notification system:

We use OpenSSH version 7.7p1 (the latest).
We lock out SSH connections and only allow SFTP connections (on port 9000).
We added a hook to send a notification to a listener, when a file is (re)uploaded. If connection is not established, it's just business as usual, nothing else happens. If connection is established, the name of the user and the uploaded file are sent to the listener. The listener forwards the message to the local broker. Technically, a stream of (username,filepath) are sent over a TCP connection. The listener is an async TCP server. We use out-of-band messages and did not use the TCP_NODELAY options.
We bound the listener to the lo interface (no external access).
One connection is established per SFTP connection.
The solution does chroot the user in its home directory, but this option is configurable.

Moreover, this PR solves Issue #329, by making the postgres data persistent in its own volume. Rebooting the database should pick up the database where it left off.

Finally, the lega.yml is slightly updated in order to allow scalability of the ingest and verify workers.

Note: we could send the message directly to the local broker instead of the listener, but using pika and conf.ini in the listener was so much simpler than using a C-based AMQPS client. After rudimentary tests, it seems performant enough.

blankdots · 2018-08-22T05:35:04Z

docker/bootstrap/lega.sh

@@ -204,87 +150,48 @@ services:
      - POSTGRES_USER=${DB_USER}
      - POSTGRES_PASSWORD=${DB_PASSWORD}
      - POSTGRES_DB=lega
+      - PGDATA=/ega/data
    hostname: db
    container_name: db
    image: postgres:latest


we should fix postgres to 9.6 .. we even mention this in the docs: https://localega.readthedocs.io/en/latest/db.html?highlight=9.6
Also replace other mentions of postgres:latest

…id not.

silverdaz · 2018-08-22T16:54:10Z

This PR also adds a stage (stages run in parallel according to Stefan) where we run the simple Makefile. That makefile has been adapted so that make is enough. No need to update any variable first. It creates a user, a random file, encrypts it, uploads it and triggers a fake submission. FILESIZE can be specified as an argument to the make command to change the size. The default is 10MB. I tried with 1GB too.

I ran some numbers:

Inbox	Upload speed	Memory consumption
OpenSSH	47MB/s	21.3MB to 38MB
Apache Mina	33MB/s	344MB to 346MB

I don't know how you get 1GB for Apache Mina, it seems to be stable roughly around 345MB.

The upload speed is not very good, mostly because we are using the default driver for the docker volumes (which are...on files). Normally, the speed should increase with better volume solutions. Let us know what you get in your respective deployment. OpenSSH seems 50% faster than Apache Mina.

blankdots · 2018-08-23T05:31:40Z

@silverdaz would be awesome if the history would not contain so many commits with "weird" (but funny) messages - the last 16 commit can and should be squashed.

This PR also adds a stage (stages run in parallel according to Stefan) where we run the simple Makefile.

It is according to: https://docs.travis-ci.com/user/build-stages#what-are-build-stages

I don't know how you get 1GB for Apache Mina, it seems to be stable roughly around 345MB.

Got the information from Openshift (also my instance has been up for 22 days and this grew over time):

Would like to monitor the inbox solutions over time > 10 days of uptime, don't know the uptime on the "Memory Consumption" numbers illustrated, and probably load tests should be implemented to stress out those solutions, but again this is not the subject of this issue and scalability is not address at this moment, as we have no indication of any usage expectations.

Moreover, this PR solves Issue #329 ..

We observed this behaviour in Kubernetes deployment as well (with attached volume, even the data is still there, the ids get overwritten), and would be more comfortable to say it is solved if we were to implement some sort of UUID-generation (some pitfalls of this explained here: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439)

No need to update MAIN_REPO Using dd for file creation. FILESIZE is a variable. Adding a target to check the MQ messages, for successful ingestion

blankdots

This addresses some of the discussions we had with using different FTP servers with the inbox, related to User story: #4 - using two different types of inboxes allows us to cover a wider range of servers.

I do not believe that the #329 is addressed with this issue, as it should be investigated further.

Additionally it bumps the versions some of python dependencies and other fixes

Replacing FUSE with a notification system

Frédéric Haziza and others added 19 commits August 20, 2018 14:01

Upgrading the use CEGA user ids and no more fuse

ffc1537

Moving deployments/docker to docker

5bb0cfe

async db was removed

50761d8

Awaiting the sleep coroutine

7d10766

Trying to fix the eureka tests

c6dcbf3

Cleaning

79fb52a

Fixing requirements in docker images

864ae73

Updating the docs

91d9885

Make ingest+verify scalable

29ef88f

Eureka interval renaming to match database naming

9d53b8a

Unnecessary switch removed

5f3f6e9

fix for unit tests to fit new wrapper

1de5d9e

Renaming inbox script into notifications

e4ac6b2

test notifications

7dfb128

Colorama not used anymore

7714258

new unit tests

52981ce

multi part message

0fb4145

update docs

d5aafcd

update travis folders

21f90ed

silverdaz added this to the Sprint 34 milestone Aug 21, 2018

silverdaz self-assigned this Aug 21, 2018

silverdaz requested a review from blankdots August 21, 2018 15:54

This was referenced Aug 21, 2018

Database IDs and Object Storage IDs may run out of sync #329

Open

Implement cronjob to palliate Swarm issues with devices. #312

Closed

Frédéric Haziza added 2 commits August 21, 2018 18:04

Travis update

004f24b

Travis update

a1e2a61

blankdots reviewed Aug 22, 2018

View reviewed changes

Frédéric Haziza added 3 commits August 22, 2018 12:20

Adding the --inbox switch back in, temporarily

93e2e9a

Trying to _not_ install the python dependencies on the travis host

47d6c22

Adding one more stage

05f0193

Frédéric Haziza added 2 commits August 22, 2018 12:49

Fair enough, I thought make up would create the cega network but it d…

7223a6b

…id not.

Locking postgres version to 9.6 and not "latest"

34e5064

Makefile update.

399751c

No need to update MAIN_REPO Using dd for file creation. FILESIZE is a variable. Adding a target to check the MQ messages, for successful ingestion

silverdaz force-pushed the feature/no-fuse branch from ef14a43 to 399751c Compare August 23, 2018 06:15

blankdots approved these changes Aug 23, 2018

View reviewed changes

blankdots merged commit d99640e into dev Aug 23, 2018

blankdots deleted the feature/no-fuse branch August 23, 2018 09:44

viklund pushed a commit that referenced this pull request Nov 22, 2018

Merge pull request #339 from NBISweden/feature/no-fuse

7bc93ac

Replacing FUSE with a notification system

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing FUSE with a notification system #339

Replacing FUSE with a notification system #339

silverdaz commented Aug 21, 2018

blankdots Aug 22, 2018

silverdaz commented Aug 22, 2018

blankdots commented Aug 23, 2018 •

edited

Loading

blankdots left a comment •

edited

Loading

Replacing FUSE with a notification system #339

Replacing FUSE with a notification system #339

Conversation

silverdaz commented Aug 21, 2018

blankdots Aug 22, 2018

Choose a reason for hiding this comment

silverdaz commented Aug 22, 2018

blankdots commented Aug 23, 2018 • edited Loading

blankdots left a comment • edited Loading

Choose a reason for hiding this comment

blankdots commented Aug 23, 2018 •

edited

Loading

blankdots left a comment •

edited

Loading