Improve the out-of-box experience for scientists #4964
Replies: 4 comments
-
Is the future for new projects using BOINC to only (automatically) work together with Science United? |
Beta Was this translation helpful? Give feedback.
-
I totally agree there should be simple easy ways to submit a single job, and multiple jobs, but first of all, Mary will need to have an easy way of setting up a project server (locally or if she chooses remotely in the Cloud). Mary wants to have decent documentation and preferably a book of high quality that guides her with all steps, provides lots of examples and recipes on how to cook her server and project, and with chapters on how to create her own tasks for different platforms. Mary is used to see how easy it is for her colleague to span new tasks on AWS or similar, so she would like to do similarly, but much better, easier I like docker but I'm wondering if running docker images inside a VirtualBox is very efficient. Simply doing an 'Hello World' takes a vast amount of resources on the client side. Many megabytes need to be downloaded by the client that don't add to the task at hand, many CPU cycles wasted on emulating and booting the ISO, diskspace wasted. And VirtualBox is not ok for GPU computing (which also should be made easy for Mary). Yes, its kind of easy, but efficiency should be relevant. Why not also look into running tasks under WSL, or simply spawn the docker images directly on the client machine (if linux based). |
Beta Was this translation helpful? Give feedback.
-
There was a time (about 8 years ago) when the the Debian package that created BOINC project servers was not completely useless. I would very much love to see this revived. But I would also love someone else to address this :) |
Beta Was this translation helpful? Give feedback.
-
Converting this to Conversation since it's a big topic to discuss before creating any particular tasks to be implemented. |
Beta Was this translation helpful? Give feedback.
-
Suppose a scientist (let’s call her Mary) needs lots of high-throughput computing and can’t afford the usual sources. Let’s assume that
Mary hears about volunteer computing and BOINC, and decides to investigate it. Mary will use BOINC only if this initial “out-of-box experience” (OOBE) is positive; i.e. she quickly tries out BOINC and is convinced that it works, that it’s useful to her, and that she wants to use it going forward. The ideal scenario is something like:
The current BOINC OOBE doesn’t achieve this. The main BOINC server documentation (https://boinc.berkeley.edu/trac/wiki/ProjectMain) is a sprawling mess. Marius’ Docker work (https://github.com/marius311/boinc-server-docker/blob/master/docs/cookbook.md) is a big step in the right direction, but more is needed to complete the above scenario.
BOINC competes with systems like HTCondor and AWS. We should study the OOBEs of these systems, borrow their good ideas, and make sure that we’re competitive. See, for example, https://www.youtube.com/channel/UCd1UBXmZIgB4p85t2tu-gLw
The goal
The following is a sketch of what I think the OOBE should be like. The target configuration involves:
Setting up the server host
This involves downloading a .gz file containing the BOINC server software and some VM and docker images. Then you run a script that asks one or two questions, then creates and runs a server (as Docker processes). It creates a read-me file saying:
Admin functions (start/stop server, create accounts for job submitters) are done through a web interface. After the initial setup there should be no need to log in.
Setting up a job submission host
This involves installing a package that contains job submission scripts (see below) but not the BOINC server.
Running jobs
We should handle at least two cases:
In each case, let’s assume that all files for an app are stored in a directory.
To submit a job:
boinc_run --app app_dir_path
Run this in a directory containing input files. It makes a job with those input files, running the given app. The file “cmdline”, if present, contains command-line args.
To run multiple jobs, create a directory for each job, and put input files there. Then do
boinc_run_jobs --app app_dir_path dir1 dir2 ...
To see the status of the job(s) started in the current directory:
boinc_status
If the job failed, show info like stderr output.
To abort jobs started in the current directory.
boinc_abort
To fetch the output files of completed jobs started in the current directory.
boinc_fetch
Note: fancier features can be added to this, but the basic features are ultra-simple. No XML editing, estimating job sizes, etc.
Implementation
The implementation shouldn’t be that hard. It’s based on technology we already have: boinc-server-docker and boinc2docker, and the remote job and file management mechanisms.
The server host setup script creates a BOINC project running in Docker containers, equipped with the VBox-based universal app, and some standard Docker containers, e.g. for Python apps.
On the submission host, each user has a directory ~/.boinc to contain various configuration and status files. A file ~/.boinc/apps contains a list of applications that have been used. Each one is identified by a directory path. We keep track of the mod time of the directory and the files in it; we maintain a Docker layer corresponding to the application.
The boinc_run command (a Python script) does the following:
boinc_status etc. use the remote job submission mechanism.
Computing resources
The scientist starts by running the BOINC client on one or more of their own computers (possibly Windows or Mac), and attaching to the project.
When things are working and they’re ready to scale up, they register with Science United, supplying their keywords. The vetting process may take a day or two. This will typically provide them with several hundred hosts.
Another possibility is to allow Science United users to register as “testers”, and to add a mechanism where projects can register as “test projects” on SU, with no vetting. Such projects would be allowed only to use VM apps with no network access (we’d need to add a mechanism for this). They’d get some number of hosts (50-100) for a few days.
Restructuring server documentation
Once we have this working, we need to reorganize the server docs in such a way that scientists are initially steered toward the OOBE described here, but can still access lower-level info.
Beta Was this translation helpful? Give feedback.
All reactions