-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic bindings for workers #31
Comments
Why isn't the closure in your example a good idea? This may be generalized as I guess you are looking to bypass the overhead of creating closures/bindings by executing tasks within the same bindings, with those bindings being determined after the kernel is created? I don't think there would be much (if any) performance benefit, and it suffers from modularity problems, but you can try
Any resurrected worker threads will have lost the new |
Thank you very much for the quick answer!
Because of a) performance (imagine setting the bindings every time
Hmmm, not quite. Imagine multiple independent threads (eg. (let ((*var* (thread-specific-data, eg. (random 5))))
(pmapcar (lambda (x) (+ *var* x)) input1)) The above should preferably "DWIM". But as Either at |
There are at least three ways to transfer the values of current dynamic bindings to tasks. The easiest is
with the usage being
That is the most modular solution. In the unlikely scenario that the overhead of creating a dynamic binding is significant (does that really happen in practice?), you can go to the other extreme by using global variables, which are both maximally efficient and maximally unmodular.
There is also an intermediate solution between these two extremes. You can wrap the unmodular code inside a macro:
with the usage being
Is there really a case for which all three of these solutions are inadequate? You mentioned I don't mean to sound hard. The way to make your case is to show a real situation where using global variables (as above) gives a meaningful improvement over A code walker that tweaks tasks containing specials is an interesting idea (even if mentioned it in jest), but remember that the main thread may be expected to have different dynamic bindings than the worker threads. |
Well, perhaps describing my usecase helps a bit. I've got some data on disk; a request coming in via hunchentoot causes that data to be loaded, and then processed. To reduce the answer latency I'd like to do some processing in parallel. Now, some of the tasks need to see the "whole data" (and some "slices") to know how to process the given job[1]; communicating these was done (in the single-threaded case) via Ad 1: Lots of logfiles. Sorting them eg. requires to know the servers' time offsets for the given point in time, so that logfiles can be associated across multiple servers. So there's a special that points to the whole dataset, and the |
Rather than having specials in predicates etc., perhaps you can just make closures and avoid the whole issue? That is, instead of
you have
Going back to the proposed change, did I understand it correctly? You want |
I already thought whether I can switch to closures; but it's not that easy.
No, not around
No restarting required. When a new task is created, it would need to pass the specials' symbols and their values along with the task.
Please stand by, I'll post a POC pull request later, perhaps that shows better what I'm trying to achieve. |
See phmarek@1989186 for a first draft (untested, just for discussion). |
I thought the purpose here was to amortize the binding of dynamic variables: performing a binding once and then many tasks are executed under that binding. That's what Maybe you misunderstood Could we put aside this comparison between |
Yeah, I guess that's what this discussion boils down to.
Not sure about that; I'm seeing specials more as per-thread global variables, and their implicit benefit is to not have to pass them along the whole call chain. I'll try to use |
Well I wrote If you want to maintain modularity and you don't want closures-without-specials or closures-with-specials (
It's safe because a single thread won't be executing two tasks at once. If it matters, you can save/restore the original values of the specials, leaving the worker thread the way it was. Of course a simple macro would fix the monotony in |
Well, excuse me if I bother you (and/or sound dumb); perhaps for a better explanation, here's an illustration. (defvar *data-set*)
(defclass data-set ()
((server-names :type list)
(server-time-delta :type hash-table)
(log-lines :type list)))
(defclass log-line ()
((timestamp :type fixnum)
(server :type string)
(tags)
(content)))
(lparallel:make-kernel 5 :bindings '(*data-set)
; :submit-time-bindings '(*data-set*)
)
(defun effective-time (line)
(- (timestamp line)
(gethash (server line)
(server-time-delta *data-set*))))
(hunchentoot:define-easy-handler (foo :uri "/foo") (data from to filter)
(let ((*data-set* (load-my-data data))
(ch (lparallel:make-channel)))
(let ((result (lparallel:premove-if-not (compile-filter filter)
(log-lines *data-set*))))
(loop for line in (lparallel:psort result #'effective-time)
...HTML output ...
)))) How would you make that run with I can see how
looks nice. Neither does the signal handler version... If we want to cut out the If there was a way to get the calling thread, and to ask special variable bindings from a given thread, then we could have a simple function that gets called from the worker threads, once. I guess I'm running around in circles... I'll do some benchmarks, and get back to you. |
To the issue of performance, it's difficult to believe that dynamically binding a variable has any cost significance compared to what goes on in a web server or most any other application. To the issue of looking "nice", lisp lets you abstract as far as you like. For example,
or you could write with a functional-language accent,
or you could use a
Given all the options available, including the aforementioned technique using the condition system (which wouldn't require rebinding the psort predicate, not that that matters), I don't think adding overhead to task creation and execution would be justified. I want lparallel to remain lightweight in order to cover even the hard cases (which are hard because the functions to be parallelized are cost-wise comparable to task overhead). Also, garbage collectors are sworn enemies of parallelization, and increasing the size of tasks would only make the enemy stronger. |
@phmarek After further considering special-variable-heavy cases like hunchentoot, I think you've convinced me that lparallel should provide such functionality. Please see b2e81a6. Your feedback would be valuable. Because it's implemented on top of the condition system, there's no additional overhead when the feature isn't used. Synopsis:
The lambda counterpart is
|
Looks good so far... The only gripe I have is that With (with-dynamic-tasks ((*foo* 3) (*bar* 100))
(pmapcar (dynamic-task (x)
(declare (ignore x))
sb-thread:*current-thread*)
:parts 1
'(1 2 3))) gives me
So it's always the same thread that runs the And there's something I don't understand yet... perhaps I'm doing something wrong, but the inner (with-dynamic-tasks ((*foo* 2) (*bar* 100))
(pmapcar #'add-foo-bar
(let ((*foo* 10) (*bar* 300))
(pmapcar #'add-foo-bar
'(1)))))
(205) Something like this can easily happen if some function hierarchy isn't aware of other functions using Yes, I know that this risks a deadlock if too few threads are available... But yeah, from the API it looks fine. |
The only way to avoid doing a binding inside each task is to break modularity, as in Dynamic tasks need to communicate with their parent thread. The endpoints of that line of communication are, on one end, There is a way to make this easier on the user, but it adds a little bit of task overhead: a special variable lookup. The trade-off is that the API is much cleaner -- in fact it's just a single special variable. See 9a0aac5. Usage is:
Unfortunately the special variable lookup is in most profiled and most scrutinized code path of the entire library, so I'd have to run some tests before being comfortable with such a change. |
I'm well aware that it needs to be done in each task - but the current approaches do it per invocation of the function passed in.
Please forget about the web server - that's just an initiator. So I still believe that the This would also reduce the impact of the |
But going back to |
I've pushed a tweak to the transfer-specials branch. Previously |
Sorry about the delay - I'm a bit busy right now. I haven't forgotten, and this issue is still in an open tab ;) Thank you very much! |
If you're still following then the new feature will probably be more like the transfer-bindings branch. API is
The feature should be scoped like this since it adds overhead even to tasks not using the bindings. I think it would be unexpected if setting the value of a special variable caused a global slowdown. I also wondered if |
I'm sorry about the long delay; I kept the browser tab open all the time, but there was too much going on. Please decide on some solution ( Thank you very much for all your patience and time! |
How about merging a solution to master, to get it into QL and "downstream"? Thanks! |
I've got a need to provide some special variables to the threads.
These bindings are not simply global items, like
make-kernel :bindings
allows; these depend on the specific task that's beingrequested. The special variables are mostly the same every time, but
their content depends on the caller.
While I could provide that via a closure, this doesn't sound like a good idea:
So I'd like to ask/discuss how to solve that.
worker-loop
with theexec-task/worker
function looks like a goodplace to do such bindings; one open question is whether to provide a
list of specials to relay at
make-kernel
time, and/or whether the variousfunctions (eg.
premove-if-not
) should get an additional argument(
:relay-bindings
?).The text was updated successfully, but these errors were encountered: