-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How notebooks will work in production #108
Comments
Hey Ryan!
All of this takes place in our AWS ECS cluster. core-service is currently deployed, however ml-workers currently is not. Looking to deploy ml-workers either today or tomorrow. Let me know if you have any questions! |
Thanks for the update Derek! I'm glad i asked ; ) This is useful info. I'll take a look at ml-workers in more detail later to understand how it works and may get back to you with some questions. I'm loving the progress! |
I've got a bit of code for creating templates from jupyter notebook files, as well as filling in certain keywords. I'll add a pull request to https://github.com/cognoma/ml-workers once I've finished a couple of use examples. |
That sounds good @wisygig. I'm looking forward to seeing how that works. @dcgoss I haven't gotten a chance to look at |
@rdvelazquez That's correct, at the moment the notebook just uses environment variables so while you can print out those values at runtime they aren't actually hardcoded into the notebook itself. Looking forward to your PR @wisygig :) |
@dcgoss I've added the PR: cognoma/ml-workers#13 |
This looks pretty cool! I skimmed through the code and I'm going to try to get it to work on my computer with my own example. I'll let you know if/when I have issues. and yes, the example (and README) are very useful. |
As cognoma.org is currently set up there seems to be no option to select a specific classifier/notebook. Are we thinking this will be the way cognoma works in production (i.e. only one notebook to handle all queries)? @dhimmel you would know better than me but I think having the option to have multiple notebooks that the user could choose from would be ideal. This could potentially be built in an open ended way so that if, in the future, someone comes up with an interesting analysis, they could submit a pull request to have their notebook added to the list of options and the cognoma.org interface would then allow anyone to do similar analysis with different gene/disease combinations using that notebook template. The cognoma.org interface seems like too nice of a tool to limit to one specific notebook. This is also of interest for how we handle certain choices (for lack of a better word) in the classifier implementation (number of PCA components, l1_ratio, test/train split size, etc.). I know some of these things could just be changed by hand in the notebook by the user but I wonder if that is the ideal solution, specifically if we want cognoma to be used by people with limited data science experience. A similar and related topic is how, if at all, cognoma will handle queries that are not well suited to the analysis. for example, if a user selects a gene that has very few (or no) mutated samples for the selected diseases will cognoma.org run the notebook anyway? Raise an error on the webpage? A warning? Should the notebook raise errors/warnings? My recommendations would be:
|
@rdvelazquez core-service and ml-workers are built to accommodate different types of notebooks, however we would need to make a few small changes. The process to implement new notebook would be:
|
I agree this would be nice. Today was @dcgoss's last day of his summer internship... so I don't want to task him with a big backend change. Especially if we're not definitely going to need it. Perhaps the right approach would be to upgrade the backend to support multiple notebooks only once / if we have multiple production ready notebooks.
The frontend should do some checks... like enforcing a minimum number of positives and a minimum number of negatives. Ideally, we can prevent the notebook from erroring, by catching the failure modes before query submission. Warnings in the notebook make sense if we detect something that appears problematic. |
@dcgoss Congrats on finishing the internship... Productive summer! @dhimmel Makes sense to keep it to just one notebook for now to limit the needed changes. There may be a chicken or egg dilemma (no one's going to make new notebooks if cognoma.org won't support them; cognoma.org won't support new notebooks if no one makes them) but we can cross that bridge if/when we get to it. The frontend checks and warnings in the notebook also make sense. Thanks for the responses! |
Closing this for now. @wisygig feel free to open this back up if you want to implement templating and want to revisit anything here. |
@wisygig, @dcgoss and/or @dhimmel, what are your thoughts on how the Jupyter Notebook part of the application will work? I'm specifically interested in:
n_components
) based on the query (Selecting the number of components returned by PCA #106) and we have also discussed letting the user select some parameters (l1_ratio
) based on their preference (Selecting the number of components returned by PCA #106).I know this may be getting ahead of ourselves so feel free to differ this till later but I thought I'd at least mention that these topics are starting to come up. This issue spans a few different repos but I thought the machine-learning repo might be the best place for it... I'll also tag #63 from cognoma/cognoma.
The text was updated successfully, but these errors were encountered: