-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify lesson pipeline with Reticulate #202
base: main
Are you sure you want to change the base?
Conversation
- Add instructions for Bash and Python in addition to R - Add source("../bin/chunk-options.R") to all files
Do I understand correctly that this would introduce an R dependency to maintaining and testing the build of Python lessons locally? I'm not necessarily against this, just trying to understand the implications of what you propose. |
Per our discussion on Twitter, I really do like the idea of having one _episode bin per lesson. But I didn't appreciate the point that @jduckles is making about introducing an R dependency. We haven't had major issues on dc-python-ecology with contributors being confused about where to contribute (or, at least, they haven't raised those issues with us), so to me, having another package to be keeping track of might overwhelm my interest in seeing that implemented. |
What languages will this work for? I like the idea of automatically generating the output, but does it work for MATLAB? Git? Make? SQL? If not then we'll still have different pipelines for some lessons. Can you snip the output? Some lessons have a cut down version of the output e.g. |
I like the idea of writing executable lessons. It also opens for converting to R notebooks, and perhaps even Jupyter notebooks from the R Markdown (I do something similar, writing in a markup language and covering to notebooks, including execution). A drawback is that there is more syntax to learn for those that want to contribute... |
Yes.
Make the code output easy to maintain across different versions of libraries/packages. And make easy for people to contribute from the web interface.
knitr can also execute code in SQL, Rcpp, Stan and JavaScript.
Automatically generating the output of the Git lesson is a bit challenge because every time that we run Usually you call Make from Bash so since we can use Bash we can use Make. knitr supports SQL but I didn't try it yet in part because we were using the Firefox add-on and I didn't followed the discussion about the replacement very close. MATLAB is more interesting. I don't think that knitr supports it but even if it supported we would have to get a MATLAB license to automatically generating the output.
The only lesson that we couldn't automatically generating the output would be MATLAB and OpenRefine. But OpenRefine is not a text based tool. And implement state persistence between code chunks in RMarkdown isn't easy so add support to MATLAB would take at least weeks of work. Before reticulate, if you want to have two Python code chunks connected in RMarkdown you had to setup some socket-like thing that I couldn't find about anymore.
We could use something like we are using for exercises and solutions, i.e.
|
I'd like to comment as a biologist-coder-wannabe (the dark-side). Life scientists sometimes struggle to learn a coding language, and this has been exacerbated by the competition between R and python (and Perl before that). This would clearly, and justifiably make it easier for R coders to be maintainers. But it would be an additional challenge for life scientists who want to contribute. I can only speak for myself, but as @jduckles points out, this makes R, essentially, essential to contributing. The intent is good, and yes the maintenance of duplicate episodes requires effort, but even within our local University Carpentries group, there are coders that use either R or Python and aren't comfortable with both. As an old guy and non-coder this affects any contribution or effort to maintain a lesson (i.e requires a reticulate manual and practicing reticulate) before resuming contributions. I personally like the idea, but is it really, really, easy? Could reticulate be developed as a lesson, and made available to the community? It could be inserted into workshops as desired, and meanwhile would allow for motivated coder-wannabe's to learn. |
I think there is a misconception about what using reticulate would mean. Yes, it would make R an essential part of our pipeline to build lessons, but for all the non-R lesson you wouldn't need to know any more R that you currently need to know Ruby to contribute (because we use Jekyll to convert our markdown files into HTML files). There will be no reticulate to learn. Reticulate is the name of the machinery that makes it possible to have python chunks inside Rmarkdown documents. It would make it easier to contribute to our lessons. People who will be contributing will only need to write the chunks of code and not have to worry about generating the outputs (they will be generated automatically). As a consequence, it will also make our lessons better because the code chunks will be less likely to include bugs and typos (if they did, generating the lessons will create an error), and the code output chunks will always be up to date. This, in combination with using a continuous integration platform (e.g., Travis CI) for generating and deploying the lessons, will make for a much better experience for people interested in contributing to our lessons. |
Thank you @fmichonneau for reaffirming that reticulate is a machinery (what I might call a tool) and not a new language. And I sincerely thank you for recently helping me learn the final steps of using ruby-jekyll-kramdown 'serving' to check my markdown. Using reticulate is a good idea, an obvious choice, and it was not my intention to appear unsupportive. Perspective is important, and lack of understanding (ignorance) about what using reticulate means, is symptomatic of backgrounds for many biologists. My willingness to share this ignorance is intentional. My hope is that reticulate can be implemented with a protocol that is easy to follow to the very end. |
I think what is missing from the above conversation and the screenshots is a "how-to" for users. From what I can gather, a contributor would need to do two things (on top of the
If you break it down like this, you can see that this tool is asking people to learn a small amount of RMarkdown syntax in the same way that we ask them to learn how to use Markdown syntax. In my opinion, this is much less daunting that asking someone to learn how to code in R or python. I do, however, still think it is a bit awkward to mix shell or python with RMarkdown, but I don't know of an alternative. |
@raynamharris Thanks for the comment.
Yes. I should have include it. All you said is correct. |
Fix CSS to use .language-X class
Having
_episodes
and_episodes_rmd
is confusing for all lesson contributors. If you are contributing to one of the R lessons, you probably edited the_episodes
files instead of the_episodes_rmd
by mistake. And if you are contributing to one of the non R lessons, you asked yourself why they_episodes_rmd
exists. This pull request is the begin of a concept idea to unify the lesson pipeline now that R Markdown supports Python so that we can use it.What is reticulate?
reticulate includes a Python engine for R Markdown and if you are using knitr version 1.18 or higher, then the reticulate Python engine will be enabled by default whenever reticulate is installed and no further setup is required.
Why to add another piece of software to the pipeline?
Because one of our core principles is facilitate research to be reproducible and to achieve it we need to eat our own pet food.
Would be possible to only use GitHub to contribute?
Yes. The idea is to make it easy for people to contribute.
What is missing on the concept idea?
bin/chunk-options.R
need to work with reticulate to save figures in the correct placebin/lesson_initialize.py
need to be updatedbin/lesson_check.py
need to be updatedbin/repo_check.py
need to be updatedScreenshots
Bash
Python
R