Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child documents paths as R code cause validation error #74

Open
trhallam opened this issue Oct 3, 2023 · 9 comments
Open

child documents paths as R code cause validation error #74

trhallam opened this issue Oct 3, 2023 · 9 comments
Labels
{pegboard} validator R package {sandpaper} user interface package

Comments

@trhallam
Copy link

trhallam commented Oct 3, 2023

Hi,

I'm having some issues with a more recent version of sandpaper where the deploy step is failing in Github.

Local builds of the site build without issue.

The problem:

I use r eval statemets within Rmd documents to dynamically include content depending on a configuration for the lesson I want to publish. This was working previously but there is now an issue I think in the way the deploy function finds and copies content. There don't appear to be any logs or useful output in the Github actions build process because the R output is limited in the deploy step.

I have created a MWE that demonstrates the failing build process.

https://github.com/trhallam/workbench-mwe/actions/runs/6392622732/job/17350256259

@zkamvar
Copy link
Contributor

zkamvar commented Oct 3, 2023

Hi @trhallam,

This is coming from {pegboard} and it's failing during the step that attempts to validate the lesson while gathering child documents.

Situation

The following R Markdown context cannot be parsed by {pegboard} without evaluating any of the code.

```{r load_config, include=FALSE}
snippets <- paste('files/snippets/', 'TEST1', sep='')
```

Working include:

```{r, child="files/snippets/TEST1/test-include.Rmd", eval=TRUE}
```

Not-working include:

```{r, child=paste(snippets, '/test-include.Rmd', sep=''), eval=TRUE}
```

One of the things to know about pegboard is that it parses the structure of the document, but it does not evaluate any of the code, which is why the error reads:

Error in initialize(...) : 
  the file '/home/runner/work/workbench-mwe/workbench-mwe/paste(snippets, /test-include.Rmd, sep = )' does not exist
Calls: <Anonymous> ... load_children -> read_children -> <Anonymous> -> initialize

That being said, even if we did evaluate the code supplied to the child parameter, we would still find ourselves in the same situation because the snippets object is defined in a separate chunk, which {pegboard} does not know about because it does not evaluate code inside the chunks:

tmp <- tempfile()
writeLines("```{r snip, child=file.path(snippet, 'test.txt')}\n```", tmp)
ep <- pegboard::Episode$new(tmp)
# this is the document we are working with ----------
ep$show()
#> ```{r snip, child=file.path(snippet, "test.txt")}
#> ```

# grab the child document expression ----------------
print(child <- xml2::xml_attr(ep$code, "child"))
#> [1] "file.path(snippet, \"test.txt\")"

# if we parse and evaluate it, it will fail ---------
print(parse(text = child))
#> expression(file.path(snippet, "test.txt"))
eval(parse(text = child))
#> Error in eval(parse(text = child)): object 'snippet' not found

Created on 2023-10-03 with reprex v2.0.2

Potential solutions

IIRC, you are looking to find a way to address carpentries/sandpaper#81, which is important for the HPC group. I'm thinking there may be two ways of going about it, but knowing that we need to engineer the solution and open the pathways for contribution in {sandpaper} or {pegboard} before it's possible. I do want to set expectations right now and say that I am currently working part time on medical leave and may not be able to address this quickly, but I will do my best to present all the information I know.

Option One: parameterize and eval

One thing you could do at the moment, which is a bit wonky, but it will work in any situation is to duplicate snippets and use eval = snippet == <THING>. This way, {pegboard} will be able to easily track the child documents.

```{r setup, include = FALSE}
snippet <- "TEST1"
```

This will only load one snippet file depending on the value of snippet:

```{r child='files/TEST1/include.Rmd', eval = snippet == "TEST1"}
```
```{r child='files/TEST2/include.Rmd', eval = snippet == "TEST2"}
```
```{r child='files/TEST3/include.Rmd', eval = snippet == "TEST3"}
```

The drawback, of course is that the snippets are inherently tied to the parent documents and are a bit difficult to wrangle and add content

Option Two: Parameterized reports

This is much closer to the spirit of carpentries/sandpaper#81 where you could specify a file called params.yaml

snippet: "TEST1"

and then in your R Markdown, you would be able to use this pattern without the need for the setup chunk.

```{r, child=file.path(params$snippets, 'test-include.Rmd'), eval=TRUE}
```

The caveat to this approach is that it will take a bit of engineering inside of both {sandpaper} and {pegboard} to make the params available since each R Markdown document is considered to run independently:

tmp <- tempfile()
writeLines("```{r snip, child=file.path(params$snippet, 'test.txt')}\n```", tmp)
ep <- pegboard::Episode$new(tmp)
ep$show()
#> ```{r snip, child=file.path(params$snippet, "test.txt")}
#> ```
# If we assume that the params are available, the child document will evaluate correctly
params <- list(snippet = "files/TEST1")

# grab the child document expression ----------------
print(child <- xml2::xml_attr(ep$code, "child"))
#> [1] "file.path(snippet, \"test.txt\")"
eval(parse(text = child))
#> [1] "files/TEST1/test.txt"

Created on 2023-10-03 with reprex v2.0.2

I believe this is the correct way to go because it means that you can continue to use the same pattern you have been using (you can also replace the {{ site.param }} syntax with `r params$param`). The only thing I would be cautious about would be allowing the fields in params.yaml to evaluate as R code (which is happily avoided).

Setting Expectations

I want to reiterate, with several factors going on, adding this feature is going to take some time to implement. I will do my best to coach @carpentries/workbench-maintainers to tackle this, but please be patient.

@zkamvar zkamvar added {pegboard} validator R package {sandpaper} user interface package labels Oct 3, 2023
@zkamvar zkamvar changed the title CICD bug for dynamic Rmd includes child documents paths as R code cause validation error Oct 3, 2023
@zkamvar
Copy link
Contributor

zkamvar commented Oct 3, 2023

Local builds of the site build without issue.

FWIW, I believe the local build works without issue because you have an old version of {pegboard} on your machine (as I just released the new version yesterday). I can add a patch that adds a catch for the child document processing for validation so that you can get back to a working state. The only caveat (and I believe this was true before the change) is that changes to the child documents in your case will not rebuild the parent document.

@trhallam
Copy link
Author

trhallam commented Oct 3, 2023

Thanks for the quick clarification @zkamvar , no expectations here and I'm not an R expert, so which pathway is more appropriate is fine by me. Appreciate you looking into this.

Indeed the need for this comes from our versions of the HPC carpentry which has a lot of system specific configuration in it, where the lessons may change depending upon where or to who you are teaching. It would be appreciated if the quick patch could be added for now. I can keep in mind that a full site rebuild may be needed if things aren't refreshing appropriately during development.

@trhallam
Copy link
Author

trhallam commented Oct 3, 2023

Option Two: Parameterized reports

In terms of the parameterized reports, I have loaded a yaml file in the pre-amble of each lesson using.

```{r load_config, include=FALSE}
library(yaml)
config <- yaml.load_file("lesson_config.yaml")
snippets <- paste('files/snippets/', config$snippets, sep='')```

Link to YAML

https://github.com/EPCCed/2023-06-28-uoe-hpcintro/blob/main/episodes/lesson_config.yaml

@zkamvar
Copy link
Contributor

zkamvar commented Oct 3, 2023

Thank you for the full context. This will really help understand what the requirements are. I have time to work on carpentries/pegboard#140 and it should be released later today.

@zkamvar
Copy link
Contributor

zkamvar commented Oct 3, 2023

I just pushed the fix and it should be available in about an hour. I have confirmed that it works in our integration test: https://github.com/carpentries/workbench-integration-test/actions/runs/6398719182/job/17369438597

The error is an unrelated one that I will open a pull request to fix.

@froggleston
Copy link
Contributor

@trhallam Could you confirm this is fixed? We could then close this issue :)

@trhallam
Copy link
Author

Thanks for following up. The sites do build and deploy correctly to Gh Pages, but I still get a warning the "Deploy Site" steps.

Do I need to modify the approach?

https://github.com/EPCCed/2024-06-20-hpc-intro-shampton/actions/runs/9480863700/job/26122442483

@zkamvar
Copy link
Contributor

zkamvar commented Jun 14, 2024

I saw this in my notifications and felt that I should add a summary:

The underlying cause of this is not fixed. What was fixed in carpentries/pegboard#140 was the lesson failing to build if a child document could not be found (for the ecologists in the room this is the r-strategist mode).

This will continue to be a problem until allowing paramterized reports via a yaml file is implemented (see #74 (comment) for a breakdown).

The reason this approach is not working is because pegboard doesn't know anything about the code inside the document, including the setup chunk, so it doesn't know what snippets means.

Note that the solution to this this is NOT to run the code in the setup chunk in {pegboard}... that would cause so many lessons to slow down so much. I see the solution in the following ways (summarized from #74 (comment)):

  1. allow for a parameters yaml file for lesson authors to swap out different parameters of their lesson in both {pegboard} and {sandpaper} (this can be a new item in config.yaml called params: <path-to-config.yaml> so that authors can specify a custom file)
  2. read in the params via {pegboard} and have them available under params$
  3. instruct authors to specify the path to the snippets folder from the root of the lesson (e.g. snippets: files/snippets/EPCC_ARCHER2_slurm)
  4. have authors use child=file.path(params$snippets, "name-of-snippet") to specify the snippet in the chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
{pegboard} validator R package {sandpaper} user interface package
Projects
None yet
Development

No branches or pull requests

3 participants