Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update case studies to use new language syntax #189

Open
15 of 42 tasks
avehtari opened this issue Sep 9, 2023 · 14 comments
Open
15 of 42 tasks

Update case studies to use new language syntax #189

avehtari opened this issue Sep 9, 2023 · 14 comments

Comments

@avehtari
Copy link
Contributor

avehtari commented Sep 9, 2023

With Stan 2.33+ several old language syntax features produce errors. All the case studies would be good to update to use the latest syntax. Many case studies are in external repos and the authors have submitted only the rendered html and short md-part for the case study contents page. Only the html needs to be updated in users/documentation/case-studies/.

It would be good o contact the original authors and ask them if they are willing to update their repos and submit a new html. If the authors disagree or don't respond, we may consider updating just the syntax on html.

To start the process, I'm listing here all the case studies, and we can start tracking which have been fixed. Tagging also some authors that were easily found by github id autocomplete @mitzimorris, @WardBrian, @bob-carpenter, @charlesm93, @bbbales2, @imadmali


  • Bayesian Structural Equation Modeling using blavaan: Feng Ji, Xingyao Xiao, Aybolek Amanmyradova, Sophia Rabe-Hesketh
  • Multilevel regression modeling with CmdStanPy and plotnine: Mitzi Morris
  • HoloML in Stan: Low-photon Image Reconstruction: Brian Ward, Bob Carpenter, and David Barmherzig
  • Bayesian Latent Class Models and Handling of Label Switching: Feng Ji, Aybolek Amanmyradova, Sophia Rabe-Hesketh
  • Bayesian model of planetary motion: exploring ideas for a modeling workflow: Charles Margossian and Andrew Gelman
  • HMM Interface Example: Ben Bales
  • Spatial models for plant neighborhood dynamics in Stan: Cristina Barber, Andrii Zaiats, Cara Applestein and T.Trevor Caughlin
  • Predicting Engine Failure with Hierarchical Gaussian Process: Hyunji Moon, Jungin Choi
  • Upgrading to the new ODE interface: Ben Bales, Sebastian Weber
  • Bayesian Workflow for disease transmission modeling in Stan: Leo Grinsztajn, Elizaveta Semenova, Charles C. Margossian, and Julien Riou
  • Reduce Sum Example: parallelization of a single chain across multiple cores: Ben Bales
  • Stan Notebooks in the Cloud: Mitzi Morris
  • Model-based Inference for Causal Effects in Completely Randomized Experimen: JoonHo Lee, Avi Feller and Sophia Rabe-Hesketh
  • Tagging Basketball Events with HMM in Stan: Imad Ali
  • Model building and expansion for golf putting: Andrew Gelman
  • A Dyadic Item Response Theory Model: Stan Case Study: Nicholas Sim, Brian Gin, Anders Skrondal and Sophia Rabe-Hesketh (note: source link points to fork of example-models)
  • Multilevel Linear Models using Rstanarm: JoonHo Lee, Nicholas Sim, Feng Ji, and Sophia Rabe-Hesketh
  • Predator-Prey Population Dynamics: the Lotka-Volterra model in Stan: Bob Carpenter
  • Nearest neighbor Gaussian process (NNGP) models in Stan: Lu Zhang
  • Extreme value analysis and user defined probability functions in Stan: Aki Vehtari
  • Modelling Loss Curves in Insurance with RStan: Mick Cooney
  • Splines in Stan: Milad Kharratzadeh
  • Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data: Mitzi Morris
  • The QR Decomposition for Regression Models: Michael Betancourt
  • Robust RStan Workflow: Michael Betancourt
  • Robust PyStan Workflow: Michael Betancourt (also uses PyStan 2 which is no longer supported)
  • Typical Sets and the Curse of Dimensionality: Bob Carpenter
  • Diagnosing Biased Inference with Divergences: Michael Betancourt
  • Identifying Bayesian Mixture Models: Michael Betancourt
  • How the Shape of a Weakly Informative Prior Affects Inferences: Michael Betancourt
  • Exact Sparse CAR Models in Stan: Max Joseph
  • A Primer on Bayesian Multilevel Modeling using PyStan: Chris Fonnesbeck (also: rendered HTML was deleted?)
  • The Impact of Reparameterization on Point Estimates: Bob Carpenter
  • Hierarchical Two-Parameter Logistic Item Response Model: Daniel C. Furr
  • Rating Scale and Generalized Rating Scale Models with Latent Regression: Daniel C. Furr
  • Partial Credit and Generalized Partial Credit Models with Latent Regression: Daniel C. Furr
  • Rasch and Two-Parameter Logistic Item Response Models with Latent Regression: Daniel C. Furr
  • Two-Parameter Logistic Item Response Model: Daniel C. Furr, Seung Yeon Lee, Joon-Ho Lee, and Sophia Rabe-Hesketh
  • Cognitive Diagnosis Model: DINA model with independent attributes: Seung Yeon Lee
  • Pooling with Hierarchical Models for Repeated Binary Trials: Bob Carpenter
  • Multiple Species-Site Occupancy Model: Bob Carpenter
  • Soil Carbon Modeling with RStan: Bob Carpenter
@avehtari
Copy link
Contributor Author

avehtari commented Sep 9, 2023

@mitzimorris
Copy link
Member

in the interim, we could insert a paragraph at the top of the old case studies saying that the code is using the old syntax and instructing the reader to run the stanc canonicalizer on the code themselves.

exercises to the reader are less work than exercises to the author.

@hyunjimoon
Copy link
Contributor

Just an idea, but it would be handy if chatgpt can auto-translate old casestudies with old syntax (e.g. python 2.7) to new syntax (python 3.10)? Python https://docs.python.org/3/library/2to3.html seems to hand-coded this translation.

@mitzimorris
Copy link
Member

mitzimorris commented Sep 9, 2023

we don't need chatGPT.

please get the latest release of Stan, and then do (something like this)

> /path/to/cmdstan/bin/stanc --print-canonical my_file.stan > new.tmp
> diff -y -W 180 my_file.stan new.tmp
> mv new.tmp my_file.stan

that diff command will show files side-by-side - it's an easy way to check that stanc did the right thing and only the right thing.

update: for some reason the above procedure is adding an extra newline to files. @WardBrian does the canonicalizer always add a newline proactively to its output in case the input was missing one?

@jgabry
Copy link
Member

jgabry commented Sep 9, 2023

in the interim, we could insert a paragraph at the top of the old case studies saying that the code is using the old syntax and instructing the reader to run the stanc canonicalizer on the code themselves.

Yeah this sounds like a good idea until these are updated.

exercises to the reader are less work than exercises to the author.

Exercises to the author require doing once and all readers benefit. Exercises to the reader require doing N_readers times. So the latter requires a lot more work overall, just less work for the author. Or am I misunderstanding what you meant?

that diff command will show files side-by-side - it's an easy way to check that stanc did the right thing and only the right thing.

Nice!

@WardBrian
Copy link
Member

I manually went through the ones which were unclear and figured out if they needed updating or not. That brings the total up to 11/42 being good to go - either because they used the new syntax, didn't use any of the old syntax, or (in a few cases) contained no actual stan code in the text of the case study.

It's also worth noting that any case study which stored it's code in the example-models repo had its code automatically updated a while back. If any of those case studies are using something like writeLines(readLines("model.stan")), then the only work that actually needs to be done is just re-kniting. More than a few seem to store the code in a string or text block in the markdown, however.

@bob-carpenter
Copy link
Contributor

@hyunjimoon : It's going from the old Stan syntax to the new Stan syntax. ChatGPT(4) is pretty good at Python, but it's very bad at Stan.

@bob-carpenter
Copy link
Contributor

If we keep our User's Guide, Reference Manual, and Functions Reference up to date, I don't think breaking the old case studies should block any of our updates. Specifically, I'm OK putting a warning up and then fixing them as we can. Another alternative is moving the ones that aren't updated to a "deprecated case study" location and flagging them up front.

I can update the five of my case studies that weren't built with the new Stan syntax:

  • Predator-Prey Population Dynamics: the Lotka-Volterra model in Stan: Bob Carpenter
  • Pooling with Hierarchical Models for Repeated Binary Trials: Bob Carpenter
  • The Impact of Reparameterization on Point Estimates: Bob Carpenter
  • Multiple Species-Site Occupancy Model: Bob Carpenter
  • Soil Carbon Modeling with RStan: Bob Carpenter

@jgabry
Copy link
Member

jgabry commented Sep 11, 2023

If we keep our User's Guide, Reference Manual, and Functions Reference up to date, I don't think breaking the old case studies should block any of our updates. Specifically, I'm OK putting a warning up and then fixing them as we can. Another alternative is moving the ones that aren't updated to a "deprecated case study" location and flagging them up front.

I agree that we shouldn't hold up Stan releases just because they break case studies. A warning about it would be good. Right now the website says:

The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming

which is a bit unfortunate since best practices would include code that doesn't error.

What if we change the note at the top to say this?

The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming. We aim to keep them current with the latest version of the Stan language, but there may be times when case studies need updating to reflect the latest Stan features and syntax.

That could probably be worded better, but something along those lines?

@bob-carpenter
Copy link
Contributor

That wording sounds good. Did we want to point people to the Stan code updater in stanc3?

@jgabry
Copy link
Member

jgabry commented Sep 11, 2023

Did we want to point people to the Stan code updater in stanc3?

The only reason I'd hesitate to do that is that on slack @WardBrian mentioned that in future versions (2.34 and beyond) we won't be able to parse and fix the old code anymore. But maybe that's not a reason to avoid mentioning it. Once we get to future breaking changes it will be those changes that need fixing not the array syntax anymore, so I guess the auto-formatter/canonicalizer will at that point work just fine for whatever syntax needs changing at that point.

@jgabry
Copy link
Member

jgabry commented Sep 12, 2023

I opened PR #191 to add the disclaimer at the top of the case studies page. I didn't mention the auto-formatter/canonicalizer but I can update it to mention it if we want that. (It is accessed differently in the different interfaces, so we'd have to decide whether to just mention it exists or actually demo how to use it in the different interfaces.)

@jgabry
Copy link
Member

jgabry commented Sep 12, 2023

Is the process for updating the ones in example-models repo the following?

  • edit any stan code in the Rmd file (separate stan files are already up to date)
  • regenerate html
  • submit Rmd pr to example models
  • submit html PR to website repo

(I just did this for the HMM interface example case study, but I can update my PRs if this process isn't right)

@WardBrian
Copy link
Member

Yep, sounds right to me. I have just updated the new ODE and golf case studies like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants