Mutating objects affect reproducibility #1385

benjreinhart · 2024-05-15T21:40:03Z

benjreinhart
May 15, 2024

Describe the bug

In FAQ, Marimo states:

In contrast, marimo guarantees that your code, outputs, and program state are consistent, eliminating hidden state and making your notebook reproducible.

I see that variables are tracked and reassignment is not allowed. However, since Python is not an immutable language, this guarantee doesn't seem possible because cells can mutate objects and that seems impossible to prevent. Consider the following example:

Cell A:

foo = {"key": "value"}
print(foo["key"])

Cell B:

print(foo["key"])

Cell C:

foo["key"] = "value 2"
print(foo["key"])

Now, if I run them in order, you get what you'd expect the first time around. The problem is if I go back to Cell B and run that cell again. I now get a different value than I did the first time (see screen shot).

The bug part of this is issue that if I re-run Cell A and then Cell B, I'm still seeing "value 2" printed.

But the larger point of this issue is that, unless I'm missing something, it can't be the case that marimo notebooks truly guarantee consistency and reproducibility in all cases? Even if we tracked some types of modification, you could never catch them all. Code can call into other libraries that mutate (pandas, tensorflow, etc), code can dynamically eval strings as code, etc. I'd love to be wrong here because this would be awesome if we could reach the same level of guarantees that e.g. Livebook does, but I don't see how that's possible without using an immutable language.

Environment

https://marimo.app (the online wasm version)

Code to reproduce

No response

akshayka · 2024-05-15T21:46:00Z

akshayka
May 15, 2024
Maintainer

But the larger point of this issue is that, unless I'm missing something, it can't be the case that marimo notebooks truly guarantee consistency and reproducibility in all cases?

That's correct, and a good point. We try our best, without resorting to magic (ie, no runtime tracing, which is a losing battle), and are currently happy with being substantially more computationally reproducible than vanilla IPyKernel.

Our prescription to our users is: don't mutate variables across multiple cells. If you do, bets on "computational reproducibility" are off, unless you carefully manage the DAG yourself. You can find such language in our docs and tutorials.

In this way we are similar to Pluto.jl, which is our main inspiration. Pluto.jl takes reproducibility one step further than us by implementing its own package management system, which we have not yet done.

Happy to continue chatting. I will convert this to a Discussion because it's not bug, rather a design choice we're committed to.

EDIT: Rereading the FAQ -- I can make the claim more clear.

2 replies

akshayka May 15, 2024
Maintainer

By the way, I've been meaning to try Elixir/Livebook. I have heard great things. Thanks for the reminder!

benjreinhart May 15, 2024
Author

Elixir is the GOAT IMO, and livebook rules.

https://www.youtube.com/watch?v=pas9WdWIBHs (may be of interest)

benjreinhart · 2024-05-15T21:53:11Z

benjreinhart
May 15, 2024
Author

This all sounds good, and I didn't see the best practices 🙏🏻

The only thing I'll add here is that I think there is still a bug... When I re-run the first cell (Cell A) and then Cell B (after running Cell C), shouldn't the printed value be reset back to value?

6 replies

benjreinhart May 15, 2024
Author

Run A->B->C, then run B (now see value 2), then run A (I'd expect it to be value again now), then run B (still value 2)

akshayka May 15, 2024
Maintainer

Oh I see. When A is run, both B and C are run, since they both reference foo. But B and C are not ordered with respect to each other (right now it's unspecified how ties are broken), and in your case it looks like C is running before B. I can see how this is confusing though

benjreinhart May 15, 2024
Author

right now it's unspecified how ties are broken

Is there a reason not to run them in the order they are listed on the page? In this case at least, it would prevent the confusing behavior, though maybe there are other scenarios where it wouldn't help

dmadisetti May 16, 2024
Collaborator

So related: #1142

Even beyond cloning, though, python can hook into C internals and create state that can't be easily touched- so I think there's always going to be some responsibility on the user.

akshayka May 16, 2024
Maintainer

Is there a reason not to run them in the order they are listed on the page? In this case at least, it would prevent the confusing behavior, though maybe there are other scenarios where it wouldn't help

We could ... although right now the kernel doesn't know the order of the cells on the page. Though that could be fixed.

We'd prefer that users don't rely on page order, since we may also add an (opt-in) parallelizing executor.

dmadisetti · 2024-06-08T17:06:49Z

dmadisetti
Jun 8, 2024
Collaborator

@benjreinhart PR #1580 addresses this to a larger extent. You really have to go out of your way to mutate things- but of course, it is limited by Python itself.

1 reply

dmadisetti Jun 17, 2024
Collaborator

@benjreinhart #1580 is now merged, you can enable it by adding

[experimental]
execution_type = "strict"

To your .marimo.toml config!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutating objects affect reproducibility #1385

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 9 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Mutating objects affect reproducibility #1385

benjreinhart May 15, 2024

Describe the bug

Environment

Code to reproduce

Replies: 3 comments · 9 replies

akshayka May 15, 2024 Maintainer

akshayka May 15, 2024 Maintainer

benjreinhart May 15, 2024 Author

benjreinhart May 15, 2024 Author

benjreinhart May 15, 2024 Author

akshayka May 15, 2024 Maintainer

benjreinhart May 15, 2024 Author

dmadisetti May 16, 2024 Collaborator

akshayka May 16, 2024 Maintainer

dmadisetti Jun 8, 2024 Collaborator

dmadisetti Jun 17, 2024 Collaborator

benjreinhart
May 15, 2024

Replies: 3 comments 9 replies

akshayka
May 15, 2024
Maintainer

akshayka May 15, 2024
Maintainer

benjreinhart May 15, 2024
Author

benjreinhart
May 15, 2024
Author

benjreinhart May 15, 2024
Author

akshayka May 15, 2024
Maintainer

benjreinhart May 15, 2024
Author

dmadisetti May 16, 2024
Collaborator

akshayka May 16, 2024
Maintainer

dmadisetti
Jun 8, 2024
Collaborator

dmadisetti Jun 17, 2024
Collaborator