Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop assuming that vectorial computation can be hidden from reusers that have to read or write formulas, and focus on making it understandable instead #673

Open
3 of 7 tasks
MattiSG opened this issue May 28, 2018 · 5 comments
Assignees
Labels
kind:roadmap A group of issues, constituting a delivery roadmap

Comments

@MattiSG
Copy link
Member

MattiSG commented May 28, 2018

Current assumptions

Traditionally, vectorial computations have been hidden from newcomers, in the hope (#651 (comment)) that users would not need to grasp the complexity up front, and that they could be confronted to it later.

How this is problematic

  1. All hackathons and IRL trainings I have observed have always shown confusion from reusers and unease from trainers around introduction of vectorial computation concepts.
  2. Recent onboarding experiences with Italy and Aotearoa New Zealand prove ( Explicit vector computations with pluralized argument name #651 (comment), Add best start tax credit BetterRules/openfisca-aotearoa#15 (comment)) that reusers are actually confronted very soon to the need to read or write formulas, and that if they need to do so they also need to understand that something more is going on.
  3. IRL training of @Verban by @MattiSG took 1h30m delay because of lack of references to NumPy helpers in the doc.

Experiments that prove it is the good direction

  1. A recent rewrite (Document recurrent issue with scalar in formulas openfisca-doc#134) of the vectorial computation pages documentation renamed them from “limitations” to a more “standard practice” vocabulary. The utility of the result has been validated by user testing in hackathon ( Explicit vector computations with pluralized argument name #651 (comment)), in IRL training and through reviews (Document recurrent issue with scalar in formulas openfisca-doc#134 (comment)).
  2. Aotearoa has trialled ( Explicit vector computations with pluralized argument name #651 (comment)) using a pluralised argument name for its variables. The response is positive, especially in IRL training (cc @Br3nda).

Concrete steps that would be taken

Estimated impact

  • No breaking change for existing country packages.
  • Suggestion to pluralise first parameter in existing country packages. This can be mostly automated.
@MattiSG MattiSG added the policy:rfc Request For Comments: chime in! label May 28, 2018
@MattiSG MattiSG self-assigned this May 28, 2018
@MattiSG
Copy link
Member Author

MattiSG commented May 28, 2018

I'm wondering if this should include deprecating formula helpers and replacing them with NumPy snippets? I see only 3 there, and they are not documented (openfisca/openfisca-doc#4).

@benjello
Copy link
Member

benjello commented May 28, 2018

I am not so sure that pluralizing first argument will help a lot when the vector nature of the argument is made clearer and emphasized at the beginning of the training.
Using singular for both the entity and the first parameter helps a lot. If not you end up asking yourself should I use a plural here or a singular, am I dealing with the entity or the vector etc. From my experience, when the vector thing is well understood, the question of the use of plural vs singular changes in nature.

@bonjourmauko
Copy link
Member

Hi @MattiSG this is IMHO a good direction to move forward 😃. I've had experienced this issue myself.

Country Template: pluralise first argument name in all formulas, using the pluralised name of the entity.
Doc: pluralise first argument name in all formulas, using the pluralised name of the entity.
Doc: document the recommendation to use the pluralised name of the entity as the first formula parameter.

I've passed quite a lot of time of my performance improvement efforts trying to understand the nature of the receiving arguments. Given the lack of native type check in Python (at least < 3.6), it can be a bit cumbersome.

I think naming should more or less reflect the duck-typing of the argument. Whether it is a list, tuple, set or a numpy.ndarray, I think argument name should be pluralised. Even if argument is an empty collection or if it has just one element.

(Note: it goes beyond the current RFC, but arguments, if not optional, should respect duck-typing, i.e. not passing None where a list is expected).

I am not so sure that pluralizing first argument will help a lot when the vector nature of the argument is made clearer and emphasized at the beginning of the training.

I see two other arguments for this:

  1. It is easier IMHO to foster contribution if the fact that we're dealing with a vector is self-evident. We can only do a limited amount of training, for the rest we rely on the doc, the code and the tests.

  2. It is way easier to refactor and to improve code when we know more or less the signature of functions and their returning type. If I can see I'm dealing with vectors, I'll adapt my refactoring approach immediately.

Core: always import numpy rather than import numpy as np to increase discoverability of that library.

Not sure about this one, as all code snippets I've seen so far in the internet use

import numpy as np

@Morendil
Copy link
Contributor

Morendil commented Feb 6, 2019

Closing as stale. This might well still be one of the core issues in OF, and some ideas are starting to emerge for addressing it from a different perspective (e.g. creating more affordances for directing computation based on conditions, which would eliminate the NumPy-idiomatic "logic multiply" operator in favor of something more salient and documented, as well as afford a large performance boost if initial trials prove a reliable indication).

@bonjourmauko
Copy link
Member

I'm reopening this issue as I'd like to arrive to a consensus on this. I'll probably split it into several other issues to have more targeted discussions.

@bonjourmauko bonjourmauko reopened this Jan 5, 2020
This was referenced Dec 6, 2022
@bonjourmauko bonjourmauko added kind:roadmap A group of issues, constituting a delivery roadmap and removed kind:theme A group of issues, directly tied to an OKR policy:rfc Request For Comments: chime in! labels Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:roadmap A group of issues, constituting a delivery roadmap
Projects
None yet
Development

No branches or pull requests

5 participants