-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document recurrent issue with scalar in formulas #134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @guillett for having supported users on Slack and then have taken the time to incorporate your that feedback as a PR! This is really useful 👏
Beyond the couple syntax & example suggestions, I have a more general observation on the way the section reads as a whole.
Right now the suggested plan for this section is:
- What are vectors in OpenFisca
- How to manage scalars
- Why you're tempted to return scalars
- Why it won't work
- A workaround if you really need to return a scalar and the proper way to do it below won't work in your context
- How to do it correctly in most cases
- Forbidden operations and alternatives
I think it would be easier to read with the following plan:
- What are vectors in OpenFisca
- What you're used to do and what you should do instead
- Control structures
- Operations
- Returning scalars
I'm not certain if the snippet is appropriate here or if its some kind of “advanced tip” that gives too much power to newcomers to work their way around using vectors properly 🤔
@@ -20,6 +20,33 @@ array([45, 42, 17]) | |||
|
|||
Basic operations such as `+` or `*` behave the same way on vectors than on numbers, you can thus use them in OpenFisca. However, some operations and structures must be adapted. | |||
|
|||
## Scalars | |||
|
|||
As a first-time developer, it is natural to write such a formula |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can read a bit condescending. What about “We are used to work with scalars rather than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector:”
## Scalars | ||
|
||
As a first-time developer, it is natural to write such a formula | ||
```py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding a blank line before and after fenced code blocks, as they will render on their own paragraph in any case, for similarity with standard paragraphs that need an empty line to appear as such.
As a first-time developer, it is natural to write such a formula | ||
```py | ||
def formula(person): | ||
rebate = 610 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this example, while being straightforward, can actually be confusing as it shows that it is okay to return a magic number in a formula. Since this is the doc, it mostly targets people who are unsure about what is right and what isn't in OpenFisca, and I believe it is very important to give consistent, best-practice examples.
What about replacing 610
with some parameter fetch, or find an example where it is seemingly legit to return a scalar value from a simple computation made on a variable fetch (that is, legit until you understand how the value should be different if there were several person
)? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is it's probably never legit to permanently return a scalar.
People usually do it temporarily, when developing a set of formulas, for instance:
I'm trying to compute
some_benefit
that depends onsome_criteria
. Right now I'd like to focus on the amount, so I'm going to havesome_criteria
returnTrue
all the time. I'll focus later on the content ofsome_criteria
's formula. Oh no, what is this error about numpy arr-what ?
Maybe we should insist more on the fact that returning a scalar is usually just a step in a modelisation process, and not something that should remain in the code ?
return rebate | ||
``` | ||
Unfortunately, this will generate an error similar to | ||
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank line above as the blockquote vs paragraph parsing can vary depending on the Markdown parser.
Unfortunately, this will generate an error similar to | ||
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’. | ||
|
||
Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the entity
documentation page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technical precision: what formulas receive is not exactly an array of entity. It's a more complex object that represents a list of entities: I cannot map
, filter
it or even iterate on it.
I don't know how technical we want to be here, but claiming it's an array may be misleading.
Unfortunately, this will generate an error similar to | ||
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’. | ||
|
||
Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about mentioning the country_template
here. It might be overwhelming as we can't be certain about the reader's knowledge of the template, and this phrasing implicitly directs them towards understanding it.
|
||
Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape. | ||
|
||
If you really want to return the same value for every entity, here is a snippet (that should work with any number of entity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a trailing colon for proper punctuation 🙂
|
||
Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape. | ||
|
||
If you really want to return the same value for every entity, here is a snippet (that should work with any number of entity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find these parentheses confusing (“is this important or not?”). Either remove the parentheses themselves as it is an important indication of the value of the snippet, or remove the content in parentheses altogether as the reader should not even care about why it wouldn't work if it's in the doc 😉
return np.ones(person.count) * rebate | ||
``` | ||
|
||
However, generally formulas will refer to other variables and numpy will do the appropriate computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice bit of context! I feel it would be more adequate to move that paragraph up in order to streamline the reading experience.
@guillett I understand the amount of requested changes might be too much for the effort you wanted to put in this improvement. If that's the case and you want to hand it over to me, I'm happy to make the changes and get your review 🙂 |
Great thanks Matti. Go ahead, I wanted to be proactive about the change and dumped a few things. Reorder as you wish and think is best for new comers.
Le 19 avril 2018 18:59:50 GMT+02:00, Matti Schneider <[email protected]> a écrit :
…
@guillett I understand the amount of requested changes might be too
much for the effort you wanted to put in this improvement. If that's
the case and you want to hand it over to me, I'm happy to make the
changes and get your review 🙂
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#134 (comment)
--
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
|
My 2 cents: I think the main point of this piece of documentation will be to have a link it in the error message pointing to the section. Dealing with scalar is not something people need to know in order to learn to code in OpenFisca. However, it is an error that beginners are likely to face, and we want to make sure that when this happens they are directed to clear explanations about what's going on. So 👍to burry this section further down. |
I have made too many changes to have an opinion on this PR.
Can someone confirm that would close #4? 🤔 |
Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape. | ||
### What happens if you don't return a vector | ||
|
||
As programmers, we are used to work with scalars rather than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector, and to loop over it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good; from my own experience where I initially missed that I was dealing with vectors was in my first few attempts to work with the country template and mimic the tests supplied in that - the example tests gave the impression everything was scalar as they were all individual scenarios. This makes sense from a testing perspective but from a learning path perspective it gave a false impression. Perhaps the country template should add to the current tests some tests that run more than one item? This could lead to better testing as I successfully wrote a test that would pass for one item but would fail for many.
I approve @MattiSG as it's fresh in my mind this would certainly have helped the learning curve, however I'd also suggest it might be worth adding a link to the NumPy docs for further reading, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already a great improvement as it is, thanks 🎉 !
See my 2 comments inline.
| `max` | `max_(x,y)` | | ||
| `round` | `round_(x,y)` | | ||
| `+` _(on strings)_ | `concat(x,y)` | | ||
If you really need to return the same value for every entity, the `np.ones` function allows you to create a vector of 1 of arbitrary length: `np.ones(persons.count) * some_parameter` will thus create a vector of the same length as the passed entities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a quick node explaining that, while it can be useful during the development process, in a real model there is usually no reason to do that.
If a variable returns the same values for everyone, it should be a parameter rather than a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for that piece of information, I had almost added that note but thought there might be other cases in which it could be useful. If you also feel it should be a parameter, then I'll remove that note altogether. The doc should never encourage or even facilitate suboptimal implementations.
|
||
> The formula ‘tax_rebate@2018’ should return a Numpy array; instead it returned ‘1000.0’ of ’<type ‘float’>’. | ||
|
||
The rest of this page gives practical replacements for situations in which you get such errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other error that comes a lot when users treat vectors as if they were scalar (with if/else, or, and, max, min) is:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
.
Maybe it's worth mentioning it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is.
Once this is merged, we need to update the error message in Core to add a link to the added section. |
Merge case disjunction documentation into vectorial computing Stop referring to vectorial computing as a limitation
If a value should be the same for all entities, it should be a parameter
The gitbook-redirect and gitbook-bulk-redirect plugins don't work
We don't want to recommend relying on default values
That's apparently already the case: https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/simulations.py#L301-L305 |
Many thanks @MattiSG ! Great job. |
No description provided.