Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document recurrent issue with scalar in formulas #134

Merged
merged 9 commits into from
May 17, 2018
Merged

Document recurrent issue with scalar in formulas #134

merged 9 commits into from
May 17, 2018

Conversation

guillett
Copy link
Member

No description provided.

Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @guillett for having supported users on Slack and then have taken the time to incorporate your that feedback as a PR! This is really useful 👏

Beyond the couple syntax & example suggestions, I have a more general observation on the way the section reads as a whole.

Right now the suggested plan for this section is:

  • What are vectors in OpenFisca
  • How to manage scalars
    • Why you're tempted to return scalars
    • Why it won't work
    • A workaround if you really need to return a scalar and the proper way to do it below won't work in your context
    • How to do it correctly in most cases
  • Forbidden operations and alternatives

I think it would be easier to read with the following plan:

  • What are vectors in OpenFisca
  • What you're used to do and what you should do instead
    • Control structures
    • Operations
    • Returning scalars

I'm not certain if the snippet is appropriate here or if its some kind of “advanced tip” that gives too much power to newcomers to work their way around using vectors properly 🤔

@@ -20,6 +20,33 @@ array([45, 42, 17])

Basic operations such as `+` or `*` behave the same way on vectors than on numbers, you can thus use them in OpenFisca. However, some operations and structures must be adapted.

## Scalars

As a first-time developer, it is natural to write such a formula
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can read a bit condescending. What about “We are used to work with scalars rather than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector:”

## Scalars

As a first-time developer, it is natural to write such a formula
```py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding a blank line before and after fenced code blocks, as they will render on their own paragraph in any case, for similarity with standard paragraphs that need an empty line to appear as such.

As a first-time developer, it is natural to write such a formula
```py
def formula(person):
rebate = 610
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this example, while being straightforward, can actually be confusing as it shows that it is okay to return a magic number in a formula. Since this is the doc, it mostly targets people who are unsure about what is right and what isn't in OpenFisca, and I believe it is very important to give consistent, best-practice examples.

What about replacing 610 with some parameter fetch, or find an example where it is seemingly legit to return a scalar value from a simple computation made on a variable fetch (that is, legit until you understand how the value should be different if there were several person)? 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is it's probably never legit to permanently return a scalar.

People usually do it temporarily, when developing a set of formulas, for instance:

I'm trying to compute some_benefit that depends on some_criteria. Right now I'd like to focus on the amount, so I'm going to have some_criteria return True all the time. I'll focus later on the content of some_criteria's formula. Oh no, what is this error about numpy arr-what ?

Maybe we should insist more on the fact that returning a scalar is usually just a step in a modelisation process, and not something that should remain in the code ?

return rebate
```
Unfortunately, this will generate an error similar to
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line above as the blockquote vs paragraph parsing can vary depending on the Markdown parser.

Unfortunately, this will generate an error similar to
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’.

Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the entity documentation page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical precision: what formulas receive is not exactly an array of entity. It's a more complex object that represents a list of entities: I cannot map, filter it or even iterate on it.

I don't know how technical we want to be here, but claiming it's an array may be misleading.

Unfortunately, this will generate an error similar to
> The formula ‘rates_rebate@2016’ should return a Numpy array; instead it returned ‘610.0’ of ’<type ‘float’>’.

Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about mentioning the country_template here. It might be overwhelming as we can't be certain about the reader's knowledge of the template, and this phrasing implicitly directs them towards understanding it.


Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape.

If you really want to return the same value for every entity, here is a snippet (that should work with any number of entity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a trailing colon for proper punctuation 🙂


Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape.

If you really want to return the same value for every entity, here is a snippet (that should work with any number of entity)
Copy link
Member

@MattiSG MattiSG Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these parentheses confusing (“is this important or not?”). Either remove the parentheses themselves as it is an important indication of the value of the snippet, or remove the content in parentheses altogether as the reader should not even care about why it wouldn't work if it's in the doc 😉

return np.ones(person.count) * rebate
```

However, generally formulas will refer to other variables and numpy will do the appropriate computation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice bit of context! I feel it would be more adequate to move that paragraph up in order to streamline the reading experience.

@MattiSG
Copy link
Member

MattiSG commented Apr 19, 2018

@guillett I understand the amount of requested changes might be too much for the effort you wanted to put in this improvement. If that's the case and you want to hand it over to me, I'm happy to make the changes and get your review 🙂

@guillett
Copy link
Member Author

guillett commented Apr 19, 2018 via email

@fpagnoux
Copy link
Member

I'm not certain if the snippet is appropriate here

My 2 cents: I think the main point of this piece of documentation will be to have a link it in the error message pointing to the section.

Dealing with scalar is not something people need to know in order to learn to code in OpenFisca. However, it is an error that beginners are likely to face, and we want to make sure that when this happens they are directed to clear explanations about what's going on.

So 👍to burry this section further down.

@MattiSG MattiSG self-assigned this Apr 24, 2018
@MattiSG MattiSG requested a review from fpagnoux May 17, 2018 04:29
@MattiSG MattiSG dismissed their stale review May 17, 2018 04:30

I have made too many changes to have an opinion on this PR.

@MattiSG
Copy link
Member

MattiSG commented May 17, 2018

So this has now become a big change… What do you think of it @guillett @verbman?

@MattiSG
Copy link
Member

MattiSG commented May 17, 2018

Can someone confirm that would close #4? 🤔

Indeed, formulas receive an array of entity (*n* _people_ or *m* _households_ in `country_template`) and OpenFisca expects formulas to return an array of the same shape.
### What happens if you don't return a vector

As programmers, we are used to work with scalars rather than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector, and to loop over it:
Copy link
Contributor

@verbman verbman May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good; from my own experience where I initially missed that I was dealing with vectors was in my first few attempts to work with the country template and mimic the tests supplied in that - the example tests gave the impression everything was scalar as they were all individual scenarios. This makes sense from a testing perspective but from a learning path perspective it gave a false impression. Perhaps the country template should add to the current tests some tests that run more than one item? This could lead to better testing as I successfully wrote a test that would pass for one item but would fail for many.

@verbman
Copy link
Contributor

verbman commented May 17, 2018

I approve @MattiSG as it's fresh in my mind this would certainly have helped the learning curve, however I'd also suggest it might be worth adding a link to the NumPy docs for further reading,

Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already a great improvement as it is, thanks 🎉 !

See my 2 comments inline.

| `max` | `max_(x,y)` |
| `round` | `round_(x,y)` |
| `+` _(on strings)_ | `concat(x,y)` |
If you really need to return the same value for every entity, the `np.ones` function allows you to create a vector of 1 of arbitrary length: `np.ones(persons.count) * some_parameter` will thus create a vector of the same length as the passed entities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a quick node explaining that, while it can be useful during the development process, in a real model there is usually no reason to do that.
If a variable returns the same values for everyone, it should be a parameter rather than a variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for that piece of information, I had almost added that note but thought there might be other cases in which it could be useful. If you also feel it should be a parameter, then I'll remove that note altogether. The doc should never encourage or even facilitate suboptimal implementations.


> The formula ‘tax_rebate@2018’ should return a Numpy array; instead it returned ‘1000.0’ of ’<type ‘float’>’.

The rest of this page gives practical replacements for situations in which you get such errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other error that comes a lot when users treat vectors as if they were scalar (with if/else, or, and, max, min) is:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Maybe it's worth mentioning it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is.

@fpagnoux
Copy link
Member

Once this is merged, we need to update the error message in Core to add a link to the added section.

guillett and others added 5 commits May 18, 2018 09:31
Merge case disjunction documentation into vectorial computing
Stop referring to vectorial computing as a limitation
If a value should be the same for all entities, it should be a parameter
MattiSG added 4 commits May 18, 2018 11:27
The gitbook-redirect and gitbook-bulk-redirect plugins don't work
We don't want to recommend relying on default values
@MattiSG MattiSG merged commit 7e4da90 into master May 17, 2018
@MattiSG MattiSG deleted the scalars branch May 17, 2018 23:52
@MattiSG
Copy link
Member

MattiSG commented May 17, 2018

we need to update the error message in Core to add a link to the added section.

That's apparently already the case: https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/simulations.py#L301-L305

@guillett
Copy link
Member Author

Many thanks @MattiSG ! Great job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants