Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function naming: pin_ctrl or valid_hetu ? #1

Closed
antagomir opened this issue Jun 26, 2020 · 20 comments
Closed

Function naming: pin_ctrl or valid_hetu ? #1

antagomir opened this issue Jun 26, 2020 · 20 comments
Labels
question Further information is requested

Comments

@antagomir
Copy link
Member

In earlier versions we had valid_hetu to check ID validity. This is now changed into pin_ctrl because the sweidnumbr package is using that name. Comments are welcome before we close this issue.

@antagomir antagomir added the question Further information is requested label Jun 26, 2020
@jukop
Copy link

jukop commented Jul 8, 2020

Personally I prefer tidyverse naming conventions: https://style.tidyverse.org/functions.html If we used that, the name could be check_validity. Of course it may be preferential to use the same names with sweidnumbr. I wonder if we can have good parts from both somehow without repeating the code..

@antagomir
Copy link
Member Author

Fully OK with me. Would be good to have a comment from sweidnumbr developer @MansMeg

Re: replicating code: I went through sweidnumbr again last week and one possible problem is that the ID systems are somewhat different in different countries. There is a certain risk that unifying the code base will bring unnecessary amount of maintenance overhead, although in general I agree that it is good idea to sync the pkgs where possible. In principle we could have a shared class structure and corresponding methods, which could be imported from sweidnumbr for instance. Then the Finnish hetu package would be a set of wrappers around that. Also need to think of who would implement and which parts.

@MansMeg
Copy link
Contributor

MansMeg commented Jul 9, 2020

Just let me know if you would need anything from sweidnumbr to facilitate it. I could export functions that you would find useful.

@MansMeg
Copy link
Contributor

MansMeg commented Jul 9, 2020

I can also chacge to tidyverse naming convensions. The sweidnumber is older than the tidyverse... =)

@jukop
Copy link

jukop commented Jul 9, 2020

Should we perhaps start by planning/listing the functions that we need and their arguments. So we would have a list of function names and their arguments and short explanation what the function does. We can use functionality of sweidnumbr as a basis for this. After that it will be easier to go thinking about classes and other more technical stuff. Also it is good to have a clear naming of functions and arguments in the beginning so that one needs not to change them from multiple files later.

@MansMeg
Copy link
Contributor

MansMeg commented Jul 9, 2020

Sounds good. I will probably implement that as wrappers initially (to minimize work). As Leo mentioned before, I think these packages will mainly be used internally in Sweden and Finland and I do not think many users would use both packages. The main benefit is probably to abide the tidyverse conventions and share functionality?

What functionality from sweidnumbr would be useful in Finland?

@jukop
Copy link

jukop commented Jul 9, 2020

Also I think that not necessarily function names need to be same with sweidnumbr. Most likely users of hetu package will be finnish and get the idea of function better if the name contains word 'hetu' rather than 'pin'. To me pin brings phone pin code to my mind. I still think it will be useful to have the same naming conventions with sweidnumbr package on function arguments. Different function names also make it less likely that someone gets confused from which package the function is. Using wrappers can be one approach on hetu package as well, I think.

@antagomir
Copy link
Member Author

Yes. The minimal set of functions and arguments would be something that is common between the two countries. In addition, there can be country-specific additional functions in each pkg.

The following shared functionality comes immediately to mind:

  • Validate the ID
  • Recognize ID type (FI/SE, person/corporation..)
  • Get sex
  • Get birth date
  • Get age

Not sure if we should launch a separate planning document, or use Github repo wiki function to systematically collect and refine the ideas(?).

@antagomir
Copy link
Member Author

.. and yes, not everything needs to be shared but if we use class structures then it might make sense to use a generic class structure that applies to both cases and can have shared methods (to retrieve age, sex etc).

@jukop
Copy link

jukop commented Jul 9, 2020

Perhaps we go with the generic names for the sake of sharing the class structure then and use package specific wrappers to implement good understandable names for the users of each package? Who knows if someone wants to make similar package for some third country later.

@antagomir
Copy link
Member Author

Yes. Let's just be careful with overheads, complexity easily increases with synchronization, too. Having further countries in the loop would be ideal, let's prepare for that.

@MansMeg
Copy link
Contributor

MansMeg commented Jul 9, 2020

I agree that "pin" may be strange in Finnish, but it is also strange in Swedish. I used the English name to enable non-Swedish speakers to understand the concepts (personal identity number and oin, organizational identity number). I think using English for the basic concepts is probably a reasonable approach also in Finnish?

@antagomir
Copy link
Member Author

One possibility is to decide on the shared standards, then have wrappers / aliases to implement country-specific helper functions.

@pitkant
Copy link
Member

pitkant commented Sep 3, 2020

Currently hetu and sweidnumbr share the following similarly named functions:

  • pin_age (Get age)
  • pin_ctrl (Validate the ID)
  • pin_sex (Get sex)
  • pin_to_date (Get birth date)
  • rpin (Generate random valid ID numbers)

I personally find pin to be alright. Curiously it seemed that in English Wikipedia the term "Personal Identity Number" is used only in Sweden's case, whereas other countries call it national identification number, national identity number etc. Finnish authorities seem to prefer the term "personal identity code" (PIC) or even the old-fashioned "social security number".

I am not so sure anymore whether sharing function names with sweidnumbr is wise, if the abbreviation "PIN" is neither used nor brings the correct connotations to mind (like phone pin code) when using it in the Finnish context. Using names like hetu_age, hetu_ctrl, hetu_sex and hetu_to_date (or hetu_birthdate) and rhetu might be clearer but would also remove the risk of using a function from wrong package when having both packages loaded simultenously.

@antagomir
Copy link
Member Author

antagomir commented Sep 3, 2020

I would be ok with either solution. It is potentially useful to have same names across different countries but since those functions (with same name) come from different packages there is indeed a risk of confusion. The "hetu" naming would be well justified and intuively more clear in the Finnish context.

One option is to have the "hetu" names as additional aliases. But this may add too much complexity.

I am not sure how much the unification of the names between the two packages is needed. At least it could facilitate later harmonization. On the other hand, if we now use country specific acronyms (pin for SE and hetu for FI), then some later unified version could introduce an entirely new acronym that would avoid the confusion.

If I would need to choose, I tend to prefer "hetu" names because the main userbase is in Finland anyway, and it is not clear how many use cases there will be for the two packages simultaneously.

Might be useful to hear some feedback from others. In particular @MansMeg @juuussi

@MansMeg
Copy link
Contributor

MansMeg commented Sep 3, 2020 via email

@antagomir
Copy link
Member Author

It is definitely useful to share common functionalities. @pitkant might be able to tell more about possible overlapping parts (if any) now after the latest developments. The functionality itself is similar (picking age, sex, etc) but the technical implementation (functions) are different because of the different structure of pins/hetus between the two countries. Hence, I am not sure how much can be shared but I agree that everything that can be naturally reused from sweidnumbr, should be reused.

In my understanding, the Luhn algorithm is not used in the Finnish context but corrections are welcome.

@pitkant
Copy link
Member

pitkant commented Sep 4, 2020

Sweidnumbr has all pin-related functions in the same pin.R file, whereas in hetu package different functions are in different files. In sweidnumbr many functions such as pin_sex and pin_to_date actually contain the code that is used to extract sex or birthdate out of the pin. In hetu package every pin goes through all the operations in hetu function and relevant information can be then extracted from the data frame, essentially meaning that hetu::pin_sex(pin) or hetu::pin_date(pin) are just fancier ways to write hetu(pin, extract = "sex") or hetu(pin, extract = "date").

Functions that are practically identical in both packages:

  • pin_age
  • pin_sex (although it is embedded in

Functions that are quite similar and could probably be made more similar, if wanted:

  • sweidnumbr::pin_coordn is quite similar to hetu-function check for temporary / artificial pins
  • sweidnumbr::pin_to_date uses long form YYYYMMDD to produce a date whereas hetu::pin_to_date (or pin_date) uses date produced by hetu-function by parsing together information from short form YYMMDD date and century marker (1800, 1900 or 2000). I see no need to combine these as pins in YYYYMMDD form are never used in Finland

Functions that share the same name but actually do slightly things:

  • sweidnumbr::pin_ctrl counts the control number using Luhn algorithm, whereas hetu::pin_ctrl checks if the PIN successfully passes hetu-function and produces a valid data frame. Of course the control number produced by Luhn algorithm somewhat checks that there are no input errors, but not always (if, for example, day and month switch places).

Functions that are exclusive to sweidnumbr:

  • all OIN related functions
  • pin_birthplace (no such information can be extracted in Finland)
  • luhn_algo -function (although we discussed with Leo whether the modulo 31 method used in hetu-pins should be made to its own function)

@antagomir
Copy link
Member Author

Great, thanks for the summary. Seem pretty ok like this to me at least.

@pitkant
Copy link
Member

pitkant commented Jun 15, 2021

In version 1.0.1 the package uses pin_ctrl, pin_sex etc. type function names and hetu_ctrl, hetu_sex as aliases. The latter type is also used in all examples and vignettes and therefore the former style could quite easily become deprecated in the future versions, but that'll be a discussion for a possible future release.

@pitkant pitkant closed this as completed Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants