✨ birth rate monthly improvements #3693

lucasrodes · 2024-12-04T17:07:50Z

Improve metadata: names, descriptions.
Add values for -9 months. Trying to estimate months of conception that lead to peak birth rate in a given year.
Fixed merge operation that was leading to data loss (due to using INNER join)
minor code readability improvements

/schedule

owidbot · 2024-12-04T17:10:08Z

Quick links (staging server):

Site Dev	Site Preview	Admin	Wizard	Docs

Login: ssh owid@staging-site-enhance-birth-rate-improvements

chart-diff: ✅

No charts for review.

data-diff: ❌ Found differences

= Dataset garden/hmd/2024-12-03/hmd_country
  = Table birth_rate_month_max
    ~ Dim country
+       + New values: 61 / 3998 (1.53%)
           year      country
           1924     Bulgaria
           1860       France
           1955       Greece
           1898        Japan
           1945 West Germany
    ~ Dim year
+       + New values: 61 / 3998 (1.53%)
               country  year
              Bulgaria  1924
                France  1860
                Greece  1955
                 Japan  1898
          West Germany  1945
    ~ Column birth_rate_per_day_max (changed metadata, new data, changed data)
-       - title: Peak daily birth rate
+       + title: Peak birth rate per day, on a monthly basis
-       - description_short: The highest average daily number of births, per 1,000 people, recorded in the given year.
        ?                                                                    ^^^^^
+       + description_short: The highest average daily number of births, per million people, recorded in the given year.
        ?                                                                    ^^^^^^^
-       - unit: births per 1,000 people
        ?                  ^^^^^
+       + unit: births per million people
        ?                  ^^^^^^^

+       + New values: 61 / 3998 (1.53%)
               country  year  birth_rate_per_day_max
              Bulgaria  1924                    <NA>
                France  1860                    <NA>
                Greece  1955                    <NA>
                 Japan  1898                    <NA>
          West Germany  1945                    <NA>
        ~ Changed values: 42 / 3998 (1.05%)
                    country  year  birth_rate_per_day_max -  birth_rate_per_day_max +
                   Bulgaria  2021                 24.578798                 24.764009
          England and Wales  2021                 30.864176                 30.850918
                New Zealand  2020                 32.752079                 32.464931
                   Portugal  2022                 24.763571                 24.690685
                     Sweden  2023                 28.571289                 28.543976
    ~ Column birth_rate_per_day_max_lead_9months (changed metadata, new data, changed data)
-       - {}
+       + title: Peak birth rate per day, on a monthly basis, in 9 months
+       + description_short: The highest average daily number of births, per million people, recorded in the given year.
+       + origins:
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality database (v6). [Available online](https://www.mortality.org/File/GetDocument/Public/Docs/MethodsProtocolV6.pdf) (needs log in to mortality.org).
+       +     attribution_short: HMD
+       +     url_main: https://www.mortality.org/Data/ZippedDataFiles
+       +     date_accessed: '2024-11-27'
+       +     date_published: '2024-11-13'
+       +     license:
+       +       name: CC BY 4.0
+       +       url: https://www.mortality.org/Data/UserAgreement
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database, by country
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     description_snapshot: |-
+       +       HMD data by country. This contains the raw data, including their "input data", which HMD defines as:
+       + 
+       +       The Input Database houses the raw data that are the basis for all HMD calculations. Input data files for each population are accessible from the country page.
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality database (v6). [Available online](https://www.mortality.org/File/GetDocument/Public/Docs/MethodsProtocolV6.pdf) (needs log in to mortality.org).
+       +     attribution_short: HMD
+       +     url_main: https://www.mortality.org/Data/ZippedDataFiles
+       +     date_accessed: '2024-11-27'
+       +     date_published: '2024-11-13'
+       +     license:
+       +       name: CC BY 4.0
+       +       url: https://www.mortality.org/Data/UserAgreement
+       + unit: births per million people
+       + display:
+       +   name: Maximum birth rate, per day
+       + presentation:
+       +   topic_tags:
+       +     - Fertility Rate

+       + New values: 61 / 3998 (1.53%)
               country  year  birth_rate_per_day_max_lead_9months
              Bulgaria  1924                            97.067017
                France  1860                            82.915085
                Greece  1955                            54.558788
                 Japan  1898                            59.609818
          West Germany  1945                            41.749641
        ~ Changed values: 3937 / 3998 (98.47%)
           country  year  birth_rate_per_day_max_lead_9months -  birth_rate_per_day_max_lead_9months +
           Denmark  2022                                    NaN                              28.423363
           Estonia  2007                                    NaN                              35.612888
           Iceland  1937                                    NaN                              65.086006
          Portugal  1988                                    NaN                              34.387554
            Sweden  1963                                    NaN                               51.30164
    ~ Column month_max (changed metadata, new data)
-       - title: Month ordinal with the peak daily birth rate
        ?                           ----
+       + title: Month ordinal with peak daily birth rate

+       + New values: 61 / 3998 (1.53%)
               country  year  month_max
              Bulgaria  1924       <NA>
                France  1860       <NA>
                Greece  1955       <NA>
                 Japan  1898       <NA>
          West Germany  1945       <NA>
    ~ Column month_max_lead_9months (changed metadata, new data, changed data)
-       - {}
+       + title: Month ordinal with peak daily birth rate in 9 months
+       + description_short: Number corresponding to the month with the highest daily birth rate.
+       + origins:
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database, by country
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     description_snapshot: |-
+       +       HMD data by country. This contains the raw data, including their "input data", which HMD defines as:
+       + 
+       +       The Input Database houses the raw data that are the basis for all HMD calculations. Input data files for each population are accessible from the country page.
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality database (v6). [Available online](https://www.mortality.org/File/GetDocument/Public/Docs/MethodsProtocolV6.pdf) (needs log in to mortality.org).
+       +     attribution_short: HMD
+       +     url_main: https://www.mortality.org/Data/ZippedDataFiles
+       +     date_accessed: '2024-11-27'
+       +     date_published: '2024-11-13'
+       +     license:
+       +       name: CC BY 4.0
+       +       url: https://www.mortality.org/Data/UserAgreement
+       + unit: ''
+       + presentation:
+       +   topic_tags:
+       +     - Fertility Rate

+       + New values: 61 / 3998 (1.53%)
               country  year  month_max_lead_9months
              Bulgaria  1924                       5
                France  1860                       7
                Greece  1955                       4
                 Japan  1898                       4
          West Germany  1945                       9
        ~ Changed values: 3937 / 3998 (98.47%)
           country  year  month_max_lead_9months -  month_max_lead_9months +
           Denmark  2022                       NaN                        10
           Estonia  2007                       NaN                        10
           Iceland  1937                       NaN                         1
          Portugal  1988                       NaN                         8
            Sweden  1963                       NaN                         7
    ~ Column month_max_name (changed metadata, new data)
-       - title: Month name with the peak daily birth rate
        ?                        ----
+       + title: Month name with peak daily birth rate

+       + New values: 61 / 3998 (1.53%)
               country  year month_max_name
              Bulgaria  1924            NaN
                France  1860            NaN
                Greece  1955            NaN
                 Japan  1898            NaN
          West Germany  1945            NaN
    ~ Column month_max_name_lead_9months (changed metadata, new data, changed data)
-       - {}
+       + title: Month name with peak daily birth rate in 9 months
+       + description_short: Month with the highest daily birth rate.
+       + origins:
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database, by country
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     description_snapshot: |-
+       +       HMD data by country. This contains the raw data, including their "input data", which HMD defines as:
+       + 
+       +       The Input Database houses the raw data that are the basis for all HMD calculations. Input data files for each population are accessible from the country page.
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality database (v6). [Available online](https://www.mortality.org/File/GetDocument/Public/Docs/MethodsProtocolV6.pdf) (needs log in to mortality.org).
+       +     attribution_short: HMD
+       +     url_main: https://www.mortality.org/Data/ZippedDataFiles
+       +     date_accessed: '2024-11-27'
+       +     date_published: '2024-11-13'
+       +     license:
+       +       name: CC BY 4.0
+       +       url: https://www.mortality.org/Data/UserAgreement
+       + unit: ''
+       + presentation:
+       +   topic_tags:
+       +     - Fertility Rate

+       + New values: 61 / 3998 (1.53%)
               country  year month_max_name_lead_9months
              Bulgaria  1924                         May
                France  1860                        July
                Greece  1955                       April
                 Japan  1898                       April
          West Germany  1945                   September
        ~ Changed values: 3937 / 3998 (98.47%)
           country  year  month_max_name_lead_9months - month_max_name_lead_9months +
           Denmark  2022                            NaN                       October
           Estonia  2007                            NaN                       October
           Iceland  1937                            NaN                       January
          Portugal  1988                            NaN                        August
            Sweden  1963                            NaN                          July
  = Table birth_rate_month
    ~ Column birth_rate (changed metadata, changed data)
-       - title: Birth rate (monthly) - << month >>
+       + title: Birth rate, in << month >>

        ~ Changed values: 517 / 47243 (1.09%)
              country  year     month  birth_rate -  birth_rate +
              Finland  2023     March      0.658883      0.658097
            Hong Kong  2020      July       0.45487      0.455102
              Hungary  2020 September      0.870945      0.873273
          New Zealand  2020  December      0.936682      0.925427
             Slovenia  2019     March      0.761227      0.760313
    ~ Column birth_rate_per_day (changed metadata, changed data)
-       - title: Daily birth rate (average in month) - << month >>
+       + title: Birth rate per day, in << month >>
-       - description_short: The average daily number of births, per 1,000 people, calculated for <<month>>.
        ?                                                            ^^^^^
+       + description_short: The average daily number of births, per million people, calculated for <<month>>.
        ?                                                            ^^^^^^^
-       - unit: births per 1,000 people
        ?                  ^^^^^
+       + unit: births per million people
        ?                  ^^^^^^^

        ~ Changed values: 517 / 47243 (1.09%)
              country  year     month  birth_rate_per_day -  birth_rate_per_day +
              Finland  2023     March             21.254305             21.228937
            Hong Kong  2020      July             14.673223             14.680702
              Hungary  2020 September             29.031488             29.109112
          New Zealand  2020  December             30.215536              29.85248
             Slovenia  2019     March             24.555696             24.526213
  = Table birth_rate
    ~ Dim country
+       + New values: 47792 / 47792 (100.00%)
                date       country
          2019-04-30       Germany
          1959-11-30        Greece
          1915-04-30         Italy
          2020-12-31        Norway
          1978-07-31 United States
-       - Removed values: 47243 / 47792 (98.85%)
                date     country
          1994-10-01     Croatia
          1961-11-01     Denmark
          2010-12-01     Estonia
          2001-04-01  Luxembourg
          2017-11-01 New Zealand
    ~ Dim date
+       + New values: 47792 / 47792 (100.00%)
                country       date
                Germany 2019-04-30
                 Greece 1959-11-30
                  Italy 1915-04-30
                 Norway 2020-12-31
          United States 1978-07-31
-       - Removed values: 47243 / 47792 (98.85%)
              country       date
              Croatia 1994-10-01
              Denmark 1961-11-01
              Estonia 2010-12-01
           Luxembourg 2001-04-01
          New Zealand 2017-11-01
    ~ Column birth_rate (changed metadata, new data, changed data)
-       - title: Birth rate (monthly)
        ?                  ^^       ^
+       + title: Birth rate, on a monthly basis
        ?                  ^^^^^^^       ^^^^^^

+       + New values: 47792 / 47792 (100.00%)
                country       date  birth_rate
                Germany 2019-04-30    0.754137
                 Greece 1959-11-30    1.266592
                  Italy 1915-04-30    2.590419
                 Norway 2020-12-31    0.677254
          United States 1978-07-31    1.327256
-       - Removed values: 47243 / 47792 (98.85%)
              country       date  birth_rate
              Croatia 1994-10-01    0.939606
              Denmark 1961-11-01    1.305436
              Estonia 2010-12-01     0.91209
           Luxembourg 2001-04-01    1.036519
          New Zealand 2017-11-01    1.028617
    ~ Column birth_rate_lead_9months (changed metadata, new data)
-       - {}
+       + title: Birth rate, on a monthly basis, by estimated month of conception
+       + description_short: The total number of births per 1,000 people in a given month.
+       + origins:
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality database (v6). [Available online](https://www.mortality.org/File/GetDocument/Public/Docs/MethodsProtocolV6.pdf) (needs log in to mortality.org).
+       +     attribution_short: HMD
+       +     url_main: https://www.mortality.org/Data/ZippedDataFiles
+       +     date_accessed: '2024-11-27'
+       +     date_published: '2024-11-13'
+       +     license:
+       +       name: CC BY 4.0
+       +       url: https://www.mortality.org/Data/UserAgreement
+       +   - producer: Human Mortality Database
+       +     title: Human Mortality Database, by country
+       +     description: |-
+       +       The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.
+       + 
+       + 
+       +       # Scope and basic principles
+       + 
+       +       The database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.
+       + 
+       +       The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, the authors of the database have followed four guiding principles: comparability, flexibility, accessibility, reproducibility.
+       + 
+       + 
+       +       # Computing death rates and life tables
+       + 
+       +       Their process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:
+       + 
+       +       1. Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
+       +       2. Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
+       +       3. Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
+       +       4. Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
+       +       5. Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
+       +       6. Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.
+       + 
+       + 
+       +       # Corrections to the data
+       + 
+       +       The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, the authors have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).
+       + 
+       +       Some available studies assess the completeness of census coverage or death registration in the various countries, and more work is needed in this area. However, in developing the database thus far, the authors did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.
+       + 
+       + 
+       +       # Age misreporting
+       + 
+       +       Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is relatively high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data.
+       + 
+       +       In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.
+       + 
+       +       Age exaggeration, on the other hand, is a more insidious problem. The authors' approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, the authors derive population estimates at older ages from the death counts themselves, employing extinct cohort methods. Such methods eliminate some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration.
+       + 
+       + 
+       +       # Uniform set of procedures
+       + 
+       +       A key goal of this project is to follow a uniform set of procedures for each population. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that the authors have not introduced biases by the authors' own manipulations. The desire of the authors for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups). The authors' general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by completed age, year of birth, and year of death; and 2) population estimates by single years of age on January 1 of each year. For each population, these calculations are performed separately by sex. From these two pieces of information, they compute death rates and life tables in a variety of age-time configurations.
+       + 
+       +       It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, the authors' uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers. Earlier methods were synthesized by choosing what they considered the best among alternative procedures and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because the authors have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.
+       + 
+       +       Although the authors adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. Each country or area is assigned to an individual researcher, who takes responsibility for assembling and checking the data for errors. In addition, the person assigned to each country/area checks the authors' data against other available sources. These procedures help to assure a high level of data quality, but assistance from database users in identifying problems is always appreciated!
+       +     description_snapshot: |-
+       +       HMD data by country. This contains the raw data, including their "input data", which HMD defines as:
+       + 
+       +       The Input Database houses the raw data that are the basis for all HMD calculations. Input data files for each population are accessible from the country page.
+       +     citation_full: |-
+       +       HMD. Human Mortality Database. Max Planck Institute for Demographic Research (Germany), University of California, Berkeley (USA), and French Institute for Demographic Studies (France). Available at www.mortality.org.
+       + 
+       +       See also the methods protocol:
+       +       Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D. A., Riffe, T., Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C., & Barbieri, M. (2021). Methods protocol for the human mortality da

...diff too long, truncated...

Edited: 2024-12-04 19:42:21 UTC
Execution time: 18.58 seconds

github-actions · 2024-12-04T19:43:08Z

✅ Merge Schedule
Scheduled on next cron expression successfully merged

✨ birth rate monthly improvements

d7d625a

github-actions bot assigned lucasrodes Dec 4, 2024

lucasrodes added 3 commits December 4, 2024 20:05

add 9 month lead, improve metadata

b76217d

improve population estimates to get rates

ce0d82a

fix table merge

5827ad0

lucasrodes marked this pull request as ready for review December 4, 2024 19:42

github-actions bot merged commit 3dd8f23 into master Dec 4, 2024
10 checks passed

github-actions bot deleted the enhance-birth-rate-improvements branch December 4, 2024 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ birth rate monthly improvements #3693

✨ birth rate monthly improvements #3693

lucasrodes commented Dec 4, 2024 •

edited

Loading

owidbot commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading

✨ birth rate monthly improvements #3693

✨ birth rate monthly improvements #3693

Conversation

lucasrodes commented Dec 4, 2024 • edited Loading

owidbot commented Dec 4, 2024 • edited Loading

github-actions bot commented Dec 4, 2024 • edited Loading

lucasrodes commented Dec 4, 2024 •

edited

Loading

owidbot commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading