Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

Rey, Grégoire; Aouba, Albertine; Pavillon, Gérard; Hoffmann, Rasmus; Plug, Iris; Westerling, Ragnar; Jougla, Eric; Mackenbach, Johan

doi:10.1186/1478-7954-9-52

Research
Open access
Published: 19 September 2011

Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

Grégoire Rey¹,
Albertine Aouba¹,
Gérard Pavillon¹,
Rasmus Hoffmann²,
Iris Plug²,
Ragnar Westerling³,
Eric Jougla¹ &
…
Johan Mackenbach²

Population Health Metrics volume 9, Article number: 52 (2011) Cite this article

9356 Accesses
18 Citations
2 Altmetric
Metrics details

Abstract

Background

Monitoring the time course of mortality by cause is a key public health issue. However, several mortality data production changes may affect cause-specific time trends, thus altering the interpretation. This paper proposes a statistical method that detects abrupt changes ("jumps") and estimates correction factors that may be used for further analysis.

Methods

The method was applied to a subset of the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project mortality database and considered for six European countries and 13 selected causes of deaths. For each country and cause of death, an automated jump detection method called Polydect was applied to the log mortality rate time series. The plausibility of a data production change associated with each detected jump was evaluated through literature search or feedback obtained from the national data producers.

For each plausible jump position, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a generalized additive regression model, and correction factors were deduced from the results.

Results

Forty-nine jumps were detected by the Polydect method from 1970 to 2005. Most of the detected jumps were found to be plausible. The age- and gender-specific amplitudes of the jumps were estimated when they were statistically heterogeneous, and they showed greater by-age heterogeneity than by-gender heterogeneity.

Conclusion

The method presented in this paper was successfully applied to a large set of causes of death and countries. The method appears to be an alternative to bridge coding methods when the latter are not systematically implemented because they are time- and resource-consuming.

Peer Review reports

Background

The study of cause-specific mortality time series is one of the main sources of information for public health monitoring [1–3]. However, while demonstrative and striking use can be made of such trends when communicating with the general public, many concerns relating to the data production process have to be addressed. More specifically, it is necessary to evaluate, and, if necessary, correct artifacts due to data production changes that may bias the interpretation of time trends over a study period.

The production processes for mortality databases have been similar in many industrialized countries (particularly in Western Europe) since the end of World War II. When a death occurs, a medical certificate based on the international form recommended by the World Health Organization (WHO) [4] is filled in by a physician. The physician reports the causes of death that directly led or contributed to the death on the death certificate. The death certificate is then forwarded to a national (e.g., France) or regional (e.g., Germany) coding office, where it is coded using the International Classification of Diseases (ICD). The ICD has been regularly reviewed and improved since the end of the 19^th century [5]. For each death, an underlying cause of death is selected in compliance with the ICD rules. The underlying cause is the most commonly used in statistical analyses.

Underlying cause coding is a complex process and thus implies potential between-coder coding differences. These differences may produce coding discrepancies over time and space. This is why, in addition to ICD revisions, coding may induce variations in the causes of death by period, region, or country. This situation has resulted in countries using increasingly automated coding systems (ACS).

Variations in mortality trends may also be related to changes in death certification (death certificates, certification habits, diagnoses, etc.). However, the changes are often diffuse and take place over long periods of time, making them harder to take into account. Moreover, the methods of analyzing actual historical variations in mortality by cause of death and taking into account data production process changes are essentially related to changes in the coding process. Of these methods, three main kinds can be distinguished: bridge coding, concordance table and cause recombination, and time series analysis-based methods.

Bridge coding

The bridge coding method is used when there is a major change in the coding process (ICD version change or switch from a manual to an automatic coding system). The method consists of coding a large set of death certificates twice, applying the rules prevailing before and after the change. The ratios of numbers of death calculated by cause category before and after the change, called "comparability ratios," generate information for trend analyses and characterization of "jumps" in mortality time series. However, analyses of long-period time series do not necessarily use comparability ratios [1, 6].

Bridge coding analyses have been carried out in the United States and England and Wales for each ICD change since the eighth version of the ICD (ICD-8) [7–12]. Bridge coding analysis was also used and generated detailed results for the change from ICD-9 to ICD-10 in some countries (Scotland, Sweden, Italy, Spain, France, and Canada) [13–18]. However, to the authors' knowledge, most European countries have not implemented bridge coding to assess ICD changes. Comparability ratios are heterogeneous between country, most likely because of variations in intra-group composition of causes of death, reporting practices, and ICD coding interpretation. Therefore, it is unlikely that a comparability ratio for one country can be inferred from the results of other countries.

Concordance table and cause recombination

The concordance table and cause recombination approach consists of determining the most consistent cause categories, under medical consideration, for two successive ICD revisions. Analysis of mortality using the resulting categories is then theoretically influenced little by coding changes. This approach typically only works well when considering the coding of any particular cause reported on the death certificate. It is often not effective when considering changes in the rules for selecting the underlying cause of death, especially when such rule changes favor the selection of one cause over another. It is often impossible to recombine codes to fully account for these changes.

The method was used on French, Dutch, and Swedish data [19–21]. This approach, complex and time-consuming when it is applied to a single country, is even more difficult to use in the context of an international study [22].

Time series analysis

The time series analysis method consists of looking for sustainable jumps, evaluating their statistical significance and amplitude, and possibly smoothing the time series by adjusting the data with correction factors. The method is easy to document, even when the volume of data considered is large (many countries, many causes of death, etc.). Furthermore, the method is necessary when the time of the change in the data production process is unknown [23]. To the authors' knowledge, the detection of jumps in mortality data has rarely been undertaken [22, 24], but, in particular for Janssen et al's work [22], has given rise to fruitful international public health studies [25–28]. However, the methods used in these studies did not take advantage of the recent development of automated jump detection methods in indexed data analysis (by time or other variables) [29–31]. Interest in the automatic jump detection method resides in its ability to avoid the subjectivity of visual detection or a priori selection of jump positions.

The aim of this paper is to propose a complement to a time series analysis method that was previously developed by Janssen et al. [22], allowing detection of sustainable jumps attributable to changes in data production and development of correction factors by age and gender in order to enable subsequent epidemiological analyses. The method is then applied to a wide range of different mortality time series: 13 causes of death for each of six European countries participating in the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project http://amiehs.lshtm.ac.uk/.

Methods

General approach

The following step-by-step approach was adopted:

1. Given a list of selected causes of death, the ICD codes to be considered were determined by nosologists based on the correspondence table method, while maintaining the medical consistency of the list of codes for the various ICD revisions.

2. An automated jump detection method was applied to the mortality rate time series for each of the selected causes of death.

3. For documented jumps (e.g., ICD changes), the available comparability ratios were compared to the amplitude of the estimated jumps. For nondocumented jumps, general information feedback was requested from the national data producers.

4. For documented or plausible jump positions, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a regression model, and correction factors were deduced from the results.

Mortality data

The mortality data were derived from the AMIEHS project dataset. Six countries were included in the analysis: Spain, France, the Netherlands, Germany, Sweden, and England and Wales (considered together). In order to simplify presentation, Estonia, which is participating in the AMIEHS project, has not been considered in this paper because Estonia used a specific coding system until 1994. The study period is 1970 to 2005. While causes of death are usually coded with four-digit ICD codes, the AMIEHS dataset only contains three-digit codes for practical reasons. Some precision in the characterization of causes has thus been lost.

Generally, deaths are coded using the same ICD revision in each calendar year. The dates of the ICD revisions used by each European country are presented in table 1.

Table 1 Dates of ICD change and automatic coding system (ACS) implementation for six of the AMIEHS European countries

Full size table

Code allocation

For 13 causes of death selected in the AMIEHS project, the method of allocating the ICD-8, ICD-9, and ICD-10 codes was as follows:

• When the cause was included in the Eurostat 65 causes shortlist [32], the codes defined by Eurostat were retained.

• For other causes, two nosologists independently selected the optimal three-digit codes. Then, a final choice was made in order to minimize the coding-related jumps in cause of death-specific time series analysis. Table 2 shows the related codes.

Table 2 ICD-8, ICD-9 and ICD-10 codes for the 13 selected causes of death

Full size table

The automatic jump detection method

The Polydect method [33] was applied to yearly log mortality rate time series for each country and cause of death.

Given that mortality analyses are often based on multiplicative assumptions, log-linear generalized models were used. Thus, the time series jump detection method was applied to the log mortality rates.

Let O_t, be the number of deaths during year t, p_t be the number of person-years, and $L_{t} = log (\frac{O_{t}}{p_{t}})$ be the log mortality rate time series. The occurrence of jumps in the log mortality rate time series may be expressed as follows:

\begin{gathered} log (E (O_{t})) = log (p_{t}) \\ + g (t) + \sum_{t’ \in S} d_{t’} \cdot 1_{(t > t’)}, \end{gathered}

In which g is a continuous function, S is the set of jump locations, and {d_t,t ∈ S} are the corresponding jump magnitudes.

In this model, g, S, and {d_t, t ∈ S} are all assumed unknown.

The method consists of three main steps:

1. A left and right limit of E(L_t) were estimated for each point t using two local polynomial smoothers, denoted P_l(t) and P_r(t), fitted on [t - h, t) and (t, t + h], respectively, where h is the bandwidth for the estimation to be estimated in further steps. If t ∉ S, and the jumps location are distant from at least h, then, given that g is continuous, we expect E(P₁(t)) = E(P_r(t)) = g(t). Else, if t ∈ S, we expect E(P₁(t)) = g(t) and E(P_r(t)) = g(t)+d_t.

The noise σof the L_t process is estimated as:

\hat{σ} = \frac{\sum_{t} min ({(L_{t} - P_{l} (t))}^{2}, {(L_{t} - P_{r} (t))}^{2})}{T - 1}

The polynomial kernel of the smoothers could, a priori, be constant, linear, or quadratic, depending on the number of observations and the curvature level of the time series. Since, in the present case, the number of observations was not greater than 40, and the time series was expected to be quite stable, a linear kernel was selected.

2. Considering M(t) = P_r(t)-P₁(t), jump points were defined as points where the signal-to-noise ratio $\frac{|M (t)|}{\hat{σ}}$ was higher than a threshold C _α.

C _α was chosen such that, if t is not a jump point, $P (\frac{|M (t)|}{\hat{σ}} > C_{α}) \leq α$ . The analytic calculation of C _α is given elsewhere [33]. In the following steps, α was set to 10^-5, a low value, in order to avoid as many as possible false positive jumps.

Then, $S = \{t : \frac{|M (t)|}{\hat{σ}} > C_{α}\}$ and $\{{\hat{d}}_{t} = M (t), t \in \hat{S}\}$ were directly estimated. When several jumps were detected in a time range less than the bandwidth, only the jump that maximized M(t) was retained.

3. The bandwidth h was estimated by minimizing the Hausdorff distance [29], defined as:

\begin{gathered} d_{H} (S, \hat{S}; h) = \\ max \{sup_{t_{1} \in \hat{S}} inf_{t_{2} \in \hat{S}} |t_{1} - t_{2}|, inf_{t_{1} \in \hat{S}} sup_{t_{2} \in \hat{S}} |t_{1} - t_{2}|\}, \end{gathered}

in which $d_{H} (S, \hat{S}; h)$ was calculated through a bootstrap procedure, setting B, the number of batches used, equal to 1000. A full description of this method is given elsewhere [33].

For a given jump i, the multiplicative factor MF_i between before and after the jump period was calculated as:

M F_{i} = exp (d_{i}),

in which d_i is the amplitude of the jump.

Age- and gender-heterogeneity test

Age categories were defined as the tertile of the cause-specific death counts.

Generally, when considering J different population groups (age and gender), a generalized additive model (GAM) with an overdispersed Poisson distribution is used [34, 35]. The model has the following form:

\begin{gathered} log (E (O_{t,j})) = log (p_{t,j}) \\ + g_{j} (t) + \sum_{t’ \in S} d_{t’,j} \cdot 1_{(t > t’)}, \end{gathered}

in which j is one of the J groups, g_j are continuous functions fitted by a thin plate penalized regression spline, S is the set of jump locations, and {d_t,j, t ∉ S} are the corresponding jump magnitudes for group j.

S is supposed known, and the aim is to test for each t ∈ S:

H0: d_{t, 1} = \dots = d_{t,J}

Backward variable selection was used to suppress, successively, age and gender from the model if their respective effects on the jump amplitude were not statistically significant at the 5% level, using Wald's test.

The MGCV (Multiple smoothing parameter estimation by Generalized Cross Validation) R package was used for this purpose [36].

Correction factors

Correction factors were calculated for all confirmed jumps.

The correction factors were calculated for use in subsequent analyses, not discussed in this article, with a log-linear model of general form:

\begin{gathered} log (E (O_{t})) = \\ log (p_{t}) + c_{t} + f (X_{t}), \end{gathered}

in which t is the year between T₁ and T₂ (respectively equal to 1970 and 2005 in this study); c_t is the correction factor and f(X_t) could be any function of independent variables to be estimated.

The correction factors were set so that the last values of the corrected mortality rates were equal to the exact mortality rates, i.e., $c_{T_{2}} = 0$ . This choice was based on the supposed superior quality and between-country comparability of the most recent year's data.

The foregoing results in the following definition of the correction factors c_t:

For t ∈ [T₁, T₂],

\begin{gathered} c_{t} = \sum_{t’ \in S, t’ < t} d_{t’} - \sum_{t’ \in S} d_{t’} \\ = - \sum_{t’ \in S, t’ \geq t} d_{t’} . \end{gathered}

The estimate of c_t was then directly obtained from the estimates of S and d_t detailed earlier.

A corrected version of the log mortality rate was then obtained as:

L_{t}^{cor} = L_{t} - c_{t}

Results

Jump detection

By applying the jump detection method to all of the time series, a set of jumps was obtained (table 3). Most of the jumps detected were concomitant with a known coding change (ICD updates or change from a manual to an automatic coding system). Some of the jumps (e.g., for heart failure and rheumatic heart disease) were of great amplitude and almost systematically observed in each country. For the former East Germany, most of the changes were concomitant with the reunification of Germany.

Table 3 Jumps in the log mortality rate time series for the 13 selected causes of death from 1970 to 2006 identified by the Polydect method

Full size table

The answers from data producers, contacted to determine whether the jump was related to a data production issue, were consistent between countries. Most (excluding the "No answer," 42 out of 44) of the detected jumps were confirmed to be related to a coding change. The three-digit coding constraint was given as an explanation for some jumps (rheumatic heart disease in France, ischemic heart disease in Spain, etc.), especially when countries chose specific codes (as in Spain for malignant colorectal neoplasm). The 1990 and 1991 jumps in East Germany were related to a complete change of coding staff. However, most of the coding changes are not documented by a literature reference.

Given the large proportion of confirmed jumps, we decided to exclude from subsequent treatment the jumps for which we received a negative answer from data producers.

It was possible to compare a few of the multiplicative factors with the comparative ratios generated by bridge coding studies corresponding to ICD-9 to ICD-10 changes (table 4). In particular, the large multiplicative factors (e.g., for rheumatic heart disease) had no related comparative ratios. Some coding changes were not detected by the jump detection method (Hodgkin's disease in England and Wales and Sweden and renal failure in England and Wales). However, none of the detected jumps were found unrelated to a coding change.

Table 4 Comparative ratios (CR) between bridge coding and multiplicative factors (MF) estimated by the jump detection method for ICD-9 to ICD-10 coding change

Full size table

Corrected mortality rate time series

Considering some of the most clear-cut time series, the profile of the corrected time series is quite different from that of the uncorrected series (Figure 1). It is noteworthy that the corrected curves do not reduce the general trends at the jump positions, which would have been the case if constant rather than linear kernel smoothing was chosen. Rather, they prolong the trends, even if the jump is in the opposite direction of the general trend.

Concerning hypertension in the Netherlands, we observe a trend shift between the periods before and after the corrected jumps. Such trend shift is not taken into account by the current method.

Estimates of jump amplitudes by age and gender

With regard to the jump amplitude heterogeneity test by age and gender, only 19 out of 47 jumps were not statistically significantly heterogeneous (table 5). Five of the jumps were heterogeneous by gender, 15 by age, and eight by age and gender simultaneously. While the jump amplitudes are of the same order by gender, even when statistically heterogeneous, they are of different orders when considered by age group. This was particularly marked for rheumatic heart disease and heart failure.

Table 5 Multiplicative factor by age and gender, if statistically heterogeneous, for each detected jump

Full size table

Discussion

The originality of the methodology reported herein mainly resides in its ability to detect jumps automatically using the Polydect method, without a priori or visual investigation for jump positions. In addition, application of the method to a large dataset is less time-consuming and less human-dependent than any other known method.

Some methodological choices were made, such as the choice of a linear kernel smoother and the choice of the probability α of detecting fake jumps. Considering a constant kernel smoother or different values of α slightly affected the final set of detected jumps and only for time series in which the jump amplitudes were of an order comparable to that of the overall noise of the time series. Choosing a low value of α insured a better accuracy in the jump's amplitude estimation, which is more statistically stable when the jump is of much larger amplitude than the overall noise of the time series. According to the visual inspection of time series graphs and comparable bridge coding results, a jump's amplitude estimates were reliable enough to be used in subsequent analyses.

The codes used in this study to characterize the conditions were not chosen to be used in all contexts. Indeed, they were allocated with the constraint of being comparable between three versions of the ICD and based on three-digit codes. Taking each ICD individually would certainly have led us to select other codes.

The method is designed to detect sustained jumps. Therefore, it is not sensitive to the occurrence of one-year outliers in time series data and it does not necessitate considering them separately, unlike other methods [22].

However, the proposed method is not able to detect and correct for nonabrupt data production changes. For example, if a new death certificate form, impacting certification practice and final coding, slowly spread through the population (as was the case in France between 1997 and 1999), the impact on yearly death counts would occur over several years. But, to the authors' knowledge, no general method is able to correct time-spread data production changes.

When comparable, the multiplicative factors obtained from bridge coding studies and time series methods were similar [11, 15–17, 37, 38].

The purpose of this article is not to challenge bridge coding studies. However, bridge coding studies are not implemented in all countries, and it would be very difficult and costly to do so retrospectively for every data production change. The time series analysis methods proposed herein provide a reliable way of correcting data production changes affecting death count time trends.

Given the indirect manner in which data production changes are identified, the method necessitates feedback from data producers in order to confirm and explain the plausibility of the changes. Without that additional information, the automatic method would blindly correct any detected jumps, some of which may be related to real abrupt and sustained variations in the mortality risk. However, it is not always straightforward for a data producer to obtain a broad overview of past coding process methods in the producer's country. The reasons for the occurrence of some of the oldest jumps may have been lost. Therefore, the decision to take into account or not any detected jump that is not confirmed by the data producer will depend on the degree of confidence that the jump is not attributable to a production change.

Some jumps are of great amplitude (e.g., rheumatic heart disease). This may be observed when the cause considered is highly likely to be the result of other causes [10, 23]. In that case, the death count time trend is very sensitive to changes in coding rules (e.g., ICD-10 rule 3). However, the absence of high-amplitude jumps is not sufficient to ensure the interpretability of time trends. Time trends for some conditions like hypertension, heart failure, and renal failure have to be interpreted cautiously. Indeed, the approach chosen was to only consider the underlying cause of death, and these specific causes may be selected as underlying, due to lack of additional information about the real underlying cause on the death certificate. In these cases, mortality time trends could be influenced by other conditions or slowly diffused certification changes. A multiple cause approach considering each cause mentioned on the death certificate could bring very different results.

Large jumps may also be observed when a country uses very specific codes. In this study, for practical reasons, it was decided to use the same codes for all the countries. However, the same general method could have been applied to specific codes for each country.

In any event, time trends for causes with large amplitude jumps, even after correction, are to be interpreted with caution.

For some causes, jump amplitude was markedly heterogeneous by age. This result has already been observed in bridge coding studies [11, 16]. This result could be attributed to three factors: first, for some causes, subcause structure is different by age, and each subcause is differentially impacted by production change; second, older age mortality is more frequently associated with multiple pathologies, and the selection of one of these as the underlying cause may change with coding rules; third, in certain cases, the same death certificate may be interpreted differently depending on the age of the deceased, and this difference may also depend on the coding rules used.

Conclusions

The method presented in this paper was successfully applied to a large set of causes of death and countries. The set of causes considered is heterogeneous in terms of frequency of occurrence (e.g., more than a 100-fold difference between the frequencies of cerebrovascular disease and malignant neoplasm of the testes) and sensitivity to coding change (no sensitivity for congenital heart disease and high sensitivity for heart failure).

In the future, it would be of interest to investigate the extent to which such a time series approach could be used in a spatial approach to some specific causes. The hypotheses would then be that a large and clear-cut discontinuous change in cause-specific death count, coinciding with a country's border, is attributable to data production discrepancies rather than to real underlying mortality risk variations.

References

Griffiths C, Brock A: Mortality trends in England and Wales. Health Statistics Quarterly 2003, (18):5-24.
Montserrat A, Devis T, Westlake S, Sumun E, Hervé A, Brückner G, et al.: Health statistics - Key data on health 2002 - Data 1970-2001. Luxembourg: Office for Official Publications of the European Communities; 2002.
Google Scholar
DREES: L'état de santé de la France en 2008 - indicateurs. Paris; 2008.
Google Scholar
International Classification of Diseases and Related Health Problems, Tenth Revision Geneva; 1992.
History of the development of the ICD: International Statistical Classification of Diseases and Related Health Problems. Geneva: WHO; 2004:145-158.
Google Scholar
Allender S, Scarborough P, O'Flaherty M, Capewell S: Patterns of coronary heart disease mortality over the 20th century in England and Wales: Possible plateaus in the rate of decline. BMC Public Health 2008, 8: 148. 10.1186/1471-2458-8-148
Article PubMed PubMed Central Google Scholar
Anderson RN, Rosenberg HM: Disease classification: measuring the effect of the Tenth Revision of the International Classification of Diseases on cause-of-death data in the United States. Stat Med 2003,22(9):1551-70. 10.1002/sim.1511
Article PubMed Google Scholar
Klebba A, Scott J: Estimates of selected comparability ratios based on dual coding of 1976 death certificates by the eight and ninth revisions of the international classification of diseases. Monthly Vital Statistics Report 1980.,28(11 supp):
Brock A, Griffiths C, Rooney C: The effect of the introduction of ICD-10 on cancer mortality trends in England and Wales. Health Statistics Quarterly 2004., 23:
Google Scholar
Brock A, Griffiths C, Rooney C: The impact of introducing ICD-10 on analysis of respiratory mortality trends in England and Wales. Health Statistics Quarterly 2006., 29:
Google Scholar
Griffiths C, Brock A, Rooney C: The impact of introducing ICD-10 on trends in mortality from circulatory diseases in England and Wales. Health Statistics Quarterly 2004, 22: 14-20.
PubMed Google Scholar
Griffiths C, Rooney C: Results of the England and Wales ICD-10 comparability study: the effect on main injury and external causes. In Meeting of WHO collaborating centres for the family of international classifications. Cologne, Germany; 2003.
Google Scholar
Pace M, Bruzzone S, Frova L: Bridge coding study in Italy following ICD-9 to ICD-10 transition: evidences and international comparisons. In EAPS working group on Health, morbidity and mortality. Workshop on Individual, area and group variation in morbidity and mortality. Roma; 2007.
Google Scholar
The introduction of ICD10 for cause of death coding in scotland General Register Office for Scotland; 2001.
Johansson L: Mortality bridge coding ICD-9/ICD-10: preliminary results from a Statistics Sweden study. In Meeting of heads of WHO collaborating centres for the classification of diseases. Rio de Janeiro, Brazil; 2000.
Google Scholar
Cano-Serral G, Perez G, Borrell C, Group C: Comparability between ICD-9 and ICD-10 for the leading causes of deaths in Spain. Rev epidemiol Sante Publique 2006, 54: 355-365. 10.1016/S0398-7620(06)76730-X
Article CAS PubMed Google Scholar
Pavillon G, Boileau J, Renaud G, Lefèvre H, Jougla E: Bridge coding ICD9-ICD10 and effects on French mortality data. In WHO family of International Classifications network meeting. Reykjavik, Iceland; 2004.
Google Scholar
Comparability of ICD-10 and ICD-9 for mortality statistics in Canada Statistics Canada; 2005.
Vallin J, Mesle F: Causes of death in France from 1925 to 1978. Reconstitution of coherent statistical series. Cah Sociol Demogr Med 1987,27(4):297-319.
CAS PubMed Google Scholar
Wolleswinkel-van Den Bosch JH, Van Poppel FW, Mackenbach JP: Reclassifying causes of death to study the epidemiological transition in the Netherlands,1875-1992. Eur J Popul 1996,12(4):327-61. 10.1007/BF01796912
Article CAS PubMed Google Scholar
Klassificering av dödsorsaker i svensk statistik [Classification of causes of death in Swedish statistics] Statistics Sweden; 1990.
Janssen F, Kunst AE: ICD coding changes and discontinuities in trends in cause-specific mortality in six European countries, 1950-99. Bull World Health Organ 2004,82(12):904-13.
PubMed Google Scholar
Jansson B, Johansson LA, Rosen M, Svanstrom L: National adaptations of the ICD rules for classification--a problem in the evaluation of cause-of-death trends. J Clin Epidemiol 1997,50(4):367-75. 10.1016/S0895-4356(96)00426-X
Article CAS PubMed Google Scholar
Pearson-Nelson BJ, Raffalovich LE, Bjarnason T: The effects of changes in the World Health Organization's International Classification of Diseases on suicide rates in 71 countries, 1950-1999. Suicide Life Threat Behav 2004,34(3):328-36. 10.1521/suli.34.3.328.42774
Article PubMed Google Scholar
Janssen F, Kunst AE: Cohort patterns in mortality trends among the elderly in seven European countries, 1950-99. Int J Epidemiol 2005,34(5):1149-59. 10.1093/ije/dyi123
Article CAS PubMed Google Scholar
Amiri M, Kunst AE, Janssen F, Mackenbach JP: Cohort-specific trends in stroke mortality in seven European countries were related to infant mortality rates. J Clin Epidemiol 2006,59(12):1295-302. 10.1016/j.jclinepi.2006.03.007
Article CAS PubMed Google Scholar
Janssen F, Peeters A, Mackenbach JP, Kunst AE: Relation between trends in late middle age mortality and trends in old age mortality--is there evidence for mortality selection? J Epidemiol Community Health 2005,59(9):775-81. 10.1136/jech.2004.028407
Article CAS PubMed PubMed Central Google Scholar
Janssen F, Mackenbach JP, Kunst AE: Trends in old-age mortality in seven European countries, 1950-1999. J Clin Epidemiol 2004,57(2):203-16. 10.1016/j.jclinepi.2003.07.005
Article CAS PubMed Google Scholar
Joo JH, Qiu P: Jump detection in a regression curve and its derivative. 2009.
Google Scholar
Qiu P, Yandell B: A local polynomial jump detection algorithm in nonparametric regression. Technometrics 1998, 40: 141-152. 10.2307/1270648
Google Scholar
Wu WB, Zhao Z: Inference of time trends in time series. J R Statis Soc B 2007,69(3):391-410. 10.1111/j.1467-9868.2007.00594.x
Article Google Scholar
Jougla E, Pavillon G, Rossollin F, De Smedt M, Bonte J: Improvement of the quality and comparability of causes-of-death statistics inside the European Community. EUROSTAT Task Force on "causes of death statistics". Rev Epidemiol Sante Publique 1998,46(6):447-56.
CAS PubMed Google Scholar
Zhang B, Su Z, Qiu P: On jump detection in regression curves using local polynomial kernel estimation. Pak J Stat 2009.
Google Scholar
Wood SN: Fast stable direct fitting and smoothness selection for generalized additive models. J R Statist Soc 2008,70(3):495-518. 10.1111/j.1467-9868.2007.00646.x
Article Google Scholar
Hastie T, Tibshirani R: Generalized additive models for medical research. Stat Methods Med Res 1995,4(3):187-96. 10.1177/096228029500400302
Article CAS PubMed Google Scholar
Wood SN: Generalized Additive Models: An Introduction with R. Boca Raton, FL: Chapman & Hall/CRC; 2006.
Google Scholar
OPCS: Mortality statistics: cause 1984. Series DH2 1985., 11:
Google Scholar
OPCS: Mortality statistics: cause 1993 (revised) and 1994. Series DH2 1995., 20:
Google Scholar

Download references

Acknowledgements

We would like to express our gratitude to Cleo Rooney and Vanessa Fearn with the Office for National Statistics, England and Wales, Lars Age Johansson with the Swedish National Board of Health and Welfare, Torsten Schelhase with the German Federal Statistical Office, Maria Del Rosario Gonzales Garcia with the Spanish National Statistics Institute, Jan Kardaun with the Dutch National Bureau of Statistics, and Gérard Pavillon and Jean Boileau with the French National Institute of Health and Medical Research, for their expertise on their respective databases.

We also are very grateful to Anton Kunst for his role in the study initiation and Andy Mullarky for his skilful assistance in the preparation of the English version of this manuscript.

The study was financially supported by the European Commission (Public Health Program, Agreement number: 2007106).

Author information

Authors and Affiliations

INSERM, CépiDc, Le Kremlin-Bicêtre, College Station, France
Grégoire Rey, Albertine Aouba, Gérard Pavillon & Eric Jougla
Erasmus MC, Department of Public Health, Rotterdam, The Netherlands
Rasmus Hoffmann, Iris Plug & Johan Mackenbach
Department of Public Health and Caring Sciences, Social Medicine, Uppsala University, Uppsala, Sweden
Ragnar Westerling

Authors

Grégoire Rey
View author publications
You can also search for this author in PubMed Google Scholar
Albertine Aouba
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Pavillon
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Iris Plug
View author publications
You can also search for this author in PubMed Google Scholar
Ragnar Westerling
View author publications
You can also search for this author in PubMed Google Scholar
Eric Jougla
View author publications
You can also search for this author in PubMed Google Scholar
Johan Mackenbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grégoire Rey.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors read and approved the final manuscript. GR participated in the design of the study, conducted the analysis, and drafted the manuscript. AA and GP conducted the code allocation and revised the manuscript. RH and IP provided the dataset and participated in the revision of the results and manuscript. RW revised the manuscript. EJ and JM participated in the design of the study and revised the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rey, G., Aouba, A., Pavillon, G. et al. Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes. Popul Health Metrics 9, 52 (2011). https://doi.org/10.1186/1478-7954-9-52

Download citation

Received: 31 January 2011
Accepted: 19 September 2011
Published: 19 September 2011
DOI: https://doi.org/10.1186/1478-7954-9-52

Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

Abstract

Background

Methods

Results

Conclusion

Background

Bridge coding

Concordance table and cause recombination

Time series analysis

Methods

General approach

Mortality data

Code allocation

The automatic jump detection method

Age- and gender-heterogeneity test

Correction factors

Results

Jump detection

Corrected mortality rate time series

Estimates of jump amplitudes by age and gender

Discussion

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Keywords

Population Health Metrics

Contact us

Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

Abstract

Background

Methods

Results

Conclusion

Background

Bridge coding

Concordance table and cause recombination

Time series analysis

Methods

General approach

Mortality data

Code allocation

The automatic jump detection method

Age- and gender-heterogeneity test

Correction factors

Results

Jump detection

Corrected mortality rate time series

Estimates of jump amplitudes by age and gender

Discussion

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Population Health Metrics

Contact us