News & Politics

A semi-Markov model for stroke with piecewise-constant hazards in the presence of left, right and interval censoring

A semi-Markov model for stroke with piecewise-constant hazards in the presence of left, right and interval censoring
of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
  Research Article Received 26 October 2011, Accepted 27 June 2012 Published online in Wiley Online Library ( DOI: 10.1002/sim.5534 A semi-Markov model for stroke withpiecewise-constant hazards in thepresence of left, right andinterval censoring Venediktos Kapetanakis, a*† Fiona E. Matthews b andArdo van den Hout c This paper presents a parametric method of fitting semi-Markov models with piecewise-constant hazards in thepresence of left, right and interval censoring. We investigate transition intensities in a three-state illness–deathmodel with no recovery. We relax the Markov assumption by adjusting the intensity for the transition from state2 (illness) to state 3 (death) for the time spent in state 2 through a time-varying covariate. This involves the exacttime of the transition from state 1 (healthy) to state 2. When the data are subject to left or interval censoring, thistime is unknown. In the estimation of the likelihood, we take into account interval censoring by integrating out allpossible times for the transition from state 1 to state 2. For left censoring, we use an Expectation–Maximisationinspired algorithm. A simulation study reflects the performance of the method. The proposed combination of statistical procedures provides great flexibility. We illustrate the method in an application by using data onstroke onset for the older population from the UK Medical Research Council Cognitive Function and AgeingStudy. Copyright © 2012 John Wiley & Sons, Ltd.Keywords:  censored data; semi-Markov model; multi-state modelling; piecewise-constant hazards; EMalgorithm; stroke 1. Introduction Stroke is the rapidly developing loss of brain function due to a disorder in the blood supply to the brain.It can cause serious complications that may lead to death. Stroke is the third largest cause of deathin the UK and the USA [1,2]. Non-fatal stroke may cause serious complications including permanentneurological damage and adult disability.Multi-state modelling is a method of analysing longitudinal data when the observed outcome is acategorical variable. In medical research, multi-state models are often used to model the developmentor progression of a disease, where the different levels of the disease can be seen as the states of themodel. This approach enables the investigation of ageing in the older population by jointly modellingthe rate of having a non-fatal stroke or dying on healthy individuals and the rate of dying after having anon-fatal stroke. Multi-state models have been used in a wide range of applications including AIDS [3],liver cirrhosis [4], cognitive impairment [5], coronary heart disease [6], stroke [7] and various types of cancer [8,9]. Putter  et al.  [10] have published a concise introduction to multi-state modelling.Norris [11] has discussed the theory of stochastic processes and Markov chains. Fitting multi-statemodels involves various assumptions. A common hypothesis is that the data satisfy the first-order time-homogeneous Markov property. According to this assumption, the transition to the next state depends a  National Heart Forum, London, U.K. b  MRC Biostatistics Unit, Institute of Public Health, Cambridge, U.K. c  Department of Statistical Science, University College London, London, U.K. * Correspondence to: Venediktos Kapetanakis, National Heart Forum, Victoria House, 7th Floor, Southampton Row, London,WC1B 4AD, U.K. †  E-mail:  Copyright © 2012 John Wiley & Sons, Ltd.  Statist. Med.  2012  V. KAPETANAKIS, F. E. MATTHEWS AND A. VAN DEN HOUT only on the current state. This means that any previous history of the process can be ignored. Althoughthis assumption simplifies statistical modelling, it may often be inappropriate and lead to incorrectconclusions. A number of extensions to the theory have been proposed including the incorporation of history in the underlying stochastic process. Weiss and Zelen [12] first proposed a semi-Markov modelfor clinical trials. In semi-Markov models, the transition to the next state depends not only on the currentstate but also on the time spent in the current state. This involves the exact transition time from one stateto the other, which in many applications is unknown. In 1999, Commenges introduced the terminologyof a partial Markov model [13]. In partial Markov models, the transition to the next state depends notonly on the current state but also on a multivariate explanatory process that can be predicted at the currentstate. This enables the inclusion of explanatory covariates in multi-state modelling. Faddy [14] appliedsrcinally a model with piecewise-constant transition intensities, which enables intensities to dependon time-varying covariates. Van den Hout and Matthews [15] have also discussed a piecewise-constantapproach for the effect estimation of explanatory variables in multi-state modelling.Longitudinal studies, as opposed to cross-sectional studies, involve repeated observations on the sameindividuals over time. In such studies, researchers often recruit individuals over a range of ages at whichsome participants may have already developed and progressed through the different study endpoints.Longitudinal data are usually collected by monitoring individuals at prespecified times over the periodof an observational study. Thus, the value of monitored variables is known at a discrete set of times,only. The case where the exact value of a variable is unknown and only partial information is available isreferredtoas censoring[16].Therearethreetypes ofcensoring,namelyleft,rightandinterval censoring.In left and right censoring, the value of a variable is known to lie below and above a certain value,respectively. In interval censoring, the value of a variable is known to lie within an interval with knownlimits. Methods for handling right-censored data have been discussed in a number of statistical text-books [17,18] and are widely implemented in medical research. However, methods for adjusting forleft censoring are less frequently employed in longitudinal studies [19]. Ignoring the presence of leftcensoring when estimating the underlying stochastic process that explains the data observed, may causesubstantial bias [19]. Cain  et al.  have shown that including individuals whose data are subject to leftcensoring (by collecting all necessary information at the time of recruitment) rather than excluding themfrom the analysis reduces bias significantly [19]. A notion similar to left censoring is that of left trun-cation. However, left truncation is to be distinguished from left censoring. A left-truncated distributionis one formed from another distribution by cutting off and ignoring the part lying to the left of a fixedvariable value [20]. A left-truncated sample is likewise obtained by ignoring all values smaller than afixed value [20]. Left truncation may occur in longitudinal studies when individuals who have alreadydeveloped and progressed through the different study endpoints before the beginning of the study arenot included in the study. A reason for an individual not to be included in the study is the event of deathbefore the initiation of the study. In 1986, Kay [21] introduced a method that dealt with the problem of right censoring and also handled the case where the time of death is known precisely. Foucher  et al.  haveinvestigated ways to fit multi-state models in the presence of left, right and interval censoring by using ageneralised Weibull distribution for the waiting times of the underlying process [22]. Interval censoringhas often been dealt with by integration [6]. In 1993, Lindsey and Ryan [8] presented another approachfor adjusting for interval censoring based on the Expectation–Maximisation (EM) algorithm.This paper presents a method to incorporate history in the underlying process in the presence of left truncation and left, right and interval censoring. The proposed model combines properties of semi-Markov models and partial Markov models. We handle interval censoring by integration and adjust forleft censoring by using an EM-inspired algorithm [23]. We bypass left truncation by analysing dataonly over the period of follow-up although, for the adjustment for left censoring, assumptions about theprocess before baseline need to be made. We illustrate the method in an application by using data fromthe UK Medical Research Council Cognitive Function and Ageing Study (MRC CFAS). The objectivewas to investigate ageing in the older population by modelling the transition intensities in a three-statemodel that comprises the states ‘healthy’ (state 1), ‘history of stroke’ (state 2) and ‘death’ (state 3) and toinvestigate how time after an individual has a stroke affects the rate of dying. Statistical inference aboutageing is feasible only for the older population because the study includes individuals in their 65th yearand above. Survival after having a stroke has been discussed in several articles [1,24]. These articlesassist the understanding of the mechanisms and the difficulties that exist in the particular data set that isused in the application and enable the validation of the results of the proposed method.Section 2 presents the available data of the MRC CFAS. Section 3 presents the statistical model andthe methodtoinclude time-varyingexplanatory covariates inthe presenceof rightand interval censoring. Copyright © 2012 John Wiley & Sons, Ltd.  Statist. Med.  2012  V. KAPETANAKIS, F. E. MATTHEWS AND A. VAN DEN HOUT We discuss handling left censoring in Section 4. A simulation study in Section 5 shows how assumptionsabout the process before baseline affect the performance of the method. Section 6 illustrates the methodon the MRC CFAS data and investigates model fit graphically. Finally, Section 7 is the discussion. 2. Data The MRC CFAS is a large scale multi-centre longitudinal study conducted in the UK [25]. The study waslaunched in the late 1980s to explore dementia and cognitive decline by using a representative sample of 13 004 people in the older population. The data have also been used to investigate other disorders suchas depression [26] and physical disability [27] and to look at healthy active life expectancy [15]. To date,over 46 000 interviews with participants have been completed. More information on the design of thisstudy is available online ( in a three-state model that comprises the states ‘healthy’ (state 1), ‘history of stroke’ (state 2)and ‘death’ (state 3). Figure 1 illustrates the multi-state model. Of interest is how time after an individualhas a stroke affects the rate of dying.We analyse a subset of the MRC CFAS data, that is, data of the Newcastle centre only. We denotethis data set as the MRC CFAS throughout this work. This subset includes data of 2316 individualsin their 65th year and above, interviewed during the period from 1991 to 2003. These individuals hadup to nine interviews where they were asked whether they had had a stroke since they were last seen,and age at interviews was recorded. Exact dates of death are available even after the end of follow-up.At baseline, history of stroke up to that time was investigated, and individual data for age ( A ), gender( G ; 0 for women and 1 for men), years of education ( E ; 0 for less than 10 years and 1 for 10 yearsor more) and smoking status at age 60 years ( S  ; 0 for non-smoker or ex-smoker and 1 for currentsmoker) were collected. Defining smoking in this way reduces a bias from giving up due to ill health.Smoking habits rarely change after age 60 years. According to the annual report for smoking-relatedbehaviour and attitudes in 2005 [28], smokers over the age 65 years are the least likely to want to stopsmoking, and those who want to give up are more likely to have quit before the age of 65 years.Both the number of interviews and the time between interviews varied among individuals. Figure 2(a)and (b) show the number of interviews per individual and the distribution of the length of follow-upintervals, respectively. The median length of follow-up intervals was 2 years, and the median number of interviews was 2. Figure 2(c) illustrates the distribution of the time between the last interview and thetime of either death or right censoring. Table I shows the frequencies of pairs of consecutive states inthe data. For each state  i  and  j  and over all individuals, these frequencies correspond to the number of times an individual had an observation in state  i  followed by an observation in state  j . Owing to thedefinitions of the states, there were no transitions from state 2 to state 1.In the MRC CFAS longitudinal study, there are a number of potentially observed patterns of follow-upfor each individual. For example, if at the beginning of the study an individual is healthy, then he or shecan either have a stroke in the coming years and die or be still alive when the study ends, or  not   have astroke and either die before the end of the study or be right censored. Likewise, if at the beginning of the study an individual is reported to have had a stroke, then he or she may remain alive or die beforethe end of the study. We depict these various patterns graphically in Figure 3 and label them as separatepatterns A–F. In patterns A, B, E and F, a transition from state 1 to state 2 is known to have happened.For patterns C and D, however, the presence of censoring makes it impossible to know whether sucha transition has taken place. Therefore, two scenarios are possible. An individual may have moved to State 1: Healthy State 2: History of strokeState 3: Death q 12 q 13  q 23 Figure 1.  Three-state model for data from the Medical Research Council Cognitive Function and Ageing Study. Copyright © 2012 John Wiley & Sons, Ltd.  Statist. Med.  2012  V. KAPETANAKIS, F. E. MATTHEWS AND A. VAN DEN HOUT 1234567890200400600800Number of interviews per individual    F  r  e  q  u  e  n  c  y (a) 051015050010001500Length of follow−up intervals (years)    F  r  e  q  u  e  n  c  y (b) 051015 0100200300400Time between the last interview and death or right−censoring (years)    F  r  e  q  u  e  n  c  y (c) Figure 2.  Descriptive statistics of (a) the number of interviews per individual, (b) the time between interviewsand (c) the time between the last interview and either death or censoring. Table I.  Frequencies of pairs of consecutive states, corresponding to the number of times an individual hadan observation in state  i  followed by an observation in state  j , as observed in the MRC CFAS data. To (state  j )Healthy History of stroke Death Censored TotalFrom (state  i )Healthy 2964 113 1328 710 5115History of stroke 0 303 223 55 581Total 2964 416 1551 765 5696 state 2 and never been recorded in this state owing to censoring or may have remained in state 1 until heor she died or the state was right censored.In the data, 2151 individuals were observed in state 1 at baseline, whereas 165 had a stroke beforethe beginning of the study. Individuals who had a stroke before the initiation of the study were asked attheir first interview to report the time of their first stroke. Self-reported data are often subject to measure-ment error due to digit preference, that is, the tendency to round outcomes to pleasing digits [29], andshould be treated with caution. Moreover, in most longitudinal studies, information about the measuredendpoints prior to baseline is rarely available. For this reason, we have developed in this paper a methodthat does not need this information and have not use self-reported data regarding the time of first stroke.Hence, the way the proposed method handles all types of censoring makes the method applicable to mostlongitudinal studies.The median age of individuals at baseline was 74 years. This was imposed by the study design,according to which individuals over their 75th year were over-sampled to achieve equal numberswith individuals aged 65–74 years at baseline. By the study design, every individual was followed upapproximately every 2 years. The time of death is known exactly. Because the exact time of transitionfrom state 1 to state 2 is unknown, the data are subject to left, right and interval censoring. It is possiblethat transitions from state 1 to state 2 may have occurred and not have been observed before death orright censoring at the end of follow-up. Transitions from state 1 to state 2 that take place before thebeginning of the study are left censored if individuals are enrolled in the study or left truncated if theyare not. A reason for an individual not to be enrolled is the event of death before baseline.To include the individuals who were observed in state 2 at the beginning of the study, the estimationof the exact age of onset of state 2 is necessary. This estimation involves assumptions with regardto the age at which these individuals were healthy in the past. Using data published in ‘Key healthstatistics from general practice’ reports of the Office for National Statistics [30], we estimated that 90% Copyright © 2012 John Wiley & Sons, Ltd.  Statist. Med.  2012  V. KAPETANAKIS, F. E. MATTHEWS AND A. VAN DEN HOUT Pattern A 1 1 2 2 3Birth  A b  A 1  N   W  A 20  A  N  Age  (  A ) Pattern B 1 1 2 2 CBirth  A b  A 1  N   W  A 20  A  N  Age  (  A ) Pattern C 1 123Birth  A b  A 1  N   W  A  N  Age  (  A ) Pattern D 1 12CBirth  A b  A 1  N   W  A  N  Age  (  A ) Pattern E 1 2 2 3Birth  A 0  W  A b  A  N  Age  (  A ) Pattern F 1 2 2 CBirth  A 0  W  A b  A  N  Age  (  A ) Figure 3.  Data patterns with regard to possible transitions between the three states. C denotes censoring.  A 0  isage at which all individuals are assumed to be healthy,  A b  is age at baseline,  A 1N   is age at the last time anindividual is observed in state 1,  A 20  is age at the first time an individual is observed in state 2,  A N   is age at theend of the follow-up, and  W   is age at the time of transition from state 1 to state 2. of individuals who have a stroke before the age of 76 years (the median age at baseline for individualswho had a stroke before the beginning of the study) have the stroke after the age of 40 years. For theseindividuals, the probability of having the stroke within the age span 35–44 and 45–55 years is 5.06%and 12.6%, respectively. In the estimation of these figures, we ignored possible cohort effects owing tounavailability of data. Nevertheless, we expect the true estimates to be of similar magnitude. Therefore,when modelling stroke for individuals who were observed in state 2 at baseline, a realistic assumptionwith regard to the age at which these individuals can be assumed to have been healthy in the past is toassume that they were healthy at the age of 40 years.To overcome the difficulties imposed by the study design and the presence of censoring, we used  age , A , as the time scale. Age is also the natural time scale for processes in the older population. We introducethe following notation: A 0  : Agebeforethebeginningofthestudyatwhichallindividualsareassumedtohavebeenhealthy. A b  : Age at baseline. A 1N   : Age at the last time an individual is observed in state 1. Copyright © 2012 John Wiley & Sons, Ltd.  Statist. Med.  2012
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!