1、多水平模型英文原著chap9Chapter 9Multilevel event history models9.1 Event history modelsThis class of models, also known as survival time models or event duration models, have as the response variable the length of time between events. Such events may be, for example, birth and death, or the beginning and end
2、 of a period of employment with corresponding times being length of life or duration of employment. There is a considerable theoretical and applied literature, especially in the field of biostatistics and a useful summary is given by Clayton (1988). We consider two basic approaches to the modelling
3、of duration data. The first is based upon proportional hazard models. The second is based upon direct modelling of the log duration, often known as accelerated life models. In both cases we may wish to include explanatory variables. The multilevel structure of such models arises in two general ways.
4、 The first is where we have repeated durations within individuals, analogous to our repeated measures models of chapter 5. Thus, individuals may have repeated spells of various kinds of employment of which unemployment is one. In this case we have a 2-level model with individuals at level 2, often r
5、eferred to as a renewal process. We can include explanatory dummy variables to distinguish these different kinds of employment or states. The second kind of model is where we have a single duration for each individual, but the individuals are grouped into level 2 units. In the case of employment dur
6、ation the level 2 units would be firms or employers. If we had repeated measures on individuals within firms then this would give rise to a 3-level structure.9.2 CensoringA characteristic of duration data is that for some observations we may not know the exact duration but only that it occurred with
7、in a certain interval, known as interval censored data, was less than a known value, left censored data, or greater than a known value, right censored data. For example, if we know at the time of a study, that someone entered her present employment before a certain date then the information availabl
8、e is only that the duration is longer than a known value. Such data are known as right censored. In another case we may know that someone entered and then left employment between two measurement occasions, in which case we know only that the duration lies in a known interval. The models described in
9、 this chapter have procedures for dealing with censoring In the case of the parametric models, where there are relatively large proportions of censored data the assumed form of the distribution of duration lengths is important, whereas in the partially parametric models the distributional form is ig
10、nored. It is assumed that the censoring mechanism is non informative, that is independent of the duration lengths.In some cases, we may have data which are censored but where we have no duration information at all. For example, if we are studying the duration of first marriage and we end the study w
11、hen individuals reach the age of 30, all those marrying for the first time after this age will be excluded. To avoid bias we must therefore ensure that age of marriage is an explanatory variable in the model and report results conditional on age of marriage.There is a variety of models for duration
12、times. In this chapter we show how some of the more frequently used models can be extended to handle multilevel data structures. We consider first hazard based models.9.3 Hazard based models in continuous timeThe underlying notions are those of survivor and hazard functions. Consider the (single lev
13、el) case where we have measures of length of employment on workers in a firm. We define the proportion of the workforce employed for periods greater than t as the survivor function and denote it by where is the density function of length of employment. The hazard function is defined asand represents
14、 the instantaneous risk, in effect the (conditional) probability of someone who is employed at time t, ending employment in the next (small) unit interval of time. The simplest model is one which specifies an exponential distribution for the duration time, which gives , so that the hazard rate is co
15、nstant and . In general, however, the hazard rate will change over time and a number of alternative forms have been studied (see for example, Cox and Oakes, 1984). A common one is based on the assumption of a Weibull distribution, namelyor the associated extreme value distribution formed by replacin
16、g by . Another approach to incorporating time-varying hazards is to divide the time scale into a number of discrete intervals within which the hazard rate is assumed constant, that is we assume a piecewise exponential distribution. This may be useful where there are natural units of time, for exampl
17、e based on menstrual cycles in the analysis of fertility, and this can be extended by classifying units by other factors where time varies over categories. We discuss such discrete time models in a later sectionThe most widely used models, to which we shall devote our discussion, are those known as
18、proportional hazards models, and the most common definition is . The term denotes a linear function of explanatory variables which we shall model explicitly in section 9.5. It is assumed that , the baseline hazard function, depends only on time and that all other variation between units is incorpora
19、ted into the linear predictor . The components of may also depend upon time, and in the multilevel case some of the coefficients will also be random variables.9.4 Parametric proportional hazard modelsFor the case where we have known duration times and right censored data, define the cumulative basel
20、ine hazard function and a variable with mean , taking the value one for uncensored and zero for censored data. It can be shown (McCullagh and Nelder, 1987) that the maximum likelihood estimates required are those obtained from a maximum likelihood analysis for this model where w is treated as a Pois
21、son variable. This computational device leads to the loglinear Poisson model for the i-th observation(9.1)where the term is treated as an offset, that is, a known function of the linear predictor.The simplest case is the exponential distribution, for which we have . Equation (9.1) therefore has an o
22、ffset and the term is incorporated into . We can model the response Poisson count using the procedures of chapter 6, with coefficients in the linear predictor chosen to be random at levels 2 or above. This approach can be used with other distributions. For the Weibull distribution, of which the expo
23、nential is a special case, the proportional hazards model is equivalent to the log duration model with an extreme value distribution and we shall discuss its estimation in a later section. 9.5 The semiparametric Cox modelThe most commonly used proportional hazard models are known as semiparametric p
24、roportional hazard models and we now look at the multilevel version of the most common of these in more detail.Consider the 2-level proportional hazard model for the jk-th level 1 unit(9.2)where is the row vector of explanatory variables for the level 1 unit and some or all of the are random at leve
25、l 2. We adopt the subscripts j,k for levels one and two for reasons which will be apparent below.We suppose that the times at which a level 1 unit comes to the end of its duration period or fails are ordered and at each of these we consider the total risk set. At failure time the risk set consists o
26、f all the level 1 units which have been censored or for which a failure has not occurred immediately preceeding time . Then the ratio of the hazard for the unit which experiences a failure and the sum of the hazards of the remaining risk set units iswhich is simply the probability that the failed un
27、it is the one denoted by (Cox, 1972). It is assumed that, conditional on the , these probabilities are independent. Several procedures are available for estimating the parameters of this model (see for example Clayton, 1991, 1992). For our purposes it is convenient to adopt the following, which invo
28、lves fitting a Poisson or equivalent multinomial model of the kind discussed in chapter 7.At each failure time we define a response variate for each member of the risk setwhere i indexes the members of the risk set, and j,k level 1 and level 2 units. If we think of the basic 2-level model as one of
29、employees within firms then we now have a 3-level model where each level 2 unit is a particular employee and containing level 1 units where is the number of risk sets to which the employee belongs. Level 3 is the firm. The explanatory variables can be defined at any level. In particular they can var
30、y across failure times, allowing so called time-varying covariates. Overall proportionality, conditional on the random effects, can be obtained by ordering the failure times across the whole sample. In this case the marginal relationship between the hazard and the covariates generally is not proport
31、ional. Alternatively, we can consider the failure times ordered only within firms, so that the model yields proportional hazards within firms. In this case we can structure the data as consisting of firms at level 3, failure times at level 2 and employees within risk sets at level 1. In both cases,
32、because we make the assumption of independence across failure times within firms, the Poisson variation is at level 1 and there is no variation at level 2. In other words we can collapse the model to two levels, within firms and between firms. A simple variance components model for the expected Poisson count is written as(9.3)where there is a blocking factor for each failure time. In fact we do not need generally to fit all these nuisance parameters: instead we can obtain efficient estimates of the model parameters by modelling as a smooth function
copyright@ 2008-2023 冰点文库 网站版权所有
经营许可证编号:鄂ICP备19020893号-2