Functional principal component analysis and inverse probability weighting Essay

essay B
  • Words: 2497
  • Category: Database

  • Pages: 10

Get Full Essay

Get access to this section to get all the help you need with your essay and educational goals.

Get Access

Chapter 2

LiteratureReappraisal

The incorporate reappraisal of literature for this survey focused on two statistical methods: functional chief constituent analysis and reverse chance burdening method for managing losing informations.

2.1 Functional Principal Component Analysis

Functional chief constituent analysis ( FPCA ) is an of import dimensionality decrease method for functional informations and an extension of multivariate chief constituent analysis. The thought of FPCA is similar to the multivariate chief component analysis ; nevertheless, the weight vector and the informations vector are maps in FPCA. The FPCA method rotates the map values through taking a additive combination of them and so happening the maximal variableness in the map values. Therefore, it allows visualising the variableness construction in the variance-covariance map, which can be difficult to construe.

FPCA starts from two independently developed theories on the optima series enlargement of a uninterrupted stochastic procedure by Karhumen ( 1947 ) and Loeve ( 1945 ) so eigenanalysis extended to built-in operation with symmetric meats from a symmetric matrix which is the foundation of FPCA theory ( Ramsay & A ; Silverman, 2005 ) . Since introduced by Rao ( 1958 ) when he studied a comparing of growing curves, it has attracted many researchers’ attending. Harmonizing to Dauxois et Al. ( 1982 ) , asymptotic belongingss of empirical eigenfunctions and characteristic root of a square matrixs were studied when sample curves were to the full discernible. Castro et Al. ( 1986 ) investigated some basicss of FPCA, such as associating FPCA to Karhumen-Loeve theorem and the bestm-dimensional functional additive theoretical account. Benko et Al. ( 2009 ) extended the survey by Dauxois et Al. ( 1982 ) to a more practical scene where sample curves themselves have to be reconstructed from noisy observations at finite design points. Hall and Hosseini-Nasab ( 2006, 2009 ) studied some statistical belongingss, peculiarly the high-order appraisal mistakes of empirical eigenfunctions and characteristic root of a square matrixs.

Following, definitions and notations of FPCA will be discussed briefly. Under the conditions that maps for a random sampleare square integrable on a bounded and closed clip intervalThymine. The covariance map is square integrable overT x T.The mean and covariance maps are definedd. These exist a decomposition of, ( 2.1 )

whereis uninterrupted pairwise extraneous maps onThyminewith, andis component tonss which is uncorrelated random variables with average 0 and discrepancy. Furthermore,. This decomposition provides an estimate ofby the additive combination of the firstKconstituents. The aim of functional chief constituent analyses is to obtain the estimations of component maps and component tonss from the sample.

To get the better of inordinate variableness of empirical eigenfuctions, Rice and Silverman ( 1991 ) proposed smoothened calculators of eighenfunctions by punishing the sample discrepancy for the raggedness of empirical eigenfunctions. The consistence of the calculators was established by Pessulli and Silverman ( 1993 ) . Subsequently, Silverman ( 1996 ) proposed an alternate manner to obtain smoothened calculators of eigenfunctions through modifying the norm construction, and established the consistence under some regularity conditions. Qi and Zhao ( 2011 ) established the asymptotic normalcy of the calculators of Silverman ( 1996 ) . A kernel-based method for smoothing eigenfunctions was proposed by Boente and Fraiman ( 2000 ) .

Classical FPCA assumes that sample curves are to the full discernible and on a regular basis spaced points of sample are available. Ramsay and Silverman ( 2005 ) introduced several functional chief constituent analysis methods. However, these methods require sufficient figure of observations for each random map. When random curves are merely sparsely ascertained, those methods can non be applied straight. The extension of FPCA for thin longitudinal information was studied by James at Al. ( 2000 ) and Yao et Al ( 2005 ) . James et Al. ( 2000 ) proposed a computationally stable outlook maximization algorithm under strong premise that random map is a Gaussian procedure. The attack in Yao et Al. ( 2005 ) focuses on the appraisal of high-dimensional discrepancy and covariance matrices, every bit good as their opposites, which are known to be computationally unstable. Peng and Paul ( 2009 ) implemented the theoretical account by James et Al. ( 2000 ) utilizing improved fitting process.

2.2 Inverse Probability Weighting Method

Many medical or biological informations include possess assorted types of high-dimensional covariates such as binary, count, and uninterrupted types of variables. Furthermore, informations can be including losing for covariates and result. Therefore, likeliness based ML and multiple imputation methods ( MI ) are non appropriate for informations with these types of losing variables since ML and MI methods require to pattern joint distribution of losing variables and the right specification of a theoretical account for joint distribution might be impossible. Even though joint normalcy is assumed, information on the joint distribution of losing variables is really small. On the other manus, Inverse chance weighting ( IPW ) method with losing result and covariates does non necessitate to stipulate a full parametric theoretical account of joint losing procedure since chance weights can be modeled by univariate normal or logistic distribution ( Need cites )

Meanwhile, IPW method under MAR is criticized chiefly from following three concerns. First, IPW method is by and large less efficient than MI method with right specification for losing variables since IPW method merely uses complete-case informations while MI method uses information in covariates and outcome even from uncomplete sample ( Clayton et al. 1999 ) . Second, IPW estimates can be really unstable of estimated chance weights are close to zero from certain subpopulation units ( Little and Robin, 2002 ) . Third, IPW estimates can be sensitive to the theoretical account specification for the chance of choice or missing.

IPW method is strongly related to study trying techniques. Understanding of study trying techniques provides insight into IPW method.

Inverse chance weighting ( IPW ) is valid under MAR premise ( Robins et Al. 1995 ) ; nevertheless, requires specification of a dropout theoretical account in term of ascertained results and/or covariates. IPW is more by and large used in fringy theoretical accounts for distinct results than for uninterrupted results. The primary thought behind IPW is that, if singlehas a chance of being observed at clipTof, so, this person should be given weight,, so as to minimise the prejudice caused by dropouts in the analysis. The weightfor theth person at clipis assigned as opposite of the cumulative merchandise of fitted chances,, whereis a (Qx 1 ) vector of unknown parametric quantities. In order to discourse the thought of what these weights are, we follow illustration provided by Carpenter et Al. ( 2006 ) .

Suppose that we have the undermentioned information, and so the mean response is 3.

Group

A

Bacillus

C

Response

222

333

444

However, if we have losing values as shown below, so the mean response is 19/6 which is biased.

Group

A

Bacillus

C

Response

2? ?

333

? 44

In order to rectify this prejudice, we calculate the chances of being observed in each group matching to 1/3 in group A, 1 in group B, and 2/3 in group C. We, thenceforth calculated a leaden norm where each observation is weighted by 1/ [ Probability of ascertained response ] . In this instance the leaden norm is given by

which now corrects the prejudice. The decision to be drawn from this simple illustration is that IPW has eliminated the prejudice by retracing the full population by up-weight the information from persons who have little opportunity of being observe. By and large, it may give biased but consistent parametric quantity estimations ( Carpenter et al. , 2006 ) . To discourse the thought of IPW in longitudinal informations, the IPW attack is described thereby exemplifying how IPW can be incorporated into the generalised estimating equations ( GEE ) by Liang and Zeger ( 1986 ) as based on article by Robins et Al. ( 1995 ) .

IPW method is strongly related to study trying techniques. Understanding of study trying techniques provides insight into IPW method.

The IPW method is complete instances are weighted by the opposite of their chances of being observed in order to set for dropouts. IPW was foremost described by Robins et Al. ( 1995 ) who noted that it deals with uncomplete longitudinal informations originating from a MAR mechanism. The roots of this attack in study analysis have been presented by Horvitz and Thompson ( 1952 ) . IPW has been recognized as an attractive attack because it does non necessitate complete specification of the jount distribution of the longitudinal responses, but instead is based merely in specification of the first two minutes ( Grace and Wenquing, 2009 ) . Several methodological research documents in the literature ( Robins et al. , 1995 ; Robins and Rotnitzky, 1995 ; Scharfstein et al. , 1999 ) have proposed improved IPW estimations that are theoretically more efficient, these are estimations where the MAR may be assumed. The IPW method is discussed in more item in Fitzmaruice et Al. ( 1995 ) , Yi and Cook ( 2002a, 2002b ) , Carpenter et Al. ( 2006 ) , and Seaman and White ( 2011 ) .

IPW method can be used to obtain consistent estimations ( Robins, Rotnitzky, & A ; Zhoa, 1995 ) . IPW methd was foremost proposed by Horvitz-Thompson ( Cochran, 1977 ) in sample study literature, where the weights are known and based on study design. In uncomplete informations anlaysis, the general thought behind IPW method is to establish appraisal on the ascertained responses but to burden them to account for the chance of dropping out. Under MAR the weights can be estimated as a map of the ascertained measurings and besides as a map of the covariates and any extra variables that could assist foretell the unseen measurings. The usage of IPW in uncomplete information analysis has been increased ( Robins, Rotnitzky, & A ; Zhao, 1995 ; Schafer, 1999 ; Carpenter, Kenward, & A ; Vansteelandt, 2006 ; Molenberghs and Kenward, 2007 ; Fitzmaurice et al. , 2009 ) .

Another attack for turn toing prejudices that may originate from informations which are MAR involves burdening observations from persons who have provided complete information so that the ensuing leaden complete-case analysis furnishes estimates compatible with the complete sample ( Whittemore and Halpern, 1997 ) . Missing informations are non imputed when IPW is used, instead the complete instances are reweighed to reflect the fact that they are potentially besides stand foring several unseen instances. Available information on uncomplete instances can be exploited in IPW to pattern the chance that an person will be wholly observed ; the weight for each person with complete informations is the opposite of this chance. Thus the IPW attack merely requires a theoretical account for the chance of missingness which can be fitted and implemented in many statistical bundles ( Hogan et al, 2004 ; van der Wal and Geskus, 2011 ) .

IPW, as described above, eliminates the potentially important prejudices of standard complete-case analyses in MAR informations, but does non optimally exploit persons with losing response informations. Augmented reverse chance weighted attacks ( Robins et Al, 1994 ; Tsiatis, 2006 ) are an extension of IPW which allow for greater usage of information from persons with uncomplete informations and, as a consequence, do non endure from every bit much loss of power as IPW may endure from. Augmented reverse chance weighting ( AIPW ) requires the specification of a 2nd theoretical account, but consistent calculators may be found if either of the theoretical accounts is right specified ( Robins et al. , 1994 ; Carpenter et al. , 2006 ; Tsiatis, 2006 ) .

The IPW attack restricts attending to persons with complete responses and achieves consistent calculators by burdening parts by the opposite of the chance of an single being complete.

As introduced by Flanders and Greenland ( 1991 ) and Zhao and Lipsitz ( 1992 ) , weighted methods are based on ascertained values. In this manner, after disregarding all the losing values from the analysis, the staying ascertained values are weighted in conformity with how their distribution approximates of the full sample or population. The methods employ the weights in order to rectify for either standard mistakes associated with the parametric quantities or the population variableness. To deduce suited weights, the predicted chance of each response is estimated from the information from the variable with losing values. By and large talking, burdening methods are a good option under certain fortunes, for illustration, when a losing information form is monotone or is under univariate analysis.

In the context of study informations, Rubin ( 1987 ) discusses several methods for using and gauging weights. Under a suited articulation theoretical account for the result and covariates, these burdening methods are, in many cases, expected to bring forth consequences similar to those of multiple imputation ( Schafer and Graham, 2002 ) .

Survey trying techniques have strongly influenced losing informations methodological analysis. It is important to understand study trying methodological analysis to derive insight into leaden gauging equations. The landmark paper by Horvitz and Thompson ( 1952 ) set the phase for selective design and losing informations methodological analysis. Horvitz et Al. developed a general technique for bettering any statistic when a random sample with unequal chance within subclasses of a finite population is selected. The Horvitz-Thompson calculator, defined as a leaden mean was originally intended to turn to server biased sampling. The statistic was restricted to descriptive statistics, such as the mean and discrepancy.

Horvitz et Al. developed an calculator under two instances. An indifferent line drive calculator and indifferent calculator of the trying discrepancy were developed for a one and two stage trying technique, where choice chances are defined a priori and used to choose a subsample from a finite population. Surveies that rely on these constructs were intended to increase preciseness in the presence of information loss. Although Horvitz et Al. were cognizant that this method reduces the discrepancy they did non turn to which calculator would give a lower limit or “optimal” discrepancy. As a consequence, assorted extensions were proposed over the following 50 old ages.

Prior to 1974 these burdening techniques were restricted to descriptive statistics. Kish and Frankel ( 1974 ) made a major part by widening the Horvits-thompson calculator to complexes statistics and designs such as assurance intervals and illation for arrested development theoretical accounts. Kish et Al. besides felt that “traditional” study trying methods, such as the Horvitz-Thompson calculator, could be implemented outside of the kingdom of study trying design. Survey samples tend to hold lasrge sample sizes so the asymptotic consequences frequently hold. Sample size issues and asymptotics may present a job and demand to be addressed with other designs and types of losing informations jobs due to limited sample size.

Manski and Lerman ( 1977 ) developed a leaden estimating equation for complete informations which used the Horvitz-Thompson attack. Manski et Al. clearly defined a general model and statistical theoretical account ( gauging attack ) for arrested development theoretical accounts under choice-based sampling. This attack can do a loss of efficiency since it uses merely complete informations, but efforts to derive efficiency by delegating larger weights to the complete pseudo-likelihood, accounting for uncomplete instances.

In an attempt to increase efficiency of the estimations of arrested development coefficients, Robins, Rotnitzky, and Zhao ( 1994 ) developed a leaden gauging attack based on semiparametric methods and influence maps. Robins et Al. were the first to develop a semiparametrically efficient calculator for arrested development theoretical accounts with uncomplete covariates. These reverse chance weighted gauging equation ( IPWE ) methods were shown to hold desirable belongingss and to be flexible adequate to manage MAR informations under any type of arrested development job and losing by design/happenstance. This instance of calculators is referred to as IPWE and has prompted many other research workers to prosecute extension of Robins’ IPWE method.

A 3rd attack is based on the complete instances but now burdening them with the opposite of the chance that a instance is observed as introduced by Flanders and Greenland ( 1991 ) and Zhao and Lipsitz ( 1992 ) . In this manner, instances with a low chance to be observed gain more influence in the analysis and therefore stand for the likely losing values in the vicinity. One can look at this attack as an inexplicit imputation of losing values.

If this chance is unknown, which in general is the instance, it can be estimated for case utilizing a non- or semiparametric technique, e.g. kernel-based denseness appraisal, splines or categorization trees.

Get instant access to
all materials

Become a Member
unlock