Functional principal component analysis and inverse probability weighting Essay Example
Functional principal component analysis and inverse probability weighting Essay Example

Functional principal component analysis and inverse probability weighting Essay Example

Available Only on StudyHippo
  • Pages: 5 (2532 words)
  • Published: September 15, 2017
  • Type: Analysis
Text preview


The incorporated reappraisal of literature for this survey focused on two statistical methods: functional chief constituent analysis and reverse chance burdening method for managing losing information.

Functional Principal Component Analysis

Functional chief constituent analysis ( FPCA ) is an import dimensionality decrease method for functional pieces of information and an extension of multivariate chief constituent analysis. The thought of FPCA is similar to the multivariate chief component analysis; nevertheless, the weight vector and the pieces of information vector are maps in FPCA. The FPCA method rotates the map values by taking an additive combination of them and so happening the maximal variableness in the map values. Therefore, it allows visualizing the variableness construction in the variance-covariance map, which can be difficult to construe.

FPCA starts from two independently developed theories on the optima series enlargement of an uninterrupted stochastic procedure by Karhumen ( 1947 ) and Love ( 1945 ) so eigenanalysis extended to built-in operation with symmetric meats from a symmetric matrix which is the foundation of FPCA theory ( Ramsay & Silverman, 2005 ) . Since introduced by Rao ( 1958 ) when he studied a comparing of growing curves, it has attracted many researchers’ attending. Harmonizing to Dauxois et Al. ( 1982 ) , asymptotic belongings of empirical eigenfunctions and characteristic root of a square matrix were studied when sample curves were to the full discernible. Castro et Al. ( 1986 ) investigated some basics of FPCA, such as associating FPCA to the Karhumen-Love theorem and the best-dimensional functional additive theoretical account. Benko et Al. ( 2009 ) e


xtended the survey by Dauxois et Al. ( 1982 ) to a more practical scene where sample curves themselves have to be reconstructed from noisy observations at finite design points. Hall and Hosseini-Nasab ( 2006, 2009 ) studied some statistical belonging, peculiarly the high-order appraisal mistakes of empirical eigenfunctions and characteristic root of a square matrix.

Following, definitions and notations of FPCA will be discussed briefly. Under the conditions that map for a random sample square-integrable on a bounded and closed clip interval Thymine. The covariance map is square-integrable overT x T.The mean and covariance maps are defined. These exist a decomposition of, where is uninterrupted pairwise extraneous maps on Thymine with, andis component tons which is uncorrelated random variables with average 0 and discrepancy. Furthermore,. This decomposition provides an estimate of the additive combination of the first constituents. The aim of functional chief constituent analyses is to obtain the estimations of component maps and component tones from the sample.

To get the better of inordinate variableness of empirical eigenfunctions, Rice and Silverman ( 1991 ) proposed smoothened calculators of eigenfunctions by punishing the sample discrepancy for the raggedness of empirical eigenfunctions. The consistency of the calculators was established by Pessulli and Silverman ( 1993 ) . Subsequently, Silverman ( 1996 ) proposed an alternate manner to obtain smoothened calculators of eigenfunctions through modifying the norm construction and establishing the consistency under some regularity conditions. Qi and Zhao ( 2011 ) established the asymptotic normalcy of the calculators of Silverman ( 1996 ) . A kernel-based method for smoothing eigenfunctions was proposed by Boente

View entire sample
Join StudyHippo to see entire essay

and Fraiman ( 2000 ) .

Classical FPCA assumes that sample curves are to the full discernible and on a regular basis spaced points of the sample are available. Ramsay and Silverman ( 2005 ) introduced several functional chief constituent analysis methods. However, these methods require a sufficient figure of observations for each random map. When random curves are merely sparsely ascertained, those methods can non be applied straight. The extension of FPCA for thin longitudinal information was studied by James at Al. ( 2000 ) and Yao et Al ( 2005 ) . James et Al. ( 2000 ) proposed a computationally stable outlook maximization algorithm under the strong premise that a random map is a Gaussian procedure. The attack in Yao et Al. ( 2005 ) focuses on the appraisal of high-dimensional discrepancy and covariance matrices, every bit good as their opposites, which are known to be computationally unstable. Peng and Paul ( 2009 ) implemented the theoretical account by James et Al. ( 2000 ) utilizing an improved fitting process.

 Inverse Probability Weighting Method

Many medical or biological information includes possessing assorted types of high-dimensional covariates such as binary, count, and uninterrupted types of variables. Furthermore, information can be included losing for covariates and results. Therefore, likeliness based ML and multiple imputation methods ( MI ) are non appropriate for informations with these types of losing variables since ML and MI methods require to pattern joint distribution of losing variables and the right specification of a theoretical account for joint distribution might be impossible. Even though joint normalcy is assumed, information on the joint distribution of losing variables is really small. On the other manus, the Inverse chance weighting ( IPW ) method with losing result and covariates does non necessitate to stipulate a full parametric theoretical account of the joint losing procedure since chance weights can be modeled by univariate normal or logistic distribution ( Need cites )

Meanwhile, IPW method under MAR is criticized chiefly for the following three concerns. First, IPW method is by and large less efficient than MI method with the right specification for losing variables since IPW method merely uses complete-case information while MI method uses information in covariates and outcomes even from uncomplete samples ( Clayton et al. 1999 ) . Second, IPW estimates can be really unstable if estimated chance weights are close to zero from certain subpopulation units ( Little and Robin, 2002 ) . Third, IPW estimates can be sensitive to the theoretical account specification for the chance of choice or missing.

IPW method is strongly related to study trying techniques. Understanding of study trying techniques provides insight into IPW method.

Inverse chance weighting ( IPW ) is valid under MAR premise ( Robins et Al. 1995 ) ; nevertheless, requires specification of a dropout theoretical account in term of ascertained results and/or covariates. IPW is more by and large used in fringy theoretical accounts for distinct results than for uninterrupted results. The primary thought behind IPW is that, if single has a chance of being observed at clip Tof, so, this person should be given weight, , so as to minimise the

prejudice caused by dropouts in the analysis. The weight for the th person at clip is assigned as opposite of the cumulative merchandise of fitted chances,, where is a (Qx 1 ) vector of unknown parametric quantities. In order to discourse the thought of what these weights are, we follow the illustration provided by Carpenter et Al. ( 2006 ) .

Suppose that we have the undermentioned information, and so the mean response is 3.

Group A Bacillus C
Response 222 333 444

However, if we have losing values as shown below, so the mean response is 19/6 which is biased.

Group A Bacillus C
Response 222 333 4 44

In order to rectify this prejudice, we calculate the chances of being observed in each group matching to 1/3 in group A, 1 in group B, and 2/3 in group C. We, thenceforth calculated a leaden norm where each observation is weighted by 1/ [ Probability of ascertained response ] . In this instance, the leaden norm is given by which now corrects the prejudice. The decision to be drawn from this simple illustration is that IPW has eliminated the prejudice by retracing the full population by up-weight the information from persons who have little opportunity of being observe. By and large, it may give biased but consistent parametric quantity estimations ( Carpenter et al. , 2006 ) . To discourse the thought of IPW in longitudinal information, the IPW attack is described thereby exemplifying how IPW can be incorporated into the generalized estimating equations ( GEE ) by Liang and Zeger ( 1986 ) as based on the article by Robins et Al. ( 1995 ) .

IPW method is strongly related to study trying techniques. Understanding of study trying techniques provides insight into IPW method.

The IPW method is complete instances are weighted by the opposite of their chances of being observed in order to set for dropouts. IPW was foremost described by Robins et Al. ( 1995 ) who noted that it deals with incomplete longitudinal information originating from a MAR mechanism. The roots of this attack in study analysis have been presented by Horvitz and Thompson ( 1952 ) . IPW has been recognized as an attractive attack because it does non necessitate complete specification of the joint distribution of the longitudinal responses but instead is based merely on the specification of the first two minutes ( Grace and Wenquing, 2009 ) . Several methodological research documents in the literature ( Robins et al. , 1995 ; Robins and Rotnitzky, 1995 ; Scharfstein et al. , 1999 ) have proposed improved IPW estimations that are theoretically more efficient, these are estimations where the MAR may be assumed. The IPW method is discussed in more items in Fitzmaurice et Al. ( 1995 ) , Yi and Cook 2002a, 2002b ) , Carpenter et Al. ( 2006 ) , and Seaman and White ( 2011 ) .

IPW method can be used to obtain consistent estimations ( Robins, Rotnitzky, & Zhoa, 1995 ) . IPW method was foremost proposed by Horvitz-Thompson ( Cochran, 1977 ) in sample study literature, where the weights are known and based on study design. In uncomplete information analysis, the general thought behind IPW method is to

establish appraisal on the ascertained responses but to burden them to account for the chance of dropping out. Under MAR the weights can be estimated as a map of the ascertained measurings and besides as a map of the covariates and any extra variables that could assist foretell the unseen measurings. The usage of IPW in uncomplete information analysis has been increased ( Robins, Rotnitzky, & Zhao, 1995 ; Schafer, 1999 ; Carpenter, Kenward, & A ; Vansteelandt, 2006 ; Molenberghs and Kenward, 2007 ; Fitzmaurice et al. , 2009 ) .

Another attack for turn toing prejudices that may originate from information which are MAR involves burdening observations from persons who have provided complete information so that the ensuing leaden complete-case analysis furnishes estimates compatible with the complete sample ( Whittemore and Halpern, 1997 ) . Missing informations are non imputed when IPW is used, instead the complete instances are reweighed to reflect the fact that they are potentially besides stand forcing several unseen instances. Available information on uncomplete instances can be exploited in IPW to pattern the chance that an person will be wholly observed ; the weight for each person with complete informations is the opposite of this chance. Thus the IPW attack merely requires a theoretical account for the chance of missingness which can be fitted and implemented in many statistical bundles ( Hogan et al, 2004 ; van der Wal and Geskus, 2011 ) .

IPW, as described above, eliminates the potentially important prejudices of standard complete-case analyses in MAR information, but does non optimally exploit persons with losing response information. Augmented reverse chance weighted attacks ( Robins et Al, 1994 ; Tsiatis, 2006 ) are an extension of IPW which allow for greater usage of information from persons with uncomplete informations and, as a consequence, do non endure from every bit much loss of power as IPW may endure from. Augmented reverse chance weighting ( AIPW ) requires the specification of a 2nd theoretical account, but consistent calculators may be found if either of the theoretical accounts is right specified ( Robins et al. , 1994 ; Carpenter et al. , 2006 ; Tsiatis, 2006 ) .

The IPW attack restricts attending to persons with complete responses and achieves consistent calculators by burdening parts by the opposite of the chance of a single being complete.

As introduced by Flanders and Greenland ( 1991 ) and Zhao and Lipsitz ( 1992 ) , weighted methods are based on ascertained values. In this manner, after disregarding all the losing values from the analysis, the staying ascertained values are weighted in conformity with how their distribution approximates the full sample or population. The methods employ the weights in order to rectify for either standard mistakes associated with the parametric quantities or the population variableness. To deduce suited weights, the predicted chance of each response is estimated from the information from the variable with losing values. By and large, talking, burdening methods are a good option under certain fortunes, for illustration, when a losing information form is monotone or is under univariate analysis.

In the context of study information, Rubin ( 1987 )

discusses several methods for using and gauging weights. Under a suited articulation theoretical account for the result and covariates, these burdening methods are, in many cases, expected to bring forth consequences similar to those of multiple imputation ( Schafer and Graham, 2002 ) .

Survey trying techniques have strongly influenced losing information methodological analysis. It is important to understand study trying methodological analysis to derive insight into leaden gauging equations. The landmark paper by Horvitz and Thompson ( 1952 ) set the phase for selective design and losing information methodological analysis. Horvitz et Al. developed a general technique for bettering any statistic when a random sample with an unequal chance within subclasses of a finite population is selected. The Horvitz-Thompson calculator, defined as a leaden mean was originally intended to turn to server biased sampling. The statistic was restricted to descriptive statistics, such as the mean and discrepancy.

Horvitz et Al. developed a calculator under two instances. An indifferent line drive calculator and indifferent calculator of the trying discrepancy were developed for a one and two-stage trying the technique, where choice chances are defined a priori and used to choose a subsample from a finite population. Surveies that rely on these constructs were intended to increase preciseness in the presence of information loss. Although Horvitz et Al. were cognizant that this method reduces the discrepancy they did non turn to which calculator would give a lower limit or “optimal” discrepancy. As a consequence, assorted extensions were proposed over the following 50 old ages.

Prior to 1974 these burdening techniques were restricted to descriptive statistics. Kish and Frankel ( 1974 ) made a major part by widening the Horvitz-Thompson calculator to complex statistics and designs such as assurance intervals and illation for arrested development theoretical accounts. Kish et Al. besides felt that “traditional” study trying methods, such as the Horvitz-Thompson calculator, could be implemented outside of the kingdom of study trying design. Survey samples tend to hold larger sample sizes so the asymptotic consequences frequently hold. Sample size issues and asymptotics may present a job and demand to be addressed with other designs and types of losing information jobs due to limited sample size.

Manski and Lerman ( 1977 ) developed a leaden estimating equation for complete information which used the Horvitz-Thompson attack. Manski et Al. clearly defined a general model and statistical theoretical account ( gauging attack ) for arrested development theoretical accounts under choice-based sampling. This attack can do a loss of efficiency since it uses merely complete information, but efforts to derive efficiency by delegating larger weights to the complete pseudo-likelihood, accounting for uncomplete instances.

In an attempt to increase the efficiency of the estimations of arrested development coefficients, Robins, Rotnitzky, and Zhao ( 1994 ) developed a leaden gauging attack based on semiparametric methods and influence maps. Robins et Al. were the first to develop a semiparametrically efficient calculator for arrested development theoretical accounts with uncomplete covariates. These reverse chance weighted gauging equation ( IPWE ) methods were shown to hold desirable belongings and to be flexible adequate to manage MAR information under any type of arrested development job and losing by

design/happenstance. This instance of calculators is referred to as IPWE and has prompted many other research workers to prosecute the extension of Robins’ IPWE method.

A 3rd attack is based on the complete instances but now burdening them with the opposite of the chance that a instance is observed as introduced by Flanders and Greenland ( 1991 ) and Zhao and Lipsitz ( 1992 ) . In this manner, instances with a low chance to be observed gain more influence in the analysis and therefore stand for the likely losing values in the vicinity. One can look at this attack as an inexplicit imputation of losing values.

If this chance is unknown, which in general is the instance, it can be estimated for cases utilizing a non- or semiparametric technique, e.g. kernel-based denseness appraisal, splines, or categorization trees.