A novel Goodness of fit test for Multilevel Proportional Odds model
- Words: 2011
- Category: Model
- Pages: 8
Get Full Essay
Get access to this section to get all the help you need with your essay and educational goals.
Get AccessChapter 51
5. Application TO A REAL LIFE DATA SET
Throughout this chapter, the purpose was to place how the proposed goodness of fit trial works with the existent life informations set. It provides a good apprehension about the fresh trial and can be used to place the belongingss of the goodness of fit trial every bit good. Since this trial is developed for multilevel informations, a information set which gives a multilevel information construction was selected for this trial proof procedure. The method of set uping the information set harmonizing to the nature of the trial and besides the application of the fresh goodness of fit trial was clearly discussed through this chapter under the relevant bomber subjects.
5.1 Description about the informations set
The information set selected for the survey is comparatively big with 31,022 persons in 2280 schools. The information set consisting the consequence of GCSE scrutiny of pupils among 131 instruction governments.
The following table represents the description of the variables which are carried through this illustration.
Table 5.1: Description about the variables
Variable | Description about the variable | Type of the variable |
ID_LEA | ID for local instruction governments. | ID for the 2nd degree |
ID_I | ID for persons. | ID for the first degree |
AGCSE | Average GCSE mark of single centered at mean. | Continuous variable |
Gender |
Gender of the persons. Male=0 and Female=1 |
Categorical variable ( Male as basal class ) |
AGE_MONTHS |
Age in months, centered at 222 months or 18.5 old ages. |
Continuous variable |
Harmonizing to the above tabular array the information set consist of a multilevel nature where it is represented by two degrees. That is as the first degree the ID for persons and as the 2nd degree ID for Local Education Authorities was considered.
Under the survey the involvement was on ordinal categorical response and hence the uninterrupted variable GCSE is categorized into three classs based on percentiles. The classification is explained under the following bomber subject 5.1 which is Data readying. Besides the variables GENDER and AGE_MONTHS were selected as the explanatory variables.
5.1 Data Preparation
In order to integrate the ordinal categorical informations the variable GCSE was coded harmonizing to the percentiles values. Three ordered classs of the response were considered to suit the multilevel relative odds theoretical account. Those classs are constructed harmonizing to the undermentioned mode.
Harmonizing to the above the response variable is categorized in to three classs based on the percentile values and hence variable is categorized without any biasness and truth of the classification is improved.
Besides it is of import to observe that this information set consists with 131 bunchs and within a bunch the Numberss of observations are altering. However the bunch sizes which are well larger adequate to use the fresh trial and all the 131 bunchs are used to suit the theoretical account. The representation of the bunchs and bunch sizes can be tabulated as follows.
Table 5.2: Description of bunchs with regard to their bunch sizes
Cluster Identity | Cluster size | Cluster Identity | Cluster size | Cluster Identity | Cluster size | Cluster Identity | Cluster size |
1 | 41 | 7 | 75 | 13 | 112 | 19 | 220 |
2 | 144 | 8 | 10 | 14 | 33 | 20 | 181 |
3 | 50 | 9 | 96 | 15 | 357 | 21 | 182 |
4 | 22 | 10 | 91 | 16 | 77 | 22 | 67 |
5 | 128 | 11 | 35 | 17 | 91 | 23 | 246 |
6 | 35 | 12 | 78 | 18 | 224 | ( Continued ) | |
Cluster Identity | Cluster size | Cluster Identity | Cluster size | Cluster Identity | Cluster size | Cluster Identity | Cluster size |
24 | 113 | 54 | 225 | 84 | 314 | 114 | 395 |
25 | 155 | 55 | 171 | 85 | 64 | 115 | 510 |
26 | 95 | 56 | 65 | 86 | 519 | 116 | 931 |
27 | 173 | 57 | 149 | 87 | 65 | 117 | 74 |
28 | 111 | 58 | 153 | 88 | 423 | 118 | 969 |
29 | 51 | 59 | 289 | 89 | 102 | 119 | 916 |
30 | 233 | 60 | 328 | 90 | 190 | 120 | 376 |
31 | 151 | 61 | 104 | 91 | 127 | 121 | 408 |
32 | 155 | 62 | 333 | 92 | 109 | 122 | 329 |
33 | 121 | 63 | 470 | 93 | 217 | 123 | 251 |
34 | 703 | 64 | 232 | 94 | 85 | 124 | 661 |
35 | 197 | 65 | 69 | 95 | 185 | 125 | 520 |
36 | 201 | 66 | 274 | 96 | 150 | 126 | 265 |
37 | 57 | 67 | 131 | 97 | 670 | 127 | 321 |
38 | 201 | 68 | 55 | 98 | 53 | 128 | 432 |
39 | 138 | 69 | 119 | 99 | 96 | 129 | 762 |
40 | 149 | 70 | 200 | 100 | 432 | 130 | 391 |
41 | 18 | 71 | 235 | 101 | 203 | 131 | 442 |
42 | 232 | 72 | 103 | 102 | 75 | ||
43 | 147 | 73 | 94 | 103 | 466 | ||
44 | 313 | 74 | 28 | 104 | 68 | ||
45 | 323 | 75 | 67 | 105 | 278 | ||
46 | 256 | 76 | 119 | 106 | 64 | ||
47 | 121 | 77 | 115 | 107 | 704 | ||
48 | 317 | 78 | 95 | 108 | 469 | ||
49 | 160 | 79 | 220 | 109 | 802 | ||
50 | 53 | 80 | 71 | 110 | 241 | ||
51 | 49 | 81 | 108 | 111 | 364 | ||
52 | 207 | 82 | 490 | 112 | 592 | ||
53 | 92 | 83 | 152 | 113 | 791 | ||
Above tabular array 5.2 represents the figure of bunchs and the sizes of each and every bunch. All together there are 31,022 observations among 131 bunchs.
5.3 Model edifice
In order to suit the theoretical account to the selected information set the MLwiN 2.19 version was used in order to integrate the multilevel nature of the informations and to suit multilevel relative odds theoretical account. Theory behind the theoretical account and the theoretical account edifice procedure was clearly discussed under the Theory and methodological analysis chapter. In this subdivision the application of those mentioned methods will be discussed. Besides after initialising the theoretical account a relevant parametric quantity appraisal method should be given and consequently 1^{st}order PQL method was used in order to get the better of the convergence jobs occurred in gauging parametric quantities.
5.3.1 Variable choice
When suiting the theoretical account the variables which significantly make an impact on the response variable should be identified. For this the forward choice method was used with 5 % degree of significance. In order to take the of import variables to the theoretical account the Wald trial statistic was used instead than utilizing the Likelihood ratio trial statistic. The ground behind this is for distinct response multilevel theoretical accounts the likelihood trial is non available in MLwiN. Therefore Wald trial statistic was calculated to prove the significance of the single coefficients in the theoretical account and those values were compared with chi-square one grades of freedom. If the Wald statistic for a covariate is important the peculiar covariate should be included in the theoretical account.
Harmonizing to the above process, by get downing bit by bit from the smallest theoretical account that is model merely with the changeless term the best theoretical account was selected by adding variables one at a clip. As the first measure a variable which is traveling to be added to the theoretical account should be selected foremost. For this ab initio all the variables were added to pattern with changeless term and among them the most important variable was selected based on the p-values.
Then harmonizing to the p-values the variable which has the lowest p-value was selected as the variable which should be added foremost to pattern. The out puts obtained from under this can be tabulated as in the following bomber subject.
5.3.2 Parameter appraisals under forward choice process for multilevel relative odds theoretical account
As discussed under bomber subject 5.3.1 the forward choice method can be carried in the undermentioned mode and the resulted out puts are mentioned consequently.
Phase 1
Each variable was added individually and Wald statistic was calculated for each variable and so the p-value obtained for trial statistic was compared with 5 % ( 0.05 ) significance degree to look into the significance of variables.
Table 5.3: Out puts obtained under phase 1
Variable | Classs | Estimates-( venereal disease () ) |
Wald trial statistic |
P-value |
Gender |
Female ( Male-Reference ) |
-0.650 ( 0.022 ) | 872.934 | 7.5118e-192* |
AGE_MONTHS | – | -0.029 ( 0.003 ) | 93.444 | 4.1786e-022* |
*significance at 5 %
Harmonizing to the resulted values under phase 1 both the variables are important since the p-values are less than 0.05. Among the two variables GENDER is extremely important when compared with AGE_MONTHS. Therefore GENDER was selected as the first inclusion variable to the theoretical account with the changeless term merely. Harmonizing to the response of involvement which has three classs, the relative odds model gives two logits by sing 3rd class as the base degree. Now the random intercept multilevel relative odds fitted theoretical account after adding GENDER to the changeless merely theoretical account can be written as follows.
Where is the cumulative chance for each class, and stand for the observation index and the bunch ID severally.
Phase 2
As the following measure AGE_MONTHS variable was added to the theoretical account which was fitted under phase 1 and it was checked for significance with the other explanatory variable which is already exist in the theoretical account. The resulted values under this phase besides tabulated as follows.
Table 5.4: Out puts obtained under phase 2
Variable | Classs | Estimates-( venereal disease () ) |
Wald trial statistic |
P-value |
AGE_MONTHS | – | -0.029 ( 0.003 ) | 90.150 | 2.2077e-021* |
*significance at 5 %
After adding the AGE_MONTHS to the theoretical account with the GENDER variable the covariate AGE_MONTHS is important. Therefore the new theoretical account obtained under phase 2 can be formulated as follows.
0.648 ( 0.022 ) Gender
0.648 ( 0.022 ) Gender
Where is the cumulative chance for each class, and stand for the observation index and the bunch ID severally.
Final chief effects theoretical account
The concluding chief effects model consist both the covariates that have been selected at the 2nd phase. Therefore the multilevel relative theoretical account fitted by utilizing forward choice process is,
0.648 ( 0.022 ) Gender
0.648 ( 0.022 ) Gender
[ 5.1 ]
The is the cumulative chance of the mean GCSE mark of the pupil ( observation unit ) in the local instruction authorization ( bunch ) . Besides harmonizing to the principal of parsimoniousness, the chief effects theoretical account prefer over the interaction theoretical account due to the simpleness and the comprehendible than the interaction theoretical account.
5.4 Application of the fresh goodness of fit trial
Under this bomber topic the application of the proposed goodness of fit trial for the multilevel relative odds theoretical account will be discussed. For that the fitted chief effects theoretical account was applied to the selected information set under subdivision 5.3.
In order to the developed method foremost predicted chances of the fitted theoretical account were calculated and so the predicted mean tonss were calculated. The computation of the predicted mean tonss is mentioned under the theory and methodological analysis chapter. Now based on the predicted mean tonss observations of each bunch should be partitioned. The figure of divider used here is 10 and it is non a fixed value. But harmonizing to Hosmer and Lemeshow ( 1980 ) , 10 is the most celebrated value for the figure of groups. Then the 10 groups are such that first group contains the smallest values of the predicted mean tonss and the ten percent group contains the largest values of the predicted mean tonss.
By partitioning the information, the goodness of tantrum of the trial is conducted by making nine index variables for each bunch.
1if is in part g =
0 otherwise
Where is the predicted mean mark for the pupil in the bunch and .
Then to measure the theoretical account adequateness of the fitted theoretical account in equation 5.1, an alternate theoretical account is constructed by adding 10 index variables and theoretical account contains 9 index variables since first index variable was selected to be the basal class.
Harmonizing to the information set of the selected illustration most of the Numberss of observations within bunchs are non divisible by 10 in order to make index variables. To get the better of this job the method discussed by Abeysekara and Sooriyarachchi ( 2008 ) , was used where index variables can be defined as follows.
Here and is the index variable.
As the aim, it is interested to look into the adequateness of the fitted theoretical account the nothing and the alternate hypothesizes can be stated as,
Hydrogen_{O}: Fitted multilevel relative odds theoretical account is equal
Hydrogen_{1}: Fitted multilevel relative odds theoretical account is non equal
Harmonizing to the above hypothesis so to measure the theoretical account adequateness of the fitted chief effects model 5.1, the alternate theoretical account 5.2 is constructed by adding index variables as follows.
[ 5.2 ]
Where is the index variable for the group for observation in the bunch.
If the fitted theoretical account 5.1 is equal so,
Hydrogen_{O}: Fitted multilevel relative odds theoretical account is equal to the informationsis non rejected and it implies that the coefficients of all index variables, .
Harmonizing to the above, discussed hypothesis is tested by utilizing MLwiN package. Consequently the joint Wald trial statistic of the alternate theoretical account 5.2 was calculated in order to look into the undermentioned hypothesis.
That is all the coefficients of index variables are equal to zero.
At least on coefficient of index variable is non equal to zero.
The end product of the resulted articulation Wald trial statistic can be tabulated as follows.
Table 5.5: End product of joint Wald trial statistic
Joint Wald trial Statistic |
Degrees of freedom |
Chi-square value ( Tabulated at 5 % ) |
p-value |
11.365 | 9 | 16.9 | 0.25152 |
Harmonizing to the end product values of the above tabular array 5.5, the p-value of the joint Wald trial statistic is greater than 0.05 ( 5 % degree significance ) proposing non rejecting the void hypothesis that is at 5 % degree of significance. On the other manus this is clearly shown by the Wald trial statistic value every bit good. Because it is less than the chi-square value, with 5 % degree of significance and 9 grades of freedom ( 11.365 & A ; lt ; 16.9 ) . This consequence concludes that the alternate theoretical account that is model with index variables is non equal to the information. It gives the fitted chief effects model under 5.1 is good fitted to the informations.
Finally all the above suggest the fresh goodness of fit trial fitted for multilevel relative odds theoretical account is equal to the selected illustration informations set.
5.5 Outline
Throughout this chapter the application of the proposed goodness of fit trial to the multilevel relative odds theoretical account was discussed. There a multilevel relative odds theoretical account was fitted to a existent life informations set and so the fresh trial was applied in order to measure the adequateness of the fitted theoretical account. Harmonizing to the consequences obtained through this application, it was suggested that the fresh goodness of fit trial works with the existent life informations every bit good.
The following chapter will transport out the general treatment of all the findings and the decisions obtained throughout this research survey. Finally it presents the restrictions and suggestions for the farther surveies.
A fresh Goodness of fit trial for Multilevel Proportional Odds theoretical account