09 Test – Flashcard
Unlock all answers in this set
Unlock answersquestion
An examiner administers and scores the same test numerous times without deviating from the procedure in order to reduce the possibility of measurement error. This exemplifies what?
answer
Standardization
question
The scores of a representative population sample on a test that an examiner compares an individual's scores to are referred to as ________; while they allow for comparisons on a person's performance on different tests, they do not provide the ultimate standard of performance.
answer
Norms
question
A psychological test that is regarded ________ is administered, scored, and interpreted independent of the subjective judgment of the examiner.
answer
Objective
question
The SAT and GRE are examples of ________ tests, as they provide information about a person's best possible performance, while the MMPI-2 and PAI are ________ tests, providing information about a person's usual experience.
answer
Maximum performance; typical performance
question
________ tests asses the difficulty level an examinee can attain (e.g., Information from WAIS), ________ tests asses the person's response rate (e.g., Digit Symbol from WAIS), and ________ tests help determine whether an individual can attain a certain level of acceptable performance (e.g., test of reading skills).
answer
Power; speed; mastery
question
A ________ occurs when an instrument cannot take on a value higher than some limit due to the measure not including enough difficult items, resulting in all high-achieving examinees getting similar scores (test is too easy); conversely, a ________ occurs when an instrument cannot take on a lower value and thus all low-achieving examinees get similar scores (test is too hard).
answer
Ceiling effect; floor effect
question
In contrast to normative measures, these types of measures use individuals themselves as their own frame of reference, comparing 2 or more desirable options and choosing the one that is most preferred.
answer
Ipsative measures
question
________ is the consistency of a test, or the degree to which a test provides the same results under the same conditions; ________ refers to the degree that a test measures what it claims to be measuring.
answer
Reliability; validity
question
A perfectly reliable test would yield every examinees' ________ every time it was administered, as this would indicate the examinees' actual ability on whatever the test is measuring; however, a test is never perfectly reliable due to ________, which is random and can be caused by environmental noise, examinee's mood on testing day, and any other number of factors.
answer
True score; measurement error
question
The most commonly used methods of estimating reliability of a test use a correlation coefficient, referred to as the ________, ranging in value from 0.0 to +1.0, where coefficients closer to 0.0 indicate less reliability and values closer to +1.0 indicate increasing reliability; the coefficient is not squared to determine the proportion of variability, unlike other correlation coefficients, rather it is interpreted directly.
answer
Reliability coefficient
question
A researcher administers the same instrument to the same group of college students on 2 separate occasions; following the second administration, the researcher correlates on the first and second administrations. What type of reliability is the researcher attempting to obtain?
answer
Test-retest reliability (or "coefficient of stability")
question
True or False: It is not recommended to use the test-retest coefficient when attempting to obtain reliability for a test that measures attributes that are unstable (e.g., mood)?
answer
True- low coefficients, in such cases, would likely be more a reflection of the attribute's unreliability rather than the test's unreliability
question
A researcher administers one form of a test on one day, then administers an equivalent form to the same group of people at a later date/time. What type of reliability is being sought in this example?
answer
Alternate forms reliability (or "coefficient of equivalence;" parallel-forms reliability)
question
When correlations are obtained among individual test items, ________ reliability is being assessed; the 3 methods for obtaining this reliability include ________ (involves dividing test into 2 parts then correlating responses from the 2 parts), ________ (used when test items are dichotomously scored- e.g., "true/false"), and ________ (used for tests with multiple-scored items- e.g., "never/rarely/sometimes/always").
answer
Internal consistency (or "coefficient of internal consistency"); split-half; Kuder-Richardson Formula 20; Cronbach's coefficient alpha
question
While the split-half reliability coefficient usually lowers the reliability coefficient artificially, the ________ can be used to correct for the effects of shortening the measure.
answer
Spearman-Brown formula
question
What type of tests are measures of internal consistency not good at assessing reliability for?
answer
Speed tests, as the correlation would be spuriously inflated
question
Instruments that rely on rater judgments would be best to have high ________ reliability, which is increased when scoring categories are ________ and ________.
answer
Inter-rater (interscorer); mutually exclusive (a particular behavior belongs to a single category); exhaustive (categories cover al possible responses/behaviors)
question
The ________ estimates the amount of error to be expected in an individual test score and is used to determine a range, referred to as a/an ________, within which an examinee's true score will likely fall.
answer
Standard Error of Measurement; confidence interval
question
What is the formula for the standard error of the measurement?
answer
SEmeas = SDx (standard deviation of test scores) / rxx (reliability coefficient)
question
What is the probability that a person's true score lies within a range of plus or minus 1 standard error of measurement (SEM) of their obtained score? How about plus or minus 1.96 (2) SEM? And finally, plus or minus 2.58 (2.5) SEM?
answer
68% of the time; 95% of the time; 99% of the time
question
True or False: Hypothetically, a test with a reliability coefficient of +1.0 would have a standard error of measurement of 0.0?
answer
True- a test with perfect reliability will have no error
question
The standard error of measurement is ________ related to the reliability coefficient (rxx) and ________ related to the standard deviation of test scores (SDx).
answer
Inversely; positively
question
What reliability coefficient, when practical, is the best to use?
answer
Alternate-forms
question
Classical test theory states that an observed score reflects ________ plus ________.
answer
True score variance; random error variance
question
Methods of recording behaviors include ________ recording (elapsed time that behavior occurs is recorded), ________ recording (number of times behavior occurs is recorded), ________ recording (rater notes whether subject engages in behavior during given time period), and ________ recording (all behavior during an observation session is recorded).
answer
Duration; frequency; interval; continuous
question
Simply put, ________ refers to the degree a test measures what it purports to measure.
answer
Validity
question
A depression scale that only assesses the affective aspects of depression but fails to account for the behavioral aspects would be lacking what type of validity?
answer
Content validity, which refers to the extent to which test items represent all facets of the content area being measured (e.g., EPPP)
question
True or False: Content validity assessment requires a degree of agreement between experts in the subject matter, thus it includes an element of subjectivity?
answer
True- in addition, tests should correlate highly with other tests that measure the same content domain
question
In contrast to content validity, ________ occurs when a test appears to valid by examinees, administrators, and other untrained observers; it is not technically a type of test validity.
answer
Face validity
question
A personality test that effectively predicts the future behavior of an examinee has what type validity?
answer
Criterion-related validity, which is obtained by correlating scores on a predictor test to some external criterion (e.g., academic achievement, job performance)
question
Criterion-related validity is assessed using a/an ________ to determine the relationship between the predictor and the criterion; for interpretation this value can be squared, producing the "________," which indicates the proportion of variability in the criterion that is explained by variability in the predictor.
answer
Correlation coefficient; coefficient of determination
question
The process of ________ validation involves the predictor and the criterion being collected at the same time, providing information regarding a test's usefulness for predicting a given current behavior; ________ validation involves a waiting period between collection of predictor scores and criterion data, providing information regarding a test's usefulness for predicting future behavior.
answer
Concurrent; predictive
question
When interpreting a person's predicted score on a given criterion measure, the ________ will determine within what range of scores their actual score will likely fall.
answer
Standard Error of Estimate
question
The standard error of measurement constructs a confidence interval around an examinee's ________ score (using a reliability coefficient), while the standard error of estimate does the same for an examinee's ________ score (using a validity coefficient).
answer
Obtained; predicted
question
Interviewees are given an aptitude test (predictor) to predict work success (criterion), with hiring contingent on achieving a certain minimum score, called a/an ________ score. The manager then rates performance on work tasks, an indication of success, and only those who score above a certain ________ are deemed successful.
answer
Predictor cutoff; criterion cutoff
question
Scoring above both the predictor and criterion cutoff points produces ________; scoring above the predictor cutoff point but below the criterion cutoff point produces ________; scoring below the predictor cutoff point but above the criterion cutoff point produces; and scoring below both the predictor and criterion cutoff points produces ________.
answer
True positives (valid acceptances); false positives (false acceptances); false negatives (invalid rejections); true negatives (valid rejections)
question
Some factors contributing to a low validity coefficient include the validation group being ________ or the predictor and/or criterion being ________.
answer
Homogenous; unreliable
question
When a test has a different validity coefficient for one group compared to another, the variables affecting validity are called ________ variables; when this is the case, the test is said to have ________.
answer
Moderator; differential validity
question
This is the process whereby an already validated test is re-validated with a different sample of people than the original validation sample.
answer
Cross-validation
question
What term is used to describe the reduction that occurs in a criterion-related validity coefficient after cross-validation?
answer
Shrinkage
question
The greatest shrinkage occurs when the original validation sample is ________, the original item pools is ________, the number of items retained is ________ relative to the items in the item pool, and/or item are not chosen based on ________ or ________.
answer
Small; large; small; previously formulated hypothesis; experience with the criterion
question
________ is one way a predictor might end up looking more valid than it actually is, which occurs when predictor scores themselves influence any person's criterion status (e.g., manager is aware that factory worker did well on predictor, this knowledge positively influences manager's ratings on criterion performance).
answer
Criterion contamination
question
How is criterion contamination prevented?
answer
Criterion raters should have no prior knowledge of examinees' predictor scores
question
Theorized psychological variables (e.g., personality, intelligence) that are abstract and not directly observable are referred to as ________, hence ________ provides an indication of the degree to which an instrument measures or correlates with such variables.
answer
Construct; construct validity
question
A newly developed test of personality has a high correlation with the MMPI-2 and a low correlation with the Wechsler Memory Scale, indicating the test has both ________ validity and ________ validity, respectively.
answer
Convergent; discriminant/divergent - both are forms of construct validity
question
True or False: The only time a low correlation coefficient provides evidence of high validity is when discriminant validity is indicated due to there being a low correlation between 2 tests that measure different constructs?
answer
True- in all other cases, high validity is indicated by a high correlation coefficient
question
What complex procedure for assessing convergent and discriminant validity requires the assessment of 2 or more traits (e.g., personality, depression) by 2 or more methods (e.g., self-report, peer rating)?
answer
Multitrait-multimethod matrix
question
When using the multitrait-multimethod matrix, ________ validity is indicated when tests that measure the same traits are highly correlated, even when different methods of measurement are used; conversely, ________ validity is indicated when tests that measure different constructs are minimally correlated, even when the same method of measurement.
answer
Convergent; discriminant
question
The ________ coefficient is a reliability coefficient, as it indicates the correlation between itself and the measure; correlations between two measures that measure the same trait using different methods are called ________ coefficients; correlations between two measures that measure different traits using the same method are called ________ coefficients; and correlations between 2 measures that measure different traits using different methods are called ________ coefficients.
answer
Monotrait-monomethod; monotrait-heteromethod; heterotrait-monomethod; heterotrait-heteromethod
question
When assessing validity using the multitrait-multimethod matrix, convergent validity is indicated when there is a high ________ correlation, while discriminant validity is indicated by a low ________ correlation and further confirmed by a ________ heterotrait-heteromethod correlation.
answer
Monotrait-heteromethod; heterotrait-monomethod; low
question
________, often used to assess the construct validity of a test or tests, involves reducing a larger set of variables into fewer classified sets of variables based on the construct that is primarily "picked-up" by each measure; each variable is correlated with every other variable, creating a ________.
answer
Factor analysis; factor matrix
question
The main purpose of factor analysis is to reveal how many and to what degree underlying constructs, also called ________ due to the fact that the analysis does not directly intend to measure them, can account for scores on a larger number of tests.
answer
Latent variables
question
In a hypothetical factor analysis, the factor matrix indicates a correlation coefficient of .68 between the depression subscale of the MMPI-2 and Factor II. What term is used to describe the correlation between the depression subscale and Factor II?
answer
Factor loading, which refers to the correlation between a given test and a given factor (e.g., the depression subscale loads .68 on Factor II); it can be square to determine proportion of variability
question
________ determines the proportion of variance of a test that is attributable to the factors; it is the sum of squared factor loadings.
answer
Communality (h-squared) - not the case when oblique rotation is used
question
The amount of variability in a test that can be explained by whatever traits are represented by the factors is referred to as ________, while variance that is specific to the test and not explained by the factors is referred to as ________.
answer
Common variance (represents communality); unique variance (represents specificity)
question
In a factor analysis, these values indicate the amount of variance in all the tests accounted for by the factor; they are analyzed to determine whether or not the factor is accounting for a significant amount of variability in the tests.
answer
Eigenvalues (or explained variance)
question
If a factor analysis is performed on 8 tests, what is the largest the sum of the eigenvalues can be?
answer
Since the sum of the eigenvalues can be no larger than the number of tests included in the factor analysis, the answer is 8
question
A procedure that facilitates factor matrix interpretation is ________, which involves re-dividing the test's communalities so that a clearer pattern of loadings emerges.
answer
Rotation
question
Two general rotation strategies include ________ for factors that are uncorrelated (independent of each other) and ________ for correlated factors; the decision as to which one is used is based on the researcher's theoretical assumptions.
answer
Orthogonal; oblique
question
When construct validity is being assessed using factor analysis, a high correlation between a test and a factor the test is expected to correlate highly with is referred to as what?
answer
Factorial validity
question
While factor analysis assumes variance in a variable is composed of ________, ________, and ________, principle components analysis assumes variance is composed of ________ and ________.
answer
Communality; specificity; error; explained variance; error variance
question
"Factor" is to factor analysis as ________ or ________ is to principal components analysis.
answer
Principal component; eigenvector
question
What method might a researcher who is interested in developing a taxonomy (classification system) of different personality characteristics use?
answer
Cluster analysis
question
In ________ analysis, only interval and ratio data can be used and researchers typically have an a priori hypothesis about what traits a set of variables measure; by contrast, ________ can be performed using any type of data (interval, ration, nominal, ordinal) and is not designed for studies where the researcher has an a priori hypothesis.
answer
Factor analysis; cluster analysis
question
True or False: A reliable test is not always a valid test, though a valid test must be a reliable test?
answer
True- reliability is a necessary but not sufficient condition for validity
question
The ________ coefficient is less than or equal to the square root of the ________ coefficient; it cannot be any higher, thus the latter sets a ________ on the former.
answer
Validity; reliability; ceiling (or upper-limit)
question
A researcher discovers a test has low reliability; however, she is interested in what the validity coefficient of the predictor would be if both the predictor and the criterion were perfectly reliable. What formula would she use?
answer
Correction for attenuation
question
What is the correlation between the factors in a factor analysis where an orthogonal rotation is used?
answer
By definition, the correlation would be 0.0
question
What is used to determine which test items will be retained for the final version of a test and to ensure that a test is both reliable and valid from the start?
answer
Item analysis
question
The ________ the p-value, the ________ the item.
answer
Higher; less difficult OR lower; more difficult
question
The percentage of examinees that answer an item correctly is referred to as a/an ________, which is abbreviated ________; most test developers prefer items with a ________ value at or around ________.
answer
Item difficulty index; p; p; .50
question
The rule-of-thumb for item difficulty on a test is that the optimal difficulty level of test items should be approximately halfway between 1.0 (i.e., everyone is correct) and the level of success expected by chance alone. That known, what is the optimal item difficulty level of a multiple choice test with 4 options (e.g., EPPP)?
answer
p = .625, which means there is a 62.5% chance of guessing the correct answer to an item
question
According to Anastasi, the p-level expresses item difficulty in terms of an ________ scale, as conclusions cannot be made about the differences in difficulty between items, only that certain items are easier/harder than others.
answer
Ordinal (difficulty level are rankings, according to Anastasi)
question
The degree to which a test item differentiates among test-takers in terms of the behavior the test is designed to measure is called ________ and can be assessed by calculating a/an ________, which is abbreviated as "________."
answer
Item discrimination; item discrimination index; D
question
An item on a measure of anxiety would have good ________ if low-anxiety examinees consistently answered it differently than high-anxiety examinees.
answer
Discriminability (item discrimination)
question
An item's ________ level places a ceiling on its ________ index; higher levels of discriminability are associated with ________ levels of difficulty.
answer
Difficulty; discrimination; moderate
question
True or False: The reliability of a test will decrease as the mean discrimination index (D) increases?
answer
False- there is a direct correlation between test reliability and mean D
question
A graphical depiction of both item difficulty and item discrimination is called a/an ________; analysis based on ________ is derived from these.
answer
Item characteristic curve (ICC); item response theory
question
An item characteristic curve identifies 3 ________, including item difficulty, item discrimination, and ________.
answer
The probability that a question can be answered correctly by guessing; this is indicated on a chart by the point at which the curve crosses the y-axis)
question
Item response theory assumes (1) performance on an item is related to the estimated amount of a/an ________ being measured by the item, and (2) ________ (an item should have the same characteristics regardless of the sample of people taking the test).
answer
Latent trait; invariance of item parameters
question
The computerized selection of test items for individual examinees is referred to as what?
answer
Computer adaptive assessment
question
What item difficulty level is associated with the maximum level of differentiation among examinees?
answer
.50, indicating half answered correctly and half answered incorrectly
question
What factor most affects an item's difficulty level?
answer
Characteristics of examinees
question
What type of interpretation indicates where the examinee stands in relation to others who have taken the same test?
answer
Norm-referenced interpretation
question
Providing a general indication as to the progression a person has made along the normal developmental path, ________ norms include ________ and ________.
answer
Developmental; mental age; grade equivalent scores
question
What is the calculation for ratio IQ?
answer
(mental age/chronological age) x 100
question
A 20-year-old performs as well on a test as the average 10-year-old. His mental age is ________ and his ratio IQ is ________.
answer
10-years-old; 50
question
Indicating the grade level a person's performance is equivalent to, ________ are typically used in the interpretation of educational achievement tests.
answer
Grade equivalent scores (e.g., Wide Range Achievement Test, 4th Ed [WRAT-4])
question
True or False: When using developmental norms, scores obtained by people of different age groups are not comparable?
answer
True- this is due to the fact that standard deviation is not accounted for
question
Including percentile ranks and standard scores, ________ norms compare examinee scores to those of the most nearly comparable standardization sample.
answer
Within-group
question
Z-scores, t-scores, stanine scores, and deviation IQ scores are all examples of ________, which express a raw score's distance from the mean in terms of standard deviation.
answer
Standard scores
question
Identify the mean (M) and standard deviation (sd) of: z-scores, t-scores, stanine scores, and deviation IQ scores.
answer
Z-score (M = 0, sd = 1), t-score (M = 50, sd = 10), stanine (M = 5, sd = about 2), deviation IQ (M = 100, sd = 15)