Free Essay

Miss

In:

Submitted By Favourtome
Words 8479
Pages 34
HEALTH EDUCATION RESEARCH Theory & Practice

Vol.21 (Supplement 1) 2006 Pages i19–i32 Advance Access publication 31 July 2006

Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach
Mark Wilson*, Diane D. Allen and Jun Corser Li
Abstract
This paper compares the approach and resultant outcomes of item response models (IRMs) and classical test theory (CTT). First, it reviews basic ideas of CTT, and compares them to the ideas about using IRMs introduced in an earlier paper. It then applies a comparison scheme based on the AERA/APA/NCME ‘Standards for Educational and Psychological Tests’ to compare the two approaches under three general headings: (i) choosing a model; (ii) evidence for reliability—incorporating reliability coefficients and measurement error—and (iii) evidence for validity—including evidence based on instrument content, response processes, internal structure, other variables and consequences. An example analysis of a self-efficacy (SE) scale for exercise is used to illustrate these comparisons. The investigation found that there were (i) aspects of the techniques and outcomes that were similar between the two approaches, (ii) aspects where the item response modeling approach contributes to instrument construction and evaluation beyond the classical approach and (iii) aspects of the analysis where the measurement models had little to do with the analysis or outcomes. There were no aspects where the classical approach contributed to instrument construction or evaluation beyond what could be done with the IRM approach. Finally, properties of the SE scale are summarized and recommendations made.

Introduction
Item response models (IRMs) underlie many of the advances in contemporary measurement of the behavioral sciences, including assessment of the information provided by a particular item, criterion referenced assessment, computerized adaptive testing and item banking. Making full use of such advances requires knowledge of IRM that few yet possess. Comparison with the more well-known procedures associated with classical test theory (CTT) should help to situate new concepts of IRM and understand how each approach might contribute to measurement. In this paper, an analysis is conducted of a self-efficacy (SE) scale for exercise comparing IRM and CTT to illustrate the differences and similarities between these approaches. First, some basic ideas of the CTT approach are reviewed, to compare with the IRM ideas introduced in another paper in this volume [1]. A comparison scheme is described based on the AERA/APA/NCME ‘Standards for Educational and Psychological Tests’ [2] and used to compare the two approaches under three general headings: (i) choosing a model, (ii) evidence for reliability and (iii) evidence for validity. The paper comments on the characteristics of the validity evidence that seem to be presented for instruments like the SE scale. Finally, the paper

Graduate School of Education, University of California, Berkeley, CA 94720, USA *Correspondence to: M. Wilson. E-mail: markw@berkeley.edu

Ó 2006 The Author(s).

doi:10.1093/her/cyl053

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

M. Wilson et al. ends with a summary of the recommendations one might make for the SE scale following the item response modeling approach. In contrast, IRM uses item responses to create a linear (logit) scale that represents ‘less’ to ‘more’ of a characteristic or latent variable like a particular ability, trait or attitude, to name a few possibilities. Because of this linearity, the relationship between respondent location and item location on the scale of that latent variable can be compared directly [7]. [We assume that the reader has already read an introduction to item response modeling, such as in Wilson et al. [1, 8] (this volume), or is otherwise familiar with it.] The location of an item is modeled to be independent of the locations of the respondents in the sense that any respondent in the group, at any location, has an estimated probability of endorsing that item. Item responses on instruments are used to estimate item locations and standard errors and respondent locations and standard errors. The advantage of locating items on the same scale as respondents is not cost free: IRM requires that items perform in ways that conform to certain assumptions (i.e. they must have reasonably good ‘fit’). The resulting scale can be interpreted as indicating the probability that a particular respondent with a particular estimated location will endorse a given item. With these conceptual differences in mind, a more direct comparison of the standard methods used in the classical approach and item response modeling becomes possible. In a classical analysis, the investigator uses raw scores to compute statistics such as means, variances, reliability coefficients, item discrimination measures, point-biserial statistics for item categories, total scores and errors of measurement for the instrument as a whole. In an item response analysis, the investigator uses raw scores to estimate item and respondent locations plus all standard errors. These parameters can be used to calculate the equivalent of the statistics mentioned from classical analysis, as well as additional ones such as the variation in standard errors and thus, the information available across the range of the latent variable. The focus on items in the item response analysis also allows a more extensive assessment of the functioning of response choices within items—the item categories—and the coverage of content the instrument is supposed to measure.

Basic differences between IRM and CTT
Both IRM and CTT are used to (i) help develop instruments and (ii) check on their reliability and validity. The CTT approach is by far the most widely known measurement approach, and, in many areas, is the most widely used for instrument development and quality control. In the classical approach, a particular instrument establishes its respondents’ ‘amount’ of a particular characteristic (i.e. ability or attitude in educational tests or psychological instruments) based on their raw score across all the items on the instrument. Instrument developers using the classical approach assume that the observed score (X) obtained is composed of the true score (T), representing the true assessment of this characteristic on this particular instrument, plus an overall error (E) [3]: X = T + E: In theory, the error or noise represents the variability if each respondent took the instrument many times without remembering previous trials or changing in the characteristic measured [4, 5]. The X and T are both indicators of the interaction between the instrument and the respondents’ characteristic; neither presumes to indicate an amount of characteristic directly. The difficulty of the instrument depends on the amount of the characteristic of the respondents who take it, while the amount of the characteristic of the respondents is assessed by their performance on the instrument [6]. The inherent confounding between instrument and sample makes comparisons between different instruments related to the same characteristic and between groups of respondents having different amounts of that characteristic a challenge, although, if we stick to just one instrument form, then the situation is not so complex. In addition, treating the raw scores as if they were linear measures with the same standard error throughout their range biases statistical methods based on that assumption [7]. i20

Comparison of IRM with the classical test theory approach

The data and instrument
A specific instrument and data set will be used to illustrate the similarities and differences between the two approaches. The data were provided by the Behavior Change Consortium (BCC) [9], which collected data from 15 different projects explicitly studying major theoretical approaches to behavior change and the interventions related to them. The different projects investigated mediators (or mechanisms) of behavioral change interventions directed at tobacco use, sedentary lifestyle and poor diet. A common hypothesized mediator for change in many of the studies was self-efficacy. Our analyses included only baseline data regarding self-efficacy; no post-intervention data from the BCC were included for this paper. Self-efficacy is defined as ‘a specific belief in one’s ability to perform a particular behavior’ [10, p. 396]. One instrument developed to measure selfefficacy for exercise, the SE scale, consists of 14 items that express the certainty the respondent has that he or she could exercise under various adverse conditions (see Table I). Items reflect ‘potentially conflictual situations’ based on ‘information gained

Table I. SE scale for exercise Item number 1 2 3 4 5 6 7 8 9 10 11 12 Items: ‘I could exercise ...’ when tired when feeling anxious when feeling depressed during bad weather during or following a personal crisis when slightly sore from the last time I exercised ... when on vacation ... when there are competing interests (like my favorite TV show) ... when I have a lot of work to do ... when I haven’t reached my exercise goals ... when I don’t receive support from family or friends ... following complete recovery from an illness which has caused me to stop exercising for a week or longer ... when I have no one to exercise with ... when my schedule is hectic ... ... ... ... ... ...

13 14

from previous research with similar populations in which relapse situations had been identified’ [10, p. 401]. Respondents rate each item 0, 10, 20, 30 and on up to 100% in 10% increments (resulting in 11 categories of responses) from 0% indicating ‘I cannot do it at all’ to 100% indicating ‘certain that I can do it’. The instrument authors used summary scores that averaged the responses across items if at least 13 items were completed [10]. Two of the projects used the same SE scale for exercise [10] to assess the amount of self-efficacy subjects said they had, and related that to changes in the dependent variables surrounding exercise program persistence. Investigators for both projects postulated self-efficacy as a hypothesized mediating variable, speculating that people who felt confident that they could maintain an exercise program through various adverse conditions would be more likely to reap the health benefits of increased physical activity [11]. The original article describing the SE scale [10] reported a typical summary of the information from a CTT analysis: the total score at baseline averaged 74.3% confident with a standard deviation of 16.72. The internal consistency of the scale was 0.90, and the test–retest correlation was 0.67 (n = 62, p < 0.001), although the time period between tests was not explicitly stated. Baseline SE scale scores were positively correlated with adherence to an exercise program in both the first and last 6-month periods during the 1-year assessment (r = 0.42 and 0.44, respectively), and added significantly to the model of adherence in a multiple regression analysis in which self-motivation was insignificant [10]. Note that the wording of certain items (Items 6 and 13) for data gathering in one project used ‘exercise’ in the text (this is what is shown in Table I), as opposed to the term ‘physically active’ which was used in the gathering of data in the second project. For the purposes of this paper, we assumed that this word change did not affect the results. The classical and item response analyses for this paper were performed using the software package ConQuest [12]. The majority of the item response analyses was described in another paper in this volume [1]. Any extensions are described when i21

M. Wilson et al. they are discussed below. The classical analyses are quite standard, and are no different from those outlined in a standard text such as that of Cronbach [4]. Details necessary for interpreting the results are discussed below. (iii) evidence for validity [13, Chapter 8], including (a) evidence based on instrument content, (b) evidence based on response processes, (c) evidence based on internal structure, (d) evidence based on other variables and (e) evidence based on the consequences of using a particular instrument. Commonly used alternative terms for (a) through (e) include content validity, evidence based on cognitive interviews or think alouds, construct validity, external or criterion validity and consequential validity. The results of the comparisons are summarized in Table II.

Results: comparing analyses
The scheme for comparing the two approaches utilizes psychometric concepts of model fit and common aspects of reliability and validity to perform a parallel assessment of the SE scale. The particular aspects of reliability and validity incorporated are based on the terminology and scheme proposed in the latest edition of Standards for Educational and Psychological Tests [2]. The scheme has been described in detail by Wilson [13]: (i) choosing a model [13, Chapter 6] (ii) evidence for reliability and measurement error [13, Chapter 7]

Choosing a model
The classical true score model, X = T + E, is not one that can be formally rejected since it is one equation with two unknowns. Hence, there is no step of ‘model choice’ in the classical approach. To

Table II. Standards framework for comparing the two approaches Comparison framework Choosing a model CTT approach The same model is always ‘chosen’ X=T+E Item response modeling approach ‘Fit’ of persons and items to specific model can be calculated and evaluated—may be informative Alternative models can be compared, to explore measurement implications MML reliability = 0.92 Varies by raw score, see Fig. 2 Will be important when the shape of the curve has measurement consequences (see text for examples) Modeling item endorsability may influence choice of items and response categories to span the range of levels needed to cover content Not relevant (yet) Wright maps provide a basic tool Mean for respondents in each category can be used for item analysis DIF can be addressed directly as an item parameter

Evidence for reliability Reliability coefficients Standard error of measurement

Cronbach’s a = 0.91 Constant value = 7.66

Evidence for validity Based on instrument content

No contribution

Based on response processes Based on internal structure

Based on other variables Based on the consequences of using an instrument

Not relevant No equivalent to Wright map Item discrimination index and point-biserial correlation can be used for item analysis DIF not available in CTT (although it could be addressed using other methods) Same Same

Same Same

i22

Comparison of IRM with the classical test theory approach illustrate the steps one would take to choose a model in the case of the IRM approach; one can first examine the section of Wilson et al. [1] labeled ‘fit statistics’, where the fit of both items and persons to the partial credit model was examined for the SE scale. The details of that will not be repeated here, but, in summary, note that (i) all the items appeared to be fitting reasonably well and (ii) the information about person fit can be used to flag individuals for whom an estimated location was perhaps not sufficient to convey an adequate summary of their full set of responses. Note that in the classical context, no results address how well the items are represented, partly, at least, because formally there are no items in the CTT model itself. Of course, under the rubric of ‘item analysis’, several characteristics of items are indeed examined, and one could argue that these constitute a sort of ‘model fit’ (more on this later). Finding evidence for misfitting items can help the instrument designer understand the essential nature of the set of items (e.g. lack of unidimensionality) that are defining a latent variable. Misfitting respondents can alert the measurer to alternative points of view or response styles that were not considered in the design of the original instrument. A different type of fit issue can be addressed by asking if an alternative model would have done a better job. For example, the partial credit model (used above) allows each item to have a different pattern of relative differences in endorsability when making the transition from one category to the next (the ‘item step parameters’ as defined in Wilson et al. [1]). A more parsimonious model, called the ‘rating scale model’ [14], allows each item to differ in its overall endorsability, but constrains the relative endorsability of the steps to be the same across all the items. Some researchers find that this is a reasonable model to consider as an alternative as the labels of the response alternatives for the SE scale items are identical across items [14]. The different item stems may not interact with the way the respondents interpret the percentages, hence, one might expect that the parameters that govern the relative endorsability of, say, 80 versus 90% might be approximately equal across items. Two ways to examine the difference in fit of the two models are (i) to compare them using a likelihood ratio test (possible because the rating scale model is a constrained version of the partial credit model) and/or (ii) to compare them using the same fit indices as were used in the previous paragraph. The likelihood ratio test can be calculated by finding the difference between the deviances (i.e. twice the loglikelihood) for the two models (provided by the ConQuest computer program). In this case, it turns out to be 336.23 (df = 117), and when this is compared with the critical value for a v2 distribution at a = 0.0001 (which is 182.61), we can see that the difference is indeed highly statistically significant. The same conclusion is drawn upon noting that the fit statistics for the step parameters, which were found to be innocuous for the partial credit model, are all above the usual maximum for the rating scale model. The effect size for this can be gauged by looking at the pattern of thresholds in a graph showing the locations of respondents and item thresholds, the Wright map shown in Fig. 1 (this was introduced in Wilson et al. [1]). Note that there are considerable differences in the relative location of the thresholds across items—e.g. between Thresholds 9 and 10 in the columns for Items 1 and 13. These differences could imply quite considerable interpretational differences between scales resulting from the two models. Hence, we can conclude that the use of a uniform set of options (‘10%’, ‘20%’, etc.) for the SE scale has not resulted in a uniform pattern of responses from the respondents.

Evidence for reliability
The consistency with which respondents are measured is commonly reported in two ways: (i) using a reliability coefficient, which attempts to give an overall indication of how consistent the respondents’ responses are and (ii) the standard error of measurement, which attempts to indicate the amount of variation one might expect, given a certain pattern of responses across the items. Under the classical approach, the internal consistency reliability coefficient most commonly used for polytomous data is Cronbach’s a [4]. In this case, it i23

M. Wilson et al.

Fig. 1. Wright map of item thresholds for SE scale analyzed polytomously (each ‘X’ represents 3.7 cases).

i24

Comparison of IRM with the classical test theory approach was calculated to be 0.91. For the item response approach, an equivalent coefficient can be calculated as a by-product of the marginal maximum likelihood (MML) estimation algorithm, and turns out in this case to be a very similar 0.92. This reliability value can be used to predict the effect on reliability of reducing or increasing the number of items using the Spearman–Brown formula [4]. For the SE scale, starting from r = 0.92, the reliability would be predicted to be reduced to 0.91 by deleting one item, to 0.89 by using only 10 items and down to 0.85 using just half the items. The classical standard error of measurement is calculated as a function of the reliability coefficient and the standard deviation of the raw scores. It is, by assumption, a constant, not varying for different scores. In this case, it turns out to be 7.66 (in raw score units). In the item response modeling approach, the relationship between the estimated location and the standard error of measurement is not a constant, but varies with the location of the respondent (thus it is also called the ‘conditional’ standard error of measurement). This relationship results from the proximity of the respondent’s location and the item parameters (usually, there are more item parameters estimated toward the middle, hence, the relationship is usually ‘U’shaped). The specific relationship for the SE scale data is shown in Fig. 2. Because of the non-linear relationship between the raw score metric and the logit metric, it is not straightforward to compare the values of these two standard errors of measurement. Of greater import is the shape of the relationship between the standard error of measurement and location, which can be informative in different measurement contexts. For example, if the tails lift too high, then that might indicate that measurements at the extremes are to be treated with caution. Or, if one is using a cut score, then the relationship could be used to examine whether the particular set of items used was an optimal set (i.e. by examining how close the minimum point is to the cut). Of course, if the non-linearity of the relationship is important, as it is in these two instances, then the linearity assumption of the classical approach is a drawback.

Fig. 2. The standard error of measurement for the SE scale (each dot represents a different score).

Evidence for validity
Note that the structure of the discussion about validity below is based on the 1999 Standards for Educational and Psychological Testing [2]. These differ quite markedly from the structures presented in earlier editions of the ‘Standards’. Those more familiar with the older categories may wish to update their knowledge before reading further.

Evidence based on instrument content
It is quite possible to develop an instrument’s content in the same way regardless of the potential application of either a classical or an item response modeling approach. The intent is to formulate items that ‘cover’ all areas of the content of interest, and, in the CTT approach, is frequently performed by topic area or other subdivision. However, to do so is to ignore one of the major advantages of item response modeling. The focus of item response modeling on ‘the item’ gives a direct connection between the meaning of the item content and the location of the item on the latent variable. The Wright map illustrates the connection between the latent variable and item and respondent locations, enabling a very rich repertoire of interpretations of the relative locations of respondents and items (see Wilson et al. [1] for more on this). The SE scale is a negative example of this criterion—the scale is dominated by the response categories rather than the items (as is clearly shown i25

M. Wilson et al. in Fig. 1). The categories, ‘0%’, ‘10%’, etc., like many merely numbered categories, are not conducive to meaningful interpretation. Effectively, the respondent is left to intuit what relative differences in ‘10% of self-efficacy’ might be and respond accordingly. That thinness of interpretational possibilities limits the possibilities for the user to interpret the results, and also limits the possibilities for someone gathering internal validity evidence (made evident in a later section). In contrast, there is a strong tradition of instruments being designed with the intent of building in strong content, and matching that content with features of a measurement model. This dates back to the work of Guttman [15], and has reached prominence in the work of Wright et al. (e.g. [16, 17]). A recent account that builds on this work [13] shows multiple examples of such content structures (called ‘construct maps’) in which the relative behaviors of a generic respondent having various amounts of the construct are arranged in order on one side, and potential items are arranged in order of endorsability on the other. in Wilson et al. [1]; also see Wilson [13], Chapter 6). As mentioned above, in the case of the SE scale, no such expectations were developed. Hence, this source of validity evidence is not available. Even if there were, the Wright map shown in Fig. 1 demonstrates that there is no discernible empirical order to the items of the SE scale. Instead, the feature of the SE scale items that maps out the SE scale variable is the transitions between the categories. Unfortunately, the labels chosen for these categories, 10%, 20%, etc., are not interpretable. The one thing you might expect would be that the categories line up across the page, but that is clearly not the case here, other than for the extreme categories. For examples of cases where the Wright map has been used successfully as a support for internal validity, see Wilson [13]. The Wright map is useful for a number of other purposes, as well. For example, one generic threat to the usefulness of an instrument is a mismatch between the locations of the items and the respondents on this map. Examination of the Wright map for the SE scale shows that the instrument is free from certain types of problems that sometimes occur in instrument development: (i) there are no significant gaps in the locations of the item thresholds and (ii) the range of the item parameter locations matches quite well the spread of the respondent locations. For illustrations of what these problems look like on a Wright map, and what it means for the instrument, see Wilson [13]. Other evidence that the items are operating as intended is available in somewhat different forms from both the classical and item response modeling approaches. An important piece of evidence that an item is functioning as expected is that the increasing response levels of the item are operating consistently with the instrument as a whole. The CTT indicator of this at the item level is the item discrimination index (Table III). The CTT indicator at the category level is the point-biserial correlation between the respondent’s choice of that category and the raw score. As the categories increase from lowest to highest, if they are working well, then one would expect these correlations to increase from negative to positive. Where they do not, that is

Evidence for validity based on response processes
Evidence based on the response process consists of studies of how respondents react to the items and the instrument as a whole. These might consist of ‘think alouds’, ‘exit interviews’ or ‘cognitive interviews’ with samples of respondents. To date, there have been no studies that explicitly relate this type of evidence with the features of measurement models, and such evidence was not available with our secondary analysis of the baseline SE scale data, so this aspect of the framework is not relevant to our comparison at this time. It is possible that this connection may be made in the future, however.

Evidence based on internal structure
One major criterion for internal or construct validity is an a priori theoretically based hypothesis about the order of item endorsability, the ease with which respondents rate items strongly. A very useful tool for investigating this is the Wright map (discussed i26

Comparison of IRM with the classical test theory approach
Table III. Item point-biserial correlations and item discrimination (disc.) for the SE scale Item Item disc. Point-biserial correlation for each category 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.59 0.75 0.75 0.67 0.67 0.71 0.55 0.73 0.65 0.71 0.71 0.68 0.67 0.70 ÿ0.31 ÿ0.30 ÿ0.32 ÿ0.30 ÿ0.29 ÿ0.27 ÿ0.16 ÿ0.34 ÿ0.35 ÿ0.30 ÿ0.31 ÿ0.29 ÿ0.29 ÿ0.33 10 ÿ0.26 ÿ0.36 ÿ0.38 ÿ0.32 ÿ0.28 ÿ0.29 ÿ0.27 ÿ0.29 ÿ0.27 ÿ0.33 ÿ0.33 ÿ0.33 ÿ0.28 ÿ0.31 20 ÿ0.17 ÿ0.17 ÿ0.16 ÿ0.19 ÿ0.19 ÿ0.22 ÿ0.16 ÿ0.21 ÿ0.17 ÿ0.13 ÿ0.17 ÿ0.18 ÿ0.22 ÿ0.19 30 ÿ0.17 ÿ0.17 ÿ0.17 ÿ0.10 ÿ0.16 ÿ0.26 ÿ0.18 ÿ0.18 ÿ0.12 ÿ0.24 ÿ0.20 ÿ0.21 ÿ0.25 ÿ0.19 40 ÿ0.06 ÿ0.22 ÿ0.15 ÿ0.07 ÿ0.13 ÿ0.15 ÿ0.17 ÿ0.19 ÿ0.08 ÿ0.17 ÿ0.16 ÿ0.12 ÿ0.16 ÿ0.07 50 0.04 ÿ0.08 0.00 ÿ0.10 0.01 ÿ0.12 ÿ0.01 ÿ0.05 0.09 ÿ0.14 ÿ0.15 ÿ0.08 ÿ0.11 0.07 60 0.14 0.03 0.07 0.00 0.09 ÿ0.01 0.13 0.01 0.15 ÿ0.05 ÿ0.03 0.06 ÿ0.07 0.16 70 0.21 0.17 0.24 0.13 0.22 0.12 0.05 0.13 0.17 0.16 0.08 0.15 0.09 0.19 80 0.29 0.25 0.27 0.23 0.26 0.24 0.20 0.26 0.23 0.22 0.27 0.29 0.23 0.32 90 0.21 0.30 0.26 0.28 0.24 0.32 0.21 0.32 0.24 0.29 0.26 0.22 0.26 0.23 100 0.06 0.36 0.33 0.30 0.26 0.30 0.24 0.33 0.24 0.34 0.35 0.26 0.35 0.20

Values that appear to be out of increasing order from left to right for any item are shown in bold.

evidence that the successive categories are not working as they should. Table III shows the point-biserial correlations for the SE scale data. Although the discrimination indexes seem to indicate that the items are all acting quite well, there are numerous cases where the expected order is not observed (cases where the right-hand values are less than the left-hand values are shown in bold in the table). In particular, there appears to be some fairly consistent problem between the first and second categories. However, this seems inconsistent with the information provided by the item discrimination index, and may be due to problems in interpreting correlation coefficients in small or restricted samples. The analogous information for the item response modeling approach is shown in Table IV: these are the means of the locations of the respondents in each category. As can be seen in Table IV, there are far fewer instances where the order is other than expected, and given the small number of cases in some of the categories (especially at the extremes), these instances can be largely ignored. The mean is an inherently simpler index than the correlation coefficient, and may be giving a clearer picture in this case [13].

One of the fundamental assumptions of IRMs is that the item response function (IRF) (discussed in Wilson et al. [1]) is invariant throughout the population being measured. If the IRF differs according to which subgroup a respondent is in, then that is referred to as ‘differential item functioning’ (DIF). The most common way to think about this is that an item would be ‘harder’ for similar people from one group than from another. (i.e. harder to endorse at a higher level, etc.). DIF does not require that the subgroups differ in their mean scale locations. When subgroups have different mean scale locations on the latent variable, this is commonly referred to as ‘differential impact’. There are several ways to investigate DIF under an item response modeling approach. One approach that is particularly straightforward is to add an item by group interaction parameter, cig into the underlying relationship. Thus, the IRF without DIF is given as in Equation 1, Probability ðXi = 1jh, di Þ = e i ðhÿd Þ , 1+e i ðhÿd Þ

ð1Þ

where h is the person estimate and di is the item endorsability parameter. Then the relationship i27

M. Wilson et al.
Table IV. Means of SE scale locations for respondents selecting each category Item Mean respondent location for each category 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ÿ1.03 ÿ1.75 ÿ1.14 ÿ1.18 ÿ1.07 ÿ2.53 ÿ1.52 ÿ1.77 ÿ1.11 ÿ3.53 ÿ2.34 ÿ2.09 ÿ3.01 ÿ1.32 10 ÿ0.40 ÿ0.44 ÿ0.49 ÿ0.39 ÿ0.40 ÿ0.42 ÿ0.39 ÿ0.37 ÿ0.35 ÿ0.44 ÿ0.38 ÿ0.45 ÿ0.30 ÿ0.41 20 ÿ0.27 ÿ0.40 ÿ0.31 ÿ0.39 ÿ0.29 ÿ0.57 ÿ0.38 ÿ0.41 ÿ0.22 ÿ0.36 ÿ0.49 ÿ0.35 ÿ0.56 ÿ0.24 30 ÿ0.20 ÿ0.29 ÿ0.22 ÿ0.17 ÿ0.22 ÿ0.45 ÿ0.23 ÿ0.27 ÿ0.17 ÿ0.40 ÿ0.36 ÿ0.30 ÿ0.37 ÿ0.25 40 ÿ0.09 ÿ0.29 ÿ0.19 ÿ0.10 ÿ0.14 ÿ0.22 ÿ0.24 ÿ0.25 ÿ0.11 ÿ0.25 ÿ0.23 ÿ0.16 ÿ0.28 ÿ0.11 50 0.03 ÿ0.10 ÿ0.02 ÿ0.12 0.00 ÿ0.14 ÿ0.01 ÿ0.06 0.07 ÿ0.15 ÿ0.17 ÿ0.10 ÿ0.16 0.03 60 0.20 0.00 0.07 ÿ0.02 0.09 ÿ0.04 0.16 ÿ0.02 0.17 ÿ0.08 ÿ0.07 0.07 ÿ0.12 0.17 70 0.28 0.16 0.25 0.17 0.25 0.11 0.04 0.12 0.24 0.15 0.06 0.17 0.09 0.28 80 0.38 0.24 0.30 0.23 0.37 0.20 0.27 0.25 0.43 0.18 0.25 0.29 0.21 0.52 90 0.62 0.48 0.49 0.43 0.50 0.50 0.29 0.49 0.52 0.38 0.31 0.32 0.31 0.63 100 0.74 1.37 1.59 0.90 1.72 1.22 0.92 1.17 2.24 1.33 0.92 1.04 0.73 2.46

Values that appear to be out of increasing order from left to right are shown in bold.

incorporating the DIF effect is expressed in Equation 2: Probability ðXi = 1jh, di , cig Þ = e i ig ðhÿd þ c Þ : ð2Þ 1 + e i ig ðhÿd þ c Þ

Table V. Estimates and standard errors for DIF parameters indicating gender–item interaction from IRM analysis of the SE scale (n = 504, female = 394) Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Estimate 0.003 ÿ0.007 0.002 ÿ0.001 0.018 0.026 ÿ0.039 ÿ0.024 0.043 ÿ0.037 0.001 0.002 0.006 0.006 Standard error 0.022 0.022 0.022 0.022 0.022 0.023 0.022 0.022 0.022 0.023 0.023 0.022 0.023 0.022

Gender was chosen to illustrate IRM analysis of DIF because it is binary and thus simpler than age or race, for example. Estimates of the item by gender interactions provided by the ConQuest software [12] for the SE scale, along with their standard errors, are shown in Table V. Calculation of the approximate 95% confidence intervals for these interaction effects using the usual formula (estimate plus or minus 1.96 times the standard error) shows that all the confidence intervals contain zero, and an omnibus test of parameter equality gives a v2 statistic of 13.021, on 14 df ( p = 0.525). Thus, the SE scale did not display any statistically significant DIF with respect to gender in this sample. Examples where DIF has been found to be important, and the implications of this for the instrument, are discussed in Wilson [13]. The discussion of how an issue like DIF can be incorporated directly into the IRM is intended as an illustration of one of the general strengths of this approach: IRMs can be expanded to investigate theoretical and measurement complexities. In coni28

No estimate is statistically significant.

trast, investigation of DIF is not available through any of the standard statistics of the classical approach. While one could use an additional technique, such as logistic regression using the raw scores, to look for evidence of DIF, it would not link directly to the CTT approach (as it does not utilize the standard error of measurement in the

Comparison of IRM with the classical test theory approach analysis). Other DIF possibilities arising out of the origins of the CTT approach include structural equation modeling (SEM) and factor analysis to see if responses differ by subgroup. measurement have equivalents in the IRM approach. The reliability for the SE scale was found to be similar under both approaches. The standard error of measurement was expressed in different metrics under the two approaches, but the reliability indicates that they were effectively not very different. Another example of method concordance is evidence concerning external variables. Both approaches use correlations between criterion variables and either the raw score or the respondent locations. There are ways in which the item response modeling approach can be extended in order to estimate better the underlying relationship using multilevel extensions of the IRMs [28], but, at a conceptual level, the operations are basically the same. There were some interesting and important differences between the two approaches, with IRM generally extending the CTT approach. For example, the concept of model fit is not available with the classical X = T + E. Although this simplifies the application of the classical approach, it is also an important limitation, depriving the analyst of the creative power of model building. A second important feature in the IRM approach is the common scale for respondents and items, embodied in the Wright map. This gives an immediacy of interpretation that allows non-technical intuitions constructively to inform instrument development and interpretation. For example, if the IRM approach had been used in the development of the SE scale, an interpretational perspective might have been developed that would make the process of interpreting results from the SE scale a more intuitive and useful exercise. By designing instruments according to the ideas of criterion referencing as pioneered by Wright [7], the unique properties of the Rasch model can be used to help build rich interpretations into measurement. For example, response categories that lend themselves to greater interpretation could help researchers to understand what aspects of selfefficacy are easy to endorse, and which can be endorsed only by those who have a great deal of confidence in their ability to overcome obstacles to exercising. Looking in more detail into the standard i29

Evidence based on other variables
Commonly called ‘external validity’, this form of evidence is examined in a similar way in both approaches by comparing respondents’ measures on the instrument of interest with their behavior or responses to other instruments. There is quite a long list of validity studies available for different versions of the SE scale, many of these have provided evidence for the relationship of the SE scale with physical activity behavior [18] or assessed its ability to predict change in physical activity behavior [19– 27]. In the item response modeling approach, this would be carried out in a very similar manner, differing only in that it would use respondent location instead of total score for comparison.

Evidence based on consequences of using an instrument
The final form of validity evidence relates to consequences. In some senses, this is the most important type of evidence. For example, if the use of an instrument was to differentiate accurately between groups of people who would respond well to intervention, but its inaccuracy in differentiating led to false exclusion or inclusion of large numbers of people, then the instrument can hardly be said to be successful, no matter what the other forms of evidence say. However, as with the previous form of validity evidence, there are no major differences in how it would be investigated under the two approaches.

Discussion
This paper has addressed several important points in the comparison of the CTT and IRM approaches to measurement. There were a number of ways in which the CTT and IRM approaches were consistent in the sorts of issues that they addressed and the results they obtained. For example, the CTT concepts of reliability and standard error of

M. Wilson et al. error of measurement, the fact that the error varies across the construct, as revealed in the IRM approach, indicates that there can be important differences in how the two approaches deal with reliability. For example, those who have very high or very low self-efficacy have higher standard errors of measurement, so any interpretation of association between these scores and behavioral change should be somewhat more suspect. There are important aspects of validity evidence that are not different between the approaches. For example, evidence based on external variables, evidence based on response processes and evidence based on consequences are all areas where the specific measurement approach adopted is not markedly important to the use of such evidence. Finally, the evidence that is usually gathered in studies of instrument validity in the area of health outcomes research does not address all aspects of validity proposed in the accepted Standards [2]. This is particularly noticeable for evidence based on response processes and evidence based on consequences. Neither the original reports on the instrument nor the BCC data include these types of evidence (although, regarding the latter, such evidence would not contribute much to a comparison of the two approaches). Perhaps the most important difference between the CTT and the IRM approaches is the DIF. DIF detection is essentially absent from the classical approach, partly at least, because items themselves are not a formal part of CTT. A measurer may use SEM or factor analysis for ways to determine DIF, but direct assessment of a DIF parameter is readily available within the IRM approach. Thus, the item response modeling approach can do all that you can do in the classical approach when it comes to assessing items and instruments, and it can do a great deal ‘more’. There could be ways to extend the classical approach to achieve many, perhaps even most, of the possibilities of item response modeling. In fact, many of the techniques that are the bread and butter of large-scale analysis using IRMs (such as, say, different equating techniques) are available in one form or another in extensions of the standard i30 classical approach. One example is generalizability theory, which can offer many useful insights— unfortunately, this and many other topics such as computerized adaptive testing are beyond the scope of an introductory paper. But the problem is that each of these extensions involves a unique set of extra assumptions that pertain only to that extension. The advantage of the item response modeling approach is its ability to be extended to incorporate new features of the measurement situation into the model, features such as extra ‘facets’ of measurement (like raters), extra dimensions (e.g. behavior as well as attitude) or higher-level units of observations (e.g. families or medical practices). Some possibilities for these extensions are given in successor papers to this one. A more comprehensive account of such extensions is given in De Boeck and Wilson [29].

Final note
So, what has been learned by analyzing the SE scale data using an IRM approach that one would not have gained by using a CTT approach? If a measurer wishes to order item presentation from those that are easiest to those that are hardest to endorse, the results of an IRM analysis of empirical data can establish such an order for future studies; future researchers must determine if this is valuable. Possibly, the most important insight has been gained not in deep technical analysis, but from a diagram. The Wright map (Fig. 1) showed that there is ample scope for incorporating meaningful category levels into the SE scale, but the current method of labeling the categories as percentages allowed for little in the way of interpretation. Perhaps, if the IRM approach had been available to the instrument’s authors, a richer basis for interpretation would have been built in from the beginning, possibly with fewer categories per item but each associated with a meaningful level of selfefficacy. The Wright map also showed that the categories were well-located with respect to the respondent locations, and also that there were no important gaps in the coverage of the latent variable,

Comparison of IRM with the classical test theory approach both positive features of the SE scale. The more detailed information about standard error of measurement indicated that the measurement at the two extremes was much less accurate than in the middle. (From Fig. 2, one can see that the measurement error at the extremes ranged from two to three times larger than the minimum, excluding the two most extreme data points.) The analyses indicated that with the current number of categories, several items could be dropped to shorten the measure with very little change in overall reliability, but the paucity of information already noted at the extremes of the range indicates the danger in assuming accuracy throughout the range would likewise be unaffected. Examination of the Wright map may give clues as to which items provide the most redundant information across the span of the content if a discerning researcher proposed to delete any items. The DIF analysis gave some comfort, as it showed that the SE scale was acting in a fairly consistent way, at least with respect to males and females. The CTT analysis, in terms of the point-biserial correlations, would likely lead to some concern about the respondent interpretations of the categories, but this was shown to be somewhat less problematical in the IRM analysis. Nevertheless, the IRM analysis indicates possible problems with respondent interpretations particularly of 20 and 30% response categories, so future investigations of the SE scale might start by considering alternative ways to label categories of responses.

Conflict of interest statement
None declared.

References
1. Wilson M, Allen DD, Li JC. Improving measurement in health education and health behavior research using item response modeling: introducing item response modeling. Health Educ Res 2006; 21(Suppl 1): i4–i18. 2. American Educational Research Association, American Psychological Association, National Council for Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association, 1999. 3. Traub RE. Classical test theory in historical perspective. Educ Meas Issues Pract 1997; 16: 8–14. 4. Cronbach LJ. Essentials of Psychological Testing, 5th edn. New York: Harper & Row, 1990. 5. Spearman C. The proof and measurement of association between two things. Am J Psychol 1904; 15: 72–101. 6. Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publications, 1991. 7. Wright B. A history of social science measurement. Educ Meas Issues Pract 1997; 16: 33–45, 52. 8. Allen DD, Wilson M. Introducing multidimensional item response modeling in health behavior and health education research. Health Educ Res 2006; 21(Suppl 1): i73–i84. 9. Ory MG, Jordan PJ, Bazzarre T. The Behavior Change Consortium: setting the stage for a new century of health behavior-change research. Health Educ Res 2002; 17: 500–11. 10. Garcia AW, King AC. Predicting long-term adherence to aerobic exercise: a comparison of two models. J Sport Exerc Psychol 1991; 13: 394–410. 11. King AC, Friedman R, Marcus BH et al. Harnessing motivational forces in the promotion of physical activity: the Community Health Advice by Telephone (CHAT) project. Health Educ Res 2002; 17: 627–36. 12. Wu ML, Adams RJ, Wilson MR. ACER ConQuest: Generalised Item Response Modelling Software [computer program, Version]. Hawthorn, Australia: ACER (Australian Council for Educational Research) Press, 1998. 13. Wilson M. Constructing Measures: An Item Response Modeling Approach. Mahwah, NJ: Erlbaum, 2005. 14. Andrich D. A rating formulation for ordered response categories. Psychometrika 1978; 43: 561–73. 15. Guttman L. A basis for scaling qualitative data. Am Sociol Rev 1944; 9: 139–50. 16. Wright BD, Stone M. Best Test Design. Chicago, IL: MESA Press, 1979. 17. Wright BD, Masters GN. Rating Scale Analysis. Chicago, IL: MESA Press, 1982. 18. Trost SG, Owen N, Bauman AE et al. Correlates of adults’ participation in physical activity: review and update. Med Sci Sports Exerc 2002; 34: 1996–2001.

Acknowledgements
We would like to thank Louise Masse formerly ˆ of the National Cancer Institute (NCI), now from the University of British Columbia, for organizing the project on which this work is based and for providing crucial guidance throughout the writing of the paper. Support for this project was provided by NCI (Contract No. 263-MQ-31958). We thank the Behavioral Change Consortium for providing the data that were used in the analyses. Any errors or omissions are, of course, solely the responsibility of the authors.

i31

M. Wilson et al.
19. Poag-DuCharme KA, Brawley LR. Self-efficacy theory: use in the prediction of exercise behavior in the community setting. J Appl Sport Psychol 1993; 5: 178–94. 20. Sallis JF, Hovell MF, Hofstetter CR. Predictors of adoption and maintenance of vigorous physical activity in men and women. Prev Med 1992; 21: 237–51. 21. Sallis JF, Hovell MF, Hofstetter CR et al. Explanation of vigorous physical activity during two years using social learning variables. Soc Sci Med 1992; 34: 25–32. 22. McAuley E. Self-efficacy and the maintenance of exercise participation in older adults. J Behav Med 1993; 16: 103–13. 23. Marcus BH, Eaton CA, Rossi JS et al. Self-efficacy, decision-making, and stages of change: an integrative model of physical exercise. J Appl Soc Psychol 1994; 24: 489–508. 24. Muto T, Saito T, Sakurai H. Factors associated with male workers’ participation in regular physical activity. Ind Health 1996; 34: 307–21. 25. McAuley E, Jacobson L. Self-efficacy and exercise participation in sedentary adult females. Am J Health Promot 1991; 5: 185–91. 26. Miller YD, Trost SG, Brown WJ. Mediators of physical activity behavior change among women with young children. Am J Prev Med 2002; 23: 98–103. 27. Wilbur J, Miller AM, Chandler P et al. Determinants of physical activity and adherence to a 24-week home-based walking program in African American and Caucasian women. Res Nurs Health 2003; 26: 213–24. 28. Adams RJ, Wilson M, Wu M. Multilevel item response models: an approach to errors in variables regression. J Educ Behav Stat 1997; 22: 47–76. 29. De Boeck P, Wilson M (eds). Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. New York: Springer-Verlag, 2004. Received on September 7, 2005; accepted on May 12, 2006

i32

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Similar Documents

Free Essay

Miss Brill

...In "Miss Brill," Katherine Mansfield portrays a lonely and sensitive woman who finds Sundays very enjoyable and comforting. She tends to go out to the park on those particular days and observe all of the people out there. She’s very interested in the lives of others and enjoys being part of their lives for only moments long just by eavesdropping on their conversations or arguments. This could be due to the possibility of her life being dull and lacking excitement. She tends to temporarily escape her realities by drifting off and joining the realities of other individuals. In order for us to really understand Miss Brill we need to look her closely as a character. Miss Brill is portrayed as an elderly woman whom is happy and satisfied with her life. On Sundays she enjoys taking walks in the park where she watches and observes other people and momentarily takes a step and participates in their lives. Of the title the character, Miss Brill, Mansfield tell us, “Only two people shared her “special” seat a fine old man in a velvet coat, his hands clasped over a huge carved walking- stick, and a big old woman, sitting upright, with a roll of knitting on her embroidered apron.” (72). She refers to a special seat in the park where she always sits to observe every detail, every move that people does, pretending that is part of the play. When Miss Brill was in the park she said she felt as if she and everyone else were all part of a “play”. She also likes to listen in on the conversations...

Words: 722 - Pages: 3

Free Essay

Miss America

...History: The Miss America Competition began in 1921 as part of an elaborate public festival staged by Atlantic City businessman to extend the summer tourist season. In succeeding years, the Miss America competition evolved into an American tradition with contestants from each of the states competing every September for the coveted title of Miss America. Early on, the talent competition was made part of the competition in addition to the original swimsuit. In 1945, the Organization began supporting women’s education by offering its first scholarship. Today, the Miss America Organization is one of the nation’s leading achievement programs and the world’s largest provider of scholarship assistance for young women. Each year, the Miss America Organization makes available more than $45 million in cash and tuition scholarship assistance. In 1989, the Miss America Organization founded the platform concept, which requires each contestant to choose an issue about which she cares deeply and that is of relevance to our country. Once chosen, Miss America and the state titleholders use their stature to address community service organizations, business and civic leaders, the media and others about their platform issues. Since 1989, Miss America titleholders have appeared at thousands of public speaking engagements and charitable events to generate awareness for a variety of causes, including homelessness, HIV/AIDS prevention, domestic violence, diabetes awareness, character education, literacy...

Words: 1255 - Pages: 6

Premium Essay

Miss Usa

...The American Dream Studs Terkel’s “Miss USA” interview of a young Emma Knight portrays the reality of the “American Dream”. Through Emma Knight, Terkel describes the life of a beauty queen using irony and pessimism. With Emma Knight’s negative self image, she projects herself as being unsuitable for the beauty queen pageant as she states, “NO, uh-uh, never, never, never. I’ll lose, how humiliating.” However, she enters and ironically goes on to win the Miss USA pageant. Terkel continues to express the irony of Knight by including her thoughts after the second night saying, “I thought: This will soon be over, get on a plane tomorrow, and no one will be the wiser. Except that my name got called as one of the fifteen.” Still showing the lack of confidence the young contestant displays her ability to fit in or belong in the world of pageantry. Terkel also utilizes a pessimistic tone in addition to the irony expressed throughout the interview of Emma Knight. In the interview Knight says “If I could put that banner and crown on that lamp, I swear to God ten men would come in and ask it for a date.” Therefore, implying that only the crown and banner makes a woman appealing. Another depiction of pessimism illustrated is her statement in the beginning of the interview saying, “It’s mostly what’s known as t and a, tits and ass. No talent.” implying that the pageants are mostly for demoralizing the women in it. Emma Knight’s tone throughout the story of the American Dream...

Words: 319 - Pages: 2

Premium Essay

Miss Havisham In Great Expectations

...Charles Dickens portrayed the character Miss Havisham as having post traumatic stress disorder.PTSD, which is experiencing or witnessing a life-threatening event, like a horrible event that had happened in your life which may lead to (U.S. Department of Veterans Affairs).The symptoms of PTSD which is depression which Miss Havisham shows a lot in book.. For example; “She had not quite finished dressing, for she had but one shoe on. The other was on the table near her hand, her veil was but half arranged” (Dickens 44). The symptoms of depression that Miss Havisham shows in the book, because of her past, which shows how it's affecting her day to day life. Miss Havisham always shows distrust and negative feelings towards people especially men...

Words: 1957 - Pages: 8

Premium Essay

Little Miss Sunshine

...The movie Little Miss Sunshine is a fantastic movie to watch for teens and adults. In Little Miss Sunshine, the directors (Jonathan Dayton and Valerie Faris) on the film have done an extraordinary job in producing the movie. The elements that were included throughout the film are soundtrack and dialogue. LMS displays lots of important qualities to the movie which made watching this movie enjoyable to watch. Overall, it seems to have the elements directly connected to the movie. The beginning of the movie, shows a girl named Olive (Abigail Breslin), who is part of the Hoover family, finding out that she had successfully been nominated for the Little Miss Sunshine competition. She tells her parents about how she should go to the competition...

Words: 478 - Pages: 2

Free Essay

Driving Miss Daisy

...11/28/2011 Driving Miss Daisy At the 62nd Academy awards Driving Miss Daisy received a total of four awards out of nine nominations. Driving Miss Daisy also won three Golden Globe Awards, and went on to win Best Adapted Screenplay at the 1989 Writers Guild of America. Jessica Tandy who played Daisy Werthan (Miss Daisy) and Morgan Freeman who played Hoke Colburn (Miss Daisy’s chauffeur) won the Silver Bear for the Best Joint Performance at the 40th Berlin International Film Festival. Driving Miss Daisy was also the last Best Picture winner to date to receive a Pg rating and is the only film based on an off Broadway Production ever to win an Academy Award for Best Picture. Actress Jessica Tandy,81 , became both the oldest winner and the oldest nominee in history of the Best Actress category. This film gives some great examples of patience,kindness ,dedication, racism , prejudice and dignity in a very difficult time and situation. Driving Miss Daisy is a comedy-drama film that came from Alfred Urhy’s play Driving Miss Daisy. Opening weekend (17 December 1989) Driving Miss Daisy brought in $73.745 the movie grossed $145,793,296. Some of the filming locations were Atlanta, Georgia,Decatur ,Georgia and Douglasville ,Georgia. Overcoming racial prejudice is an important theme in the movie along with growing older, and the importance of friendship. You are also Reminded of the situation in the south, During the time of the civil rights movement. The years 1948-1973...

Words: 722 - Pages: 3

Premium Essay

Miss America By Elizabeth Fettechtel Thesis

...Elizabeth Fechtel is no rookie when it comes to pageants. The former Miss America’s Outstanding Teen 2012 is now this year’s Miss UF. The 19-year-old telecommunication sophomore was one of 18 contestants at this year’s pageant and said she saw it as an opportunity to do what she loves. But when asked whether or not she thought she was going to win, Fechtel’s immediate answer was no. “Because I’d done pageants before, some of my friends thought, ‘oh, easy breezy,’” she said. “But I knew how difficult it was walking on stage in a gown.” Miss UF is a preliminary pageant to Miss Florida, which is preliminary to Miss America. “There are so many pageants, but there is only one Miss America,” she said. As Miss UF, Fechtel will uphold the four pillars of the Miss America...

Words: 403 - Pages: 2

Premium Essay

Little Miss Sunshine

...THA 2301 001 Assignment 1 The Explicit Meaning of Little Miss Sunshine In the movie, Little Miss Sunshine, a family embarks on a journey from Albuquerque, New Mexico, to Redondo Beach, California, in order to help the main character, a 9-year old girl named Olive, pursue her dream of winning a pageant. Richard and Cheryl, Olive’s parents, decide that it is necessary to take the entire household, which consists of Dwayne, Olive’s teenage half-brother who has taken a vow of silence until he is accepted into the Air Force, Edwin (Grandpa), Richard’s heroin-addicted father, and Frank, Sheryl’s gay brother, who comes to live with them after a suicide attempt. The family climbs into an old Volkswagen bus to make their way to the pageant. At the beginning of the road trip, the clutch goes out on the bus, and because of time restraints, they do not have time to have the bus repaired. Thus, they decide to push-start the bus for the remainder of the trip. Later on, the horn on the bus becomes stuck and the passengers have to deal with an incessant honking for the rest of the journey. Throughout the trip, several devastating things happen. Richard receives news that his business venture has failed, Frank has an encounter with the student who broke his heart, Grandpa dies of a heroin overdose, and Dwayne discovers that he is color-blind. Despite these unhappy situations, the family soldiers on, desperately trying to give Olive her opportunity at happiness. The...

Words: 375 - Pages: 2

Free Essay

Little Miss Sunshine

...Morgan Cross Final Project Spivey April 28, 2014 Little Miss Sunshine Movies are very beneficial in understanding sociology. Films are a mirror image of society and they perceive the social and family movements during a lifetime. Little Miss Sunshine, released in 2006 and written by Mark Arndt, is a startling and revealing comedy about a bizarre family in New Mexico. This movie shows signs of deviance in assorted ways from drug abuse, suicide, and sexuality with signs of social interaction. Social interaction is how we act toward and react to other people around us. Deviance is traits or behaviors that violate society’s expected rules or norms. Olive, the little girl in the Hoover family, has been nominated to compete in the Little Miss Sunshine Pageant in California. If she wants to participate in the pageant, the whole family must travel together to California. The experiences and life lessons that they have are out of the ordinary and shocking. The viewer sees the grandfather locking himself in the bathroom doing drugs. Drugs are deviant because they are illegal. The viewer might look at the grandfather badly because in real life people doing drugs are shunned. This is a way of social construction. On the way to California, they stop at a hotel for the night where the grandfather dies in his sleep after taking the drugs. The family retrieved his dead body from the hospital morgue to take with them to get to the pageant in time. Common sense says this is a criminal act because...

Words: 1388 - Pages: 6

Premium Essay

How Does Dickens Present Miss Havisham

...Estella is the adopted daughter of Miss Havisham. From meeting Pip to marrying Drummle she carries a very cold attitude towards males which remains with her from Havisham's teachings. Estella acts like a cold and heartless woman, she remains true to her upbringing and the reality of her being heartless and incapable of love. Which hurts Pip even more, as he can not stop loving her but she does not love him back. She plays as she grows from a child to a woman toying with many suitors along the way, but never as detrimental as she did Pip. She claims that she treats Pip the best out of all other suitors, "Do you want me then," said Estella, turning with a fixed and serious, if not angry, look, "to deceive and entrap you?" (Dickens 312). Truthfully she acts under Havisham's revenge ideas but she does nothing to stop this and carries these actions through with no emotion....

Words: 929 - Pages: 4

Premium Essay

Little Miss Sunshine Caregiver Identity

...Parenting Movie Analysis The movie “Little Miss Sunshine” is about a 7 year old girl named Olive Hoover whose dream is to be entered into a pageant called Little Miss Sunshine.The movie includes an extended family including their uncle and grandparent. Moreover, when she discovers that she’s been entered her family face many difficulties. Though they do want Olive to achieve her dream they are so burdened with their own quirks and problems that they can barely make it through a day without some disaster occurring. This movie relates to the Caregiver Identity Theory because the Caregiver Identity theory is the theory “Multidimensional roles caregivers play when they are both a loved one of the patient and the caregivers”. This relates to...

Words: 344 - Pages: 2

Premium Essay

Little Miss Sunshine Hoover Family

...The movie Little Miss Sunshine premiered in the year 2006 and is arguably the most successful indie movie of all time. The movie features an array of characters all with their own internal issues and it is evident of the disfunctionality of this family very early on in the script and also the movie. While the movie is filled with many negative events, in the end the family is brought together and it did bring a tear to my eye as this past week was in fact the first time I have ever seen this movie. Little Miss Sunshine qualifies as an ensemble film as all six characters within their Hoover family all have their own role within the film and each characters story is critical to the story line throughout. These six characters work together...

Words: 1727 - Pages: 7

Free Essay

Compare Little Miss Sunshine and Juno

...Little Miss Sunshine directed by Jonathan Dayton and Valarie Faris, is a family drama about a young girl wanting to go after her dream. Along the way, family members go through conflicts that change him or her and help them grow and mature as a character. Jason Reitman, the director of Juno, also brings up this issue, where the main character goes through a series of conflicts that ‘forces’ her to mature. Both these films show the representation of family and youth and the theme of maturing by the use of language and cinematic conventions. Both these films show two protagonists affected by the issue of having to grow up early and family support. Throughout a person’s life, they will go through changes that will help them mature and grow as a person. Young Olive in Little Miss Sunshine realises that her dream of being a beauty pageant winner is out of her reach but soon realises winning doesn’t matter and overcomes her loss. Similarly, Juno is faced with being pregnant which is unplanned but she is almost forced to deal with it. She decides to give the baby up for adoption, the same as Olive is giving up her dream. Each film uses a variety of cinematic conventions to bring forward the specific issues. For example, in Little Miss Sunshine, several scenes use camera angles such as a close up of Olive with her family blurred out in the background, symbolising that she feels alone and separated yet is determined for them to be an ideal ‘happy’ family, this helps position the viewers...

Words: 976 - Pages: 4

Free Essay

Mr Ahmed

...in support of the explanation which I have just offered to you?" I saw Miss Halcombe change colour, and look a little uneasy. Sir Percival's suggestion, politely as it was expressed, appeared to her, as it appeared to me, to point very delicately at the hesitation which her manner had betrayed a moment or two since. I hope, Sir Percival, you don't do me the injustice to suppose that I distrust you," she said quickly. "Certainly not, Miss Halcombe. I make my proposal purely as an act of attention to YOU. Will you excuse my obstinacy if I still venture to press it?" He walked to the writing-table as he spoke, drew a chair to it, and opened the paper case. "Let me beg you to write the note," he said, "as a favour to ME. It need not occupy you more than a few minutes. You have only to ask Mrs. Catherick two questions. First, if her daughter was placed in the Asylum with her knowledge and approval. Secondly, if the share I took in the matter was such as to merit the expression of her gratitude towards myself? Mr. Gilmore's mind is at ease on this unpleasant subject, and your mind is at ease—pray set my mind at ease also by writing the note." "You oblige me to grant your request, Sir Percival, when I would much rather refuse it." With those words Miss Halcombe rose from her place and went to the writing-table. Sir Percival thanked her, handed her a pen, and then walked away towards the fireplace. Miss Fairlie's little Italian greyhound was lying on the rug. He held out his...

Words: 572 - Pages: 3

Premium Essay

Missed Appt

...time, they may have avoided the ambush or avoided the Vbid that hit them in the bottleneck. It sounds extreme but time management plays a critical role in the Army. When you make an appointment that spot has been reserved for you. That means if you have been given the last slot someone else is going to have to wait for another one to open up. This could be one day or one month. And because you missed it someone else is still going to have to wait when they could have had that spot and been there. If you are going to miss the appointment or cannot make it due to mission they do allow us to cancel the appointment with in twenty four hours. The Army allows us to make appointments for whatever we need. Be it for a medical appointment, house goods, CIF, Smoking Sensation or whatever we need these recourses are available to us. But when Soldiers start missing appointments theses systems start to become inefficient. What a lot of Soldiers do not realize is that when they miss an appointment it does not just affect them; it affects the entire chain of command from the Squad Leader all the way to the First Sgt. When a Soldier misses an appointment the squad leader must answer for the Soldier, the Squad leader must answer to the platoon Sgt., the Platoon Sgt. Must answer to the First Sgt., and the First Sgt., must answer to the...

Words: 354 - Pages: 2