0% found this document useful (0 votes)
141 views24 pages

13 Hmef5053 T9

Uploaded by

chenz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views24 pages

13 Hmef5053 T9

Uploaded by

chenz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topic  Item Analysis

9
LEARNING OUTCOMES

By the end of the topic, you should be able to:


1. Describe what item analysis is and the steps in item analysis;
2. Calculate the difficulty index and discrimination index;
3. Apply item analysis on essay-type question;
4. Discuss the relationship between the difficulty index and
discrimination index of an item;
5. Do distractor analysis; and
6. Explain the role of an item bank in the development of tests.

 INTRODUCTION
When you develop a test, it is important to identify the strengths and weaknesses
of each item. To determine how well items in a test perform, some statistical
procedures need to be used.

In this topic, we will discuss item analysis which involves the use of three
procedures: item difficulty, item discrimination and distractor analysis to help the
test developer decide whether the items in a test can be accepted or should be
modified, or rejected. These procedures are quite straightforward and easy to use,
and the educator needs to understand the logic underlying the analyses in order
to use them properly and effectively.

Copyright © Open University Malaysia (OUM)


196  TOPIC 9 ITEM ANALYSIS

9.1 WHAT IS ITEM ANALYSIS?


After having administered a test and marked it, most teachers would discuss the
answers with their students. Discussion would usually focus on the right answers
and the common errors made by students. Some teachers may focus on the
questions most students performed poorly on and the questions they did very
well.

However, there is much more information available about a test that is often
ignored by teachers. This information will only be available if the item analysis is
done. What is item analysis?

Item analysis is a process which examines the responses to individual test


items or questions in order to assess the quality of those items and the test as
a whole.

Item analysis is especially valuable in improving items or questions that will be


used again in later tests, but it can also be used to eliminate ambiguous or
misleading items in a single test administration.

Specifically, in classical test theory (CTT) the statistics produced from analysing
the test results based on test scores include measures of difficulty index and
discrimination index. Analysing the effectiveness of distractors also becomes part
of the process (which we will discuss in detail later in the topic).

The quality of a test is determined by the quality of each item or question in the
test. The teacher who constructs a test can only roughly estimate the quality of a
test. This estimate is based on the fact that the teacher has followed all the rules
and conditions of test construction.

However, it is possible that this estimation may not be accurate and certain
important aspects have been ignored. Hence, it is suggested that to obtain a more
comprehensive understanding of the test, item analysis should be conducted on
the responses of students. Item analysis is done to obtain information about
individual items or questions in a test and how the test can be further improved.
It also facilitates the development of an item or question bank which can be used
in the construction of a test.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  197

9.2 STEPS IN ITEM ANALYSIS


Both CTT and the „modern‰ test theory such as item response theory (IRT) provide
useful statistics to help us analyse the test data. For many item analysis, CTT is
sufficient to provide the information we need. CTT will be used in this module.
Let us take an example of a teacher who has administered a 30-item multiple-
choice objective test in geography to 45 students in a secondary school classroom.

Step 1
Upon receiving the answer sheet, the first step would be to mark each of the
answer sheets.

Step 2
Arrange the 45 answer sheets from the highest score obtained to the lowest score
obtained. The paper with the highest score is on top and the paper with the lowest
score is at the bottom.

Step 3
Multiply 45 (the number of answer sheets) with 0.27 (or 27 per cent) which is 12.15
and round it up to 12. The use of the value 0.27 or 27 per cent is not inflexible. It is
possible to use any percentage from 27 to 35 per cent as the value. However, the
27 per cent rule can be ignored if the class size is too small. Instead of taking the
27 per cent sample, divide the number of answer sheets by two.

Step 4
Arrange the pile of 45 answer sheets according to the scores obtained (from the
highest score to the lowest score). Take out 12 answer sheets from the top of the
pile and 12 answer sheets from the bottom of the pile. Call these two piles „high
marks‰ students and „low marks‰ students respectively. Set aside the middle
group of papers (21 papers). Although these could be included in the analysis,
using only the high and low groups will simplify the procedure.

Step 5
Refer to Question 1 (refer to Figure 9.1), then:

(a) Count the number of students from the „high marks‰ group who selected
each of the options (A, B, C or D); and

(b) Count the number of students from the „low marks‰ group who selected the
option A, B, C or D.

Copyright © Open University Malaysia (OUM)


198  TOPIC 9 ITEM ANALYSIS

Figure 9.1: Item analysis for one item or question

From the analysis, 11 students from the „high marks‰ group and two students
from the „low marks‰ group selected „B‰ which is the correct answer. This means
that 13 out of the 24 students selected the correct answer. Also, note that all the
distractors (A, C and D) were selected by at least one student. However, the
information provided in Figure 9.1 is insufficient and further analysis has to be
conducted.

SELF-CHECK 9.1

1. Define item analysis.

2. Describe the five steps of item analysis.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  199

9.3 DIFFICULTY INDEX


Using the information provided in Figure 9.1, you can compute the difficulty index
which is a quantitative indicator with regard to the difficulty level of an individual
item or question. It can be calculated using the following formula:

Number of students with the correct answer (R)


Difficulty index 
Total number of students who attempted the question (T)
R 13
   0.54
T 24

What does a difficulty index (p) of 0.54 mean? The difficulty index is a coefficient
that shows the percentage of students who got the correct answer compared with
the total number of students who attempted the question. In other words,
54 per cent of students selected the right answer. Although our computation is
based on the high and low scoring groups only, it provides a close approximation
of the estimate that would be obtained with the total group. Thus, it is proper to
say that the index of difficulty for this item is 54 per cent (for this particular group).
Note that since „difficulty‰ refers to the percentage getting the item right, the
smaller the percentage figure the more difficult the item. The meaning of the
difficulty index is shown in Figure 9.2.

Figure 9.2: Interpretation of the difficulty index (p)

If a teacher believes that the achievement 0.54 on the item is too low, he or she can
change the way he or she teaches the item to better meet the objective represented
by it. Another interpretation might be that the item was too difficult or confusing
or invalid, in which case the teacher can replace or modify the item, perhaps using
information from the itemÊs discrimination index or distractor analysis.

Under CTT, the item difficulty measure is simply the proportion that is correct for
an item. For an item with a maximum score of two, there is a slight modification
to the computation of proportion of percentage correct.

Copyright © Open University Malaysia (OUM)


200  TOPIC 9 ITEM ANALYSIS

This item has a possible partial credit scoring 0, 1, 2. If the total number of students
attempting this item is 100, and 23 students scored 0, 60 students scored 1 and
17 students scored 2, then a simple calculation will show that 23 per cent of the
students scored 0, 60 per cent of the students scored 1, and 17 per cent of the
students scored 2 for this particular item. The average score for this item should
be 0  0.23 + 1  0.6 + 2  0.17 = 0.94.

Thus, the observed average score of this item is 0.94 out of a maximum of 2. So the
average proportion correct is 0.94/2 = 0.47 or 47 per cent.

ACTIVITY 9.1

A teacher gave a 20-item Science test to a group of 35 students. The


correct answer for Question #20 is „C‰ and the results are as follows:

Options A B C D Blank
High marks group (n = 12) 0 2 8 2 0
Low marks group (n = 12) 2 4 3 2 1

(a) Calculate the difficulty index (p) for Question #20.

(b) Is Question #20 an easy or difficult question?

(c) Do you think you need to improve Question #20? Why?

Post your answers on the myINSPIRE online forum.

9.4 DISCRIMINATION INDEX


Discrimination index is a basic measure which shows the extent to which a
question discriminates or differentiates between students in the „high marks‰
group and „low marks‰ group. This index can be interpreted as an indication of
the extent to which overall knowledge of the content area or mastery of the skills
is related to the response on an item. Most crucial for a test item is that whether a
student answered a question correctly or not is due to his/her level of knowledge
or ability and not due to something else such as chance or test bias.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  201

In our example in subtopic 9.2, 11 students in the high group and two students in
the low group selected the correct answer. This indicates positive discrimination,
since the item differentiates between students in the same way that the total test
score does. That is, students with high scores on the test (high group) got the item
right more frequently than students with low scores on the test (low group).
Although analysis by inspection maybe all that is necessary for most purposes, an
index of discrimination can be easily computed using the following formula:

Rh  RL
Discrimination index 
1 T
2
where Rh = Number of students in „high marks‰ group (Rh) with the correct
answer
RL = Number of students in „low marks‰ group (RL) with the correct
answer
T = Total number of students

Example 9.1:
A test was given to a group of 43 students and 10 out of the 13 „high marks‰ group
got the correct answer compared to five out of the 13 „low marks‰ group who got
the correct answer. The discrimination index is computed as follows:

R h  R L 10  5 10  5
    0.38
1 T 1  26  13
2 2

What does a discrimination index of 0.38 mean? The discrimination index


is a coefficient that shows the extent to which the question discriminates or
differentiates between „high marks‰ students and „low marks‰ students.
Blood and Budd (1972) provide the guidelines on the meaning of
the discrimination index as follows (refer to Figure 9.3).

Copyright © Open University Malaysia (OUM)


202  TOPIC 9 ITEM ANALYSIS

Figure 9.3: Interpretation of the discrimination index


Source: Blood and Budd (1972)

A question that has a high discrimination index is able to differentiate between


students who know and those who do not know the answer. When we say that a
question has a low discrimination index, it is not able to differentiate between
students who know and students who do not know. A low discrimination index
means that more „low marks‰ students got the correct answer because the
question was too simple. It could also indicate that students from both the „high
marks‰ group and „low marks‰ group got the answer wrong because the question
was too difficult.

The formula for the discrimination index is such that if more students in the „high
marks‰ group chose the correct answer than students did in the low scoring group,
the number will be positive. At a minimum, one would hope for a positive value,
as that would indicate that it is knowledge of the question that resulted in the
correct answer. The greater the positive value (the closer it is to 1.0), the stronger
the relationship is between overall test performance and performance on that item.
If the discrimination index is negative, that means that for some reason, students
who scored low on the test were more likely to get the answer correct. This is a
strange situation which suggests poor validity for an item.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  203

9.5 APPLICATION OF ITEM ANALYSIS ON


ESSAY-TYPE QUESTIONS
The previous subtopics explain the use of item analysis on multiple-choice
questions. Item analysis can also be applied on essay-type questions. This subtopic
will illustrate how this can be done. For ease of understanding, the illustration will
use a short-answer essay question as an example.

Let us assume that a group of 20 students have responded to a short-answer essay


question with scores ranging from the minimum of 0 to the maximum of 4.
Table 9.1 provides the scores obtained by the students.

Table 9.1: Scores Obtained by Students for a Short-answer Essay Question

No. of Students Earning Each


Item Score Total Scores Earned
Score
4 5 20
3 6 18
2 5 10
1 3 3
0 1 0
Total 51
Average Score 51/20 = 2.55

The difficulty index (p) of the item can be computed using the following formula:

Average score
p
Possible range of score

Using the information from Table 9.1, the difficulty index of the short-answer essay
question can be easily computed. The average score obtained by the group of
students is 2.55, while the possible range of score for the item is (4 ă 0) = 4. Thus,

2.55
p
4
 0.64

Copyright © Open University Malaysia (OUM)


204  TOPIC 9 ITEM ANALYSIS

The difficulty index (p) of 0.64 means that on average, students have received
64 per cent of the maximum possible score of the item. The difficulty index can
be interpreted the same as that of the multiple-choice question discussed in
subtopic 9.3. The item is of a moderate level of difficulty (refer to Figure 9.2).

Note that in computing the difficulty index in the previous example, the scores of
the whole group are used to obtain the average score. However, for a large group
of students, it is possible to estimate the difficulty index for an item based on only
a sample of students comprising the high marks and low marks groups as in the
case of computing the difficulty index of a multiple-choice question.

To compute the discrimination index (D) of an essay-type question, the following


formula is suggested by Nitko (2004):

Difference between upper and lower groups' average score


D
Possible range of score

Using the information from Table 9.1 but presenting it in the following format as
in Table 9.2, we can compute the discrimination index of the short-answer essay
question.

Table 9.2: Distribution of Scores Obtained by Students

Average
Score 0 1 2 3 4 Total
Score
High marks group (n = 10) 0 0 1 4 5 34 3.4
Low marks group (n = 10) 1 3 4 2 0 17 1.7

Note: n refers to the number of students.

The average score obtained by the upper group of students is 3.4 while that of the
lower group is 1.7. Using the formula as suggested by Nitko (2004), we can
compute the discrimination index of the short-answer essay question as follows:

3.4  1.7
D
4
 0.43

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  205

The discrimination index (D) of 0.43 indicates that the short-answer question does
discriminate between the upper and lower groups of students and at a high level
(refer to Figure 9.3.) As in the computation of the discrimination index of the
multiple-choice question for a large group of students, a sample of students
comprising the top 27 per cent and the bottom 27 per cent may be used to provide
a good estimate.

The following are two possible reasons for poorly discriminating items:

(a) The item tests something else compared to the majority of items in the test;
or

(b) The item is poorly written and confuses the students.

Thus, when examining the low discriminating item, it is advisable to check


whether:

(a) The wording and format of the item are problematic; and

(b) The item may be testing a different thing than that intended for the test.

Copyright © Open University Malaysia (OUM)


206  TOPIC 9 ITEM ANALYSIS

ACTIVITY 9.2

1. The following is the performance of students in the high marks and


the low marks groups in a short-answer essay question.

Score 0 1 2 3 4
High marks group (n = 10) 2 2 3 1 2
Low marks group (n = 10) 3 2 2 3 0

(a) Calculate the difficulty index.

(b) Calculate the discrimination index.

Discuss the findings on the myINSPIRE online forum.

2. A teacher gave a 35-item Economics test to 42 students. For


Question 16; 8 out of the 11 from the high marks groups got the
correct answer compared with 4 out of 11 from the low marks
group who got the correct answer.

(a) Calculate the discrimination index for Question 16.

(b) Does Question 16 have a high or low discrimination index?

Post your answers on the myINSPIRE online forum.

9.6 RELATIONSHIP BETWEEN DIFFICULTY


INDEX AND DISCRIMINATION INDEX
Theoretically, the more difficult or easier a question (or item) is, the lower will the
discrimination index be. Stanley and Hopkins (1972) provided a theoretical model
to explain the relationship between the difficulty index and discrimination index
of a particular question or item (refer to Figure 9.4).

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  207

Figure 9.4: Theoretical relationship between difficulty index and discrimination index
Source: Stanley and Hopkins (1972)

According to the model, a difficulty index of 0.2 can result in a discrimination


index of about 0.3 for a particular item (which may be described as an item of
„moderate discrimination‰). Note that as the difficulty index increases from 0.1 to
0.5, the discrimination index increases even more. When the difficulty index
reaches 0.5 (described as an item of „moderate difficulty‰), the discrimination
index is positive 1.00 (very high discrimination). Interestingly, a difficulty index of
more than 0.5 leads to a decrease in the discrimination index.

For example, a difficulty index of 0.9 results in a discrimination index of about 0.2,
is described as an item of low to moderate discrimination. What does this mean?
The more difficult a question, the harder it is for that question or item to
discriminate between those students who know and those who do not know the
answer to the question.

Copyright © Open University Malaysia (OUM)


208  TOPIC 9 ITEM ANALYSIS

Similarly, when the difficulty index is about 0.1, the discrimination index drops to
about 0.2. What does this mean? The easier a question, the harder it is for that
question or item to discriminate between those students who know and those who
do not know the answer to the question.

ACTIVITY 9.3

1. What can you conclude about the relationship between the difficulty
index of an item and its discrimination index?

2. Do you take these factors into consideration when giving an


objective test to students in your school? Justify.

Share your answers with your coursemates in the myINSPIRE online


forum.

9.7 DISTRACTOR ANALYSIS


In addition to examining the performance of an entire test item, teachers are also
interested in examining the performance of individual distractors (incorrect
answer options) on multiple-choice items. By calculating the proportion of
students who chose each answer option, teachers can identify which distractors
are „working‰ and appear attractive to students who do not know the correct
answer, and which distractors are simply taking up space and not being chosen by
many students. To eliminate blind guessing which results in a correct answer
purely by chance (which hurts the validity of a test item), teachers want as many
plausible distractors as is feasible. Analyses of response options allow teachers to
fine-tune and improve items they may wish to use again with future classes. Let
us examine performance on an item or question (refer to Figure 9.5).

Figure 9.5: Effectiveness of distractors

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  209

Generally, a good distractor is able to attract more „low marks‰ students to select
that particular response or distract „high marks‰ students towards selecting that
particular response. What determines the effectiveness of distractors? Figure 9.5
shows you how 24 students selected the options A, B, C and D for a particular
question. Option B is a less effective distractor because many „high marks‰
students (n = 5) selected option B. Option D is a relatively good distractor because
two students from the „high marks‰ group and five students from the „low marks‰
group selected this option. The analysis of response options shows that those who
missed the item were about equally likely to choose answer B and answer D. No
students chose answer C, meaning it does not act as a distractor. Students were not
choosing between four answer options on this item, they were really choosing
between only three options, as they were not even considering answer C. This
makes guessing correctly more likely, which hurts the validity of the item. The
discrimination index can be improved by modifying and improving options B
and C.

ACTIVITY 9.4

Which British resident was killed by Maharajalela in Pasir Salak?

Hugh Low Birch Brooke Gurney


Options A B C D No Response
High marks (n = 15) 4 7 0 4 0
Low marks (n = 15) 6 3 2 4 0

The answer is B.

Analyse the effectiveness of the distractors. Discuss your answer with


your coursemates on the myINSPIRE online forum.

Copyright © Open University Malaysia (OUM)


210  TOPIC 9 ITEM ANALYSIS

9.8 PRACTICAL APPROACH IN ITEM ANALYSIS


Some teachers may find the techniques discussed earlier as time consuming and
this fact cannot be denied especially when you have a test consisting of 40 items.
However, there is a more practical approach which may take less time. Imagine
that you have administered a 40-item test to a class of 30 students. It will surely
take a lot of time to analyse the effectiveness of each item and this may discourage
teachers from analysing each item in a test. Here is a method that shows you how
to do so:

Step 1
Arrange the 30 answer sheets from the highest score obtained to the lowest score
obtained.

Step 2
Select the answer sheet that obtained a middle score. Group all answer sheets
above this score as „high marks‰ (mark an „H‰ on these answer sheets). Group all
answer sheets below this score as „low marks‰ group (mark an „L‰ on these
answer sheets).

Step 3
Divide the class into two groups (high and low) and distribute the „high‰ answer
sheets to the high group and the low answer sheet to the low group. Assign one
student in each group to be the counter.

Step 4
The teacher then asks the class, „The answer for Question #1 is „C‰ and those who
got it correct, raise your hand.
Counter from „H‰ group: „Fourteen for group H‰
Counter from „L‰ group: „Eight from group L‰

Step 5
The teacher records the responses on the whiteboard as follows:

High Low Total of Correct Answers


Question #1 14 8 22
Question #2 12 6 18
Question #3 16 7 23
|
|
Question #n n n n

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  211

Step 6
Calculate the difficulty index for Question #1 as follows:

R H  R L 14  8
Difficulty index    0.73
30 30

Step 7
Compute the discrimination index for Question #1 as follows:

R H  R L 14  8 6
Discrimination index     0.40
1 30 15 15
2

Note that earlier, we took 27 per cent of answer sheets in the „high marks‰ group
and 27 per cent of answer sheets in the „low marks‰ group from the total answer
sheets. However, in this approach we divided the total answer sheets into two
groups. There is no middle group. The important thing is to use a large enough
fraction of the group to provide useful information. Selecting the top and bottom
27 per cent of the group is recommended for a more refined analysis. This method
may be less accurate but it is a „quick and dirty‰ method.

ACTIVITY 9.5

Compare the difficulty index and discrimination index obtained using


this rough method with the theoretical model by Stanley and Hopkins
(1972) in Figure 9.4. Are the indexes very far out?

Share your answer with your coursemates in the myINSPIRE online


forum.

Copyright © Open University Malaysia (OUM)


212  TOPIC 9 ITEM ANALYSIS

9.9 USEFULNESS OF ITEM ANALYSIS TO


TEACHERS
After each test or assessment, it is advisable to carry out item analysis of the test
items because the information from the analysis would be useful to teachers.
Among the benefits they can get from the analysis are as follows:

(a) From the discussion in the earlier subtopics, it is obvious that the results of
item analysis could provide answers to the following questions:

(i) Did the item function as intended?

(ii) Were the items of appropriate difficulty?

(iii) Were the items free from irrelevant clues and other defects?

(iv) Was each of the distracters effective (in multiple-choice questions)?

Answers to the previous questions can be used to select or revise test items
for future use. This would improve the quality of test items and the test paper
to be used in future. It also saves teachersÊ time in preparing the test items
for future use because good items can be stored in the item bank.

(b) Item analysis data can provide a basis for efficient class discussion of the test
results. Knowing how effectively each test item functions in measuring the
achievement of the intended learning outcome and how students perform in
each item, teachers can have a more fruitful discussion with the students as
feedback based on the item analysis that is more objective and informative.

For example, teachers can highlight the misinformation or misunderstanding


reflected in the choice of particular distracters on multiple-choice questions
or frequently repeated errors on essay-type questions, thereby enhancing the
instructional value of assessment. If, during the discussion, the item analysis
reveals that there are technical defects in the items or the marking scheme,
studentsÊ marks can also be rectified to ensure a fairer test.

(c) Item analysis data can be used for remedial work. The analysis will reveal
the specific areas that the students are weak in. Teachers can use the
information to focus remedial work directly on the particular areas of
weakness. For example, based on the distracter analysis, it is found that a
specific distracter has a low discrimination with a great number of students
from both the high marks and low marks groups choosing the option. This
could suggest that there is some misunderstanding of a particular concept.
Remedial lessons can thus be planned to arrest the problem.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  213

(d) Item analysis data can reveal weaknesses in teaching and provide useful
information to improve teaching. For example, despite the fact that an item
is properly constructed, it has a low difficulty index, suggesting that most
students fail to answer the item satisfactorily. This might suggest that the
students have not mastered a particular syllabus content that is being
assessed. This could be due to the weakness in instruction and thus
necessitates the implementation of more effective teaching strategies by the
teachers. Furthermore, if the item is repeatedly difficult for the items, there
might be a need to revise the curriculum.

(e) Item analysis procedures provide a basis for teachers to improve their skills
in test construction. As teachers analyse studentsÊ responses to items, they
become aware of the defects of the items and what causes them. When
revising the items, they gain experience in rewording the statements so that
they are clear, rewriting the distracters so that they are more plausible and
modifying the items so that they are at a more appropriate level of difficulty.
As a consequence, teachers improve their test construction skills.

9.10 CAUTION IN INTERPRETING ITEM


ANALYSIS RESULTS
Despite the usefulness of item analysis, the results from such an analysis are
limited in many ways and must be interpreted cautiously. The following are some
of the major precautions to observe:

(a) Item discriminating power does not indicate item validity. A high
discrimination index merely indicates that students from the high marks
group perform relatively better than the students from the low marks group.
The division of the high and low marks groups is based on the total test score
obtained by each student, which is an internal criterion. By using the internal
criterion of total test score, item analysis offers evidence concerning the
internal consistency of the test rather than its validity. The validity of a test
needs to be judged by an external criterion, that is, to what extent the test
assesses the learning outcomes intended.

(b) The discrimination index is not always an indicator of item quality. For
example, a low index of discriminating power does not necessarily indicate
a defective item. If an item does not discriminate but it has been found to be
free from ambiguity and other technical defects, the item should be retained,
especially in a criterion-referenced test. In such a test, a non-discriminating
item may suggest that all students have achieved the criterion set by the
teacher. As such, the item does not discriminate between the good and poor

Copyright © Open University Malaysia (OUM)


214  TOPIC 9 ITEM ANALYSIS

students. Another possible reason why low discrimination occurs for an item
is that the item may be very easy or very difficult. Sometimes, this item,
however, is necessary or desirable to be retained in order to measure a
representative sample of learning outcomes and course content. Moreover,
an achievement test is usually designed to measure several different types of
learning outcomes (knowledge, comprehension, application and so on). In
such a case, there will be learning outcomes that are assessed by fewer test
items and these items will have low discrimination because they have less
representation in the total test score. Removing these items from the test is
not advisable as it will affect the validity of the test.

(c) This traditional item analysis data is tentative. It is not fixed but influenced
by the type and number of students being tested and the instructional
procedures employed. The data would thus change with every
administration of the same test items. So, if repeated use of items is possible,
item analysis should be carried for each administration of each item. The
tentative nature of item analysis should therefore be taken seriously and the
results are interpreted cautiously.

9.11 ITEM BANK


What is an item bank?

An item bank is a large collection of easily accessible questions or items that


have been administered over a period of time.

For achievement tests which assess performance in a body of knowledge such as


Geography, History, Chemistry or Mathematics, the questions that can be asked
are rather limited. Hence, it is not surprising that previous questions are
„recycled‰ with some minor changes and administered to a different group of
students. Making good test items is not a simple task and can be time consuming
for teachers. Hence, an item or question bank would be of great assistance to
teachers.

An item bank consists of questions that have been analysed and stored because
they are good items. Each stored item will have information on its difficulty index
and discrimination index. Each item is stored according to what it measures,
especially in relation to the topics of the curriculum. These items will be stored in
the form of a table of specifications indicating the content being measured as well
as the cognitive levels measured. For example, you will be able to draw from the

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  215

item bank items measuring the application of concepts for the topic on
„electricity‰. You will also be able to draw items from the bank with different
difficulty levels. Perhaps, you want to arrange easier questions at the beginning of
the test so as to build confidence in students and then gradually introduce
questions of increasing difficulty.

With computerised databases, item banks are easy to access. Teachers will have at
their disposal hundreds of items from which they can draw upon when
developing classroom tests. This would certainly help them with the tedious and
time-consuming task of having to construct items or questions from scratch.

Unfortunately, not many educational institutions are equipped with such an item
bank. The more common practice is for teachers to select items or questions from
commercially prepared workbooks, past examination papers and sample items
from textbooks. These sources do not have information about the difficulty index
and discrimination index of items, nor information about the cognitive levels of
questions or what they aim to measure. Teachers will have to figure out for
themselves the characteristics of the items based on their experience in teaching
the content.

However, there are certain issues to consider in setting up a question bank. One of
the major concerns of the bank is how to place different test items collected
overtime on a common scale. The scale should indicate difficulty of the items, one
scale per subject matter. Retrieval of items from the bank is made easy when all
items are placed on the same scale.

The person in charge must also take every effort to add only quality items to the
item pool. To develop and maintain a good item bank requires a great deal of
preparation, planning, expertise and organisation. Though item response theory
(IRT) approach is not a panacea for the item banking problems, it can solve many
of these issues (IRT will be explained further in the next subtopic).

Copyright © Open University Malaysia (OUM)


216  TOPIC 9 ITEM ANALYSIS

9.12 PSYCHOMETRIC SOFTWARE


Software designed for general statistical analysis such as SPSS can often be used
for certain types of psychometric analysis. However, there are many software
available in the market specially to analyse the data from tests.

Classical test theory or CTT is an approach to psychometric analysis that has


weaker assumptions than item response theory and is more applicable to smaller
sample sizes. Under CTT, the studentÊs raw test score would be the sum of the
scores received on the item in the test. For example, Iteman is a commercial
software program while TAP is a free program for classical analysis.

Item response theory (IRT) is a psychometric approach which assumes that the
probability of a certain response is a direct function of an underlying trait or traits.
Under IRT, the concern is whether the student obtained each item correctly or not,
rather than the raw test score. The basic concept of IRT is about the individual item
of test rather than about the test scores. Student trait or ability and item
characteristics are referenced to the same scale. For example, ConQuest is a
computer program for item response and latent regression models and TAM is a
R package for item response models.

ACTIVITY 9.6

In the myINSPIRE forum, discuss:

(a) To what extent do Malaysian schools have item banks?

(b) Do you think teachers should have access to computerised item


banks? Justify.

 Item analysis is a process which examines the responses to individual test


items or questions in order to assess the quality of those items and the test as a
whole.

 Item analysis is conducted to obtain information about individual items or


questions in a test and how the test can be improved.

Copyright © Open University Malaysia (OUM)


TOPIC 9 ITEM ANALYSIS  217

 The difficulty index is a quantitative indicator with regard to the difficulty


level of an individual item or question.

 The discrimination index is a basic measure which shows the extent to which
a question discriminates or differentiates between students in the "high marks"
group and "low marks" group.

 Theoretically, the more difficult a question (or item) or easier the question (or
item) is, the lower will the discrimination index be.

 By calculating the proportion of students who chose each answer option,


teachers can identify which distractors are "working" and appear attractive to
students who do not know the correct answer, and which distractors are
simply taking up space and not being chosen by any student.

 Generally, a good distractor is able to attract more "low marks" students to


select that particular response or distract "high marks" students towards
selecting that particular response.

 An item bank is a collection of questions or items that have been administered


over a period of time.

 There are many psychometric software programs to help expedite the tedious
calculation process.

Computerised data bank Good distractor


Difficult question High marks group
Difficulty index Item analysis
Discrimination index Item bank
Distractor analysis Low marks group
Easy question

Copyright © Open University Malaysia (OUM)


218  TOPIC 9 ITEM ANALYSIS

Blood, D. F., & Budd, W. C. (1972). Educational measurement and evaluation.


Manhattan, NY: Harper and Row.

Nitko, A. J. (2004). Educational assessments of students. Englewood Cliffs, NJ:


Prentice Hall.

Stanley, G., & Hopkins, D. (1972). Introduction to educational measurement and


testing. Boston, MA: Macmillan.

Copyright © Open University Malaysia (OUM)

You might also like