Excellent.
Now I will go chunk by chunk, as you requested, quoting the full text in italics
and explaining clearly after each.
Chunk 1
percentage passing. For most testing purposes, the difficulty of an item is defined in terms of
the percentage of persons who answer it correctly. The easier the item, the larger will this
percentage be. A word that is correctly defined by 70 percent of the standardization sample
(p = .70) is regarded as easier than one that is correctly defined by only 15 percent (p = .15).
Explanation:
Item difficulty is measured by the proportion of test takers who answer correctly.
This proportion is called p-value.
A higher p-value means an easier item.
o For example:
p = 0.70 → 70% answered correctly → item is fairly easy.
p = 0.15 → 15% answered correctly → item is very hard.
Chunk 2
It is customary to arrange items in order of difficulty, so that examinees begin with relatively
easy items and proceed to items of increasing difficulty. This arrangement gives the
individual confidence in approaching the test and also reduces the likelihood of his wasting
much time on items beyond his ability to the neglect of easier items he can correctly
complete.
Explanation:
Many tests arrange questions from easy to hard.
This:
o Helps test-takers gain confidence.
o Prevents wasting time on very hard items early on.
o Allows them to answer items they can likely get right first.
Chunk 3
In the process of test construction, a major reason for measuring item difficulty is to choose
items of suitable difficulty level. Most standardized ability tests are designed to assess as
accurately as possible each individual's level of attainment in the particular ability.
Explanation:
Test developers measure item difficulty to select the right mix of items.
The goal is to measure ability accurately.
Too easy or too hard items do not help much in evaluating different people’s abilities.
Chunk 4
For this purpose, if no one passes an item, it is excess baggage in the test. The same is true of
items that everyone passes. Neither of these types of items provides any information about
individual differences. Since such items do not affect the variability of test scores, they
contribute nothing to the reliability or validity of the test.
Explanation:
Items that everyone fails (p = 0) or everyone passes (p = 1.0) are useless.
They don’t show any differences between test takers.
They do not improve reliability or validity because they don’t contribute to score
variability.
Chunk 5
The closer the difficulty of an item approaches 1.00 or 0, the less differential information
about examinees it contributes. Conversely, the closer the difficulty level approaches .50, the
more differentiations the item can make.
Explanation:
Best discrimination happens when p = 0.50.
At p = 0.50, about half pass and half fail.
This allows maximum separation between high and low ability test takers.
Chunk 6
Suppose out of 100 persons, 50 pass an item and 50 fail it (p = .50). This item enables us to
differentiate between each of those who passed it and each of those who failed it. We thus
have 50 X 50 or 2500 paired comparisons, or bits of differential information.
Explanation:
When p = 0.50, we get maximum combinations of comparisons.
Each of the 50 who passed can be compared to each of the 50 who failed:
o 50 × 50 = 2500 comparisons.
More comparisons = more information about differences.
Chunk 7
An item passed by 70 percent of the persons provides 70 X 30 or 2100 bits of information;
one passed by 90 percent provides 90 X 10 or 900; one passed by 100 percent provides 100
X 0 or 0.
Explanation:
Formula: p × (1 – p) × 10000 gives total possible comparisons.
Example calculations:
o p = 0.70 → 70 × 30 = 2100
o p = 0.90 → 90 × 10 = 900
o p = 1.00 → 100 × 0 = 0 (no information)
Maximum information occurs near p = 0.50.
Chunk 8
The same relationships would hold for harder items, passed by fewer than 50 percent.
Explanation:
Whether p = 0.40 or p = 0.60, both provide the same amount of information because:
o p × (1 – p) is symmetrical around 0.50.
o e.g. 40 × 60 = 2400
o e.g. 60 × 40 = 2400
Chunk 9
For maximum differentiation, then, it would seem that one should choose all items at the .50
difficulty level. The decision is complicated, however, by the fact that items within a test tend
to be intercorrelated.
Explanation:
In theory, all items at p = 0.50 give maximum differentiation.
But in practice, items are often correlated with each other (test homogeneity), so this
complicates matters.
Chunk 10
The more homogeneous the test, the higher will these intercorrelations be. In an extreme
case, if all items were perfectly intercorrelated and all were of .50 difficulty level, the same
50 persons out of 100 would pass each item. Consequently, half of the examinees would
obtain perfect scores and the other half zero scores.
Explanation:
If all items measure exactly the same thing (perfect correlation), people will score the
same on all items.
Half will get full marks, half will get none.
This leads to very poor variability, even though p = 0.50 for each item.
Chunk 11 (Final chunk)
Because of item intercorrelations, it is best to select items with a moderate spread of
difficulty level, but whose average difficulty is .50.
Explanation:
The best practice is to have a mix of item difficulties (some easy, some hard), but
overall, the average p-value should be around 0.50.
This allows both good differentiation and avoids perfect correlations between items.
✅ One-sentence summary of the entire passage:
Item difficulty is based on % passing; p = 0.50 gives maximum discrimination, but a mix of
difficulties creates a better, more reliable test.
If you want, I can also give you a small diagram or MCQ practice based on this for your
exam prep. Shall I?
Excellent — now let’s do a full proper explanation of the whole passage you’ve read.
I will keep it very clear, complete, and exam-friendly.
🔷 What is the main idea?
The whole passage explains:
What item difficulty means in test construction.
Why choosing the right item difficulty is important.
How item difficulty affects how well a test can differentiate between people’s
abilities.
🔷 What is item difficulty?
Item difficulty is simply:
The percentage of people who answer a test item correctly.
We express this as a proportion p:
o If 70% answer correctly → p = 0.70 → item is easy.
o If 15% answer correctly → p = 0.15 → item is hard.
🔷 Why do we care about item difficulty?
The goal of a test is not just to ask questions, but to:
o Measure differences between people’s abilities.
To do that, you need items that some people get right, and some get wrong.
Items that everyone gets right (p = 1.0) or everyone gets wrong (p = 0) are useless for
measuring differences — they do not add any information about individual
differences.
🔷 When does an item give maximum information?
When p = 0.50 (i.e., 50% pass, 50% fail).
Why?
o Because then you can make the maximum number of comparisons between
who passed and who failed.
o In a group of 100 people:
50 passed × 50 failed = 2500 possible comparisons.
o This gives the test the most power to distinguish high and low ability people.
If p = 0.70:
o 70 passed × 30 failed = 2100 comparisons → less than 2500.
If p = 0.90:
o 90 × 10 = 900 → much less information.
If p = 1.0:
o 100 × 0 = 0 → no information.
This rule holds for both easy and hard items (p above or below 0.50).
🔷 Should all test items be p = 0.50?
In theory: yes → maximum differentiation.
In practice: no → because of item intercorrelation.
🔷 What is item intercorrelation?
Items often measure similar skills (they are related).
If all items are very similar, the same people will pass or fail all of them.
For example, if 50 people pass the first item (p = 0.50), and all items are perfectly
related, those same 50 will pass every item.
→ Half the test-takers will score 100%, half will score 0%.
→ Again, no variability → bad test.
🔷 So what is the solution?
Use items of mixed difficulty levels (some easy, some moderate, some hard).
But keep the average item difficulty near p = 0.50.
This allows:
o Enough variation in item difficulty.
o Good differentiation between test-takers.
o Better reliability and validity.
🔷 Main takeaway (very exam-friendly):
Item difficulty is the proportion answering correctly (p).
p = 0.50 gives maximum discrimination.
But to avoid problems of intercorrelation, tests should have a spread of difficulties averaging
near 0.50.
If you want, I can also give you a small diagram, formula summary, or a few exam-style
MCQs to help you master this fully. Shall I?
Excellent — you’re doing very well because this is a slightly advanced concept of item
difficulty and scaling.
Now I will explain the whole thing properly, in one flow, keeping it clear and practical for
you:
🔷 What is being said overall?
The whole passage explains:
1️⃣First:
The usual item difficulty (percentage passing) is only ordinal.
2️⃣Second:
We can convert this into an interval scale using normal distribution, to get more precise
measurements.
3️⃣Third:
The Educational Testing Service uses a special A (Delta) scale to simplify this
conversion.
🔷 Let’s go step by step:
1️⃣ Item difficulty as percentage passing is an ordinal scale
Item difficulty is usually given as percentage passing (p).
Example:
o Item 1 → 30% pass (p = 0.30)
o Item 2 → 20% pass (p = 0.20)
o Item 3 → 10% pass (p = 0.10)
From this, we can say:
o Item 1 is easier than Item 2
o Item 2 is easier than Item 3
✅ This tells us only the order of difficulty.
❌ But it does not tell us how much more difficult one item is compared to another.
2️⃣ Ordinal scale problem: unequal intervals
The difference between 30% and 20% (10% difference) may not mean the same thing
as the difference between 20% and 10% (also 10%).
This is because:
o The percentages do not have equal units across the scale.
o Just like percentiles (earlier chapter), percentage passing is compressed at the
extremes.
3️⃣ How to get an interval scale?
To solve this, we assume that:
o The trait being measured (like ability) follows a normal distribution.
Then:
o We convert the percentage passing (p) into standard deviation units (σ-units or
z-scores).
Examples:
If 84% pass → means the item lies 1σ below the mean (because 50% + 34% = 84%).
If 16% pass → means the item lies 1σ above the mean (since 50% – 34% = 16%).
If 50% pass → item lies exactly at the mean → 0σ.
✅ Now, this σ-unit scale is interval — equal units everywhere.