Journal of Vision (2010) 10(10):5, 1–13 http://www.journalofvision.org/content/10/10/5 1
The precision of binocular and monocular depth
judgments in natural settings
Smith-Kettlewell Eye Research Institute,
Suzanne P. McKee San Francisco, CA, USA
Smith-Kettlewell Eye Research Institute,
Douglas G. Taylor San Francisco, CA, USA
We measured binocular and monocular depth thresholds for objects presented in a real environment. Observers judged the
depth separating a pair of metal rods presented either in relative isolation, or surrounded by other objects, including a
textured surface. In the isolated setting, binocular thresholds were greatly superior to the monocular thresholds by as much
as a factor of 18. The presence of adjacent objects and textures improved the monocular thresholds somewhat, but the
superiority of binocular viewing remained substantial (roughly a factor of 10). To determine whether motion parallax would
improve monocular sensitivity for the textured setting, we asked observers to move their heads laterally, so that the viewing
eye was displaced by 8–10 cm; this motion produced little improvement in the monocular thresholds. We also compared
disparity thresholds measured with the real rods to thresholds measured with virtual images in a standard mirror
stereoscope. Surprisingly, for the two naive observers, the stereoscope thresholds were far worse than the thresholds for
the real rodsVa finding that indicates that stereoscope measurements for unpracticed observers should be treated with
caution. With practice, the stereoscope thresholds for one observer improved to almost the precision of the thresholds for
the real rods.
Keywords: binocular vision, stereopsis, monocular depth cues
Citation: McKee, S. P., & Taylor, D. G. (2010). The precision of binocular and monocular depth judgments in natural settings.
Journal of Vision, 10(10):5, 1–13, http://www.journalofvision.org/content/10/10/5, doi:10.1167/10.10.5.
As is well known, error comes in two varieties:
Introduction systematic errors (bias, assessed by accuracy measure-
ments, such as the PSE) and random errors (reliability,
Half a century ago, Gibson (1950) drew attention to the assessed by precision measurements, such as thresholds).
rich array of monocular depth information in the natural Most of the studies comparing monocular and binocular
world. He felt that stereopsis, as a cue to depth, was judgments have focused on accuracyVon how close the
overrated, noting that the apparent depth of a natural scene shape judgment was to the actual physical shapeVrather
changes little when one closes one eye. The depth than on precision. In principle, humans should be able to
portrayed in two-dimensional media such as movies and compensate for systematic depth errors, particularly in
computer graphics is compelling, providing further evi- performing well-practiced movements in familiar environ-
dence of the strength of monocular depth cues. What ments. A major league center fielder must be able to throw
exactly does stereopsis add to our perception of depth in a ball accurately to second base from anywhere in the
the natural world? outfield, no matter what his perceived distance. Exper-
In the last two decades, several studies have used real imental evidence for these compensatory effects comes
objects presented in natural surroundings to examine the from a study by Loomis, Da Silva, Fujita, and
human ability to judge three-dimensional shapes. Almost Fukusima (1996). They asked observers to match a depth
all have found that binocular shape estimates are more interval (z-axis) to a lateral extent (x-axis) and found that
nearly veridical than monocular estimates (Allison, the depth intervals were generally underestimated. How-
Gillam, & Vecellio, 2009; Buckley & Frisby, 1993; ever, when they asked their observers to walk blindfolded
Durgin, Profitt, Olson, & Reinke, 1995; Frisby, Buckley across the same interval, their motor performance showed
& Duke, 1996; Loomis, Philbeck, & Zahorik, 2002). no evident bias.
Three-dimensional shape judgments require an estimate of Why does precision matter, if we already know that
the object’s extent along the z-axisVthe depth interval. It monocular depth estimates are inaccurate? It is difficult to
is likely that binocular judgments are better than monoc- correct for random depth errors, since these errors
ular judgments because stereopsis provides less erroneous generally arise from inherent physiological noise, whereas
information about depth intervals. we can and do correct for systematic errors. Precision is a
doi: 1 0. 11 67 / 1 0 . 1 0. 5 Received February 16, 2010; published August 6, 2010 ISSN 1534-7362 * ARVO
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 2
measure of uncertainty, which affects how rapidly we can intervals. Do all these cues, consistently presented, lead to
carry out actions. If our estimate of the distance separating monocular depth judgments comparable in precision to
two objects is 50 cm T 1 cm, we can move rapidly binocular judgments of depth?
between them without damaging our bodies. However, if To answer this question, we will make systematic
our estimate is 50 cm T 10 cm, then we have to move measurements of depth interval thresholds as a function
slowly and update our information continuously. Precision the z-axis distance between the test rods. Our results show
(reliability) is also thought to determine the weights that, even in a highly enriched natural environment,
attached to various cues to depth. In contemporary binocular depth estimates are far more precise than mono-
Bayesian models of cue combination, cues from different cular estimates over a substantial range of depth intervals.
modules (disparity, texture, motion parallax, etc.) are
separately weighted according to their reliability (Landy,
Maloney, Johnston, & Young, 1995), and then combined
optimally. Many studies have shown that human observers Methods
combine cues in a way that is consistent with this optimal
model (Ernst & Banks, 2002; Hillis, Watt, Landy, &
Banks, 2004; Knill & Saunders, 2003; Svarverud, Gilson, Procedure and equipment
& Glennerster, 2010).
Only a few studies have compared monocular and We compared binocular and monocular depth sensitiv-
binocular precisions for judging depth in real settings. ities for objects presented on a well-illuminated laboratory
Frisby et al. (1996) measured Weber fractions for judging table. Observers judged the relative depth separating two
the length of real twigs presented in random orientations. metal rodsVa depth interval judgment. One of the rods
In some of their experiments, the monocular and binocular remained in a fixed position, while the other test rod was
Weber fractions were similar, while in other experiments presented in one of four positions chosen at random from
the monocular Weber fractions were about twice the trial to trial. The base size of the depth interval was varied
binocular ones. Because of the random orientations of the parametrically from 0 to 8 cm in separate blocks of trials.
twigs, the judgments were based on x-, y-, and z-axis When the depth interval was zero, the observer judged
components of length, rather than on depth intervals per se. whether the test rod was in front or behind the other fixed
More recently, Allison et al. (2009) compared binocular rod, i.e., a standard stereoacuity task. When the depth
and monocular judgments of the depth interval separating interval separating the two rods was non-zero, e.g., 4 cm,
a metal rod from an adjacent panel. Binocular precision, the observer judged the relative size of the incremental
estimated from the dispersion statistics, was as much as a changes in the interval, e.g., whether the depth separating
factor of 40 better than monocular precision. the rods was smaller or larger than 4 cm.
Although Allison et al. (2009) and Frisby et al. (1996) In Signal Detection terminology, our procedure was a
presented objects in real settings, the immediate surround- “Yes–No” task. Observers categorized the test position
ings of their test objects were fairly austere, e.g., a with one of two labels: “front” or “back” for the zero
covered empty space, which minimized the monocular pedestal condition; “large” or “small” for non-zero
cues to depth. In particular, Allison et al. (2009) went to pedestals. They were given 5 practice trials at the
some trouble to minimize monocular cues; their observers beginning of each block to establish the test range and
viewed the test stimuli through an aperture that obscured category boundary. Our previous work has shown that this
the immediate surroundings, as well as the ends of the test number of practice trials is sufficient for observers to
rod and reference panel, and their lighting was uniform to estimate the mean and range of a test set consisting of four
obscure shadows. In a full cue, unrestricted setting, test intervals (Morgan, Watamaniuk, & McKee, 2000).
monocular cues might provide more precise depth Feedback was given in the form of a beep if the observer
information than this study suggests. judged an interval incorrectly. Observers were also given
In the present study, we will measure depth interval some practice with the task during preliminary blocks
thresholds for a pair of real rods, viewed monocularly or (50–100 trials) taken to establish the appropriate threshold
binocularly, in two different well-lighted indoor settings. range.
First, we will present the rods in a relatively austere We plotted the percentage of trials that the observer
setting that contains many monocular cues (shape from labeled the position of the test rod “front” for the zero
shading, shadows, changes in lateral separation and interval measurement or “large” for the non-zero intervals,
angular subtense, etc.). Then, we shall enrich the setting and fitted a psychometric function to the data using probit
with facsimiles of the usual clutter that surrounds objects analysis. We estimated thresholds from blocks of fifty
in most indoor scenes, including an adjacent textured trials; the threshold criterion was dV= 0.67. Each plotted
surface and occluded itemsVadditions that incorporate data point is based on a minimum of 4 blocks of 50 trials
potent monocular depth cues. Ultimately, we will also each (200 trials total). The error bar on each point shows
introduce motion parallax, so that the whole array of the standard error of the mean of the thresholds estimated
normal monocular cues is available for judging depth from each of the 50 trial blocks.
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 3
We mounted the test rod on a micro-stage (National
Aperture Model MM-4M-EX140). Its physical position
could be varied in increments of less than a millimeter
with a reliability of 0.001 mm, permitting measurements
of fine stereoacuity. Approximately 6.5 cm of the test rod
was visible above the black baseboard that concealed the
micro-stage. The fixed reference rod was identical in
width (0.4 cm) to the test rod and was mounted below the
test rod so that it was 1.8 cm shorter than the test rod, when
viewed from the observer’s perspective (see Figure 2). The
rods were separated laterally by 2.5 cm (1.3 deg). For
most experiments, the observers viewed the two rods
through computer-controlled shutter goggles that covered
the eyes during the intertrial interval when the micro-stage
was shifting position. For the monocular measurements,
only the right eye’s shutter was opened during a trial.
Black felt drapes, located 0.7 m behind the metal rods,
were the background for all objects on the table. Overhead Figure 2. Austere setting for metal test rods from approximate
fluorescent lights provided strong continuous illumination viewpoint of observer. The variable test rod is on the left. Long
of the room, adjacent laboratory furniture, and the test duration photograph taken without flash to show lighting and
objects (see Figure 1). The lighting fixture immediately shadows.
over the rods contained only one bulb unlike the other
three fixtures, so it produced gentle shadows. The viewing
distance for all measurements with the metal test rods was
other objects in the room were visible in the periphery,
1.12 m.
since we made no attempt to conceal them. In the first
experiment using this setting, the shutters were open for
1 s. In all subsequent experiments, the shutters were open
Austere setting until the observer made a judgment. We compared
thresholds for a 1-s duration with thresholds for an
We used two different environments for our measure- unlimited duration and found that they were not signifi-
ments of depth sensitivity in natural settings. In the cantly different.
“Austere setting,” only the rods, the black baseboard,
and the black felt curtains were centrally visible though
the shutter goggles (see Figure 2), although many of the
Enriched setting
For the “enriched setting,” we added some additional
objects surrounding the metal rods. Wrapping paper was
attached to flat panels mounted adjacent to the test rods,
and the panels were tilted slightly so that the observer
could easily see the regular texture beside the rods.
Grocery items were placed 0.4–0.6 m behind the rods to
provide occlusion and relative depth cues (see Figures 1
and 3).
We asked observers to judge whether shadows and
shape from shading were apparent when viewing the
enriched setting from the position of the shutters. They all
noted that the rods looked unevenly lit and rounded, with
reflected coloring from adjacent surfaces (the black
baseboard or the textured panels). They also noted
highlights on the glass jars and faint shadows from the
white rods onto the textured surfaces, as well as shadows
on the other grocery objects. When looking through the
Figure 1. Experimental setting. The micro-stage lies between the shutters, they said they were able to see much of the rear
two tilted panels in this picture of the enriched setting. Photograph of the room, including some furniture and cabinet doors,
taken without flash using an SLR camera mounted on a tripod, so as well as the computer mouse on the table in front of
that lighting and shadows would be visible. them.
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 4
density was 189 points per square degree. The pattern
consisted of a central circular test region (1.3 deg in
diameter), surrounded by a square annulus, 2.3 deg on a
side. Observers judged the depth interval that separated
the central disk from the square annulus, using the same
procedure described above for the metal rods. Observers
were given 3 days of practice on the random dots; each
data point shown in the training graph is based on 100–
200 trials.
The displays were programmed on a Macintosh com-
puter and presented on two Sony Trinitron monitors,
Model 110GS. We presented the two half-images in the
central 3 deg of the monitors where screen curvature was
minimal. We ran the two monitors at 75 Hz, using a 1024
768 resolution level. Each pixel subtended 0.71 arcmin at
the 1.22-m viewing distance. We used dithering to produce
sub-pixel shifts in disparity. A Pritchard photometer was
used to measure luminance as a function of monitor gray
Figure 3. Enriched setting from observer’s viewpoint. Variable test levels, and these values in turn were used in the dithering
rod on left. Long duration photograph taken without flash. The calculations.
varying photographic durations alter colors somewhat from picture
to picture.
Observers
In a separate control experiment on monocular motion
parallax, we removed the goggles and mounted a large Both authors were observers for this study. The other
shutter in front of the rods themselves, which concealed four observers were naive volunteers who had not
the micro-stage movements during the intertrial interval. previously participated in any psychophysical study. All
For this experiment only, the observer wore a patch over observers had normal or corrected-to-normal visual
the left eye. Without the goggles in place, observers could acuity.
move their heads over a larger distance, improving motion Disparity thresholds for real objects, when expressed in
parallax information. We instructed each observer to arcmin, depend on the observer’s interpupillary distance.
move her head laterally from one side of the head holder In Figures 4 and 6, we have expressed thresholds in
to the other (18 cm); our measurements showed that this centimeters but have added second axes to show disparity
lateral head movement translated the center of the right thresholds in arcmin. We did not correct these second axes
eye 8–10 cm. for interpupillary distances because the proliferation of
scales would have been confusing. For these three figures,
we assumed an interpupillary distance of 6 cm (the
Stereoscope measurements average interpupillary distance of our observers) to
convert centimeters into arcmin. In Figures 8 and 9, we
We also measured disparity thresholds in a mirror corrected disparity thresholds for the interpupillary dis-
stereoscope, composed of two pairs of mirrors arranged so tance of each observer. Interpupillary distances of our
that each eye could see only the image on one computer observers in centimeters were: S1 = 7.15; S2 = 5.75; S3 =
screen. Observers viewed the screens from a head holder 6; S4 = 5.8; S5 = 5.75; S6 = 6.
that minimized head movements. Before any set of the
threshold measurements was made, the observer adjusted
mirrors closest to her eyes so that the Nonius lines were
aligned with a minimum effort. The viewing distance was Results
1.22 m for the stereoscope measurements.
The stereoscopic display consisted of two dark lines that
matched the dimensions (height, width, and separation in Austere setting
arcmin) of the two metal rods. The luminance of the lines
was È1 cd/m2 and that of the background was 30 cd/m2. The binocular and monocular thresholds for the austere
Stimulus duration was unlimited for all measurements in setting are shown in Figure 4 for depth intervals ranging
the stereoscope; trials were terminated when the observer from 0 to 8 cm, or equivalently for 0–12 arcmin of
gave a response. disparity. The monocular thresholds are all far higher than
For the training sets in the stereoscope, we used a sparse the binocular thresholds. At the smallest intervals, the
random dot pattern composed of bright (50 cd/m2) points; monocular thresholds are more than a log unit worse than
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 5
Figure 4. Binocular and monocular depth interval thresholds for four subjects for the metal rods shown in Figure 2. Left and lower axes in
centimeters, right and upper axes in arcmin of disparity. Viewing distance 1.12 m; duration = 1 s. Straight lines were fitted by eye.
the binocular thresholds, confirming the results of Allison between them is changed (left side of figure), or (2) the
et al. (2009). difference in angular subtense of the variable rod with
The binocular thresholds rise proportionately with the depth (right side of figure). Recall that the rods have the
magnitude of the depth interval. The average Weber same physical width, so the difference in angular subtense
fraction for the four observers was 0.056, which is is useful in this context, although it would not generally
consistent with similar measurements, made using stereo- be useful for judging the depth of random objects. The
scope displays presented for 1 s (McKee, Levi, & Bowne, observers could have used motion parallax, but the view-
1990). Durgin et al. (1995) estimated Weber fractions for ing aperture of the goggles was 2.5 cm and the stimulus
shape disparities greater than 3 arcmin as roughly 5–15%, duration was 1 s, making the motion cues weak. In a
again consistent with our measurements. subsequent experiment, we examined the effect of motion
In contrast, the monocular thresholds show a very parallax under more optimal circumstances.
shallow dependence on depth interval measured over our A change of 1.7 cm (2.9 arcmin) along the z-axis
test range of 0 to 8 cm. The average threshold for the zero produces a change in the projected x-axis separation of
interval, i.e., when the observers judged which rod was in .02 deg (1.2 arcmin) or about 1.6% of the initial separation
front of the other, equaled 1.7 cm or 2.9 arcmin. In Figure 5, between the rods. The change of the angular subtense of
we have drawn a schematic of the two metal rods the variable test rod, produced by a 1.7-cm depth incre-
corresponding to this monocular threshold. Since z-axis ment, is roughly 0.2 arcmin or about 1.6% of the angular
information is not accessible monocularly, observers must width of the rods. These estimated changes in lateral
rely on one of two sources of x-axis information: (1) the separation or angular subtense are close to the best
projected lateral separation between the rods as the depth thresholds for separation or width in the literature, which
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 6
able to read off the position of the test rod monocularly in
“pattern” units. We also added grocery objects behind the
two rods to provide familiar size and occlusion cues; as
the test rod changed position in depth, different parts of the
words on the package would be occluded. Observers were
given unlimited time to make these judgments; trials were
terminated when they pressed one of two mouse buttons.
We have plotted the binocular and monocular thresh-
olds for the enriched setting in Figure 6. The monocular
thresholds (red circles) and the binocular thresholds (open
blue squares) may look remarkably similar to those in
Figure 4, but, in fact, the difference between them has
shrunk. This result will be particularly evident if you
compare the two bottom graphs of Figure 4 with the two
top graphs of Figure 6 (same subjects). Their monocular
thresholds for the enriched setting are approximately half
Figure 5. Schematic of the monocular information available at those for the austere setting. Thus, the enriched environ-
threshold for a depth interval of zero. ment has improved the monocular thresholds substan-
tially, but they are still about 10 times worse than the
range from 1% to 3% (Burbeck, 1987; McKee, Welch, comparable binocular thresholds.
Taylor, & Bowne, 1990; Yap, Levi, & Klein 1987). The Given the additional reference targets in the enriched
monocular depth threshold is limited by the same source setting, we anticipated that the binocular thresholds would
of internal noise that limits estimates of lateral extent in also improve. We have plotted binocular measurements
other contexts. In short, the monocular threshold for zero for both experimental settings in Figure 6 (the open and
interval is as good as it can be, based on the available closed blue squares). For three of the four observers, the
spatial information. enriched setting had no systematic effect on binocular
Assume that the observer relies only on the projected thresholds. However, observer S6 clearly took advantage
x-axis separation between the rods (left side of Figure 5) of the additional information. Her thresholds show almost
and that the detectable change in that projected separation no dependence on the depth interval between the metal
is always equal to the Weber fraction for width, namely rods, because the marks on the red patterned paper and the
È1.5%. From simple geometry, the z-axis change required white plastic rods holding the adjacent surfaces are much
to produce this change is also roughly 1.5% of the viewing better disparity references than the reference rod itself.
distance to the rods. Since this distance does not change She has chosen an optimal strategy for estimating the
much at the largest depth interval (112-cm viewing depth interval from disparity. Her performance highlights
distance + 8-cm interval = 120 cm), the monocular an important fact about natural scenesVthe rich array
thresholds should be roughly constant, which they are. of information benefits both monocular and binocular
Actually, the thresholds for the larger depth interval (8 cm) judgments of depth.
are on average a little less precise than the zero interval
thresholds, meaning that they should increase less than
they do. Nevertheless, the Weber fraction (È2.2%) for the Motion parallax
larger interval still falls well within traditional estimates
of sensitivity for lateral separation judgments. The 2.5-cm aperture of the electronic shutter goggles
made it difficult for observers to use monocular motion
parallax to judge the size of the depth interval separating
Enriched setting the metal rods. To remedy this situation, we removed the
goggles and the chin rest from the head holder and
The monocular information in natural scenes is generally mounted a large shutter immediately in front the rods.
far richer than that in the austere setting shown in Figure 2. This shutter concealed the movements of the variable
In particular, objects often sit on textured surfaces and are test rod between trials; otherwise, it remained below the
surrounded by clutter. To simulate a more typical black baseboard during a block of trials. While viewing
environment, we added patterned paper mounted on the rods in the enriched setting, observers were asked to
wooden surfaces positioned adjacent to the metal rods move their heads laterally from one side of the head
(see Figures 1 and 3); the papered surface was as close holder to the other repeatedly before making a judgmentV
(0.75 deg) as possible while still permitting free move- a distance of 18 cm. This head movement translated the
ment of the micro-stage. Since the test rod was near the viewing eye over a distance of 8–10 cm, a distance
edge of the paper, we thought that the observers might be somewhat larger than the interpupillary separation.
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 7
Figure 6. Monocular depth interval thresholds (open circles) for four subjects for the enriched setting (shown in Figure 3); binocular depth
interval thresholds for enriched setting (open boxes) and austere setting (filled boxes). Left and lower axes in centimeters, right and upper
axes in arcmin of disparity. Viewing distance = 1.12 m. Straight lines were fitted by eye.
As shown in Figure 7, motion parallax improved more precise than monocular judgments, no matter what
monocular thresholds significantly for only one (S5) of cues are available. It appears that, if all monocular cues
our three observers. Even for this observer, the monocular were removed, observers could base their judgments of
thresholds with motion parallax were significantly worse depth interval, and implicitly three-dimensional shape,
than the binocular thresholds. Several previous studies solely on disparity information. To put this idea to the test,
have found that monocular judgments of depth or shape we measured thresholds for computer-generated targets
based on motion parallax are not nearly as accurate or presented in a mirror stereoscope.
precise as binocular judgments, in agreement with our The test and reference “bars” in the stereoscope were
findings (Durgin et al., 1995; Frisby et al., 1996; LeClair two dark lines that matched the vertical and horizontal
& Durgin, 2008; Wheeler, 1982). dimensions of the metal rods used in the austere setting. In
Figure 8, we have plotted thresholds for the real and
stereoscope displays. Surprisingly, three of the four
Stereoscope measurements observers were far less sensitive to disparity increments
measured in the stereoscope than in the real setting.
The overwhelming message from the data taken in our Indeed, one observer could not see any depth difference
natural settings is simpleVbinocular judgments are far between the dark lines, despite further experimental
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 8
Figure 7. Monocular thresholds in centimeters for viewing enriched setting through a 2.5-cm aperture (red solid bars) or viewing enriched
setting while moving head 18 cm laterally (hatched red bar); binocular thresholds (blue bars) for viewing enriched setting through 2.5-cm
aperture. Viewing distance = 1.12 m; unlimited viewing time.
manipulations of step size, lateral separation, and the enriched setting. The studies, cited above, showing that
addition of perspective changes, i.e., changes in length hundreds of trials were needed to produce fine stereo-
and width consistent with bar depth. Only the senior acuity were made with stereoscopic displays rather than
author (S5) has identical thresholds for the two settings. with real objects. We suggest that in the studies showing
She has had perhaps half a million trials for various significant improvement with practice observers were not
stereoscopic displays, so her performance may simply learning to detect disparity per se but rather learning to
reflect extensive practice. The second author (S3) has also detect disparity in a stereoscope.
had practice with stereoscopes but has always experienced What is it about a stereoscope that interferes with
difficulties with large depth intervals. For him, the interval disparity sensitivity? Watt, Akeley, Ernst, and Banks
sometimes flattens midway through a block of trials. The (2005) found that the absence of correct focus cues affects
two naive observers had had no previous experience with perceived depth for surfaces presented in a stereoscope.
stereoscope displays. The focus cues in our stereoscope would indicate that both
It is well known that stereoacuity improves with the test and reference bars were in the same plane, in
practice (Fendick & Westheimer, 1983; Gantz, Patel, conflict with the disparity information. However, our
Chung, & Harwerth, 2007; O’Toole & Kersten, 1992; viewing distance was 1.2 m and the maximum simulated
Sowden, Davies, Rose, & Kaye, 1996). In fact, stereo- depth interval was 8 cm, which should be well within the
acuity takes many more trials to reach asymptotic values human depth of field of 0.33 diopters (Charman &
than other spatial judgments (Kumar & Glaser, 1993). The Whitefoot, 1977).
puzzle here is that our naive observers needed almost no There are other possible sources of cue conflict. Motion
practice (at most 50 trials) for the stereoacuities measured parallax from small head movements would indicate that
with the real metal rods; their thresholds ranged from 6 to both test and reference bars were in the same plane,
14 arcsec in the austere setting, and 4–8 arcsec in the contrary to the bar disparity. In addition, the two bars
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 9
Figure 8. Disparity thresholds for depth interval judgments made with two real metal rods (blue squares) and for computer-generated bars
presented in a mirror stereoscope (green triangles).
define an implicit slanted plane, and some observers have Another issue is that real objects, like our metal rods,
great difficulty detecting rotation about the vertical axis for have “solidity”; their front and back surfaces have differ-
simulated surfaces (Gulick & Lawson, 1976; Mitchison & ent disparities. The stereoscope “bars” have no solidity
McKee, 1990). Based on their calculations, Backus, but appear as infinitely thin films, floating in space.
Banks, Van Ee, and Crowell (1999) and Gårding, Porrill, Moreover, one of these ghostly bars is apparently floating
Mayhew, and Frisby (1995) showed that eye position in front of the monitor screens. Naive observers might find
(vergence and version) could introduce horizontal dispar- that the stereoscope bars are contrary to their expectations
ities into frontoparallel surfaces. This ambiguity about about objects in depth. Not all natural objects have
what is causing the slant disparity, i.e., eye position or solidityVspecks of dust hovering in midair do not
stimulus slant, may make it impossible to see depth produce detectable differences in disparity between front
between the simulated bars. Note, however, that the and back surfaces.
information supporting the percept of a slanted surface is We thought that if we modified the stereoscope display
very weak; the frontoparallel black bars were 12 arcmin to minimize all these potential conflicts, our observers
wide and separated by a bright gray expanse of 1.3 deg. A would be able to respond more easily to the test
single pair of lines or points is not sufficient to induce disparities. To simulate “dust clouds,” we used sparse
ambiguity for most observers (McKee, 1983; Fahle & random dot displays, reasoning that small bright points
Westheimer, 1988); stereoacuity thresholds for two lines should produce less cue conflicts than the virtual bars,
are typically less than 10 arcsec. because random dots minimize texture and/or perspective
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 10
Figure 9. (A) Left graphs show three practice sessions for disparity increment thresholds measured with random dot displays presented in
the stereoscope. The blue boxes on the ordinate show the stereoacuity thresholds for the metal rods in the austere setting. (B) The right
graph shows transfer of training from the random dot displays to the “bar” targets presented in the stereoscope for one observer; the other
observer was unable to judge depth between the bars even after training with random dot displays.
cues (Zabulis & Backus, 2004). The random dot display between the virtual bar targets, had less difficulty seeing
consisted of a circular central region that changed depth differences in these sparse random dot displays,
incrementally from trial to trial, surrounded by a fronto- although her thresholds for large intervals were initially
parallel annular region that served as a reference surface. quite poor (purple squares in Figure 9A).
This arrangement made it impossible to interpret the Both observers showed improved sensitivity; the blue
display as a slanted surface. To teach our observers to boxes on the ordinate show their stereoacuity thresholds for
detect fine disparities in the stereoscope, we trained them the real metal rods in the austere setting. For observer S6,
with these random dot displays. the stereoacuity threshold for the random dot display is, in
fact, slightly better than her threshold for the metal rods.
Stereoacuity for the other observer (S4) was still about a
Learning to discriminate disparity factor of two worse than her rod threshold.
in the stereoscope Would training on the random dots transfer to the
virtual bars? We repeated our increment threshold
Each naive observer was given 3 days of practice measurements with the bright bars in the stereoscope.
(1500–2000 trials) with the random dot displays. They Once again, observer S4 was unable to see any depth
made incremental judgments for depth intervals covering difference between the bars. Apparently, despite her
the same range as the bar targets. As shown by the graphs capacity to respond to the disparities of the random dots,
on the left of Figure 9, they were able to respond to the the various cue conflicts continued to interfere with her
incremental changes in the central test region. Observer S4, ability to respond to the disparity of the bars. On the other
who had previously been unable to see any depth difference hand, observer S6 showed nearly perfect transfer of
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 11
training; her thresholds for the virtual bars after training about 1–2%, much better than the Weber fraction for
are close to her thresholds for the real rods (see open disparity, which is 5–6%. If the monocular noise is so low,
triangles and filled squares in Figure 9B). then why are the monocular thresholds so bad? Keep in
mind that we are not measuring monocular thresholds for
incremental changes along the x-axis, i.e., width. Instead,
we are measuring the ability to discriminate changes along
Discussion the z-axis from the information in the monocular image.
Changes along the z-axis necessarily produce angular
changes in x-axis dimensionsVin the projected lateral
In natural settings, monocular information about depth is separation between the rods or in the angular subtense of
very imprecise. Our results show that, even for objects in the rodsVbut as noted above, these changes are remark-
rich local surroundings, monocular depth thresholds are as ably small. In short, monocular thresholds for real objects
much as a log unit higher than binocular depth thresholds. are largely limited by the lack of physical information,
We argue that, for static viewing, this imprecision follows rather than by internal noise.
from the viewing geometry; monocular information about
relative depth along the z-axis depends on the projected
distance separating features along the x-axis. To produce a Depth in the stereoscope
detectable change in depth monocularly, the associated
change in the x-axis projection has to reach threshold In a prescient comment, Buckley and Frisby (1993)
levels (see Figure 5). Thresholds for lateral separation are warned against drawing conclusions about depth cue
1–2%. To produce a 1–2% increment in the x-axis combination from computer-generated displays presented
projection, the change in the viewing distance to the test in a stereoscope (see also Frisby, Buckley, & Horsman,
object has to be roughly 1–2%. Our monocular depth 1995). In the current study, we were interested only in
thresholds in the austere setting correspond to the 1–2% disparity sensitivity for a pair of computer-generated lines,
change in the z-axis distance needed to produce the 1–2% not perceived surface slant or cue combination. Never-
change in the x-axis projection. theless, three of the four observers showed higher thresh-
In our enriched setting, the textured paper provided olds for the stereoscope bars than for the real metal rods.
many marks that served as additional reference points. This finding is only surprising in the context of our main
Thresholds in the enriched setting were therefore some- results. For the real rods, the stereoacuity thresholds for all
what lower than thresholds for the austere setting. Would three observers were less than 10 arcsec, without
monocular thresholds be even lower if the rods were significant practice. Apparently, the absence of monocular
superimposed directly on the textured surface, or better cues consistent with the disparity of the virtual bars
yet, superimposed on a ruler with demarcations specifying interferes with fine stereoacuity. Yet our main results also
numbered units? If an observer were estimating the show that the monocular cues provide unreliable informa-
position of a test rod positioned on a ruler lying on the tion about depth. An ideal observer would ignore the
z-axis, then determining where the rod fell on the ruler, monocular information in favor of the far more precise
e.g., where exactly the rod was sitting between the 5- and disparity information.
6-cm marks, would still be imprecise for the same Real observers, however, are affected by cue conflict.
geometrical reasons described above. Of course, the Girshick and Banks (2009) found that disparity thresholds
optimum strategy for the monocular observer is simple. increased when there was a large conflict between the depth
Walk to one side of the display, so that the z-axis is specified by texture and depth specified by disparity. This
directly converted into an x-axis. Then, reading the explanation works for our larger depth intervals, because
position from a ruler is limited by the exquisite human the conflict between the standing disparity (12 arcmin) that
sensitivity for lateral separation. In fact, in any natural defines the interval and the monocular cues (depth = zero)
setting, the optimum strategy for utilizing monocular cues are in significant conflict. One would predict a pattern of an
is to view the depth relationships off the line of sight, so increasing discrepancy between thresholds for the real rods
that the z-axis is converted into an x-axis judgment. This and those for the stereoscope as the standing disparity
strategy obviously will not work if the objects are very far increases, thereby increasing the cue conflict between the
away; it also takes time. Fine stereopsis provides a rapid, disparity and the monocular cues. This pattern is actually
precise assessment of depth differences along the line of observed for the second author (observer S3 in Figure 8). It
sight without any need to change position. is hard to see how this explanation works for stereoacuity.
In the Introduction section, we noted that thresholds are When the standing disparity is zero and the threshold
usually limited by internal sources of noise. From the poor increments are small, the conflict is trivial. Nevertheless,
thresholds, one might guess that all monocular processing our two naive observers had poor stereoacuity for the
is inherently noisier than binocular processing. This virtual bars and, initially, for stereoacuity measured with
conclusion is incorrect. The Weber fraction for width is the random dot display.
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 12
From our extensive experience measuring the properties
of stereopsis in a stereoscope, we can assert that there are
References
few observers like S4, who are unable to see any depth
conveyed by virtual bar targets. Most naive observers do Allison, R. S., Gillam, B. J., & Vecellio, E. (2009).
need practice in the stereoscope to produce fine stereo- Binocular depth discrimination and estimation
acuity thresholds. Our results here suggest that, for real beyond interaction space. Journal of Vision, 9(1):10,
objects, disparity judgments require no more practice than 1–14, http://www.journalofvision.org/content/9/1/10,
judgments about other dimensions. doi:10.1167/9.1.10. [PubMed] [Article]
Backus, B. T., Banks, M. S., Van Ee, R., & Crowell, J. A.
(1999). Horizontal and vertical disparity, eye posi-
Practical consequences tion, and stereoscopic slant perception. Vision
Research, 39, 1143–1170. [PubMed]
Three to five percent of the population has no stereopsis
because of strabismus during early development. The Buckley, D., & Frisby, J. P. (1993). Interaction of stereo,
greatest concern of most pediatric ophthalmologists is the texture, and outline cues in the shape perception
loss of visual acuity in the deviating eye (amblyopia). Our of three-dimensional ridges. Vision Research, 33,
results show that the loss of stereopsis is also an 919–933. [PubMed]
important concern, even for those strabismics who do Burbeck, C. A. (1987). Position and spatial frequency in
not suffer from amblyopia. It greatly increases their large-scale localization judgments. Vision Research,
uncertainty about the location of features along the line 27, 417–428. [PubMed]
of sight, and based on our calculations, it seems unlikely
that monocular depth information can compensate for its Charman, W. N., & Whitefoot, H. (1977). Pupil diameter
absence. The optimum solution for these individuals is to and the depth-of-field of the human eye as measured
move around the objects, converting the z-axis informa- by laser speckle. Optica Acta, 24, 1211–1216.
tion into x-axis information. However, these movements Durgin, F. H., Profitt, D. R., Olson, T. J., & Reinke, K. S.
take time and are only useful for objects that are fairly (1995). Comparing depth from motion with depth
close. Of course, there are surgeons and professional from binocular disparity. Journal of Experimental
athletes who manage superbly without stereopsis. It would Psychology: Human Perception and Performance, 21,
be interesting to know what information these extraordi- 679–699. [PubMed]
nary individuals use to compensate for the loss of the
disparity information. For less gifted individuals, the loss Ernst, M. O., & Banks, M. S. (2002). Humans integrate
of stereopsis certainly hampers visual processing of visual and haptic information in a statistically optimal
object location and shape. Happily, extensive training fashion. Nature, 415, 429–433. [PubMed]
and some types of treatment for amblyopia not only Fahle, M., & Westheimer, G. (1988). Local and global
improve the acuity of the amblyopic eye but also lead to factors in disparity detection of rows of points. Vision
the recovery of stereopsis (Levi & Li, 2009; Li, Provost, Research, 28, 171–178. [PubMed]
& Levi, 2007).
Fendick, M., & Westheimer, G. (1983). Effects of practice
and the separation of test targets on foveal and
peripheral stereoacuity. Vision Research, 23, 145–150.
[PubMed]
Acknowledgments Frisby, J. P., Buckley, D., & Duke, P. A. (1996). Evidence
for good recovery of lengths of real objects seen with
This research was supported by National Eye Institute natural stereo viewing. Perception, 25, 129–154.
Grants R01-EY018875 and R01-EY06644 and by The [PubMed]
Smith-Kettlewell Eye Research Institute. We thank Laurie Frisby, J. P., Buckley, D., & Horsman, J. M. (1995).
Wilcox, James Elder, Preeti Verghese, Andrew Glennerster, Integration of stereo, texture and outline cues during
Justin Ales, and Christopher McKee for valuable discussions pinhole viewing of real ridge-shaped objects and
about these results. stereograms of ridges. Perception, 24, 181–198.
[PubMed]
Commercial relationships: none.
Corresponding author: Suzanne P. McKee. Gantz, L., Patel, S. S., Chung, S. T. L., & Harwerth, R. S.
Email:
[email protected]. (2007). Mechanisms of perceptual learning of depth
Address: 2318 Fillmore St., San Francisco, CA 94115, discrimination in random dot stereograms. Vision
USA. Research, 47, 2170–2178. [PubMed]
Journal of Vision (2010) 10(10):5, 1–13 McKee & Taylor 13
Gårding, J., Porrill, J., Mayhew, J. E., & Frisby, J. P. Journal of Experimental Psychology: Human Percep-
(1995). Stereopsis, vertical disparity and relief trans- tion and Performance, 28, 1202–1212. [PubMed]
formations. Vision Research, 35, 703–722. [PubMed] McKee, S. P. (l983). The spatial requirements for fine
Gibson, J. J. (1950). The perception of the visual world stereoacuity. Vision Research, 23, 191–198. [PubMed]
(p. 108). Cambridge, MA: The Riverside Press. McKee, S. P., Levi, D. M., & Bowne, S. F. (1990). The
Girshick, A. R., & Banks, M. S. (2009). Probabilistic imprecision of stereopsis. Vision Research, 30,
combination of slant information: Weighted averag- 1763–1779. [PubMed]
ing and robustness as optimal percepts. Journal of
McKee, S. P., Welch, L., Taylor, D. G., & Bowne, S. F.
Vision, 9(9):8, 1–20, http://www.journalofvision.org/
(1990). Finding the common bond: Stereoacuity
content/9/9/8, doi:10.1167/9.9.8. [PubMed] [Article]
and the other hyperacuities. Vision Research, 30,
Gulick, W. L., & Lawson, R. B. (1976). Human stereo- 879–891. [PubMed]
psis. New York: Oxford University Press.
Mitchison, G. J., & McKee, S. P. (1990). Mechanisms
Hillis, J. M., Watt, S. J., Landy, M. S., & Banks, M. S. underlying the anisotropy of stereoscopic tilt percep-
(2004). Slant from texture and disparity cues: Optimal tion. Vision Research, 30, 879–891. [PubMed]
cue combination. Journal of Vision, 4(12):1, 967–992,
http://www.journalofvision.org/content/4/12/1, Morgan, M. J., Watamaniuk, S. N., & McKee, S. P.
doi:10.1167/4.12.1. [PubMed] [Article] (2000). The use of an implicit standard for mea-
suring discrimination thresholds. Vision Research, 40,
Knill, D. C., & Saunders, J. A. (2003). Do humans
2341–2349. [PubMed]
optimally integrate stereo and texture information for
judgments of surface slant? Vision Research, 43, O’Toole, A., & Kersten, D. (1992). Learning to see
2539–2558. [PubMed] random dot stereograms. Perception, 21, 227–243.
[PubMed]
Kumar, T., & Glaser, D. A. (1993). Initial performance,
learning and observer variability for hyperacuity Sowden, P., Davies, I., Rose, D., & Kaye, M. (1996).
tasks. Vision Research, 33, 2287–2300. [PubMed] Perceptual learning of stereoacuity. Perception, 25,
Landy, M. S., Maloney, L. T. Johnston, E. B., & Young, M. 1043–1052. [PubMed]
(1995). Measurement and modeling of depth cue Svarverud, E., Gilson, S. J., & Glennerster A. (2010). Cue
combination: In defense of weak fusion. Vision combination for 3D location judgements. Journal of
Research, 35, 389–412. [PubMed] Vision, 10(10):7, 1–13, http://www.journalofvision.
LeClair, A., & Durgin, F. H. (2008). Depth interval org/content/10/1/5, doi:10.1167/10.1.5. [PubMed]
perception: Comparing binocular stereopsis with [Article]
motion parallax in “action space” [Abstract]. Journal Watt, S. J., Akeley, K., Ernst, M. O., & Banks, M. S.
of Vision, 8(6):857, 857a, http://www.journalofvision. (2005). Focus cues affect perceived depth. Journal of
org/content/8/6/857, doi: 10.1167/8.6.857. Vision, 5(10):7, 834–862, http://www.journalofvision.
Levi, D. M., & Li, R. W. (2009). Perceptual learning as a org/content/5/10/7, doi:10.1167/5.10.7. [PubMed]
potential treatment for amblyopia: A mini-review. [Article]
Vision Research, 49, 2535–2549. [PubMed]
Wheeler, D. G. (1982). The role of motion parallax in the
Li, R. W., Provost, A., & Levi, D. M. (2007). Extended perception of distance, depth and size. (Dissertation,
perceptual learning results in substantial recovery of Rutgers-Newark, The State University of New Jersey)
positional acuity and visual acuity in juvenile [URL]
amblyopia. Investigative Ophthalmology and Visual
Science, 48, 5046–5051. [PubMed] Yap, Y. L., Levi, D. M., & Klein, S. A. (1987). Peripheral
hyperacuity: 3-Dot bisection scales to a single factor
Loomis, J. M., Da Silva, J. A., Fujita, N., & Fukusima, S. S. from 0 to 10 deg. Journal of the Optical Society of
(1996). Visual space perception and visually directed America A, 4, 1554–1561. [PubMed]
action. Journal of Experimental Psychology: Human
Perception and Performance, 18, 906–921. [PubMed] Zabulis, X., & Backus, B. T. (2004). Starry nights: A
texture devoid of depth cues. Journal of the Optical
Loomis, J. M., Philbeck, J. W., & Zahorik, P. (2002).
Society A, 21, 2049–2060. [PubMed]
Dissociation between location and shape in visual space.