See discussions, stats, and author profiles for this publication at: [Link]
net/publication/303382652
An algorithm to estimate building heights from Google street-view imagery
using single view metrology across a representational state transfer system
Conference Paper in Proceedings of SPIE - The International Society for Optical Engineering · April 2016
DOI: 10.1117/12.2224312
CITATIONS READS
15 2,933
2 authors:
Elkin David Diaz plata Henry Arguello
Industrial University of Santander Industrial University of Santander
4 PUBLICATIONS 23 CITATIONS 472 PUBLICATIONS 4,627 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Elkin David Diaz plata on 16 August 2019.
The user has requested enhancement of the downloaded file.
An algorithm to estimate building heights from Google
street-view imagery using single view metrology across a
representational state transfer system
Elkin Dı́az and Henry Arguello
Department of Computer Science, Universidad Industrial de Santander, Bucaramanga,
Colombia, 680002.
ABSTRACT
Urban ecosystem studies require monitoring, controlling and planning to analyze building density, urban density,
urban planning, atmospheric modeling and land use. In urban planning, there are many methods for building
height estimation using optical remote sensing images. These methods however, highly depend on sun illumina-
tion and cloud-free weather. In contrast, high resolution synthetic aperture radar provides images independent
from daytime and weather conditions, although, these images rely on special hardware and expensive acquisition.
Most of the biggest cities around the world have been photographed by Google street view under different con-
ditions. Thus, thousands of images from the principal streets of a city can be accessed online. The availability of
this and similar rich city imagery such as StreetSide from Microsoft, represents huge opportunities in computer
vision because these images can be used as input in many applications such as 3D modeling, segmentation,
recognition and stereo correspondence. This paper proposes a novel algorithm to estimate building heights using
public Google Street-View imagery. The objective of this work is to obtain thousands of geo-referenced images
from Google Street-View using a representational state transfer system, and estimate their average height using
single view metrology. Furthermore, the resulting measurements and image metadata are used to derive a layer
of heights in a Google map available online. The experimental results show that the proposed algorithm can
estimate an accurate average building height map of thousands of images using Google Street-View Imagery of
any city.
Keywords: Single view metrology, Google Street-View, building height estimation, representational state trans-
fer, urban planning.
1. INTRODUCTION
Building detection and analysis is a fundamental task in urban monitoring, controlling and planning.1 There are
many methods for building height extraction using optical remote sensing images.2–5 These methods however,
highly depend on sun illumination and cloud-free weather. In contrast, synthetic aperture radar (SAR) provides
images independent from daytime and weather conditions. High resolution (HR) SAR images are used to iden-
tify anthropogenic structures such as individual buildings and thus, accurate building height can be estimated.6
Unfortunately, these techniques are expensive, for instance, the HR-SARs data acquisition is overpriced, and it
relies on special hardware.
The availability of Google Street-View and similar rich imagery such as StreetSide7 from Microsoft, represents
huge opportunities in computer vision, since they can be used for several applications such as building height
estimation. These systems provide images from most of the biggest cities around the world. In particular,
Google Street-View imagery has been acquired under different conditions, including vehicles and weather. Thus,
thousands of images providing distinct views from streets of a city are accessible online.8
Further author information:
E-mail: elkin.diaz1@[Link]
E-mail: henarfu@[Link]
Dimensional Optical Metrology and Inspection for Practical Applications V,
edited by Kevin G. Harding, Song Zhang, Proc. of SPIE Vol. 9868, 98680A
© 2016 SPIE · CCC code: 0277-786X/16/$18 · doi: 10.1117/12.2224312
Proc. of SPIE Vol. 9868 98680A-1
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
One of the challenges of Google Street View imagery is that the data is uncalibrated, with widely variable
and uncontrolled illumination, resolution, and quality . The development of computer vision algorithms that
correctly work with such imagery has been a major challenge for the research community.
Single View Metrology9 describes methods for measuring aspects of the affine 3D geometry of a scene from
a single perspective image. These methods are independent of the internal parameters of the camera, such as
focal length, aspect ratio, principal point, and skew. Since the camera parameters are unknown in the Google
Street-View images, single view metrology is suitable to work with these images.
This paper proposes a novel algorithm to estimate building heights using public Google Street-View imagery
and Single View Metrology. Figure 1 shows a general architecture scheme in which a representational state trans-
fer system (RESTS) retrieves the Google Street-View imagery, then the proposed algorithm processes blocks of
the city area, and creates a layer of heights that can be accessed online.
Goggle
Stalle Maps
API
3,
Heights estimation algorithm
running on blocks
foyer of heights
Google in a Goggle map
Maps Road
API
PIP
Google
boogie
GoopVi
Street
Image API
H
Figure 1: General Architecture scheme where a set of Google APIs allows to collect a set of Google Street-View
Images. Then, the proposed algorithm estimates buildings height and finally a layer map of heights is generated.
2. ALGORITHM FOR BUILDING HEIGHT ESTIMATION
The goal of this work is to address the issues involved in estimating the height of buildings from Google Street-
View images using Single View Metrology. The method proposed uses a representational state transfer system
(RESTS) to collect a set of Google Street-View images. These images are then used to estimate the average
height of the area.
2.1 Representational state transfer system (RESTS)
Google APIs are a set of application programming interfaces developed by Google,10 which allows the com-
munication with Google Services through RESTS.11 In this work the Google Static Maps API (SMA) and the
Google Street View Image API (SIA) are used to determine a city area and collect the corresponding Google
Street-View images. All of these APIs include request methods based on the URL parameters sent through a
standard HTTP request. In particular, SIA takes a geographic coordinate and its attributes: location (latitude
and longitude), heading and pitch, and returns a Google Street View Image. The parameter heading indicates
the compass heading of the camera with accepted values from 0 to 360 with 0 and 360 indicating North, 90
indicating East, and 180 South. The pitch specifies the up or down angle of the camera relative to the Street
View vehicle.
Using SMA to obtain the map image and, knowing its four geographic coordinate corners, the affine referencing
spatial matrix can be constructed as
lon lat = row col 1 R, (1)
Proc. of SPIE Vol. 9868 98680A-2
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
where R is a 3x3 affine transformation matrix that transforms the row and column subscripts of an image or
regular data grid to 2D map coordinates or to geographic coordinates (longitude and geodetic latitude). Then,
it is possible to transform pixel subscripts (row, column) to/from map coordinates (longitude, latitude).
On the other hand, the area of a city can be represented as a set of blocks given by
City = { B1 , B2 , ..Bi ..Bm } ,
(2)
where Bi = { Ii1 , Ii2 , ..Iij ..Iini }
with Iij being the j th Google Street-View image of the block ith and j indexes the number of images on each
block, ni . The images set of Bi are collected using SIA. For each image, the average height is estimated using
the single view metrology method presented in the next section.
2.2 Building height estimation based on single view metrology
The single view metrology9 method assumes that the vanishing line of a reference plane in this case, the ground
in the scene may be determined from the image and a vertical vanishing point to this plane. The approach
consists on measuring the distance between the top of the building and the ground. The method employs a
central projection camera model. In this model, the image plane is located at some distance from a point called
the camera center. A point is mapped into the image plane by translating the point to a straight line towards the
camera center.12 The basic geometry of the vanishing line and the vanishing point of the plane are illustrated
in fig 2.
plane
vanishin line camera
centre
reference
vanishing direction
point
image
plane
reference
plane
Figure 2: Basic geometry: The plane’s vanishing line l is the intersection of the image plane with a plane parallel
to the reference plane and passing through the camera centre. The vanishing point v is the intersection of the
image plane with a line parallel to the reference direction through the camera centre.9
The height of a building can be estimated based on vanishing points and the camera height.9 The distance
to be measured (in the reference direction) is assumed to be between two parallel planes, specified by the image
points x and x0 as shown in fig 3. The upper case letters (X) indicate quantities in space and lower case letters
(x) indicate image quantities. To start, an affine coordinate system XY Z is defined in the space. Let the origin
of the coordinate frame lie on the ground, with the X and Y -axes spanning the plane. The Z-axis is the reference
direction, which is thus any direction not parallel to the ground plane. The image coordinate system is the usual
xy affine image frame, and a point X in space is projected to the image point x via a 3 × 4 projection matrix P
as
x = PX = p1 p2 p3 p4 X, (3)
Proc. of SPIE Vol. 9868 98680A-3
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
at
infinity
it, X'
Ul
X
(a) In the real world (b) In the image
Figure 3: Distance between two planes relative to the distance of the camera centre from one of the two planes:
(a) in the real world; (b) in the image. The point x on the plane π corresponds to the point x0 on the plane π 0 .9
where x and X are homogeneous vectors of the form x ≡ (x, y, w)| , X ≡ (X, Y, Z, W )| , and “ ≡ ” means
equality up to scale.
If the vanishing points are denoted vX , vY and v for the X, Y and Z directions respectively, then the first
three columns of P are the vanishing points vX = p1 , vY = p2 and v = p3 , and the final column of P is the
projection of the origin of the world coordinate system, o = p4 . Since the choice of coordinate frame has the
X and Y axes in the reference plane, p1 = vX and p2 = vY are two distinct points on the vanishing line. The
vanishing line is denoted by l, and to emphasize that the vanishing points vX , and vY lie on it, they are denoted
l|1 and l|2 , with l|i · l = 0. The final column, the origin of the coordinate system, must not lie on the vanishing
line, otherwise all three columns would be points on the vanishing line, and thus they would not be linearly
independent. Hence, p4 is set to be p4 = l/klk = l̄. Therefore, the final parameterization of the projection
matrix P can be written as
P = l|1 l|2
αv l̄ , (4)
where α is a scale factor. Now, suppose the camera center (see the fig 2) is C = (Xc , Yc , Zc , Wc )| . Given
that PC = 0, we can write
PC = p1 Xc + p2 Yc + p3 Zc + p4 Wc = 0. (5)
Using Cramer rule the solution to the set of equations in Eq. (5), with α unknown is given by
Xc = -det p2 v p4 ,
Yc = det p1 v p4 ,
(6)
αZc = -det p1 p2 p4 ,
Wc = det p1 p2 v .
Using Zc = 2.5m, which is the approximate height of the camera located in the Google cars according to
Google Earth, it is possible to find α. For a given building on the ground, the height Z can then be found by
||x × x0 ||
Z=− , (7)
(p4 · x)||p3 × x0 ||
where ||a|| is the L2 norm of the vector a and the operator × is the cross product. Proof of equation (7) can
be found in.9
Proc. of SPIE Vol. 9868 98680A-4
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
I
°f
I
Eif
or'
IE;
(a) First Case (b) Second Case (c) Both cases
Figure 4: Three possible cases in the system when the parameter pitch is 0. Google Street View Image API.
December 21, 2015: (a) the sky can be seen; (b) the sky cannot be seen; (c) the sky only can be seen in a part
of image.
Algorithm 1 summarizes the implementation of the proposed solution to estimate building height from Google
Street-View using single view metrology. Assuming the Google street-view image with the correct heading is
already obtained such that the building is present, it is necessary to see the sky in the image in order to estimate
the height. However, sometimes this does not happen. More specifically, three cases are possible: the sky can
be seen, the sky cannot be seen and both cases occur in the same image as is illustrated in fig 4.
To address all the cases, the algorithm performs 8 partitions of 80 pixels, given that the size of the image is
always 640x640 pixels, and in every partition it is checked if the sky can be detected, using the method of gradient
in.13 When three consecutive partitions the sky is not detected, it is assumed that they conform a building, then
the parameter pitch is modified until the sky can be detected. Once the sky is detected the correct height of
building can be estimated. Ground detection is performed based on Hough Transform.14 Finally, the average
height is estimated using Eq.7.
Algorithm 1 Estimating average buildings height from a Google Street-View image
Input: Google Street Image, with correct heading and geographic coordinate.
Output:Average height of the buildings in the picture.
1: Detect ground region.
2: W=length(StreetImage)
3: pitch =0
4: for i = 0 to W 80 do
5: while The sky cannot be detected do
6: pitch=pitch+5;
7: StreetImage=SIA(GeoCoordinate,heading,pitch)
||x×x0 ||
8: Z(i) = (p ·x)||p 0
4 3 ×x ||
MeanHeight=Mean(Z);
3. EXPERIMENTAL RESULTS
Experiments were conducted to test the performance of the proposed algorithm. In particular, approximately
40000 Google Street-View images of an area of Madrid, Spain were chosen. The dataset contains a wide variety
of outdoor scenes and buildings. Images are collected using Google APIs. The intrinsic and extrinsic camera
parameters are not provided, instead, the camera height was used to find α , as described in section 2.2. The
matrix P is computed once for all Street View images with pitch=0 when the perspective error is low and once
for pitch=25 when the perspective error is high.
Proc. of SPIE Vol. 9868 98680A-5
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
(a) (b)
Figure 5: Sky and ground detection results. Source: Madrid. Google Street View Image API. January 21, 2016:
(a) The sky and ground detection with a correct position; (b) The ground detection moved 3 meters up from
the correct one.
I
'
¡, !
1.,,,,_Lp[J,1
1;
T 1y
1gl9e
)
(a) 590 Madison Avenue-New York (b) Humphries Court-Manchester
Figure 6: Estimation algorithm results. Source: New York and Manchester. Google Street View Image API.
January 21, 2016: (a) Estimated height 200.8 m; actual height is 184 m; (b) Estimated height 36.8 m; actual
height is 38,4 m.
The algorithm was implemented on MATLAB running on a 2.6-GHz Intel processor. Since the algorithm
uses RESTS, the time of the algorithm depends on the internet speed available. However, this process can be
also performed using batch processing.
Figure 5 shows the results of the algorithms for ground and sky detection. The main error sources are also
illustrated, since if the sky or the ground are not correctly detected the measured height will be affected.
In order to quantitatively analyze the results, the estimation algorithm is used in buildings with known height,
the results are shown in fig 6, where the relationship between height error and perspective error is noticeable.
A random sample of 100 images were chosen in order to validate the algorithm. The height of the random
sample is estimated manually based on the storey height. The results are shown in fig 7 where the estimation
accuracy can be noticed, and it is verified by the average error of 2.48 m. The height layer is presented in the
fig 8 where the entire area of the Madrid is estimated.
Proc. of SPIE Vol. 9868 98680A-6
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
Estimation Algorithm Vs Estimated manually
40
35
30
'i',1:'.,i ..
Ê25
20 n,wir7.11Wwì
15
10
5
o
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73
Building
Estivation Agathm -Estimated manuany
Figure 7: Results for Madrid, the mean error was 2.48 m.
Figure 8: Layer of heights created over Google Maps. Madrid.
4. SUMMARY AND CONCLUSIONS
An algorithm to estimate building height from Google Street-View images has been presented. The proposed
algorithm uses the single view metrology method for the estimation and Google APIs to obtain the images. A
layer of the average heights for each block is also generated. The accuracy of the proposed algorithm is verified
with several experiments and an average error of 2.48 m was obtained.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the Vicerrectorı́a de Investigacin y Extensión of Universidad Industrial de
Santander for supporting this work registered under the project titled “ Extracción y separación de información
espectral en imágenes obtenidas de forma remota usando muestreo compresivo y aperturas codificadas de color.”
with VIE code 1804.
Proc. of SPIE Vol. 9868 98680A-7
Downloaded From: [Link] on 05/20/2016 Terms of Use: [Link]
REFERENCES
[1] Rashed, T. and Jürgens, C., “Remote Sensing of Urban and Suburban Areas,” Remote Sensing and Digital
Image Processing 10(42), 181–192 (2010).
[2] Elbakary, M. I. and Iftekharuddin, K. M., “Shadow Detection of Man-Made Buildings in High-Resolution
Panchromatic Satellite Images,” IEEE Transactions on Geoscience and Remote Sensing 52(9) (2013).
[3] Izadi, M. and Saeedi, P., “Three-Dimensional Polygonal Building Model Estimation From Single Satellite
Images,” IEEE Transactions on Geoscience and Remote Sensing 50(6), 2254–2272 (2012).
[4] Ok, A. O., Senaras, C., and Yuksel, B., “Automated detection of arbitrarily shaped buildings in complex en-
vironments from monocular VHR optical satellite imagery,” IEEE Transactions on Geoscience and Remote
Sensing 51(3), 1701–1717 (2013).
[5] Stankov, K. and He, D.-c., “Building Detection in Very High Spatial Resolution Multispectral Images Using
the Hit-or-Miss Transform,” IEEE Geoscience and Remote Sensing Letters 10(1), 86–90 (2013).
[6] Wang, Z., Jiang, L., Lei, L., and Yu, W., “Building Height Estimation from High Resolution SAR Imagery
via Model-Based Geometrical Structure Prediction,” Progress In Electromagnetics Research , 11–24.
[7] MicrosoftCorporation, “Streetside,” [Link] (2009).
[8] Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., Ogale, A., Vincent, L., and Weaver, J.,
“Google street view: Capturing the world at street level,” IEEE Computer Society (6), 32–38 (2010).
[9] Criminisi, A., Reid, I., and Zisserman, A., “Single view metrology,” International Journal of Computer
Vision 40(2), 123–148 (2000).
[10] GoogleInc, “Google APIs Explorer,” [Link] (2009).
[11] Fielding, R. T., Architectural styles and the design of network-based software architectures, PhD thesis,
University of California, Irvine (2000).
[12] Poling, B., “A Tutorial On Camera Models,” 1–10.
[13] Shen, Y. and Wang, Q., “Sky Region Detection in a Single Image for Autonomous Ground Robot Naviga-
tion,” International Journal of Advanced Robotic Systems , 1–13.
[14] Duda, R. O. and Hart, P. E., “Use of the hough transformation to detect lines and curves in pictures,”
Commun. ACM 15, 11–15 (Jan. 1972).
Proc. of SPIE Vol. 9868 98680A-8
DownloadedViewFrom:
publication[Link]
stats on 05/20/2016 Terms of Use: [Link]