0% found this document useful (0 votes)
28 views6 pages

Literature Review Research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Literature Review Research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

2.

Literature Review

The application of remote sensing technology has advanced significantly in recent years,

especially in the areas of object segmentation and height extraction. High-resolution aerial

imaging and LiDAR (Light Detection and Ranging) technology have come together to provide a

potent tool for gathering three-dimensional data. The literature on this topic takes a variety of

approaches, from the most recent deep learning models to conventional geospatial

methodologies. This section places the current study in the context of a larger body of research

and reviews important works in the field.

2.1 Traditional Methods for Height Extraction

Digital Elevation Models (DEMs) and Digital Surface Models (DSMs) are examples of

conventional geospatial approaches that were previously used to extract object heights from

remote sensing data. According to Weinstein et al. (2020), DSMs include the heights of objects

like buildings, trees, and other structures, whereas DEMs depict the naked soil surface.

Researchers were able to determine the height of numerous objects above ground by deducting

the DEM from the DSM. Although this method is simple, it frequently necessitates a large

amount of manual labor for data preprocessing, especially when working with intricate

landscapes.

The inability of these conventional techniques to automatically segregate objects within

the data is one of their main drawbacks. This implies that every object (such as a tree or building)

needs to be manually identified, which adds effort to the process and increases the possibility of

human error. These techniques usually have trouble differentiating between items that overlap,

like buildings in an urban environment or trees in a dense forest (Wang et al., 2017). By
automating the segmentation and classification of objects, machine learning and deep learning

have emerged as solutions to these problems.

2.2 Machine Learning Approaches

Remote sensing has changed as a result of the use of deep learning (DL) and machine

learning (ML) techniques on LiDAR and aerial photography data. According to Qi et al. (2017),

Convolutional Neural Networks (CNNs) were among the first to be extensively used in remote

sensing applications, especially for object recognition and categorization. CNNs have been used

to recognize objects with high accuracy because they are particularly good at processing grid-

like data structures, including aerial pictures. Direct application of these methods to unstructured

point clouds results in reduced efficacy.

Techniques like PointNet and PointNet++, its successor, were created to get around this

restriction. A deep learning model called PointNet++ (Qi et al., 2017) was created especially to

deal with point clouds. Its ability to divide and categorize individual points according to their

spatial relationships makes it a very useful tool for applications like LiDAR data height

extraction and object detection. Even though PointNet++ has demonstrated strong performance

in numerous contexts, it still necessitates substantial labeled training data and computational

resources, which may be prohibitive for practitioners or smaller research teams without access to

these resources.

Transformer models have been modified for use in Computer Vision in recent years; they

were first created for Natural Language Processing (NLP) applications (Dosovitskiy et al., 2020).

The transformer architecture is very effective for large-scale activities like multi-modal data

fusion because it enables data processing in parallel. The combination of LiDAR and aerial
photography data has been made possible by the advent of Vision Transformers (ViTs), which

have further broadened the possibilities of deep learning in remote sensing (Dosovitskiy et al.,

2020). However, only well-funded research organizations are typically able to apply these

models due to their intricacy and high computational resource requirements.

2.3 Segmentation Models

Accurately segmenting objects from LiDAR point clouds and aerial data is a major

challenge in height extraction. Models such as the Segment Anything Model (SAM) have been

created in answer to this difficulty (Kirillov et al., 2023). SAM is a basic model that requires

very little human intervention to detect and segment items in an image. Because of its

adaptability to handle various data types, it has been successfully used in a number of sectors,

including medical imaging and remote sensing.

Although SAM and related models, such DeepForest (Weinstein et al., 2020), have

demonstrated encouraging outcomes in object detection from remote sensing data, they are not

without drawbacks. SAM's heavy reliance on labeled data for fine-tuning is one of its main

disadvantages. Furthermore, SAM's effectiveness generally deteriorates when applied to larger or

more intricate objects, like dense forest canopies or tall buildings, especially when using lower-

resolution data (Kirillov et al., 2023). Similar to this, DeepForest—a program designed

especially for detecting tree canopies—has demonstrated excellent accuracy in identifying

individual tree crowns but has trouble identifying overlapping canopies in densely populated

areas (Weinstein et al., 2020).

2.4 Multi-Modal Data Fusion and Large Language Models


Integration of multi-modal data, which combines information from many sensors (e.g.,

LiDAR, RGB, infrared), is a developing trend in the field of remote sensing. Using the

advantages of each data source, this method improves item recognition and categorization

accuracy. For instance, RGB photography captures color and texture, enabling more

sophisticated object recognition, whereas LiDAR data offers exact height information (Yang et

al., 2024).

Remote sensing has been impacted by recent developments in large language models, or

LLMs. Multi-modal LLMs have the ability to handle and understand remote sensing data, such

as aerial images and LiDAR, as demonstrated by models like as SkyEyeGPT and EarthGPT

(Yang et al., 2024). These models are very adaptable for tasks requiring multi-step analysis or

complicated reasoning since they can process both textual and visual inputs. The integration of

LLMs with remote sensing data offers intriguing prospects for the future of object detection and

height extraction, even though it is still in the experimental stages.

2.5 Gaps in the Literature

There is still a need in the literature for easily understandable, computationally effective

approaches that researchers without substantial funding or experience in machine learning may

employ, even with the quick developments in deep learning and transformer-based models.

Smaller institutions and individual researchers frequently lack access to the substantial

computational infrastructure and big labeled datasets needed for training and fine-tuning of the

current models, despite their great power.

The method proposed in this paper fills this gap by using commonly accessible geospatial

tools to extract object heights from aerial and LiDAR photos in a procedural manner. The
suggested approach, which requires less processing power and no specialist machine learning

understanding, is intended to be more approachable than deep learning-based models. This effort

attempts to close the knowledge gap between conventional techniques and the most recent

developments in GeoAI by providing a useful, replicable methodology.


Notes:
Traditional Methods: To set the historical background for the developments in remote sensing,
the article starts out by outlining earlier, manual techniques for extracting height data.
Machine Learning: The strengths and limitations of the machine learning and deep learning
approaches utilized in height extraction are highlighted in this section's more thorough
explanation.
Segmentation Models: SAM and DeepForest are thoroughly examined, and their drawbacks are
pointed out to establish the necessity of a methodical approach.
Multi-Modal Data and LLMs: Including multi-modal data fusion with LLMs highlights the
direction that this discipline is taking by introducing cutting-edge research trends.
Gaps in Literature: The conclusion of the literature review makes it evident why your
procedural method is a useful contribution by explicitly describing the gap your study is filling.

You might also like