ABSTRACT
With the growing emphasis on touchless interfaces and smart automation,
gesture recognition has become a key technology in human-computer interaction.
This project presents the design and implementation of a gesture-controlled LED
lighting system using MediaPipe, a machine learning framework developed by
Google for real-time pose and hand tracking. The goal is to control the ON/OFF
states of an LED using simple hand gestures, eliminating the need for physical
contact with switches.
The system uses a webcam to capture real-time video input, which is
processed through MediaPipe’s hand-tracking solution to detect hand landmarks.
Specific hand gestures—such as an open palm or a closed fist—are identified
based on landmark positions and classified into control commands. These
commands are then sent to an Arduino microcontroller via serial communication
using Python, which subsequently toggles the state of the connected LED.
This project demonstrates the seamless integration of computer vision and
embedded systems to develop a contactless, user-friendly interface. It has
potential applications in smart homes, assistive technologies, and environments
where physical contact should be minimized, such as hospitals or industrial
settings. The proposed solution is low-cost, uses minimal hardware, and
leverages open-source tools, making it accessible for further development and
deployme
iv
TABLE OF CONTENTS
CHAPTERS TITLES NO PAGE
ABSTRACT IV
LIST OF FIGURES VII
LIST OF ABBREVATIONS VIII
1 INTRODUCTION 1
1.1 Project Overview 1
1.2 Problem Statement 2
1.3 Objectives 2
1.4 Scope 3
2 LITERATURE SURVEY 4
2.1 Gesture Recognition Basics 4
2.2 LED Control Methods 5
2.3 MediaPipe Overview 5
3 SYSTEM DESIGN 6
3.1 Architecture 6
3.2 Block Diagram 6
3.3 Workflow 7
4 COMPONENTS AND TOOLS 8
4.1 Hardware(Arduino, LED, Webcam) 8
4.2 Software(Python, MediaPipe, Arduino IDE) 8
v
5 IMPLEMENTATION 11
5.1 Hand Tracking 11
5.2 Gesture Detection 12
5.3 LED Control via Arduino 12
6 RESULTS AND APPLICATIONS 13
6.1 Output and Testing 13
6.2 Use Cases 13
6.3 Smart Home Automation 14
6.4 Contactless Control in Healthcare 15
6.5 Assistive Technology 15
6.6 Educational Use 15
6.7 Interactive Systems 16
6.8 Industrial Automation and Robotics 16
7 OUTPUTS AND SCREENSHOTS 17
8 CONCLUSION AND FUTURE WORK 18
8.1 Summary 18
8.2 Limitations 18
8.3 Future improvements 20
References 23
vi
LIST OF ABBREVIATINS
ESP32 - ESPRESSIF SYSTEMS PROCESSOR 32-BIT
PYSERIAL - PYTHON SERIAL COMMUNICATION LIBRARY
OPENCV - OPEN SOURCE COMPUTER VISION LIBRARY
GUI - GRAPHICAL USER INTERFACE
UART - UNIVERSAL ASYNCHRONOUS RECEIVER
TRANSMITTER
vii
LIST OF FIGURES
FIGURE NO. TITLE PAGE
3.1 Block Diagram 6
3.2 Workflow 7
4.1 ESP 32 9
4.2 LED lights 9
4.3 Jumber wires 10
4.4 1k Resistor 10
4.5 B Type USB cable 10
5.1 Hand Tracking 11
6.1 Use Cases 14
7.1 Mediapipe Output 17
viii
CHAPTER 1
INTRODUCTION
With the rapid advancement of smart technologies, the interaction between
humans and machines is evolving toward more natural and intuitive methods.
Among these, gesture recognition is gaining popularity as a contactless and
efficient way to control digital and physical devices. This project aims to develop
a simple yet effective system to control an LED light using hand gestures detected
by a webcam. Leveraging Google’s MediaPipe framework for realtime hand
tracking, the system translates specific gestures into commands that control the
LED via an Arduino microcontroller. This approach not only improves
convenience and accessibility but also offers potential applications in smart
homes, assistive technologies, and touchless control systems.
1.1 PROJECT OVERVIEW
This project focuses on controlling an LED light using hand gestures
captured through a webcam. By leveraging the MediaPipe framework developed
by Google, the system can recognize hand gestures in real time. The recognized
gestures are then translated into commands that are sent to an Arduino
microcontroller, which switches the LED ON or OFF. The goal is to create a
simple, low-cost, and contactless control system suitable for smart home
applications or assistive environments.
1.2 PROBLEM STATEMENT
Traditional methods of controlling electrical devices often require
physical interaction, such as pressing switches or using remotes. These
methods may not be suitable in situations where:
1
• Touch-free operation is preferred (e.g., hospitals or cleanrooms),
• Users have physical limitations or disabilities,
• Convenience and modern user experiences are desired.
There is a need for an affordable, contactless, and user-friendly alternative for
controlling basic electronic devices.
1.3 OBJECTIVES
The main objectives of this project are:
• To implement real-time hand gesture detection using the MediaPipe
framework.
• To classify specific hand gestures for controlling an LED light.
• To establish serial communication between the computer and an ESP 32
board.
• To create a low-cost, touchless system for controlling electronic devices.
1.4 SCOPE
This project is designed as a prototype for demonstrating gesture-based control.
It focuses on:
• Detecting simple hand gestures (e.g., open palm = ON, closed fist
= OFF).
• Controlling a single LED through esp32.
• Operating in real time using only a webcam and basic components.
2
While the current system is limited to controlling one LED, the concept can
be extended to multiple devices and more complex gestures in future
developments.
3
CHAPTER 2
LITERATURE SURVEY
Traditional LED control methods involve physical switches, remotes, or
voice commands. These require contact, extra hardware, or may not be suitable
in noisy environments.
Recent research has explored gesture-based control using computer vision. Early
methods used OpenCV for basic motion detection but lacked precision.
MediaPipe, developed by Google, provides accurate, real-time hand tracking
using only a webcam.
Compared to traditional systems, MediaPipe-based LED control is:
Contactless and hygienic
Low-cost (no sensors needed)
Easy to implement and flexible
2.1 GESTURE RECOGNITION BASICS
Gesture recognition is a technology that interprets human movements
especially hand and finger gestures Minto commands that a computer or
electronic device can understand. It is a form of nonverbal communication that
enables natural interaction between humans and machines. There are two main
types of gesture recognition:
• Static gestures (e.g., a still hand posture), and
• Dynamic gestures (e.g., moving a hand in a specific pattern).
4
Recent advancements in computer vision and machine learning have allowed
gesture recognition to move from complex sensor-based systems to simple
camera-based systems using real-time image processing.
2.2 LED CONTROL METHODS
Controlling LED lights has traditionally involved physical switches, remote
controls, or mobile apps. Some of the common modern control methods include:
• Touch-based interfaces (e.g., capacitive switches)
• Voice control systems (e.g., using Alexa or Google Assistant)
• Sensor-based systems (e.g., motion or proximity sensors)
• Wireless control via Bluetooth or Wi-Fi
Although effective, these systems often require physical contact, expensive
hardware, or third-party integrations. Gesture-based control offers a hands-free
and intuitive alternative.
2.3 MEDIAPIPE OVERVIEW
MediaPipe, developed by Google, is an open-source framework for building
multimodal machine learning pipelines. One of its core solutions is MediaPipe
Hands, which provides:
• Real-time hand tracking
• 21 landmark detection per hand
• High accuracy using only a webcam
MediaPipe processes video frames efficiently using lightweight models and does
not require any external sensors. Its Python compatibility makes it ideal for rapid
development of gesture-based applications.
5
CHAPTER 3
SYSTEM DESIGN
The system is built to detect hand gestures using a webcam, interpret the
gestures through Python using MediaPipe, and control an LED via Arduino. The
design ensures real-time gesture recognition and immediate hardware rense,
creating a simple, touch-free user interface.
3.1 ARCHITECTURE
The architecture consists of three main components:
• Input Unit: A webcam that captures real-time video of hand movements.
• Processing Unit: A Python script using the MediaPipe framework to detect
hand landmarks and recognize gestures.
• Control Unit: An ESP 32 that receives commands from Python and controls
an LED accordingly.
3.2 BLOCK DIAGRAM
HAND GESTURE WEBCAM INPUT MEDIAPIPE IN
(OPEN/FIST) (LIVE VIDEO FEED) PYTHON
LED ESP 32
FIG 3.1 BLOCK DIAGRAM
6
3.3 WORKFLOW
1. Gesture Input: The user shows a hand gesture (e.g., open palm or closed
fist) in front of the webcam.
2. Video Capture: The webcam captures the real-time video stream and feeds
it into the Python program.
3. Gesture Detection: MediaPipe identifies hand landmarks and the script
determines if the gesture matches predefined patterns.
4. Command Transmission: Depending on the recognized gesture, the script
sends a command (e.g., "1" or "0") to the Arduino via serial
communication.
5. LED Control: ESP 32 receives the signal and turns the LED ON or OFF
based on the command.
FIG 3.2 WORKFLOW
7
CHAPTER 4
COMPONENTS AND TOOLS
4.1 HARDWARE COMPONENTS
• ESP 32 – Microcontroller that receives commands from Python and
controls the LED.
• LED – Indicates ON/OFF status based on the gesture.
• 220Ω Resistor – Protects the LED by limiting current.
• Webcam – Captures hand gestures for detection.
• USB Cable – Connects Arduino to the computer for power and data.
4.2 SOFTWARE TOOLS
• Python – Used for writing gesture detection and communication code.
• MediaPipe – Framework for real-time hand gesture detection.
• OpenCV – Handles image processing from the webcam.
• PySerial – Sends commands from Python to Arduino via serial.
• Arduino IDE – Used to program and upload code to the Arduino.
8
FIG 4.1 ESP 32
FIG 4.3 LED LIGHTS
9
FIG 4.4 JUMPER WIRES FIG 4.5 1K RESISTOR
FIG 4.6 B TYPE USB CABLE
10
CHAPTER 5
IMPLEMENTATION
This chapter explains the practical steps taken to build and integrate the system
that uses hand gestures to control an LED. The implementation involves software
setup, gesture logic, and hardware communication.
5.1 HAND TRACKING
Hand tracking is achieved using the MediaPipe Hands solution by Google. The
webcam captures the live video stream, and MediaPipe detects and tracks 21 hand
landmarks in real time. These landmarks include key points like finger tips and
joints, which are used to interpret gestures.
• MediaPipe processes frames using a pre-trained model.
• It identifies whether a hand is open, closed, or in a specific gesture.
FIG 5.1 HAND TRACKING
11
5.2 GESTURES DETECTION
Python is used to analyze the hand landmarks and identify specific gestures. For
this project, the gestures are:
The gesture is detected by checking the relative positions of the fingers:
Open Hand → LED ON
Closed Fist → LED OFF
• If fingers are extended (tips are above knuckles), it's an open hand.
• If all fingers are folded (tips near the palm), it's a closed fist.
Once the gesture is identified, a corresponding signal is prepared to be sent to the
Arduino.
5.3 LED CONTROL VIA ARDUINO
The Python script sends a signal (e.g., "1" or "0") via serial communication
using the pyserial library.
• "1" → Arduino turns the LED ON
• "0" → Arduino turns the LED OFF
The Arduino continuously listens on the serial port and changes the LED state
based on the received command. This provides real-time response to the user’s
hand gesture.
12
CHAPTER 6
RESULTS AND APPLICATION
This chapter presents the output of the implemented system and highlights real-
world scenarios where gesture-based control can be useful.
6.1 OUTPUT AND TESTING
After successful integration of MediaPipe, Python, and Arduino, the system was
tested under different conditions. The following results were observed:
• Gesture Detection: Open hand and closed fist were accurately recognized
in real time.
• Response Time: LED responded with minimal delay (less than 1 second)
after gesture recognition.
• Reliability: The system consistently worked under normal lighting and
with a clear background.
• Serial Communication: Commands sent via Python were correctly
interpreted by the Arduino.
6.2 USE CASES
This system can be applied in various areas:
• Smart Home Automation: Control lights and appliances without touching
switches.
• Touchless Interfaces: Useful in hospitals and clean rooms where hygiene
is critical.
• Accessibility: Helps individuals with physical disabilities control devices
easily.
13
• Educational Projects: Ideal for students to learn about AI, computer vision,
and embedded systems.
FIG 6.1 USE CASES
6.3 SMART HOME AUTOMATION
In the context of smart homes, hand gesture control offers an innovative
way to operate lighting systems without relying on traditional switches or voice
commands. Users can simply wave their hand or form specific gestures to control
lights, fans, or appliances. This can be particularly useful in situations where users
have wet or dirty hands (e.g., in the kitchen), or in modern home designs that
prioritize minimalism and aesthetics by reducing the number of physical control
interfaces.
6.4 CONTACTLESS CONTROL IN HEALTHCARE
Hospitals and clinics demand strict hygiene and sterility. Traditional
switches or control panels can become points of contamination. By implementing
gesture-controlled systems, healthcare professionals can control lighting, call for
14
assistance, or activate indicators without direct contact. This reduces the spread
of pathogens, improves workflow efficiency in sterile rooms (like operating
theaters or ICUs), and ensures a safer working environment.
6.5 ASSISTIVE TECHNOLOGY
This system has immense potential in assistive technology. People with
physical disabilities who struggle with pressing buttons or reaching switches can
benefit from gesture-controlled lighting systems that require only minimal hand
movement. This fosters independence, improves accessibility, and enhances the
quality of life for individuals with motor impairments or elderly individuals who
may find traditional controls difficult to use.
6.6 EDUCATIONAL USE
Hand gesture recognition for controlling LEDs provides a compelling, real-
world example for educational purposes. It helps students and researchers explore
key concepts in AI, computer vision, embedded systems, and human-computer
interaction. This project can serve as a foundational model for learning Python
programming, sensor integration, and communication between software and
hardware components (e.g., Python-to-Arduino via serial communication). Such
projects are also ideal for engineering exhibitions, hackathons, and technical
competitions.
6.7 INTERACTIVE SYSTEMS
Museums, science centers, and public installations can use gesture-
controlled lighting to create interactive and immersive environments. For
instance, visitors waving their hands could trigger light patterns, animations, or
sounds, creating an engaging user experience. This approach adds novelty and
encourages user participation, especially in areas aimed at children, education, or
entertainment.
15
6.8 INDUSTRIAL AUTOMATION AND ROBOTICS
In industrial settings, gesture control can be used to trigger signals, control
robotic arms, or interact with machinery without touching surfaces—important in
dusty, hazardous, or high-precision environments. In robotics, gestures can
command specific robot behaviors or activate/deactivate functions, with LED
indicators providing visual feedback on system status. This can enhance safety
and improve communication in environments where verbal commands may be
impractical.
16
CHAPTER 8
CONCLUSION AND FUTURE SCOPE
FIG 8.1 MEDIAPIPE OUTPUT
17
CHAPTER 8
CONCLUSION AND FUTURE SCOPE
8.1 SUMMARY
This project successfully demonstrates a contactless method of controlling
an LED using hand gestures. By integrating MediaPipe for real-time hand
tracking, Python for gesture recognition, and Arduino for hardware control, a
low-cost and efficient system was developed. The system performed accurately
under typical conditions, providing a simple and intuitive user experience.
8.2 LIMITATIONS
While the project met its primary goals, a few limitations were observed:
• Gesture detection accuracy can drop in low lighting or cluttered
backgrounds.
• Only basic gestures (open and closed hand) are supported.
• The system requires a PC and webcam, limiting portability.
• It can only control a single output (one LED) in the current version.
The hand gesture-controlled LED system, while innovative and functional,
faces several limitations that impact its performance, especially in uncontrolled
environments. One major challenge is its sensitivity to lighting conditions. The
accuracy of gesture detection relies heavily on consistent and adequate lighting.
In low-light or overly bright environments, the computer vision model struggles
to detect hand landmarks correctly. Shadows, reflections, and glare can also
interfere with proper recognition. Additionally, cluttered or dynamic
backgrounds introduce visual noise that may confuse the model, leading to
frequent detection errors or failure to recognize gestures altogether.
Another significant limitation is the dependency on camera positioning and
the orientation of the user's hand. For the system to work effectively, the hand
18
must remain within the field of view of the webcam and at an optimal distance
and angle. If the hand moves too far, too close, or is tilted, the model may fail to
track it properly. This restricts natural user movement and can make the
interaction feel rigid or unnatural. The lack of flexibility in hand placement can
be frustrating during extended use or in practical situations where free movement
is expected.
The system also encounters issues related to gesture misclassification.
Gestures that appear visually similar, such as showing two fingers versus three,
can be easily confused without more advanced recognition techniques or filtering
logic. This leads to unintended actions and reduces the overall reliability of the
system. Because MediaPipe primarily recognizes static hand poses, it cannot
accurately interpret dynamic or transitional gestures unless extended with more
complex temporal models. As a result, the range of usable gestures is limited, and
the user interface becomes less expressive.
In terms of technical performance, the system requires a fair amount of
processing power. MediaPipe and OpenCV, when run in real time with
continuous webcam input, can be demanding on system resources. On older or
lower-end computers, this may lead to lag, reduced frame rates, or even system
crashes. This limits the accessibility of the solution to users with higher-
performance hardware and makes it less practical for integration with low-power
embedded devices unless significant optimization is applied.
Furthermore, the current system does not provide built-in feedback
mechanisms. There is no visual or audio confirmation when a gesture is detected
or an action is performed, which can leave users uncertain about whether their
input was registered correctly. Without feedback, users may repeat gestures
unnecessarily or assume the system is unresponsive. This lack of feedback
negatively affects user experience and system intuitiveness, especially for new
users unfamiliar with the gesture set.
19
Another key limitation is the system’s support for only a single user at a time.
It is designed to detect and respond to one hand, meaning it cannot distinguish
between multiple users or hands in the frame. In environments like classrooms,
exhibitions, or shared workspaces, this restricts its scalability and limits its
usefulness. The system also has a fixed gesture vocabulary, which reduces
flexibility and personalization. Adding new gestures requires manual coding or
retraining, which adds complexity.
Finally, when integrated with external hardware such as an Arduino for
physical LED control, the system depends heavily on stable serial
communication. Any interruption or delay in data transfer between the Python
script and the microcontroller can cause synchronization issues, resulting in
unresponsive or delayed LED behavior. Managing wiring, ports, and device
connections also introduces a layer of complexity that must be handled carefully
to maintain consistent performance.
8.3 FUTURE IMPROVEMENTS
Future versions of this project can be enhanced in several ways:
• Add more complex gesture recognition for multiple device control.
• Improve the system to work under varied lighting and environments.
• Integrate with wireless modules (e.g., Bluetooth, Wi-Fi) for mobile or IoT-
based control.
• Develop a user interface (GUI) to customize gestures and functions.
• Optimize performance for use on embedded platforms like Raspberry Pi.
To enhance the capabilities and usability of the hand gesture-controlled
LED system, several future improvements can be pursued. One major area of
advancement is in gesture classification. While the current implementation relies
on basic hand landmark detection and rule-based interpretation, integrating more
20
sophisticated machine learning algorithms such as Convolutional Neural
Networks (CNNs) or Support Vector Machines (SVMs) can greatly improve
accuracy. These models can be trained to recognize a wider variety of custom
gestures, account for subtle differences in hand shape or orientation, and reduce
misclassification, especially in noisy or inconsistent environments. This would
also allow the system to adapt to different users and use cases through
personalized gesture libraries.
Another promising enhancement is the introduction of multimodal control.
By combining hand gestures with additional input methods such as voice
commands, eye tracking, or facial expressions, the system can become more
intuitive, accessible, and versatile. This would be especially beneficial for users
with limited mobility or in situations where one input mode may be unreliable
due to environmental conditions. A multimodal interface also enables redundancy
and better contextual awareness, allowing the system to respond more
intelligently to complex scenarios.
To improve user interaction and system transparency, implementing
feedback mechanisms is essential. Currently, users receive no direct indication of
whether a gesture has been recognized or acted upon. By adding on-screen
indicators, audio cues such as beeps or spoken messages, or even haptic feedback
through vibration modules, the system can confirm successful gesture recognition
in real time. This not only improves usability but also enhances user confidence
and reduces errors from repeated inputs or misinterpretation.
Expanding the system's interface beyond the PC setup is another valuable
direction. Developing a dedicated mobile application would allow users to
monitor, configure, and override the system remotely via Bluetooth or Wi-Fi. The
app could display real-time gesture detection output, provide system status
updates, or allow users to toggle devices manually if gesture input is not feasible.
This mobile integration would make the system more adaptable to smart home
environments and increase its overall practicality.
21
Gesture logging and analytics could also be introduced to track user
interactions over time. By storing data on recognized gestures, usage patterns,
and system responses, developers can analyze performance, detect trends, and
refine gesture detection models. This logging capability is useful for both
debugging during development and for building training datasets for future
machine learning improvements. It can also support user personalization,
allowing the system to adapt to individual preferences and frequently used
gestures.
The ability to detect multiple users or hands simultaneously would
significantly increase the system’s utility in shared environments. Upgrading the
system to support dual-hand tracking or recognize different users in the frame
enables collaborative interaction and parallel control. For example, one user
could turn on the light while another adjusts the brightness, all through separate
gestures. This would be especially useful in educational, exhibition, or household
settings where multiple people interact with the system.
Integrating the system with existing smart home platforms represents
another key opportunity. By interfacing with platforms like Google Home,
Amazon Alexa, or open-source frameworks such as Home Assistant, the gesture-
controlled interface can be connected to a broader ecosystem of smart devices.
This enables the user to control not just LEDs but also smart plugs, appliances,
speakers, or climate control systems, turning the project into a full-featured home
automation solution.
Finally, for greater portability and independence from high-performance
computers, deploying the solution on edge AI devices is a valuable step forward.
Platforms like the Raspberry Pi 4 or NVIDIA Jetson Nano are capable of running
optimized MediaPipe models locally without requiring a full desktop setup. This
would allow the system to function offline, consume less power, and be
embedded into compact enclosures for real-world applications in smart homes,
healthcare environments, or industrial settings.
22
REFERENCES
1. D. Clark and R. Kumar, "Real-time gesture recognition using OpenCV for
brightness control," Proc. Int. Conf. Comput. Vis. Signal Process., vol. 8, pp. 215-
220, 2022.
2. A. Sharma, P. Singh, and T. Mehta, "Volume control using hand gestures
and depth-sensing cameras," IEEE Trans. Multimedia, vol. 14, no. 3, pp. 789-
798, 2023.
3. J. Kim, S. Lee, and Y. Choi, "Hand tracking and motion detection for
multimedia control using OpenCV," IEEE Access, vol. 10, pp. 35234-35245,
2024.
4. M. Kaur and S. Singh, "Improving gesture classification using CNN
models with OpenCV," J. Adv. Comput. Sci. Technol., vol. 9, no. 2, pp. 45-52,
2023.
5. P. Patel and H. Joshi, "Gesture-based brightness and volume control using
OpenCV and TensorFlow," Proc. IEEE Conf. Artif. Intell. Appl., pp. 162-169,
2023.
6. R. Ahmed and L. Smith, "Impact of lighting conditions on hand gesture
detection for brightness control," IEEE Sens. J., vol. 23, no. 1, pp. 45-55, 2024.
7. L. Zhang and X. Li, "A robust hand gesture interface for smart home
control," IEEE Trans. Consum. Electron., vol. 70, no. 2, pp. 284-292, 2023.
8. G. Bose, A. Roy, and S. Banerjee, "Evaluating gesture recognition
algorithms for media control applications," IEEE Trans. Ind. Electron., vol. 68,
no. 4, pp. 1575-1585, 2023.
9. K. Wang, Y. Liu, and H. Wu, "Gesture recognition using OpenCV in
resource-limited environments," IEEE Trans. Cybern., vol. 54, no. 8, pp. 1231-
1242, 2023.
23
10. N. Mishra, P. Sinha, and R. Ghosh, "Adaptive gesture recognition for
brightness control using OpenCV," IEEE Trans. Neural Netw. Learn. Syst., vol.
35, no. 1, pp. 344-353, 2024.
11. M. Khan and P. Roy, "A hybrid system for gesture-based control
combining OpenCV and infrared sensors," IEEE Sens. Lett., vol. 8, no. 3, pp.
310-315, 2023. 12. V. Gupta and A. Das, "Dynamic hand gesture tracking using
OpenCV and deep learning," IEEE Trans. HumanMach. Syst., vol. 54, no. 6, pp.
980-987, 2024.
24