UNIT 4 : Computer Vision

NOTES CBSE AI X
UNIT 3 : DATA SCIENCE
September 23, 2024
NOTES CBSE AI X
UNIT 5 : Natural Language Processing
September 23, 2024
NOTES CBSE AI X

Computer Vision

1. Introduction to Computer Vision

  • Definition: Computer Vision (CV) is a field within Artificial Intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs. It allows machines to process and analyze visual data to simulate human sight. By using algorithms and machine learning models, CV applications can detect objects, recognize patterns, and make decisions based on the visual input provided.
  • Example – Emoji Scavenger Hunt: Imagine playing a game where a machine shows you an emoji and asks you to find a real-life object that matches it. In the “Emoji Scavenger Hunt” game, the computer uses its “vision” to detect the objects you show in front of your camera and check if they match the emoji. This simulates how CV enables machines to identify objects from real-world environments using camera input.
  • How It Works: Computer Vision uses advanced algorithms to interpret visual data. It breaks down images into pixels, processes them using machine learning techniques, and identifies patterns, shapes, or objects by comparing them with its dataset.

2. Applications of Computer Vision

Over the years, CV has evolved to become a crucial part of various industries, with applications that have transformed sectors ranging from retail to healthcare. Here are some real-world applications of CV:

a. Facial Recognition

  • Definition: Facial recognition systems identify or verify a person’s identity using their facial features.
  • Applications:
    • Smart Homes & Cities: CV plays a critical role in enhancing security. In smart homes, facial recognition technology can be used to control access, allowing only registered individuals inside. Similarly, smart city cameras can recognize and track people in public spaces for security purposes.
    • Attendance Systems: Schools and workplaces use facial recognition for automated attendance marking.
  • Example: Schools can track student attendance automatically by scanning students’ faces upon entry.

b. Face Filters in Social Media

  • Definition: Face filters are used to apply augmented reality (AR) effects to users’ faces in apps like Instagram and Snapchat.
  • How It Works: Computer vision algorithms detect and map facial features in real-time. Using this data, the system overlays digital filters that enhance or alter the appearance of the face.
  • Example: When you apply a dog filter on Snapchat, CV algorithms track the eyes, mouth, and nose, allowing the filter to adjust dynamically as you move.

c. Google’s Search by Image

  • Definition: Google’s “Search by Image” feature uses computer vision to allow users to upload an image instead of typing keywords, and Google returns relevant search results based on that image.
  • How It Works: The CV system analyzes features like colors, shapes, and patterns of the uploaded image, compares them to images in its database, and displays matching results.
  • Example: If you upload a picture of a landmark, Google will identify it and provide detailed information about the place, including its history and location.

d. Computer Vision in Retail

  • Customer Behavior Tracking: Retailers use CV to track customers’ movements within stores. Cameras and CV algorithms analyze how people navigate through aisles, which helps in optimizing store layouts.
  • Inventory Management: Cameras monitor stock levels on shelves, and CV algorithms provide real-time analysis of which products need restocking.
  • Example: Amazon Go stores use computer vision to create a cashier-less shopping experience. Shoppers can pick items off the shelf, and CV systems automatically detect what they’ve selected, charge their account, and let them walk out without checking out manually.

e. Self-Driving Cars

  • Definition: Autonomous vehicles rely heavily on computer vision to interpret the surrounding environment, helping the car navigate safely without human intervention.
  • Key Tasks: CV enables self-driving cars to detect objects like other cars, pedestrians, road signs, and obstacles. It also assists in lane detection and route navigation.
  • Example: Tesla’s autopilot uses computer vision to detect nearby vehicles and ensure lane accuracy, adjust speed, and manage traffic conditions.

f. Medical Imaging

  • Definition: CV is revolutionizing healthcare by aiding in the analysis of medical images such as X-rays, MRIs, and CT scans.
  • How It Works: The technology helps to identify abnormalities and diseases by converting 2D scans into detailed 3D models, offering better insights for diagnosis.
  • Example: AI-powered systems can detect tumors or fractures from medical images faster and sometimes more accurately than human radiologists, providing early diagnosis and better treatment outcomes.

g. Google Translate App (Augmented Reality)

  • Definition: By using CV combined with augmented reality (AR), Google Translate allows users to point their phone cameras at foreign text and receive a real-time translation overlay.
  • How It Works: Optical character recognition (OCR) detects the foreign words, while AR translates and overlays the text in the user’s preferred language.
  • Example: If you’re traveling abroad and come across a sign in a language you don’t understand, pointing your camera at it will display the translated text almost instantly on your screen.

3. Core Tasks in Computer Vision

  • Classification: Assigning a label to an image (e.g., categorizing images as “cat” or “dog”).
  • Classification + Localization: Identifying the object and its position within the image.
  • Object Detection: Detecting multiple objects in a single image along with their positions.
    • Example: In self-driving cars, object detection is used to identify pedestrians, traffic signals, or other vehicles.
  • Instance Segmentation: Segmenting an image to identify multiple instances of objects and assigning labels to individual pixels.
    • Example: In medical imaging, instance segmentation can identify and label different organs in a scan.

4. Basics of Images in Computer Vision

  • Pixels: The smallest unit of an image. Each image is made up of thousands or millions of these pixels.
  • Resolution: The number of pixels in an image determines its resolution, which affects clarity. For example, a 5-megapixel camera produces images with 5 million pixels.
  • Pixel Value: Each pixel has a brightness or color value ranging from 0 to 255. In grayscale images, 0 represents black, and 255 represents white.
  • RGB Images: Color images are made by combining three color channels (Red, Green, and Blue). Every color image pixel has a set of three values, each corresponding to the intensity of these colors.

5. Image Features

  • Definition: Features are essential visual elements in an image that help in recognizing or categorizing objects.
  • Key Features:
    • Edges: Boundaries between different regions in an image.
    • Corners: Points where two edges meet.
    • Blobs: Regions that differ in properties such as color or intensity from surrounding areas.
  • Example: In facial recognition, detecting key features like eyes, nose, and mouth edges is crucial for identification.

6. Convolutional Neural Networks (CNN)

CNNs are a specialized class of deep neural networks designed to process and analyze visual data. They are highly effective in tasks such as image classification, object detection, and image segmentation.

  • Structure of a CNN:
    1. Convolution Layer: The first layer in a CNN where filters (kernels) scan the input image to extract features like edges, colors, and textures.
      • Example: If you input a picture of a cat, the convolution layer extracts features like the shape of the cat’s eyes, ears, and fur pattern.
    2. ReLU Layer (Rectified Linear Unit): This activation function removes negative values from the feature maps, introducing non-linearity.
    3. Pooling Layer: Reduces the dimensionality of the feature maps by selecting the most important information (e.g., through Max Pooling).
      • Example: Max Pooling selects the brightest or most prominent feature in a given region, allowing the network to focus on key details.
    4. Fully Connected Layer: Flattens the input and uses it for classification. The flattened vector is used to assign labels to the input image.
      • Example: After feature extraction, the fully connected layer identifies whether the input image is a cat or a dog based on the probability distribution across labels.
  • Convolution Operation: Convolution is the core operation in CNNs. A small matrix (kernel) is slid across the image and multiplies pixel values to detect features such as edges. The convolution output is a feature map, highlighting specific patterns in the image.
  • Example: Applying an edge-detection filter on an image will highlight the boundaries of objects, such as outlining a building’s edges in a photograph.

ai cbse
ai cbse
This site is dedicated to provide contents, notes, questions bank,blogs,articles and other materials for AI students of CBSE.

Leave a Reply

Your email address will not be published. Required fields are marked *