Computer Vision
MCQs:
- What does Computer Vision enable machines to do?
a) Read text
b) Hear sounds
c) See through images and process visual data
d) Speak
Answer: c) See through images and process visual data
- Which application of Computer Vision involves recognizing human faces for security purposes?
a) Face Filters
b) Google Translate App
c) Facial Recognition
d) Medical Imaging
Answer: c) Facial Recognition
- What technology uses Computer Vision for self-driving cars?
a) Image Classification
b) Object Detection
c) Convolution
d) Medical Imaging
Answer: b) Object Detection
- In a grayscale image, the darkest pixel has a value of:
a) 255
b) 128
c) 0
d) 1
Answer: c) 0
- What are the three primary colors used in an RGB image?
a) Red, Yellow, Blue
b) Red, Green, Blue
c) Red, Green, Black
d) Yellow, Blue, White
Answer: b) Red, Green, Blue
- Which of the following applications uses Computer Vision to help users search for information using an image?
a) Face Filters
b) Google’s Search by Image
c) Google Translate App
d) Inventory Management
Answer: b) Google’s Search by Image
- What type of image does not have color but consists of different shades of gray?
a) RGB Image
b) Grayscale Image
c) Filtered Image
d) Segmented Image
Answer: b) Grayscale Image
- The term “pixel” refers to:
a) A picture element
b) A processing unit
c) A color model
d) An algorithm
Answer: a) A picture element
- What does a convolution operation do in image processing?
a) Brightens the image
b) Combines multiple images
c) Multiplies the image by a kernel
d) Increases the resolution of the image
Answer: c) Multiplies the image by a kernel
- What is the range of pixel values in an 8-bit grayscale image?
a) 0-100
b) 0-500
c) 0-255
d) 0-1024
Answer: c) 0-255
- In computer vision, which task involves assigning a label to an image from a fixed set of categories?
a) Object Detection
b) Instance Segmentation
c) Image Classification
d) Feature Extraction
Answer: c) Image Classification
- Which layer of the Convolutional Neural Network (CNN) is responsible for extracting high-level features like edges?
a) Fully Connected Layer
b) Convolution Layer
c) Pooling Layer
d) ReLU Layer
Answer: b) Convolution Layer
- What does Max Pooling do in a CNN?
a) Extracts high-level features
b) Returns the maximum value from the image region
c) Classifies the image
d) Reduces noise in the image
Answer: b) Returns the maximum value from the image region
- What is the purpose of the Fully Connected Layer in a CNN?
a) Reduce image size
b) Extract features
c) Perform image segmentation
d) Classify the image into labels
Answer: d) Classify the image into labels
- Which feature in images is considered the easiest to detect in image processing?
a) Edges
b) Corners
c) Flat surfaces
d) Textures
Answer: b) Corners
- What type of pooling reduces the spatial size of the convolved feature while retaining the important features?
a) Average Pooling
b) Max Pooling
c) Zero Pooling
d) Dynamic Pooling
Answer: b) Max Pooling
- In an image, how are RGB images stored?
a) As a single grayscale value
b) In three different channels
c) As a string of text
d) Using a single bit
Answer: b) In three different channels
- Which computer vision task involves finding instances of real-world objects in images or videos?
a) Instance Segmentation
b) Object Detection
c) Feature Extraction
d) Classification
Answer: b) Object Detection
- In computer systems, pixel data is stored in:
a) Binary format
b) Text format
c) Audio format
d) XML format
Answer: a) Binary format
- What is the result when R=G=B=0 in an RGB image?
a) Black
b) White
c) Gray
d) Red
Answer: a) Black
- Which application of Computer Vision converts 2D scans into 3D models for medical professionals?
a) Self-driving cars
b) Medical Imaging
c) Google Translate App
d) Face Filters
Answer: b) Medical Imaging
- In which year was the concept of Computer Vision introduced?
a) 1960s
b) 1970s
c) 1980s
d) 1990s
Answer: b) 1970s
- Which component of a CNN helps introduce non-linearity into the feature map?
a) Pooling Layer
b) Convolution Layer
c) Fully Connected Layer
d) ReLU Layer
Answer: d) ReLU Layer
- What is an 8-bit image’s maximum pixel value?
a) 100
b) 128
c) 200
d) 255
Answer: d) 255
- What feature does Google Translate use to provide real-time text translation through the camera?
a) Optical Character Recognition
b) Face Detection
c) Image Filters
d) Object Detection
Answer: a) Optical Character Recognition
- Which image property determines how much detail the image has?
a) Color depth
b) Pixel value
c) Resolution
d) Filter
Answer: c) Resolution
- What is a key challenge in object detection?
a) Identifying edges
b) Locating the objects accurately
c) Applying filters
d) Storing the image
Answer: b) Locating the objects accurately
- What does the kernel in convolution help in image processing?
a) Identifying colors
b) Resizing the image
c) Enhancing certain features
d) Increasing brightness
Answer: c) Enhancing certain features
- Which of the following is NOT a layer in Convolutional Neural Networks (CNN)?
a) Fully Connected Layer
b) Pooling Layer
c) ReLU Layer
d) Median Filter Layer
Answer: d) Median Filter Layer
- What is the output color when R=255, G=0, and B=0 in the RGB model?
a) Blue
b) Green
c) Red
d) Yellow
Answer: c) Red
- What is the term for assigning each pixel a label in the instance segmentation task?
a) Localization
b) Classification
c) Feature extraction
d) Instance segmentation
Answer: d) Instance segmentation
- In convolution, which operation occurs after multiplying the image and the kernel?
a) Pooling
b) Summation
c) Filtering
d) Transformation
Answer: b) Summation
- The Google Translate App uses Computer Vision for which task?
a) Facial Recognition
b) Image Classification
c) Object Detection
d) Text Translation
Answer: d) Text Translation
- The more pixels an image has, the ______ it is.
a) Smaller
b) Larger
c) Blurred
d) Brighter
Answer: b) Larger
- What is the smallest unit of a digital image?
a) Frame
b) Pixel
c) Color depth
d) Bit
Answer: b) Pixel
- In the RGB color model, what happens when all channels (R, G, B) are set to 255?
a) Black color is produced
b) Gray color is produced
c) Red color is produced
d) White color is produced
Answer: d) White color is produced
- What type of layer in a CNN is responsible for reducing the size of the image?
a) Pooling Layer
b) Convolution Layer
c) Fully Connected Layer
d) Feature Layer
Answer: a) Pooling Layer
- What is the main purpose of image classification in Computer Vision?
a) To extract edges from images
b) To reduce the size of the image
c) To assign an image a label from a set of categories
d) To increase the resolution
Answer: c) To assign an image a label from a set of categories
- Which term refers to detecting the presence and location of an object in an image?
a) Object Detection
b) Localization
c) Image Segmentation
d) Feature extraction
Answer: b) Localization
- What are grayscale images commonly used for?
a) Detecting faces
b) Simplifying image processing
c) Increasing color intensity
d) Reducing image noise
Answer: b) Simplifying image processing
- What technique involves finding the exact pixel location of features in an image?
a) Image Compression
b) Feature Detection
c) Image Denoising
d) Color Segmentation
Answer: b) Feature Detection
- In CNNs, what is the benefit of applying multiple convolution layers?
a) Increase image size
b) Extract high-level features
c) Remove color from the image
d) Reduce brightness
Answer: b) Extract high-level features
- What is a fundamental use of Computer Vision in retail?
a) Image Classification
b) Tracking customer movements
c) Face recognition
d) Object Segmentation
Answer: b) Tracking customer movements
- Which layer of CNN removes negative pixel values?
a) Pooling Layer
b) ReLU Layer
c) Convolution Layer
d) Fully Connected Layer
Answer: b) ReLU Layer
- What is the RGB color model primarily used for?
a) Encoding textures
b) Image classification
c) Color representation in images
d) Denoising images
Answer: c) Color representation in images
- Which feature in an image helps identify an edge during image processing?
a) Brightness
b) Contrast
c) Gradient
d) Pixel density
Answer: c) Gradient
- What does object detection in self-driving cars help with?
a) Recognizing facial expressions
b) Navigating through environments
c) Classifying textures
d) Translating text
Answer: b) Navigating through environments
- A kernel is typically used in which operation of image processing?
a) Image Segmentation
b) Feature Detection
c) Convolution
d) Denoising
Answer: c) Convolution
- How does Google’s Search by Image feature operate?
a) Text comparison
b) Feature matching with a database
c) Audio analysis
d) Object segmentation
Answer: b) Feature matching with a database
- What is the goal of instance segmentation in Computer Vision?
a) To count the number of objects
b) To assign each object a category and label its pixels
c) To increase image resolution
d) To detect edges in images
Answer: b) To assign each object a category and label its pixels
Assertion-Reasoning based:
- Assertion: Computer Vision enables machines to process and analyze visual data using algorithms.
Reason: Computer Vision allows machines to mimic human intelligence by interpreting visual information in real time.
Options:
a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
c) Assertion is true, but Reason is false.
d) Assertion is false, but Reason is true.
Answer: a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
- Assertion: Grayscale images are composed of three channels: red, green, and blue.
Reason: Grayscale images use pixel values ranging from 0 (black) to 255 (white) to represent different shades of gray.
Options:
a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
c) Assertion is true, but Reason is false.
d) Assertion is false, but Reason is true.
Answer: d) Assertion is false, but Reason is true.
- Assertion: Object detection involves classifying objects into a fixed set of categories.
Reason: Object detection is focused on identifying and locating multiple objects in an image.
Options:
a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
c) Assertion is true, but Reason is false.
d) Assertion is false, but Reason is true.
Answer: b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
- Assertion: The convolution operation in image processing involves the multiplication of the image by a kernel.
Reason: Convolution reduces the spatial size of an image while retaining its important features.
Options:
a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
c) Assertion is true, but Reason is false.
d) Assertion is false, but Reason is true.
Answer: c) Assertion is true, but Reason is false.
- Assertion: Max Pooling in Convolutional Neural Networks (CNN) returns the minimum value from the portion of the image covered by the kernel.
Reason: Max Pooling reduces the spatial size of the image while retaining the most important features.
Options:
a) Both Assertion and Reason are true, and Reason is the correct explanation of Assertion.
b) Both Assertion and Reason are true, but Reason is not the correct explanation of Assertion.
c) Assertion is true, but Reason is false.
d) Assertion is false, but Reason is true.
Answer: d) Assertion is false, but Reason is true.
Question-Answer:
- Question: What is the role of Computer Vision in self-driving cars?
Answer: Computer Vision plays a crucial role in self-driving cars by enabling the vehicle to perceive its surroundings. It allows the car to detect and identify objects, such as pedestrians, traffic signs, and other vehicles. It also assists in determining navigational routes and environmental monitoring, which are essential for the safe operation of autonomous vehicles.
- Question: Explain the concept of object detection in Computer Vision.
Answer: Object detection in Computer Vision is the process of identifying instances of real-world objects, such as faces, cars, and buildings, in images or videos. It involves not only detecting the presence of an object but also identifying its location within the image. This task is used in various applications, including facial recognition, video surveillance, and automated vehicle systems.
- Question: What is a grayscale image and how is it different from an RGB image?
Answer: A grayscale image consists of varying shades of gray, without any color. Each pixel in a grayscale image has a value between 0 (black) and 255 (white). An RGB image, on the other hand, consists of three color channels (Red, Green, and Blue), where each pixel is a combination of values from these three channels, providing full-color representation.
- Question: Define and explain the concept of image resolution.
Answer: Image resolution refers to the number of pixels in an image and is typically expressed as width × height. For example, an image resolution of 1280×1024 means the image is 1280 pixels wide and 1024 pixels high. Higher resolution means more pixels, which usually results in more detail and better image quality.
- Question: How does the ReLU layer function in a Convolutional Neural Network (CNN)?
Answer: The ReLU (Rectified Linear Unit) layer introduces non-linearity to the CNN. It replaces all negative pixel values in the feature map with zero, while positive values remain unchanged. This helps the CNN model learn complex patterns in the data by increasing the model’s ability to capture non-linear relationships.
- Question: What is the significance of the pooling layer in a CNN?
Answer: The pooling layer in a CNN is used to reduce the spatial size of the convolved features, which helps in decreasing the computational power required for processing the data. It also helps in retaining the important features of the image while making the model more robust to distortions and shifts in the input image.
- Question: Describe the application of facial recognition in smart homes and cities.
Answer: In smart homes and cities, facial recognition is used for security and access control. It can recognize authorized individuals, maintain visitor logs, and trigger actions such as opening doors or granting access to restricted areas. In schools, it can be used for automated attendance systems, where the system identifies students using facial recognition.
- Question: Explain how Google Translate App uses Computer Vision.
Answer: The Google Translate App uses Computer Vision through its Optical Character Recognition (OCR) feature. The app can identify text in real-time using the phone’s camera. It then translates the text into the user’s preferred language by analyzing the characters in the image and overlaying the translation using augmented reality.
- Question: What is the importance of pixel values in digital images?
Answer: Pixel values represent the color or brightness of each individual pixel in a digital image. In an 8-bit grayscale image, for example, pixel values range from 0 (black) to 255 (white). These values are essential for determining the visual properties of the image and are used by algorithms in tasks such as filtering, compression, and feature extraction.
- Question: What are the advantages of using computer vision in medical imaging?
Answer: Computer vision enhances medical imaging by enabling more accurate analysis of medical scans, such as CT and MRI scans. It can convert 2D scan images into interactive 3D models, providing medical professionals with detailed insights into a patient’s condition. This helps improve diagnosis, treatment planning, and surgery preparation.
- Question: What is a kernel in the context of convolution in image processing?
Answer: A kernel, also called a filter, is a small matrix that is used in the convolution process in image processing. It is slid over the image, where each pixel value is multiplied by the corresponding value in the kernel, and the results are summed to produce a new pixel value. This operation helps in extracting features such as edges and textures from an image.
- Question: How does the “Google’s Search by Image” feature use Computer Vision?
Answer: Google’s Search by Image feature allows users to upload an image, which is then processed using Computer Vision techniques. The system analyzes the image and compares it to a database of images to find similar content. It identifies key features of the image, such as shapes, colors, and patterns, and returns matching search results based on these features.
- Question: What is the difference between classification and object detection in Computer Vision?
Answer: Classification in Computer Vision assigns a single label to an entire image from a set of predefined categories. Object detection, on the other hand, not only classifies objects but also identifies their locations within the image. Object detection is typically more complex as it requires detecting multiple objects and their positions.
- Question: Why are corners considered good features in images for computer vision tasks?
Answer: Corners are considered good features because they are unique and easy to detect in images. They provide distinct information as their appearance changes significantly when moved. Unlike flat surfaces or edges that appear similar when shifted, corners stand out due to their abrupt change in pixel values, making them reliable for tasks like object detection and feature extraction.
- Question: How does max pooling help in a Convolutional Neural Network?
Answer: Max pooling in a Convolutional Neural Network reduces the spatial size of the convolved features by taking the maximum value from a set of pixels in a region covered by the kernel. This helps in reducing the dimensionality of the data, lowering the computational cost, and making the model more resistant to variations like shifts and distortions in the input image.