Unit 2: AI Project Cycle
September 19, 2024Unit 5: Computer Vision
September 19, 2024
Data Sciences
MCQs
- What is the primary component that AI depends on?
a) Algorithms
b) Hardware
c) Data
d) Software
Answer: c) Data
- Which of the following is NOT a domain of AI mentioned in the document?
a) Data Sciences
b) Computer Vision
c) Natural Language Processing
d) Quantum Computing
Answer: d) Quantum Computing
- Data Sciences primarily work around which type of data?
a) Image data
b) Numeric and alphanumeric data
c) Textual data
d) Speech data
Answer: b) Numeric and alphanumeric data
- Which of the following is an example of an AI application in finance?
a) Airline route planning
b) Fraud and risk detection
c) Website recommendations
d) Personality prediction
Answer: b) Fraud and risk detection
- How do search engines use data science?
a) To predict flight delays
b) To suggest movies
c) To deliver search results
d) To classify images
Answer: c) To deliver search results
- What is the primary objective of data science in genomics?
a) To improve airline route planning
b) To enhance movie recommendations
c) To personalize treatment based on DNA
d) To predict stock prices
Answer: c) To personalize treatment based on DNA
- What is the primary focus of targeted advertising?
a) Fraud detection
b) Customer segmentation
c) Maximizing click-through rates
d) Improving search results
Answer: c) Maximizing click-through rates
- Which model is chosen for predicting food wastage in restaurants?
a) Classification
b) Clustering
c) Regression
d) Reinforcement
Answer: c) Regression
- What does the AI project goal in the restaurant scenario aim to predict?
a) Customer behavior
b) Food wastage
c) Food quantity to be prepared
d) Restaurant ratings
Answer: c) Food quantity to be prepared
- Which factor is NOT part of the system map for the food wastage problem?
a) Number of customers
b) Price of dish
c) Customer feedback
d) Quantity of unconsumed dish
Answer: c) Customer feedback
- Which of the following is an offline data collection method?
a) Sensors
b) Surveys
c) Government portals
d) Kaggle
Answer: b) Surveys
- What format is used to store tabular data with comma-separated values?
a) JSON
b) XML
c) CSV
d) SQL
Answer: c) CSV
- What is the primary use of NumPy in Python?
a) Text processing
b) Web development
c) Arithmetic operations on arrays
d) Image recognition
Answer: c) Arithmetic operations on arrays
- Which Python package is primarily used for data manipulation and analysis?
a) NumPy
b) Matplotlib
c) Pandas
d) TensorFlow
Answer: c) Pandas
- Which data structure is used by Pandas to handle 2-dimensional data?
a) Series
b) DataFrame
c) Array
d) Tuple
Answer: b) DataFrame
- Which visualization library is mentioned for creating 2D plots in Python?
a) NumPy
b) Matplotlib
c) Pandas
d) Seaborn
Answer: b) Matplotlib
- What type of data is typically visualized using scatter plots?
a) Continuous data
b) Discontinuous data
c) Textual data
d) Image data
Answer: b) Discontinuous data
- What is the key feature of a histogram?
a) Represents frequency distribution
b) Shows textual data
c) Displays social networks
d) Analyzes time series
Answer: a) Represents frequency distribution
- What do box plots represent in data visualization?
a) Frequency of data
b) Quartiles and outliers
c) Discontinuous data
d) Categorical data
Answer: b) Quartiles and outliers
- What is the key concept of the K-Nearest Neighbour algorithm?
a) Majority voting
b) Decision trees
c) Nearest neighbors classification
d) Predicting stock prices
Answer: c) Nearest neighbors classification
- What is the primary application of K-Nearest Neighbour (KNN) in the document’s game activity?
a) Predicting stock market trends
b) Personality prediction
c) Flight delay prediction
d) DNA analysis
Answer: b) Personality prediction
- In the personality prediction game, which axis represents being task-focused?
a) Positive X-axis
b) Negative X-axis
c) Positive Y-axis
d) Negative Y-axis
Answer: b) Negative X-axis
- How many nearest neighbors are considered in KNN when K=1?
a) 1
b) 2
c) 3
d) 5
Answer: a) 1
- As the value of K increases in KNN, predictions become:
a) Less stable
b) More stable
c) Faster
d) Less accurate
Answer: b) More stable
- In the example of predicting fruit sweetness, when K=2, the prediction:
a) Is sweet
b) Is not sweet
c) Becomes uncertain
d) Depends on color
Answer: c) Becomes uncertain
- What is the disadvantage of having K=1 in KNN?
a) More computational resources
b) Unstable predictions
c) Slower prediction times
d) Too many errors
Answer: b) Unstable predictions
- What is an advantage of using Python in data science?
a) Complex syntax
b) Limited libraries
c) Predefined functions for statistics
d) Slow performance
Answer: c) Predefined functions for statistics
- What is Mean in statistics?
a) The sum of all values divided by the number of values
b) The middle value of a dataset
c) The most frequent value in a dataset
d) The range of values in a dataset
Answer: a) The sum of all values divided by the number of values
- What does the term ‘mode’ refer to in statistics?
a) The highest value
b) The most frequent value
c) The middle value
d) The sum of all values
Answer: b) The most frequent value
- Standard deviation is used to measure:
a) The mean
b) The range
c) The variability of data
d) The mode
Answer: c) The variability of data
- Which Python package is primarily used for data visualization?
a) NumPy
b) Pandas
c) Matplotlib
d) SciPy
Answer: c) Matplotlib
- What type of plot is used to represent continuous data frequency?
a) Scatter plot
b) Box plot
c) Histogram
d) Line plot
Answer: c) Histogram
- Outliers in a dataset are typically represented in box plots as:
a) Boxes
b) Whiskers
c) Circles or dots
d) Lines
Answer: c) Circles or dots
- Which type of data is commonly used in data science projects?
a) Textual data
b) Numeric and alpha-numeric data
c) Visual data
d) Audio data
Answer: b) Numeric and alpha-numeric data
- Data Science integrates methods from:
a) History and Literature
b) Statistics and Computer Science
c) Biology and Chemistry
d) Architecture and Engineering
Answer: b) Statistics and Computer Science
- What is a common source of online data for data science projects?
a) Personal interviews
b) Kaggle
c) Classroom surveys
d) Newspapers
Answer: b) Kaggle
- Which AI domain deals with image and visual data?
a) Data Sciences
b) Computer Vision
c) Natural Language Processing
d) Genetic Analysis
Answer: b) Computer Vision
- What is the significance of NaN in a dataset?
a) Represents text data
b) Represents an error
c) Represents missing or invalid data
d) Represents numerical data
Answer: c) Represents missing or invalid data
- What does CSV stand for?
a) Comma-Separated Values
b) Code-Specific Variables
c) Constant Search Value
d) Computer-Structured Variables
Answer: a) Comma-Separated Values
- Which tool is used to predict flight delays in airlines?
a) Classification model
b) K-Nearest Neighbours
c) Regression model
d) Neural networks
Answer: c) Regression model
- What is the primary benefit of using a box plot?
a) It shows continuous data
b) It provides frequency distribution
c) It displays data quartiles and outliers
d) It identifies trends over time
Answer: c) It displays data quartiles and outliers
- How are errors in data typically represented?
a) Through graphs
b) Through incorrect or invalid values
c) Through statistical models
d) Through outliers
Answer: b) Through incorrect or invalid values
- What is the role of Pandas in Python?
a) Image processing
b) Text analysis
c) Data manipulation and analysis
d) Speech recognition
Answer: c) Data manipulation and analysis
- Which data format is primarily used for tabular data?
a) JSON
b) CSV
c) XML
d) SQL
Answer: b) CSV
- What is a common use of regression models in data science?
a) Classification of text
b) Predicting numerical values
c) Recognizing speech
d) Processing images
Answer: b) Predicting numerical values
- What is the importance of cleaning data before analysis?
a) To format the data
b) To ensure accuracy
c) To speed up the analysis
d) To display data visually
Answer: b) To ensure accuracy
- Which of the following is NOT a statistical tool used in data analysis?
a) Mean
b) Mode
c) Regression
d) Histogram
Answer: d) Histogram
- What is the key challenge in airline route planning?
a) Flight delays
b) Customer loyalty programs
c) Predicting customer satisfaction
d) Predicting profitability
Answer: a) Flight delays
- Which of the following is an advantage of using NumPy arrays?
a) Can hold multiple data types
b) Homogeneous data collection
c) Simple to initialize
d) Require more memory
Answer: b) Homogeneous data collection
- What is the purpose of website recommendation systems?
a) Predict user behavior
b) Improve user experience
c) Boost product sales
d) All of the above
Answer: d) All of the above
QUESTION-ANSWERS:
- Explain how Artificial Intelligence (AI) depends on data.
AI fundamentally relies on data to function and improve its intelligence. Data fed into AI systems allow them to identify patterns, make predictions, and learn. Different types of data—numeric, visual, or textual—are used in various AI domains like Data Science (numeric data), Computer Vision (visual data), and Natural Language Processing (NLP) (textual data). Without data, AI cannot operate effectively.
- What are the key fields that Data Science integrates, and how do they contribute to its functions?
Data Science integrates fields such as Mathematics, Statistics, Computer Science, and Information Science. Mathematics provides theoretical foundations, while statistics offers tools for analyzing data. Computer Science enables the development of algorithms to process large datasets, and Information Science focuses on managing and retrieving this data. Together, these fields allow data scientists to analyze and make predictions from complex datasets.
- Describe the role of Data Science in fraud and risk detection in finance.
Data Science plays a crucial role in fraud and risk detection by analyzing historical data, customer profiles, and expenditures. In finance, it helps companies identify potential risks, detect fraud, and prevent bad debts. By examining customer behavior and transaction patterns, data scientists create predictive models that reduce financial losses and optimize risk management strategies.
- How does Data Science contribute to advancements in genetics and genomics?
In genetics and genomics, Data Science enables personalized treatments by analyzing large-scale genomic data. It helps in understanding the relationship between DNA and health, allowing researchers to predict how individuals may react to certain drugs or be predisposed to diseases. This deeper insight into human DNA helps in developing advanced genetic risk prediction models for more tailored healthcare solutions.
- Explain the use of data science algorithms in internet search engines.
Search engines like Google, Bing, and Yahoo use data science algorithms to deliver accurate search results quickly. These algorithms analyze user queries, rank relevant results, and improve over time through machine learning. With massive amounts of data processed daily, such algorithms help filter, sort, and rank results based on relevance, thereby enhancing the search experience.
- What is targeted advertising, and how does Data Science enhance it?
Targeted advertising uses data science algorithms to personalize advertisements based on user behavior and preferences. Data collected from online activity, such as browsing history and past purchases, is analyzed to display relevant ads. This increases the effectiveness of digital marketing campaigns by ensuring that ads reach the right audience, leading to higher conversion rates compared to traditional advertising methods.
- What is the AI project cycle, and why is it important in Data Science?
The AI project cycle consists of several stages: problem scoping, data acquisition, data exploration, modeling, and evaluation. This cycle is essential for systematically addressing real-world problems using AI and Data Science. It helps ensure that the problem is well understood, the right data is collected, accurate models are built, and their predictions are properly evaluated for effective solutions.
- How can Data Science help reduce food wastage in restaurants?
Data Science can help predict the amount of food to be prepared by analyzing historical data such as the number of customers, dish preferences, and past food wastage. By using regression models, restaurants can estimate the exact amount of food needed each day, minimizing wastage and reducing losses. Predictive algorithms help restaurants optimize their supply chain and daily food production.
- Describe the significance of regression models in predicting food wastage.
Regression models, which are part of supervised learning, predict continuous values by analyzing trends over time. In the restaurant scenario, regression models use historical data—such as customer footfall and dish consumption—to predict future demand. By training these models on previous data, restaurants can accurately estimate the amount of food to prepare, thus preventing excess wastage.
- What is the difference between offline and online data collection methods?
Offline data collection involves gathering data through traditional means such as surveys, interviews, or observations, while online data collection utilizes digital platforms like sensors, government portals, and open-source websites (e.g., Kaggle). Offline methods are useful for localized or manual data collection, whereas online methods allow access to larger datasets from reliable sources, facilitating more comprehensive analysis in Data Science.
- Explain the importance of data cleaning in Data Science.
Data cleaning is crucial because it ensures the accuracy and consistency of the data used in analysis. It involves identifying and correcting errors such as incorrect values, missing entries, and outliers. Without cleaning, faulty data can lead to incorrect conclusions and poor model performance. Clean data enhances the reliability of the models, ensuring accurate and meaningful insights.
- What are the advantages of using Python’s NumPy for data analysis?
NumPy is a powerful library for numerical operations in Python, widely used in Data Science for handling large datasets. It allows for efficient mathematical computations, such as matrix operations and arithmetic functions, on arrays. NumPy’s ability to process multi-dimensional data and its speed in handling large datasets make it a preferred choice for data manipulation and analysis in scientific computing.
- How do box plots represent data distribution, and what insights do they offer?
Box plots graphically display the distribution of a dataset by dividing it into quartiles. They show the minimum, first quartile, median, third quartile, and maximum, along with any outliers. Box plots are useful for identifying the spread and skewness of data, as well as detecting outliers. They offer a quick visual summary of data distribution and are commonly used in exploratory data analysis.
- What is the K-Nearest Neighbour (KNN) algorithm, and how does it work?
The KNN algorithm is a simple, supervised learning technique used for classification and regression tasks. It classifies data points based on the majority label of their nearest neighbors. For instance, if K=3, the algorithm looks at the 3 closest points to the unknown data and assigns it the most common class among those neighbors. KNN is based on the principle that similar data points exist near each other.
- What are some common sources of open data for Data Science projects?
Common sources of open data for Data Science projects include government portals, open-source platforms like Kaggle, and world organizations’ statistical websites. These sources provide reliable datasets that can be used for various purposes, from academic research to business analytics. Access to open data ensures that data scientists have authentic and diverse datasets to work with for model training and analysis.
This site is dedicated to provide contents, notes, questions bank,blogs,articles and other materials for AI students of CBSE.