Evaluation
MCQs:
- What does the “Evaluation” stage in the AI project cycle involve?
a) Problem scoping
b) Testing AI models with a dataset
c) Gathering data
d) Exploring data
Answer: b) Testing AI models with a dataset
- Why is it not recommended to use training data for model evaluation?
a) It leads to faster results
b) It helps avoid overfitting
c) It improves accuracy
d) It reduces precision
Answer: b) It helps avoid overfitting
- What is the term used when the model predicts a fire and there is actually a fire?
a) False Positive
b) True Negative
c) True Positive
d) False Negative
Answer: c) True Positive
- What does the term “False Positive” refer to?
a) Model correctly predicts an event that occurs
b) Model predicts an event that does not occur
c) Model misses predicting an event
d) Model performs better than expected
Answer: b) Model predicts an event that does not occur
- Which of the following conditions is termed “True Negative”?
a) No fire occurs, and the model predicts no fire
b) Fire occurs, and the model predicts no fire
c) Fire occurs, and the model predicts a fire
d) No fire occurs, and the model predicts a fire
Answer: a) No fire occurs, and the model predicts no fire
- What does the confusion matrix help to understand?
a) Accuracy of data collection
b) Relationship between prediction and reality
c) Overfitting of the model
d) Selection of the correct model
Answer: b) Relationship between prediction and reality
- How is accuracy defined in model evaluation?
a) Percentage of correct predictions out of all observations
b) Ratio of True Positives to False Negatives
c) Number of False Positives divided by True Negatives
d) A comparison between prediction time and actual results
Answer: a) Percentage of correct predictions out of all observations
- In the forest fire example, how is accuracy calculated if True Negatives are 98, and True Positives are 0 out of 100 cases?
a) 50%
b) 98%
c) 100%
d) 0%
Answer: b) 98%
- Which metric is used when both True Positives and False Positives are considered?
a) Recall
b) Precision
c) Accuracy
d) F1 Score
Answer: b) Precision
- In the forest fire scenario, what happens when Precision is low?
a) More fires are correctly predicted
b) There are many false alarms
c) There are no missed predictions
d) There is a decrease in model performance
Answer: b) There are many false alarms
- What does Recall measure in model evaluation?
a) Fraction of positive cases that are correctly identified
b) Percentage of negative cases that are correctly rejected
c) Difference between Precision and Accuracy
d) Overall model speed and efficiency
Answer: a) Fraction of positive cases that are correctly identified
- What is the denominator when calculating Recall?
a) True Positives and False Negatives
b) True Negatives and False Positives
c) True Positives and True Negatives
d) False Negatives and False Positives
Answer: a) True Positives and False Negatives
- What happens when a model has high Precision but low Recall?
a) The model performs better in all scenarios
b) The model performs poorly in identifying false negatives
c) The model predicts everything correctly
d) The model overfits
Answer: b) The model performs poorly in identifying false negatives
- In which situation would a False Negative have a high cost?
a) Spam mail prediction
b) Viral outbreak detection
c) Treasure hunting
d) Forest fire prediction
Answer: b) Viral outbreak detection
- What is F1 Score a balance of?
a) Recall and Accuracy
b) Precision and Recall
c) True Positives and False Negatives
d) Precision and Accuracy
Answer: b) Precision and Recall
- Which value would indicate a perfect F1 Score?
a) 0
b) 0.5
c) 1
d) 0.75
Answer: c) 1
- What does a low F1 Score indicate about a model’s performance?
a) High Precision and Recall
b) Low Precision and Recall
c) High Accuracy and Precision
d) High Recall and low Precision
Answer: b) Low Precision and Recall
- Which metric is most important when False Negatives are costly?
a) Precision
b) Accuracy
c) Recall
d) F1 Score
Answer: c) Recall
- What happens when Recall is high, but Precision is low?
a) The model misses predicting true cases
b) The model predicts false positives frequently
c) The model has perfect accuracy
d) The model is overfitting
Answer: b) The model predicts false positives frequently
- In which scenario would a False Positive be more costly than a False Negative?
a) Viral outbreak detection
b) Spam mail detection
c) Predicting rain
d) Water shortage prediction
Answer: b) Spam mail detection
- What does overfitting mean in AI models?
a) Model performs well only on training data
b) Model performs poorly on all datasets
c) Model generalizes well across all data
d) Model is unable to make predictions
Answer: a) Model performs well only on training data
- Why is Precision important in AI model evaluation?
a) It measures how often false positives occur
b) It ensures the model has high recall
c) It shows how correct positive predictions are
d) It reflects the model’s accuracy in all cases
Answer: c) It shows how correct positive predictions are
- Which condition describes a False Negative?
a) Fire occurs, and the model predicts fire
b) No fire occurs, and the model predicts fire
c) Fire occurs, but the model predicts no fire
d) No fire occurs, and the model predicts no fire
Answer: c) Fire occurs, but the model predicts no fire
- Which scenario would require high Precision?
a) Predicting viral outbreaks
b) Identifying potential treasure in mining
c) Predicting water shortage in schools
d) Predicting floods
Answer: b) Identifying potential treasure in mining
- What is the primary purpose of model evaluation?
a) Improving training data quality
b) Testing the model with the same data repeatedly
c) Assessing how well the model will work in the future
d) Reducing overfitting
Answer: c) Assessing how well the model will work in the future
- What happens when both Precision and Recall are high?
a) The F1 Score is low
b) The F1 Score is high
c) The model has low accuracy
d) The model is overfitting
Answer: b) The F1 Score is high
- In the forest fire scenario, what does a True Positive represent?
a) The model predicts fire, and there is a fire
b) The model predicts no fire, but there is a fire
c) The model predicts fire, but there is no fire
d) The model predicts no fire, and there is no fire
Answer: a) The model predicts fire, and there is a fire
- What is the role of the confusion matrix in model evaluation?
a) It helps calculate Precision and Recall
b) It measures the accuracy of predictions
c) It shows the difference between overfitting and underfitting
d) It displays the True Positives, True Negatives, False Positives, and False Negatives
Answer: d) It displays the True Positives, True Negatives, False Positives, and False Negatives
- Which of the following metrics takes both False Positives and False Negatives into account?
a) Precision
b) Recall
c) Accuracy
d) F1 Score
Answer: d) F1 Score
- If an AI model predicts a flood when there is no flood, which condition is this?
a) True Positive
b) True Negative
c) False Positive
d) False Negative
Answer: c) False Positive
- Why is high accuracy not always equivalent to good model performance?
a) Accuracy can ignore False Negatives
b) Accuracy takes into account both False Positives and False Negatives
c) Accuracy does not consider True Positives
d) High accuracy leads to model overfitting
Answer: a) Accuracy can ignore False Negatives
- What does a low Recall score indicate?
a) The model is underfitting
b) The model predicts more False Positives than True Positives
c) The model misses many actual positive cases
d) The model performs well in predicting negative cases
Answer: c) The model misses many actual positive cases
- If a model has 100% Precision, what does this mean?
a) The model predicts False Positives frequently
b) The model always predicts False Negatives
c) The model only predicts True Positives correctly
d) The model predicts all cases correctly
Answer: c) The model only predicts True Positives correctly
- Which metric would be prioritized in a scenario where missing positive cases is critical?
a) Precision
b) Accuracy
c) Recall
d) F1 Score
Answer: c) Recall
- What can be concluded if an AI model has low Precision but high Recall?
a) The model predicts fewer False Negatives but more False Positives
b) The model performs poorly overall
c) The model predicts fewer False Positives but more False Negatives
d) The model predicts everything correctly
Answer: a) The model predicts fewer False Negatives but more False Positives
- What does a False Negative cost in the case of a viral outbreak model?
a) Missed predictions of spam emails
b) Missed cases of the virus spreading
c) Unnecessary firefighting efforts
d) Incorrect treasure discovery
Answer: b) Missed cases of the virus spreading
- Which of the following measures is used to balance Precision and Recall?
a) Accuracy
b) F1 Score
c) Recall
d) True Positive Rate
Answer: b) F1 Score
- What happens when the F1 Score is low?
a) The model has poor Precision or Recall
b) The model predicts everything correctly
c) The model has overfitted the training data
d) The model performs exceptionally well
Answer: a) The model has poor Precision or Recall
- Which scenario would benefit from a higher Precision than Recall?
a) Flood prediction
b) Viral outbreak detection
c) Identifying treasure locations in mining
d) Traffic prediction for school buses
Answer: c) Identifying treasure locations in mining
- What happens in the confusion matrix when there are more False Negatives than True Positives?
a) Precision decreases
b) Recall decreases
c) Accuracy increases
d) F1 Score increases
Answer: b) Recall decreases
- If a model always predicts “no rain” and there is 2% chance of rain, what is its accuracy for 100 cases with 98 correct predictions?
a) 100%
b) 2%
c) 50%
d) 98%
Answer: d) 98%
- How does overfitting affect a model’s performance?
a) The model performs well on unseen data
b) The model only performs well on training data
c) The model predicts True Negatives correctly
d) The model increases Recall
Answer: b) The model only performs well on training data
- Which of the following scenarios would suffer the most from False Positives?
a) Detecting spam emails
b) Predicting water shortage in schools
c) Predicting a viral outbreak
d) Predicting traffic jams for buses
Answer: a) Detecting spam emails
- Which metric should be improved if the model misses many actual positive cases?
a) Precision
b) Recall
c) Accuracy
d) F1 Score
Answer: b) Recall
- In which situation would False Positives be acceptable, but False Negatives should be minimized?
a) Mining for treasure
b) Predicting traffic jams
c) Detecting viral outbreaks
d) Predicting spam emails
Answer: c) Detecting viral outbreaks
- What happens when the F1 Score of a model is close to 1?
a) The model has poor Precision
b) The model has poor Recall
c) The model has balanced Precision and Recall
d) The model has a high number of False Negatives
Answer: c) The model has balanced Precision and Recall
- In model evaluation, what does Precision prioritize?
a) Minimizing False Negatives
b) Correct positive predictions
c) Correct negative predictions
d) Improving the confusion matrix
Answer: b) Correct positive predictions
- Why would a model with high Precision and low Recall still not perform well?
a) It predicts too many False Negatives
b) It has overfitted the data
c) It predicts too many False Positives
d) It increases Accuracy unnecessarily
Answer: a) It predicts too many False Negatives
- What does a confusion matrix with high True Positives and low False Negatives indicate?
a) The model has high Precision and Recall
b) The model has high Accuracy but low Precision
c) The model has overfitted the data
d) The model has poor performance
Answer: a) The model has high Precision and Recall
- In which case would a False Negative result in the worst outcome?
a) Predicting a flood
b) Detecting spam emails
c) Identifying treasure in a mine
d) Predicting traffic for school buses
Answer: a) Predicting a flood
Assertion-Reasoning Questions
- Assertion (A): Precision is the percentage of true positive cases versus all positive predictions.
Reason (R): Precision only considers False Positives and not False Negatives.
Answer:
a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.
Correct Answer: b) Both A and R are true, but R is not the correct explanation of A.
- Assertion (A): Recall is the fraction of positive cases that are correctly identified by the model.
Reason (R): Recall only considers True Positives and False Negatives.
Answer:
a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.
Correct Answer: a) Both A and R are true, and R is the correct explanation of A.
- Assertion (A): Overfitting occurs when a model performs well on training data but poorly on new data.
Reason (R): Overfitting happens because the model memorizes the training data.
Answer:
a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.
Correct Answer: a) Both A and R are true, and R is the correct explanation of A.
- Assertion (A): F1 Score provides a balanced evaluation by combining Precision and Recall.
Reason (R): F1 Score ranges from 0 to 1, with 1 being the perfect score.
Answer:
a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.
Correct Answer: a) Both A and R are true, and R is the correct explanation of A.
- Assertion (A): Accuracy is the best evaluation metric for all AI models.
Reason (R): High accuracy always indicates that a model has good performance.
Answer:
a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.
Correct Answer: c) A is true, but R is false.
15 Important 4-Marks Questions with Answers
Question: Why should the same data used to train a model not be used for evaluation?
Answer:
Using the same data for both training and evaluation can lead to overfitting, where the model memorizes the training data and performs well on it but poorly on new, unseen data. To evaluate a model’s ability to generalize to new data, it is necessary to use a separate test set that was not used during training. This ensures a more accurate assessment of the model’s performance.
Question: Explain the concept of overfitting in AI models and why it is problematic.
Answer:
Overfitting occurs when a model learns the training data too well, including the noise and outliers, causing it to perform poorly on new, unseen data. It is problematic because the model fails to generalize, making incorrect predictions on new data that wasn’t part of the training set. Overfitting results from excessive complexity in the model, such as too many parameters or insufficient training data.
Question: What is the confusion matrix, and how does it help in evaluating an AI model?
Answer:
A confusion matrix is a tool used to summarize the performance of a classification model by comparing the predicted and actual outcomes. It contains four outcomes: True Positive, True Negative, False Positive, and False Negative. The confusion matrix helps in understanding where the model is making errors (false positives and false negatives) and gives insight into metrics like accuracy, precision, recall, and F1 score.
Question: Differentiate between Precision and Recall.
Answer:
Precision measures the proportion of true positive predictions among all positive predictions (True Positives + False Positives), focusing on the accuracy of positive predictions. Recall measures the proportion of actual positive cases that were correctly predicted, focusing on how well the model identifies all actual positive cases (True Positives + False Negatives). Precision is important when the cost of false positives is high, while Recall is vital when the cost of missing positive cases is significant.
Question: Discuss the importance of F1 Score in model evaluation.
Answer:
The F1 Score is important because it provides a balanced measure of both Precision and Recall, making it useful when you want to account for both false positives and false negatives. It is especially valuable when there is an uneven class distribution or when one type of error is more critical than another. The F1 score ranges from 0 to 1, where 1 represents perfect precision and recall balance.
Question: Why is accuracy not always a reliable metric for model performance, particularly in imbalanced datasets?
Answer:
In imbalanced datasets, accuracy can be misleading because it may reflect the correct prediction of the majority class while ignoring the minority class. For example, in a dataset where 95% of cases belong to one class, a model predicting only the majority class can achieve high accuracy, but it would fail to predict the minority class correctly, leading to poor performance in real-world applications.
Question: Explain the role of Recall in scenarios where False Negatives are costly.
Answer:
Recall becomes crucial in scenarios where missing a positive case (False Negative) can have severe consequences. For example, in medical diagnoses or detecting a viral outbreak, failing to identify a positive case can lead to life-threatening situations. High recall ensures that most of the actual positive cases are correctly identified, minimizing the occurrence of False Negatives.
Question: How does the F1 Score differ from Accuracy, and when should it be used?
Answer:
The F1 Score differs from Accuracy by focusing on the balance between Precision and Recall rather than just the overall correct predictions. It should be used in cases where there is an imbalance between classes or where both False Positives and False Negatives need to be minimized. Accuracy can be misleading in such cases, while the F1 Score provides a more meaningful evaluation.
Question: What are the four outcomes recorded in a confusion matrix, and what do they signify?
Answer:
The four outcomes in a confusion matrix are:
True Positive (TP): Correctly predicted positive cases.
True Negative (TN): Correctly predicted negative cases.
False Positive (FP): Incorrectly predicted positive cases (Type I error).
False Negative (FN): Incorrectly predicted negative cases (Type II error). These outcomes help assess the model’s classification performance.
Question: Define Precision and discuss its significance in spam email detection.
Answer:
Precision is the ratio of true positive predictions to the total number of positive predictions (both true and false positives). In spam email detection, high Precision is important to reduce the number of non-spam emails incorrectly classified as spam (false positives), as these could contain important information. A model with high Precision ensures fewer false alarms.
Question: Explain why Recall is important in medical diagnostic models.
Answer:
Recall is vital in medical diagnostics because missing a positive diagnosis (False Negative) can have serious health consequences. High Recall ensures that most patients with the condition are correctly identified, minimizing the risk of leaving cases untreated. It reduces the likelihood of False Negatives, which is critical when early detection can save lives.
Question: What is the role of False Positives and False Negatives in evaluating AI models?
Answer:
False Positives (predicting an event that didn’t happen) and False Negatives (failing to predict an event that did happen) are crucial in evaluating a model’s performance. False Positives may lead to unnecessary actions (e.g., sending firefighters when there is no fire), while False Negatives may result in missing critical events (e.g., ignoring a real fire). Balancing these errors is essential for effective model evaluation.
Question: Why is Precision important in situations where False Positives are costly?
Answer:
In situations where False Positives result in significant costs or wasted resources (e.g., treasure hunting or unnecessary medical treatments), Precision is important to minimize the number of false alarms. A high Precision model ensures that when the model predicts a positive outcome, it is highly likely to be correct, reducing unnecessary actions.
Question: How does the F1 Score help in comparing AI models?
Answer:
The F1 Score provides a single metric that combines Precision and Recall, making it easier to compare models, especially when there is an imbalance between classes or when both types of errors (False Positives and False Negatives) need to be minimized. It helps to evaluate which model performs better in terms of both sensitivity and specificity.
Question: Explain the significance of Recall in predicting viral outbreaks.
Answer:
Recall is significant in predicting viral outbreaks because failing to identify an outbreak (False Negatives) could lead to widespread infection and delayed responses. A high Recall ensures that most outbreaks are detected, allowing for timely intervention and containment. Minimizing False Negatives in such cases is critical for public health safety.