Accuracy is calculated as total correct predictions divided by total predictions. While simple, accuracy cannot be relied upon for unbalanced datasets. For example, in a dataset with 99% healthy patients, a model predicting everyone as healthy would have 99% accuracy but would be completely useless for catching sick patients.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Confusion Matrix Explained in 8 Minutes | Precision, Recall, F1 ScoreIndexed:
Understand Confusion Matrix in the easiest way possible. In this video, we cover: ✔ True Positive (TP) ✔ True Negative (TN) ✔ False Positive (FP) ✔ False Negative (FN) ✔ Accuracy ✔ Precision ✔ Recall ✔ F1 Score ✔ Specificity ✔ ROC-AUC Advanced Metrics Covered: ✔ Matthews Correlation Coefficient (MCC) ✔ Cohen’s Kappa ✔ AUC-ROC Curve ✔ Balanced Accuracy ✔ Specificity ✔ F-Beta Score ✔ Multi-Class Confusion Matrix Perfect for: • Machine Learning Beginners • AI/ML Students • Interview Preparation • GATE Preparation • College Exams • Research Papers Analysis #MachineLearning #AI #ConfusionMatrix #DataScience #DeepLearning
Hello everyone. Today we are going to cover one of the most important evaluation tool in machine learning that is confusion matrix. So let's start.
First of all, why confusion matrix?
Let's start with why. Uh when we build a classifier, the first thing people look at is the accuracy. But accuracy alone can be very dangerous. Let's see.
Imagine a data set where we have 99% of the patients as healthy. A model that just say everyone is healthy with a accuracy of 99%. But it's completely useless because it never catches a single sick patient. Exactly. Here we have confusion matrix in the story. It breaks down the prediction into four categories showcasing us that how many we got right, how many types of error are there, what are the false alarming uh versus the missed detection and depending on the problem different errors have different cost. In medical diagnosis, missing a sick patient is far more dangerous than a false alarm. In spam filtering, it's opposite. You don't want to block a real email. Okay. So let's now move to section two.
Moving to the section two, we can see how a actual matrix look like. It's a 2 +2 table with actual values on the row and the predicted values on the column.
If you want you can inverse these also but just for now we can see the actual values are on the rows and the predicted values are on the column. Now it is divided into four cells. One is the TP true positive then true negative, false positive and false negative. Let's go one by one. True positive. The model said that it's positive and it actually was. So this is a correct detection.
Secondly, we have true negative. The model said that it was negative and it was actually. So it's a correct rejection. Third one is the false positive. The model said that it was positive but actually it was negative.
So that's a type one error. Coming to the fourth part is the false negative.
The model said that it was negative but it was actually positive. So that's a type two error. Okay. So uh now let's take a example. For example, we have 100 patient, 40 are sick. The model gets the true positive as 30, false negative as 10, false positive as five and true negative as 55. So to catch it out, we can see that 30 out of 40 a sick patient missed 10 and raised five false alarm.
So that's the story which accuracy can't tell us. So that's how this confusion matrix is important.
Moving forward to some of the basic matrices. From the confusion matrix, we derive some of the matrices which can be used uh for our basis level. For example, accuracy total correct over the total prediction simpler but can only be rely on balanced data set. This question can be asked many a time that when you can use accuracy. So accuracy cannot be used for unbalanced data set. Uh coming forward we have error rate. Error rate is just one minus the accuracy. The fraction we got negative.
Coming forward to the third one is the precision of everything the model called positive. How many were actually positive? The formula for this is TP over TN plus FP. Use this formula when false positive are expensive. For example, spam detection where you don't need to wrongly block real emails.
Coming towards the third one which is the recoil also called as the sensitivity or the true positive rate.
This will we also see in the AU graph AU graph and ROC curve of all the actual positive how many did the model catch formula TP over TP plus FN use this when the false positive are expensive like uh cancer screening when missing a sick person is dangerous now moving forward towards the here example is also given related to precision recoil and precision is high but recoil is low we miss 25% of the sick patient And that's a problem in a medical setting. Okay. So now let's moving forward to the section four.
Section four contain intermediate matrices. Moving to the intermediate matrices specificity or the true negative rate. This also we going to see in the AU curve. So of all actual negative how many did we correctly identify. It's a complement of FPR.
Okay. Key point is that recoil measure the positive class, specificity measure the negative class. Always pair them together. Okay. Now the F1 score. This is the harmonic mean of precision and recoil. Why harmonic mean? Because it serly panilize if either precision or recoil is very low. If one of them is zero, f_sub_1 becomes zero. From our example, f_sub_1 equals to 60 by 75.
That's a decent balance score. Okay. Now we also have f beta score. F beta score journal I is when beta is greater than one we emphasize recoil and when beta is less than one we emphasize precision. So it's basically a weighted F1 score.
Okay. Next we have FPR false positive rate fraction of actual negative that were wrongly called positive. This is on the x-axis of the ROC curve. We'll get into it shortly.
Okay. So now moving forward towards the advanced matrices we have MCC Matthew correlation coefficient the range is from minus1 to +1 +1 is perfect zero is random and minus1 is completely wrong.
The key example point is unlike f_sub_1 MCC unlike f1 score MCC uses all the four cells of the confusion matrix okay making it more informative for a single matrix. Moving forward towards the cohen kapa major how better the matter how better the model is for example it correctly observe accuracy by subtracting the expected chances of accuracy above 0.8% is strongly agreement for this uh coming forward towards the balanced accuracy which simply averages the sensitivity and the specificity. So this can be asked that uh the average of the sensitivity and specificity. So it's a it's not accuracy it's a balanced accuracy. Now moving forward towards we have NPV negative prediction value negative predictive value of everything predictive negative how many actually are it's the precision counterpart for the negative classes.
In section six we have multiclass. uh when we have more than two classes the matrix expand to K cross K the diagonal still uh represent the correctly predicted everything of diagonal is a mclassification for example for averaging a matrix we have macro averaging compute the matrix of each class independently then simply take the average for it secondly we have the weighted average same but weighted on how many class how many samples from each class should be taken third one is the micro averaging pull all the TP, FN, FP together first and then we have to compute it. Okay, in our uh threecl class example with C, cat, dog and bird, the cat class has TP equals to 25, FP= to 3, FN equals to 5 giving a precision of nearly 0.893 and recoil of 0.833.
Moving forward towards the section seven, we have precision recoil trade-off. Finally, the trade-off, the decision threshold is what the model uses to decide positive or negative.
Okay, by default, it is 0.5. If you raise the threshold, the model become more conservative, predicts very fewer positive, so precision got up and recoil drops. Uh, if you have a lower than the threshold, then it become more aggressively, catches more positive, so the recoil goes up, but the precision drop. Practical example. For example, for a COVID test, set a low threshold.
Catches as many as possible cases for the false alarm. For a spam detection, for a spam filter detection, set a higher threshold. Only flags things which are very surely about. Okay. Uh so now to wrap up and to close, the confusion matrix is the foundation of all the classification evaluation. Uh now you know all the four cells, know the key formula and understand. So that's how it all works. Thank you.
Related Videos
Decart Raises $300M to Build the Future of Realtime AI
DecartAI
252 views•2026-05-18
I Read Every Google Antigravity 2.0 Doc So You Don't Have To (13-Min Operator Playbook)
hyperautomationlabs1045
120 views•2026-05-19
Could AI change the future of cancer survival?
MotherConservative
999 views•2026-05-16
Firefox on Android Just Added 'Shake to Summarize'
BrenTech
349 views•2026-05-19
Google’s NEW AI Just SHOCKED The World…
JulianGoldiePodcast
188 views•2026-05-21
WWDC 2026 Promises Apple Intelligence and Siri Upgrades | Episode 195
TheMacRumorsShow
104 views•2026-05-22
RNNs Had a Fatal Flaw — Why Transformers Replaced Sequential Processing
axiom-motion-math
567 views•2026-05-18
Pu Lawmna Kima (LuhsAITech CEO) kawmna | India rama a hmasa ber niturin Agentic AI an siamchhuak ta!
mizoofficialchannel109
5K views•2026-05-19











