Confusion matrix is important topic of Machine learning.
Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime.
The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack. Detecting and avoiding cyber-attacks are difficult tasks.
There are three main objectives in our study. The first is to use actual cyber-crime data as input to predict a cyber-crime method and compare the accuracy results. The second is to measure whether cyber-crime perpetrators can be predicted based on the available data. The third objective is to understand the effect of victim profiles on cyber-attacks.
However, researchers have recently been solving these problems by developing security models and making predictions through artificial intelligence and Machine learning methods. A high number of methods of crime prediction are available in the literature. On the other hand, they suffer from a deficiency in predicting cyber-crime and cyber-attack methods.
This problem can be tackled by identifying an attack and the perpetrator of such attack, using actual data. The data include the type of crime, gender of perpetrator, damage and methods of attack. The data can be acquired from the applications of the persons who were exposed to cyber-attacks to the forensic units.
Here, we analyze cyber-crimes in two different models with machine-learning methods and predict the effect of the defined features on the detection of the cyber-attack method and the perpetrator. We used some machine-learning methods in our approach and concluded that their accuracy ratios were close.
The Logistic Regression was the leading method in detecting attackers with an accuracy rate of 65.42%.
In other method, we predicted whether the perpetrators could be identified by comparing their characteristics.
By using a simple table to show analytical results, the confusion matrix essentially boils down your outputs into a more digestible view.confusion matrix is a data visualization resource.
What is confusion matrix ? ?🤔
This is a binary classification. It can work on any prediction task that makes a yes or no, or true or false, distinction.
The confusion matrix uses specific terminology to arrange results. There are true positives and true negatives, as well as false positives and false negatives. For a more complicated confusion matrix or one based on comparison classification, these values might be shown as being actual and predicted classes for two distinct objects.
Where 0 is “Perpetrator Known”, 1 is “Perpetrator Unknown”.
The purpose of the confusion matrix is to show how…well, how confused the model is. To do so, we introduce two concepts: false positives and false negatives.
- If the model is to predict the positive (left) and the negative (right), then the false positive is predicting left when the actual direction is right.
- A false negative works the opposite way; the model predicts right, but the actual result is left.
Using a confusion matrix, these numbers can be shown on the chart as such:
In this confusion matrix, there are 19 total predictions made. 14 are correct and 5 are wrong.
The False Negative cell, number 3, means that the model predicted a negative, and the actual was a positive.
The False Positive cell, number 2, means that the model predicted a positive, but the actual was a negative.
The false positive means little to the direction a person chooses at this point. But, if you added some stakes to the choice, like choosing right led to a huge reward, and falsely choosing it meant certain death, then now there are stakes on the decision, and a false negative could be very costly. We would only want the model to make the decision if it were 100% certain that was the choice to make.
The confusion matrix gives you a lot of information, but sometimes you may prefer a more concise metric.
Precision:Precision measures how good our model is when the prediction is positive. It is the ratio of correct positive predictions to all positive predictions
precision = (TP) / (TP+FP)
TP is the number of true positives, and FP is the number of false positives.
A trivial way to have perfect precision is to make one single positive prediction and ensure it is correct (precision = 1/1 = 100%). This would not be very useful since the classifier would ignore all but one positive instance.
Recall:Recall goes another route. Instead of looking at the number of false positives the model predicted, recall looks at the number of false negatives that were thrown into the prediction mix.
recall = (TP) / (TP+FN)
Weighing the cost and benefits of choices gives meaning to the confusion matrix.
Results have revealed that the probability of cyber-attack decreases as the education and income level of victim increases. I believe that cyber-crime units will use these models. It will also facilitate the detection of cyber-attacks and make the fight against these attacks easier and more effective.