Predict Email
Data description
----------- Columns---------------- Unnamed: 0 Email Text Email Type 0 0 re : 6 . 1100 , disc : uniformitarianism , re ... Safe Email 1 1 the other side of * galicismos * * galicismo *... Safe Email 2 2 re : equistar deal tickets are you still avail... Safe Email 3 3 \nHello I am your hot lil horny toy.\n I am... Phishing Email 4 4 software at incredibly low prices ( 86 % lower... Phishing Email ----------- Before cleanup---------------- RangeIndex: 18650 entries, 0 to 18649 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 18650 non-null int64 1 Email Text 18634 non-null object 2 Email Type 18650 non-null object ----------- After cleanup---------------- Int64Index: 18634 entries, 0 to 18649 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 18634 non-null int64 1 Email Text 18634 non-null object 2 Email Type 18634 non-null object
Model Performance
1. Random ForestAccuracy score: 93.8468550592525 Classification Report: precision recall f1-score support Phishing Email 0.92 0.96 0.94 2198 Safe Email 0.96 0.91 0.94 2190 accuracy 0.94 4388 macro avg 0.94 0.94 0.94 4388 weighted avg 0.94 0.94 0.94 4388
2. Bernoulli Naive Bayes
Accuracy score: 91.93254329990884 Classification Report: precision recall f1-score support Phishing Email 0.88 0.97 0.92 2198 Safe Email 0.97 0.86 0.91 2190 accuracy 0.92 4388 macro avg 0.92 0.92 0.92 4388 weighted avg 0.92 0.92 0.92 4388
3. XGBoost (eXtreme Gradient Boosting)
Accuracy score: 93.34548769371011 Classification Report: precision recall f1-score support Phishing Email 0.92 0.95 0.93 2198 Safe Email 0.95 0.92 0.93 2190 accuracy 0.93 4388 macro avg 0.93 0.93 0.93 4388 weighted avg 0.93 0.93 0.93 4388
4. Logistic Regression
Accuracy score: 97.08295350957155 Classification Report: precision recall f1-score support Phishing Email 0.96 0.98 0.97 2198 Safe Email 0.98 0.96 0.97 2190 accuracy 0.97 4388 macro avg 0.97 0.97 0.97 4388 weighted avg 0.97 0.97 0.97 4388
5. Decision Tree Classifier
Accuracy score: 90.8158614402917 Classification Report: precision recall f1-score support Phishing Email 0.89 0.93 0.91 2198 Safe Email 0.93 0.88 0.91 2190 accuracy 0.91 4388 macro avg 0.91 0.91 0.91 4388 weighted avg 0.91 0.91 0.91 4388
6. NeuralNetwork_lbfgs
Accuracy score: 97.19690063810393 Confusion matrics: [[2164 34] [ 89 2101]] Classification Report: precision recall f1-score support Phishing Email 0.96 0.98 0.97 2198 Safe Email 0.98 0.96 0.97 2190 accuracy 0.97 4388 macro avg 0.97 0.97 0.97 4388 weighted avg 0.97 0.97 0.97 4388