This is the full Simulation Codes package used to generate the results presented in the paper titled “Comparative Analysis of Feature Selection Techniques for Malicious Website Detection in SMOTE Balanced Data”. Article link: https://doi.org/10.46470/03d8ffbd.993cf635
Article link: https://rs-ojict.pubpub.org/pub/ci8qmhls
====== Work Summary =====
The advancement in network technology has led to an exponential rise in the number of internet users across the globe. The increase in internet usage has resulted in an increase in both the number of malicious websites and cybercrimes reported over the years. Therefore, it has become critical to devise an intelligent solution that can detect malicious websites and be used in real-time systems. In our paper, we perform a comparative analysis of various feature selection techniques to build a time-efficient and accurate predictive model. To build our predictive model, a set of features are selected by feature selection methods. The selected features consist of at least 70% of the categorical features in all feature selection techniques examined in this paper. Keeping the end goal of real-time deployment of models in context the cost of processing or storing these features is far cheaper when compared to text or image-based features. We started out with a class imbalance in our data which was later dealt with using the Synthetic Minority Oversampling Technique. Our proposed model also bested the existing work in the literature when compared over various evaluation metrics. The result indicated that Embedded feature selection was the best technique considering the accuracy of the model. The Filter-based technique may also be used in the context of developing a low latency system at the cost of the accuracy of the model.