MACHINE LEARNING BASED PHISHING WEBSITE DETECTION SYSTEM VIA CLOUD
Keywords:
Phishing website detection, cloud deployment, AWS EC2, Random Forest, XG-Boost, machine learning, Web-Space-Kit, secure web application,,Abstract
In this digital era, the risk of cyberattacks such as phishing has risen significantly. Phishing attacks
trick users into revealing sensitive information by disguising malicious websites as legitimate ones.
This project focuses on detecting phishing websites using a machine learning-based approach
hosted entirely in the cloud. The system is deployed on an AWS EC2 instance and integrated with
a custom domain through the Web-Space-Kit platform, providing a seamless and secure web
interface for real-time URL analysis. The dataset used in this study comprises 11,000 samples
with 33 features extracted from URLs, encompassing both structural and content-based attributes.
Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and XG
Boost were among the few supervised machine learning algorithms that were implemented and
evaluated. The models were evaluated using accuracy as the primary performance metric.
Experimental results showed that Logistic Regression achieved 93.42% accuracy, Decision Tree
achieved 92.15%, SVM reached 91.64%, Random Forest attained 97.82%, and XG-Boost
achieved 96.99%. Among them, Random Forest emerged as the most reliable model due to its
ability to handle complex feature interactions and deliver the highest prediction accuracy. The
system’s cloud-based deployment allows users to enter any URL via a secure HTTPS web portal,
instantly obtain phishing or legitimate classification, view an explanation of the decision, and
monitor response times. This approach demonstrates how machine learning models, combined
with scalable cloud infrastructure, can effectively mitigate phishing risks and support a safer online
environment. Future enhancements could include integrating deep learning models, continuous
learning to detect new phishing patterns, and browser extension integration for real-time
protection.
References
C. Gu, 2021, “A Lightweight Phishing Website Detection Algorithm by Machine Learning,”
International Conference on Signal Processing and Machine Learning (CONF-SPML), pp.
–249.
J. Tanimu and S. Shiaeles, 2022, “Phishing Detection Using Machine Learning Algorithm,”
IEEE International Conference on Cyber Security and Resilience (CSR), pp. 282–287.
U. Zara, K. Ayub, H. U. Khan, A. Daud, et al., 2024, “Phishing Website Detection Using Deep
Learning Models,” IEEE Access, vol. 12, pp. 1–12.
R. S. Rao and S. T. Ali, 2015, “PhishShield: A Desktop Application for Detecting Phishing
Webpages Using Heuristic Techniques,” Procedia Computer Science, vol. 54, pp. 147–156.
H. Sampat, M. Shankar, A. Pandey, and H. Lopes, 2018, “Detection of Phishing Websites
Using Machine Learning Approaches,” International Research Journal of Engineering and
Technology (IRJET), vol. 5, no. 3, pp. 2500–2504.
S. C. Jeeva and E. B. R. Singh, 2016, “Intelligent Phishing URL Detection Using Association
Rule Mining,” International Journal of Computer Applications, Karunya University, India.
S. A. Al-Saaidah, 2017, “Detecting Phishing Emails Using Machine Learning Techniques,”
Middle East University, Department of Computer Science.
R. B. Basnet, A. H. Sung, and Q. Liu, 2014, “Learning to Detect Phishing URLs
International Journal of Research in Engineering and Technology (IJRET), Colorado
Mesa
University, USA.
A. K. Jain and B. B. Gupta, 2017, “Phishing Detection: Analysis of Visual Similarity-Based
Approaches,” Security and Communication Networks, vol. 2017, Hindawi.
J. Mao, J. Bian, W. Tian, S. Zhu, T. Wei, A. Li, and Z. Liang, 2018, “Detecting Phishing
Websites via Aggregation Analysis of Page Layouts,” Procedia Computer Science, vol. 129, pp.
–230.
R. Kiruthiga and D. Akila, 2019, “Phishing Website Detection Using Machine Learning
Techniques,” International Journal of Recent Technology and Engineering (IJRTE), vol. 8, no.
S11, pp. 123–127.
M. Chatterjee and A. S. Namin, 2019, “Detecting Phishing Websites through Deep
Reinforcement Learning,” IEEE 43rd Annual Computer Software and Applications Conference
(COMPSAC), pp. 536–541.
M. E. Pratiwi, 2018, “Phishing Site Detection Analysis Using Artificial Neural Networks,”
Journal of Physics: Conference Series, vol. 1140, no. 1, doi:10.1088/1742
/1140/1/012048. [14] R. Mahajen and I. Siddavatam, 2018, “Detection of Phishing
Websites Using Machine Learning Algorithms,” International Journal of Computer
Applications (IJCA), vol. 182, no. 12, pp. 1–6.
A. K. Dutta, 2021, “Phishing Website Detection Using Machine Learning Techniques,”
Open Access Journal of Information Security, vol. 12, pp. 15–22.
N. Md. Norzaidah and M. N. Bin, 2021, “Phishing Website Detection Using Random Forest
in Cloud-Based Environments,” 2nd International Conference on Artificial Intelligence and
Data Sciences (AiDAS), pp. 134 139.
A. Al swailem and B. Alabdullah, 2020, “Deep Learning Approach for Phishing Website
Detection in Cloud Platforms,” International Journal of Engineering Research & Technology
(IJERT), vol. 9, no. 5, pp. 120–125.