Perbandingan Algoritma Naïve Bayes dan Regresi Logistik pada Klasifikasi Produk Olahraga Tokopedia Berdasarkan Gender
Abstract
This research examines the application of Naïve Bayes algorithm and Logistic Regression for data classification on the Tokopedia e-commerce platform using PySpark. In the context of large and varied data volumes, PySpark is used to process and analyze data efficiently. Naïve Bayes, which is known for its simplicity and speed in the classification process, is applied to categorize products based on descriptions and other features. Additionally, logistic regression is applied to predict users' propensity to make purchases based on their activity history. The results show that both algorithms perform well in classification tasks, with logistic regression slightly superior in prediction accuracy 0.9967. This study provides insight by implementing machine learning methods to optimize efficiency and effectiveness of data classification in the e-commerce industry.
Keywords: Tokopedia, PySpark, Naïve Bayes, Logistic Regression, Big Data