Clustering of YouTube Viewer Data Based on Preferences using Leiden Algorithm

Authors

  • Erlin Windia Ambarsari Universitas Indraprasta PGRI, Jakarta
  • Aulia Paramita Universitas Indraprasta PGRI, Jakarta
  • Desyanti Sekolah Tinggi Teknologi Dumai, Riau

Keywords:

YouTube Viewer Engagement; Leiden Algorithm; Clustering; Viewer Behavior; Content Strategy

Abstract

This study aims to analyze YouTube viewer engagement patterns by applying the Leiden algorithm for clustering based on user interactions such as likes, dislikes, and subscription behaviors in correlation with video duration. Therefore, the method that we used begins with data cleaning to ensure completeness, followed by selecting relevant features and applying z-score normalization to equalize their contributions. A similarity graph is constructed using cosine similarity, representing instances as nodes and their relationships as edges. The Leiden algorithm is then applied to optimize modularity and extract clusters, with results integrated into the original dataset for analysis. Dimensionality reduction using PCA facilitates cluster visualization, while statistical summaries and distribution plots provide deeper insights into cluster characteristics. Subsequently, we obtained a dataset sourced from the YouTube content creator @ArmanVesona, which includes 237 instances with ten features: Shares, Comments Added, Dislikes, Likes, Subscribers Lost, Subscribers Gained, Views, Watch Time (hours), Impressions, and Click-Through Rate (%). The analysis reveals two distinct clusters: Cluster 0, characterized by lower engagement and stable audience, and Cluster 1, exhibiting higher engagement but higher subscriber churn. The findings highlight the effectiveness of the Leiden algorithm in detecting well-connected communities and provide insights into viewer behavior, aiding in the development of improved content strategies and targeted marketing approaches.

References

Afgiansyah, Televisi vs Youtube, Benarkah TV Akan Mati? Kumpulan Esai Seputar TV di Era Digital. Proxy Media, 2022.

S.-G. Jung, J. Salminen, and B. J. Jansen, “The Effect of Hiding Dislikes on the Use of YouTube’s Like and Dislike Features,” in 14th ACM Web Science Conference 2022, New York, NY, USA: ACM, Jun. 2022, pp. 202–207. doi: 10.1145/3501247.3531546.

S. Rashid, A. Ahmed, I. Al Barazanchi, and Z. A. Jaaz, “Clustering algorithms subjected to K-mean and gaussian mixture model on multidimensional data set,” Periodicals of Engineering and Natural Sciences, vol. 7, no. 2, pp. 448–457, 2019.

S. Sieranoja and P. Fränti, “Adapting k-means for graph clustering,” Knowl Inf Syst, vol. 64, no. 1, pp. 115–142, Jan. 2022, doi: 10.1007/s10115-021-01623-y.

J. Baarsch and M. E. Celebi, “Investigation of internal validity measures for K-means clustering,” in Proceedings of the International MultiConference of Engineers and Computer Scientists, 2012, pp. 471–476.

E. Patel and D. S. Kushwaha, “Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model,” Procedia Comput Sci, vol. 171, no. 2019, pp. 158–167, 2020, doi: 10.1016/j.procs.2020.04.017.

I. Ramadhaniati, “Product Clustering using K-MEANS Method in CV. JAYA ABADI,” Jurnal TAM (Technology Acceptance Model) , vol. 14, no. 1, pp. 91–97, 2023.

H. Humaira and R. Rasyidah, “Determining The Appropiate Cluster Number Using Elbow Method for K-Means Algorithm,” in Proceedings of the 2nd Workshop on Multidisciplinary and Applications (WMA) 2018, 24-25 January 2018, Padang, Indonesia, EAI, 2020. doi: 10.4108/eai.24-1-2018.2292388.

T. Wang, Q. Li, D. J. Bucci, Y. Liang, B. Chen, and P. K. Varshney, “K-Medoids Clustering of Data Sequences With Composite Distributions,” IEEE Transactions on Signal Processing, vol. 67, no. 8, pp. 2093–2106, 2019, doi: 10.1109/TSP.2019.2901370.

I. Fatma, H. S. Tambunan, and F. Rizki, “Analisis Metode K-Medoids Cluster Dalam Mengelompokkan Siswa Yang Berprestasi,” Bulletin of Informatics and Data Science, vol. 1, no. 1, 2022, [Online]. Available: https://ejurnal.pdsi.or.id/index.php/bids/index

R. Adha, N. Nurhaliza, and U. Soleha, “Perbandingan Algoritma DBSCAN dan K-Means Clustering untuk Pengelompokan Kasus Covid-19 di Dunia,” Jurnal Sains, Teknologi dan Industri, vol. 18, no. 2, pp. 206–211, 2021, [Online]. Available: https://covid19.who.int.

B. Nurina Sari, A. Primajaya, and J. H. Ronggowaluyo Teluk Jambe Karawang, “Penerapan Clustering DBSCAN Untuk Pertanian Padi di Kabupaten Karawang,” Jurnal Informatika dan Komputer, vol. 4, no. 1, pp. 28–34, 2019, [Online]. Available: www.mapcoordinates.net/en.

N. Selvia, E. Windia Ambarsari, and N. Dwitiyanti, “Shortest Path Clustering Dalam Menyaring Tingkat Kepadatan Arus Lalu Lintas,” JURIKOM (Jurnal Riset Komputer), vol. 10, no. 2, pp. 396–403, 2023, doi: 10.30865/jurikom.v10i2.5979.

I. A. Atiyah, A. Mohammadpour, N. Ahmadzadehgoli, and S. M. Taheri, “Fuzzy C-Means Clustering Using Asymmetric Loss Function,” Journal of Statistical Theory and Applications, vol. 19, no. 1, pp. 91–101, 2020, doi: 10.2991/jsta.d.200302.002.

O. Amira, J.-S. Zhang, and J. Liu, “Fuzzy c-means clustering with conditional probability based K–L information regularization,” J Stat Comput Simul, vol. 91, no. 13, pp. 2699–2716, Sep. 2021, doi: 10.1080/00949655.2021.1906243.

K. V. Rajkumar, A. Yesubabu, and K. Subrahmanyam, “Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation studies on a subset of CiteScore dataset,” International Journal of Electrical and Computer Engineering, vol. 9, no. 4, pp. 2760–2770, 2019, doi: 10.11591/ijece.v9i4.pp2760-2770.

F. AlMahamid and K. Grolinger, “Agglomerative Hierarchical Clustering with Dynamic Time Warping for Household Load Curve Clustering,” in 2022 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), 2022, pp. 241–247. doi: 10.1109/CCECE49351.2022.9918481.

M. Jafarzadegan, F. Safi-Esfahani, and Z. Beheshti, “An Agglomerative Hierarchical Clustering Framework for Improving the Ensemble Clustering Process,” Cybern Syst, vol. 53, no. 8, pp. 679–701, Nov. 2022, doi: 10.1080/01969722.2022.2042917.

M. -, A. B. Mutiara, S. Wirawan, T. Yusnitasari, and D. Anggraini, “Expanding Louvain Algorithm for Clustering Relationship Formation,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 1, 2023, doi: 10.14569/IJACSA.2023.0140177.

J. C. Paolillo, S. Ghule, and B. P. Harper, “A Network View of Social Media Platform History: Social Structure, Dynamics and Content on YouTube,” in Proceedings of the 52nd Hawaii International Conference on System Sciences, 2019, pp. 2632–2641. [Online]. Available: https://hdl.handle.net/10125/59701

V. A. Traag, L. Waltman, and N. J. van Eck, “From Louvain to Leiden: guaranteeing well-connected communities,” Sci Rep, vol. 9, no. 1, p. 5233, 2019, doi: 10.1038/s41598-019-41695-z.

S. Güldal, “The Effect of Scoring Factor for Leiden Algorithm,” Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 21, no. 3, pp. 559–564, 2021, doi: 10.35414/akufemubid.870835.

S. H. H. Anuar et al., “Comparison between Louvain and Leiden Algorithm for Network Structure: A Review,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Dec. 2021. doi: 10.1088/1742-6596/2129/1/012028.

K. Sliwa, E. Kusen, and M. Strembeck, “A Case Study Comparing Twitter Communities Detected by the Louvain and Leiden Algorithms During the 2022 War in Ukraine,” in Companion Proceedings of the ACM on Web Conference 2024, New York, NY, USA: ACM, May 2024, pp. 1376–1381. doi: 10.1145/3589335.3651892.

S. H. Hairol Anuar, Z. Abal Abas, N. Md Yunos, M. F. Mukhtar, T. Setiadi, and A. S. Shibghatullah, “Identifying Communities with Modularity Metric Using Louvain and Leiden Algorithms,” Pertanika J Sci Technol, vol. 32, no. 3, pp. 1285–1300, Apr. 2024, doi: 10.47836/pjst.32.3.16.

M. R. Firmansyah, “Stroke Classification Comparison with KNN through Standardization and Normalization Techniques,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401012, Jan. 2024, doi: 10.26877/asset.v6i1.17685.

D. M. Amin and A. Garg, “Performance Analysis of Data Mining Algorithms,” J Comput Theor Nanosci, vol. 16, no. 9, pp. 3849–3853, Sep. 2019, doi: 10.1166/jctn.2019.8260.

N. Salem and S. Hussein, “Data dimensional reduction and principal components analysis,” Procedia Comput Sci, vol. 163, pp. 292–299, 2019, doi: 10.1016/j.procs.2019.12.111.

J. J. Berman, “Understanding Your Data,” in Data Simplification, Elsevier, 2016, pp. 135–187. doi: 10.1016/B978-0-12-803781-2.00004-7.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Clustering of YouTube Viewer Data Based on Preferences using Leiden Algorithm

Published

2024-06-29

Issue

Section

Articles