Unpacking Human Behavior Online: Insights from the YT-30M Dataset

Tuesday 25 February 2025


A massive dataset of YouTube comments has been released, offering a unique window into human behavior online. The dataset, called YT-30M, contains over 32 million comments posted by users in more than 50 languages. Each comment is associated with its corresponding video and channel, allowing researchers to analyze the relationship between content and user engagement.


One of the most striking features of the dataset is its sheer size and diversity. The comments span a wide range of categories, from music and entertainment to education and non-profit organizations. This makes it an ideal resource for studying how different types of content affect user behavior and sentiment.


The dataset also contains information about the number of upvotes each comment received, which can be used to analyze the level of engagement with specific videos and channels. Interestingly, certain categories such as music, news, and people tend to receive more upvotes than others, suggesting that users are more likely to engage with content that is entertaining or informative.


The sentiment analysis of the comments reveals a fascinating pattern. While many comments are neutral or positive, there are also significant numbers of negative comments across all categories. This suggests that online discourse can be just as divisive and contentious as offline conversations.


The length of the comments also varies widely, from brief one-liners to lengthy discussions. Interestingly, certain categories such as news and non-profit organizations tend to have longer comments, suggesting that users are more likely to engage in meaningful discussions around these types of content.


Overall, the YT-30M dataset offers a unique opportunity for researchers to study human behavior online. By analyzing the comments and their associated metadata, scientists can gain insights into how people interact with each other and with content on YouTube. This could have significant implications for fields such as social network analysis, natural language processing, and online community development.


The release of this dataset is also an important step towards increasing transparency and accountability in online discourse. By making it possible to analyze and understand the opinions and behaviors of users online, researchers can help identify and address issues such as harassment and misinformation.


As researchers delve deeper into the YT-30M dataset, they may uncover new patterns and trends that challenge our understanding of human behavior online. But for now, this massive dataset offers a fascinating glimpse into the complex and ever-changing landscape of online communication.


Cite this article: “Unpacking Human Behavior Online: Insights from the YT-30M Dataset”, The Science Archive, 2025.


Youtube, Comments, Human Behavior, Online Discourse, Sentiment Analysis, Engagement, Metadata, Social Network Analysis, Natural Language Processing, Online Community Development


Reference: Hridoy Sankar Dutta, “YT-30M: A multi-lingual multi-category dataset of YouTube comments” (2024).


Leave a Reply