Problem Statement
In today’s digital world, millions of social media posts are generated every day, containing various opinions and emotions. Understanding public sentiment can help businesses, governments, and individuals make informed decisions. However, manual analysis of such massive data is impractical. This project focuses on developing a sentiment analysis system that uses machine learning to classify social media posts into positive, negative, or neutral categories.
Users/Stakeholders
Objectives
Features
AI Used
Dataset
Solution
This Python-based system reads social media posts, processes the text data using NLP techniques, and classifies them using machine learning models. The application can be extended to integrate with live social media feeds for real-time sentiment tracking.
Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords
df = pd.read_csv(‘social_media_posts.csv’) # Replace with the path to your dataset
nltk.download(‘stopwords’)
stop_words = set(stopwords.words(‘english’))
df[‘text’] = df[‘text’].apply(lambda x: ‘ ‘.join([word for word in x.split() if word.lower() not in stop_words]))
X = df[‘text’]
y = df[‘sentiment’] # Assume ‘sentiment’ is labeled as ‘positive’, ‘negative’, or ‘neutral’
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = MultinomialNB().fit(X_train_tfidf, y_train)
X_test_counts = vectorizer.transform(X_test)
X_test_tfidf = tfidf_transformer.transform(X_test_counts)
y_pred = clf.predict(X_test_tfidf)
print(f”Accuracy: {accuracy_score(y_test, y_pred)}”)
print(classification_report(y_test, y_pred))
Steps to run this code:
'social_media_posts.csv'
with the path to your dataset.text
) and another for sentiment labels (sentiment
).pip install pandas scikit-learn nltk
.