In the 21st Century, with the increase in the use of technology and the internet, the impact of fake news has become widespread. Fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media. This is often intended to increase the financial profits of the news outlets.
Many people think that distinguishing fake news is tough or near impossible and don't bother with the idea of doing something about it. But Twitter, out of all social media platforms, has now started flagging some posts as fake. They make use of a Machine Learning model similar to what we will train today.
The dataset we will use for this is a csv file named 'news.csv'. It has three columns namely, Title, Text and Labels, and 6,335 examples/rows.
To install necessary libraries needed for this project:
First open Terminal/Command Line.
Run the following command
pip3 install pandas sklearn
Installation should be complete.
To verify whether the installation is correct and/or to view information about the installation run the following command
pip3 show pandas sklearn
Training the model:
Make necessary imports
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
Read the 'news.csv' dataset into a Pandas DataFrame.
news = pd.read_csv('news.csv')
Get the labels into a different DataFrame named %u2018labels%u2019.
labels = news['Label']
Now, we will separate the dataset into training and test data. This will help us test the accuracy of our model while making sure we are not overfitting.
The 'random_state' argument sets seed and helps maintain duplicacy of results.
X_train, X_test, y_train, y_test = train_test_split(news['Text'], labels, random_state=0)
Transform the training and test data using TfifdfVectorizer. TfidfVectorizer converts a collection of raw documents into a matrix of TF-IDF features.
TF stands for Term Frequency which is the number of times a word appears in a document. IDF stands for Inverse Document Frequency which measure significance of words in the document.
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_transformed = vectorizer.fit_transform(X_train)
X_test_transformed = vectorizer.transform(X_test)
Note, we are fitting our vectorizer using only training data and then transforming both training and test data. This is to prevent Data Leakage.
Finally, we will fit our training data in a PassiveAgressiveClassifier model.
clf = PassiveAggressiveClassifier()
Predicting and Testing our model:
We will predict the labels for our test dataset using the predict method.
predictions = clf.predict(X_test_transformed)
Now, we can test the accuracy of our model.
Our model gives approximately 93% accuracy.
We can also get more details about where our model went wrong using the Confusion Matrix.
confusion_matrix(y_test, predictions, labels=['FAKE','REAL'])
|Pyperclip: Installation and Working||991||2|
|Number Guessing Game using Python||683||2|
|Pyperclip: Not Implemented Error||1033||2|
|Hangman Game using Python||16819||2|
|Using Databases with CherryPy application||1676||2|
|Open Source and Hacktoberfest||868||2|
|Managing Logs of CherryPy applications||1005||2|
|Top 20 Data Science Tools||684||2|
|Ajax application using CherryPy||799||2|
|REST application using CherryPy||664||2|
|On Screen Keyboard using Python||5532||2|
|Elastic Net Regression||815||2|
|US Presidential Election 2020 Prediction using Python||794||2|
|Sound Source Separation||1165||2|
|URLs with Parameters in CherryPy||1635||2|
|Testing CherryPy application||637||2|
|Handling HTML Forms with CherryPy||1449||2|
|Applications of Natural Language Processing in Businesses||511||2|
|Tracking User Activity with CherryPy||1403||2|
|CherryPy: Handling Cookies||822||2|
|Introduction to NetworkX||633||2|
|TorchServe - Serving PyTorch Models||1306||2|
|Fake News Detection Model using Python||735||2|
|Keeping Home Routers secure while working remotely||484||2|
|Email Slicer using Python||2998||2|
|NetworkX: Creating a Graph||1111||2|
|Best Mathematics Courses for Machine Learning||551||2|
|Hello World in CherryPy||681||2|
|Building dependencies as Meson subprojects||979||2|
|Vehicle Detection System||1081||2|
|NetworkX: Examining and Removing Graph Elements||608||2|
|Handling URLs with CherryPy||537||2|
|PEP 8 - Guide to Beautiful Python Code||759||2|
|NetworkX: Drawing Graphs||624||2|
|Mad Libs Game using Python||644||2|
|Hosting Cherry applications||613||2|
|Top 5 Free Online IDEs of 2020||867||2|
|Preventing Pwned and Reused Passwords||582||2|
|Contact Book using Python||2095||2|
|Introduction to CherryPy||547||2|
|Text-based Adventure Game using Python||3001||2|
|NetworkX: Adding Attributes||2290||2|
|NetworkX: Directed Graphs||1021||2|
|Dice Simulator using Python||562||2|
|Decorating CherryPy applications using CSS||833||2|