Fake News Detection Model using Python

Fake News Detection Model using Python

Machine Learning


In the 21st Century, with the increase in the use of technology and the internet, the impact of fake news has become widespread. Fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media. This is often intended to increase the financial profits of the news outlets.

Many people think that distinguishing fake news is tough or near impossible and don't bother with the idea of doing something about it. But Twitter, out of all social media platforms, has now started flagging some posts as fake. They make use of a Machine Learning model similar to what we will train today.


The dataset we will use for this is a csv file named 'news.csv'. It has three columns namely, Title, Text and Labels, and 6,335 examples/rows.

Necessary Installations:

To install necessary libraries needed for this project:

  1. First open Terminal/Command Line.

  2. Run the following command

pip3 install pandas sklearn

  1. Installation should be complete.

  2. To verify whether the installation is correct and/or to view information about the installation run the following command

pip3 show pandas sklearn

Training the model:

  1. Make necessary imports

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

  1. Read the 'news.csv' dataset into a Pandas DataFrame.

news = pd.read_csv('news.csv')

  1. Get the labels into a different DataFrame named %u2018labels%u2019.

labels = news['Label']

  1. Now, we will separate the dataset into training and test data. This will help us test the accuracy of our model while making sure we are not overfitting.

The 'random_state' argument sets seed and helps maintain duplicacy of results.

X_train, X_test, y_train, y_test = train_test_split(news['Text'], labels, random_state=0)

  1. Transform the training and test data using TfifdfVectorizer. TfidfVectorizer converts a collection of raw documents into a matrix of TF-IDF features.

TF stands for Term Frequency which is the number of times a word appears in a document. IDF stands for Inverse Document Frequency which measure significance of words in the document.

vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_transformed = vectorizer.fit_transform(X_train)
X_test_transformed = vectorizer.transform(X_test)

Note, we are fitting our vectorizer using only training data and then transforming both training and test data. This is to prevent Data Leakage.

  1. Finally, we will fit our training data in a PassiveAgressiveClassifier model.

clf = PassiveAggressiveClassifier()
clf.fit(X_train_transformed, y_train)

Predicting and Testing our model:

  1. We will predict the labels for our test dataset using the predict method.

predictions = clf.predict(X_test_transformed)

  1. Now, we can test the accuracy of our model.

accuracy_score(y_test, predictions)

Our model gives approximately 93% accuracy.

  1. We can also get more details about where our model went wrong using the Confusion Matrix.

confusion_matrix(y_test, predictions, labels=['FAKE','REAL'])

More Articles of Aniket Sharma:

Name Views Likes
Pyperclip: Installation and Working 991 2
Number Guessing Game using Python 683 2
Pyperclip: Not Implemented Error 1033 2
Hangman Game using Python 16819 2
Using Databases with CherryPy application 1676 2
nose: Working 509 2
pytest: Working 512 2
Open Source and Hacktoberfest 868 2
Managing Logs of CherryPy applications 1005 2
Top 20 Data Science Tools 684 2
Ajax application using CherryPy 799 2
REST application using CherryPy 664 2
On Screen Keyboard using Python 5532 2
Elastic Net Regression 815 2
US Presidential Election 2020 Prediction using Python 794 2
Sound Source Separation 1165 2
URLs with Parameters in CherryPy 1635 2
Testing CherryPy application 637 2
Handling HTML Forms with CherryPy 1449 2
Applications of Natural Language Processing in Businesses 511 2
NetworkX: Multigraphs 649 2
Tracking User Activity with CherryPy 1403 2
CherryPy: Handling Cookies 822 2
Introduction to NetworkX 633 2
TorchServe - Serving PyTorch Models 1306 2
Fake News Detection Model using Python 735 2
Keeping Home Routers secure while working remotely 484 2
Email Slicer using Python 2998 2
NetworkX: Creating a Graph 1111 2
Best Mathematics Courses for Machine Learning 551 2
Hello World in CherryPy 681 2
Building dependencies as Meson subprojects 979 2
Vehicle Detection System 1081 2
NetworkX: Examining and Removing Graph Elements 608 2
Handling URLs with CherryPy 537 2
PEP 8 - Guide to Beautiful Python Code 759 2
NetworkX: Drawing Graphs 624 2
Mad Libs Game using Python 644 2
Hosting Cherry applications 613 2
Top 5 Free Online IDEs of 2020 867 2
pytest: Introduction 535 2
Preventing Pwned and Reused Passwords 582 2
Contact Book using Python 2095 2
Introduction to CherryPy 547 2
nose: Introduction 505 2
Text-based Adventure Game using Python 3001 2
NetworkX: Adding Attributes 2290 2
NetworkX: Directed Graphs 1021 2
Dice Simulator using Python 562 2
Decorating CherryPy applications using CSS 833 2