top of page
Search

Predicting Crypto Price Shifts using Sentiment & Persuasion Analysis

  • Writer: Prasoon Prasoon
    Prasoon Prasoon
  • Oct 6, 2024
  • 2 min read

Updated: 2 days ago

A machine learning system designed to predict directional Bitcoin price movements by analysing sentiment and persuasive tone in real-time news and Reddit headlines. By combining FinBERT and GPT-4o feature extraction with rolling-window NLP engineering, the model outperformed published baselines by 12% in ROC-AUC.


The Problem

Cryptocurrency markets are notorious for their volatility — often moving not just on fundamentals, but on how news is framed and how persuasive or emotional it feels to the crowd. Traditional financial models miss these language cues entirely. We set out to build a system that could capture sentiment and persuasive tone in real-time and use that to forecast price direction during volatile periods.


The Solution

We developed a machine learning pipeline to predict short-term Bitcoin price direction by combining market data with features extracted from Reddit and financial news headlines. This system simulated real-time signal generation and was evaluated on 20K+ Bitcoin trading records.


What set this apart was the dual-layer NLP approach:


  • Sentiment scores were extracted using FinBERT, capturing the market’s emotional tone.

  • Persuasion confidence scores were generated using GPT-4o, following SemEval 2023 Task 3, which defines 23 distinct persuasion strategies. Custom prompts were designed to evaluate persuasive intensity per headline.


To model how these signals influenced price over time, we used rolling windows and lag-based transformations, capturing the decay and momentum of sentiment.


A Random Forest classifier achieved the best performance, with 0.64 accuracy and 0.65 ROC-AUC — outperforming models from prior studies (Springer ‘21 , MDPI ‘21 ) and all other tested algorithms, including MLP, SVM, LSTM, and XGBoost.


Although deployed in a batch setting, the architecture was built to simulate real-time inference, supporting near-instantaneous signal creation from live news streams — ideal for integration into a crypto trading assistant or dashboard.


Model Evaluation Snapshot


We tested 7 different models and found that Random Forest consistently delivered the best tradeoff between performance and deployment simplicity:

Model

Accuracy

ROC-AUC

Logistic Regression

0.57

0.58

SVM

0.57

0.41

Naïve Bayes

0.50

0.53

Random Forest

0.64

0.65

Multilayer Perceptron

0.59

0.62

XGBoost

0.54

0.56

LSTM

0.53

0.53


Tech Stack


  • NLP Feature Engineering:

    • FinBERT (sentiment)

    • GPT-4o (persuasion scores via SemEval 2023 prompt templates)

    • Rolling window & lag-based features


  • Data Sources:

    • 26K+ Reddit/news headlines

    • 20K+ Bitcoin trading records


  • Modeling & Evaluation:

    • Random Forest classifier (best), compared with 6 other models

    • 1-hour prediction horizon, binary classification (up/down)


  • Deployment Design:

    • Simulated real-time inference

    • Web-scraped data using BeautifulSoup + APIs


For more details, checkout my Github page: https://github.com/pparashar21/CryptoStockPricePrediction




 
 
 

Comments


bottom of page