Predicting Crypto Price Shifts using Sentiment & Persuasion Analysis
- Prasoon Prasoon
- Oct 6, 2024
- 2 min read
Updated: 2 days ago
A machine learning system designed to predict directional Bitcoin price movements by analysing sentiment and persuasive tone in real-time news and Reddit headlines. By combining FinBERT and GPT-4o feature extraction with rolling-window NLP engineering, the model outperformed published baselines by 12% in ROC-AUC.

The Problem
Cryptocurrency markets are notorious for their volatility — often moving not just on fundamentals, but on how news is framed and how persuasive or emotional it feels to the crowd. Traditional financial models miss these language cues entirely. We set out to build a system that could capture sentiment and persuasive tone in real-time and use that to forecast price direction during volatile periods.
The Solution
We developed a machine learning pipeline to predict short-term Bitcoin price direction by combining market data with features extracted from Reddit and financial news headlines. This system simulated real-time signal generation and was evaluated on 20K+ Bitcoin trading records.
What set this apart was the dual-layer NLP approach:
Sentiment scores were extracted using FinBERT, capturing the market’s emotional tone.
Persuasion confidence scores were generated using GPT-4o, following SemEval 2023 Task 3, which defines 23 distinct persuasion strategies. Custom prompts were designed to evaluate persuasive intensity per headline.
To model how these signals influenced price over time, we used rolling windows and lag-based transformations, capturing the decay and momentum of sentiment.
A Random Forest classifier achieved the best performance, with 0.64 accuracy and 0.65 ROC-AUC — outperforming models from prior studies (Springer ‘21 , MDPI ‘21 ) and all other tested algorithms, including MLP, SVM, LSTM, and XGBoost.
Although deployed in a batch setting, the architecture was built to simulate real-time inference, supporting near-instantaneous signal creation from live news streams — ideal for integration into a crypto trading assistant or dashboard.
Model Evaluation Snapshot
We tested 7 different models and found that Random Forest consistently delivered the best tradeoff between performance and deployment simplicity:
Model | Accuracy | ROC-AUC |
Logistic Regression | 0.57 | 0.58 |
SVM | 0.57 | 0.41 |
Naïve Bayes | 0.50 | 0.53 |
Random Forest | 0.64 | 0.65 |
Multilayer Perceptron | 0.59 | 0.62 |
XGBoost | 0.54 | 0.56 |
LSTM | 0.53 | 0.53 |
Tech Stack
NLP Feature Engineering:
FinBERT (sentiment)
GPT-4o (persuasion scores via SemEval 2023 prompt templates)
Rolling window & lag-based features
Data Sources:
26K+ Reddit/news headlines
20K+ Bitcoin trading records
Modeling & Evaluation:
Random Forest classifier (best), compared with 6 other models
1-hour prediction horizon, binary classification (up/down)
Deployment Design:
Simulated real-time inference
Web-scraped data using BeautifulSoup + APIs
For more details, checkout my Github page: https://github.com/pparashar21/CryptoStockPricePrediction
Comments