Skip to content

semework/llm-GPT-Alpaca-BERT-financial-sentiment-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

llm-GPT-Alpaca-BERT-financial-sentiment-classification      

seaborn      MIT License     langchain       alpaca-trade-api      circleci      circleci      langchain

Large Language Model (LLM) sentiment analysis of financial news using Alpaca, BERT and ChatGPT

An experiment with the fascinating potential of large language models to efficiently classify short news summaries and headlines into 'positive', 'neutral' and 'negative' sentiments. Here we use Alpaca, BERT and ChatGPT to:

  1. import news headings and summaries for specific stock symbols
  2. use the language (NLP) model Bidirectional Encoder Representations from Transformers (BERT) to tokenize and classify the news data into sentiments
  3. classify the same news data using OpenAI's gpt-3.5 to do the same

About this repo

A jupyter notebook (llm_sentiment_classifier.ipynb) which takes in financial news for ticker (AKA stock symbols) and returns sentiments is provided. Final results should look something like this:

Approach steps:

Step 0 - Install the necessary libraries - if they are not already installed. This is followed by importing the necessary libraries. Langchain sometimes has issues with dependencies and it is recommended to install with upgrade

pip -q install alpaca-trade-api alpaca-py transformers openai tiktoken
pip -q install langchain --upgrade

1 - Import keys saved in your environment

Most APIs provide a security option to ensure that you can store your authentication details in environment variables. This enables you:

  1. to authenticate your logins and code tracking in code depos such as GitHub
  2. in addition to #1, to control your privileges such as OpenAI's tokens
  3. to protect yourself from inadvertently exposing your secret IDs and keys in any environment such as live trading platforms such as Alpaca

For best practices read:

  1. OpenAI's advice: https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety
  2. Alapaca: https://medium.com/software-engineering-learnings/algorithmic-trading-with-alpaca-and-python-c81bad480053
  3. General: https://www.twilio.com/blog/how-to-set-environment-variables-html
openai.api_key = os.environ["OPENAI_API_KEY"]
API_KEY = os.getenv("APCA_API_KEY_ID")
API_SECRET = os.getenv("APCA_API_KEY_SECRET")
print('keys imported')

2 - Setup your llm model and construct a pydantic base model

Pydantic defines objects via models (classes which inherit from pydantic.BaseModel).

More info here: https://docs.pydantic.dev/latest/usage/models/ which states:

  • These models are similar to Python's dataclasses with some differences that streamline certain workflows related to validation, serialization, and JSON schema generation. Untrusted data can be passed to a model and, after parsing and validation, Pydantic guarantees that the fields of the resultant model instance will conform to the field types defined on the model.

3 - Prepare sentiment analysis models and pipeline

Approach and cautionary note:

  • we will use a fine-tuned (trained for financial news) Hugging Face model (BERT) to analyze the article's headline and summary sentiment
  • initial model downloads might take some time

Next, create an alpaca client, choose stock tickers and decide how many days of news to scrape


4 - Create a functions to classify individual news and calculate an average sentiment per ticker and to process all tickers

Important considerations:

  • more recent news might be more relevant
  • the sentiment confidence also gives us a clue how certain the algorithm is about its classification

Approach:

  • weigh recent news more heavily (straightforward linear increase, going from old to new - although there can be many variations of this approach such as inverted/hyperbolic, linear-with-noise, etc. approaches)
  • use sentiment confidence to adjust our weights. i.e. multiply recency score with the score
  • the function sentiment_to_weighed takes care of the weighing
  • the function sentiment_analysis takes in a list of tickers and returns a weighed sentiment per ticker
  • since OpenAI's tocken allotments deplete quickly, a few lines in sentiment_analysis are commented out and a flag is given. Uncomment them if you have enough tokens left (after changing do_llm flag to "1"). Note that llm sentiment classification is slow.

What you should see for each ticker:

Contributing and Permissions

Please do not directly copy anything without my concent. Feel free to reach out to me at https://www.linkedin.com/in/mulugeta-semework-abebe/ for ways to collaborate or use some components.

License

langchain under MIT and Alpaca trade api under Apache License 2.0. Please view LICENSE and (https://www.apache.org/licenses/LICENSE-2.0) for more details. For other packages click on corresponding links at the top of this page (first line).

About

classify financial news sentiments using Alpaca, BERT and GPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published