An experiment with the fascinating potential of large language models to efficiently classify short news summaries and headlines into 'positive', 'neutral' and 'negative' sentiments. Here we use Alpaca, BERT and ChatGPT to:
- import news headings and summaries for specific stock symbols
- use the language (NLP) model Bidirectional Encoder Representations from Transformers (BERT) to tokenize and classify the news data into sentiments
- classify the same news data using OpenAI's gpt-3.5 to do the same
A jupyter notebook (llm_sentiment_classifier.ipynb
) which takes in financial news for ticker (AKA stock symbols) and returns sentiments is provided. Final results should look something like this:
Step 0 - Install the necessary libraries - if they are not already installed. This is followed by importing the necessary libraries. Langchain sometimes has issues with dependencies and it is recommended to install with upgrade
pip -q install alpaca-trade-api alpaca-py transformers openai tiktoken
pip -q install langchain --upgrade
Most APIs provide a security option to ensure that you can store your authentication details in environment variables. This enables you:
- to authenticate your logins and code tracking in code depos such as GitHub
- in addition to #1, to control your privileges such as OpenAI's tokens
- to protect yourself from inadvertently exposing your secret IDs and keys in any environment such as live trading platforms such as Alpaca
- OpenAI's advice: https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety
- Alapaca: https://medium.com/software-engineering-learnings/algorithmic-trading-with-alpaca-and-python-c81bad480053
- General: https://www.twilio.com/blog/how-to-set-environment-variables-html
openai.api_key = os.environ["OPENAI_API_KEY"]
API_KEY = os.getenv("APCA_API_KEY_ID")
API_SECRET = os.getenv("APCA_API_KEY_SECRET")
print('keys imported')
More info here: https://docs.pydantic.dev/latest/usage/models/ which states:
- These models are similar to Python's dataclasses with some differences that streamline certain workflows related to validation, serialization, and JSON schema generation. Untrusted data can be passed to a model and, after parsing and validation, Pydantic guarantees that the fields of the resultant model instance will conform to the field types defined on the model.
- we will use a fine-tuned (trained for financial news) Hugging Face model (BERT) to analyze the article's headline and summary sentiment
- initial model downloads might take some time
- more recent news might be more relevant
- the sentiment confidence also gives us a clue how certain the algorithm is about its classification
- weigh recent news more heavily (straightforward linear increase, going from old to new - although there can be many variations of this approach such as inverted/hyperbolic, linear-with-noise, etc. approaches)
- use sentiment confidence to adjust our weights. i.e. multiply recency score with the score
- the function
sentiment_to_weighed
takes care of the weighing - the function
sentiment_analysis
takes in a list of tickers and returns a weighed sentiment per ticker - since OpenAI's tocken allotments deplete quickly, a few lines in
sentiment_analysis
are commented out and a flag is given. Uncomment them if you have enough tokens left (after changingdo_llm
flag to "1"). Note that llm sentiment classification is slow.
Please do not directly copy anything without my concent. Feel free to reach out to me at https://www.linkedin.com/in/mulugeta-semework-abebe/ for ways to collaborate or use some components.
langchain under MIT and Alpaca trade api under Apache License 2.0. Please view LICENSE and (https://www.apache.org/licenses/LICENSE-2.0) for more details. For other packages click on corresponding links at the top of this page (first line).