A Twitter bot @mandarin_daily that tweets Mandarin vocabulary, with links to previous Tweets for reinforcement.
Three times a day, the bot tweets a Chinese word, its pinyin, and its definition. In addition, it includes references to previous words tweeted; specifically, it provides URLs to the corresponding tweet from last week, the corresponding tweet from last month, and the corresponding tweet from a random previous date, allowing users to quickly review previously seen words.
The bot is implemented in Python 3.8. It accesses Twitter via the Twitter API. Data are stored in AWS DynamoDB. It runs on AWS Lambda, with scheduling handled by AWS EventBridge. Logs are written to AWS CloudWatch.
Words and pinyin are sourced from Wiktionary. Definitions and additional pinyin are scraped from MDBG's Chinese dictionary.
Settings for AWS DynamoDB and Twitter are configured in a configuration file. An example is available at mandarin_twitter_bot/config/example_config.conf.example
:
[aws]
endpoint_url =
[twitter]
twitter_access_token =
twitter_access_token_secret =
twitter_bearer_token =
twitter_consumer_key =
twitter_consumer_secret =
twitter_user_username =
The configuration file is selected based on the environment variable TWITTER_BOT_SETTINGS_MODULE
:
export TWITTER_BOT_SETTINGS_MODULE="mandarin_twitter_bot/config/{config_name}_config.conf"
There are three environments, for production, local staging, and testing.
The production instance runs on AWS Lambda and DynamoDB.
- Export production settings.
export TWITTER_BOT_SETTINGS_MODULE="mandarin_twitter_bot/config/production_config.conf"
- Create DynamoDB tables.
sh mandarin_twitter_bot/deploy/create_tables.sh https://dynamodb.us-west-1.amazonaws.com/
- Zip the source code.
sh mandarin_twitter_bot/deploy/zip_code.sh mandarin_twitter_bot
- Zip Pip requirements. Because Lambda has
boto3
pre-installed, it can be removed fromrequirements.txt
to save space.
sh mandarin_twitter_bot/deploy/zip_pip_requirements.sh mandarin_twitter_bot/requirements.txt
- Create a Lambda function.
- Set the runtime to Python 3.8.
- Set the handler to
lambda_function.lambda_handler
. - Configure the function to use 128 MB of memory and timeout after 3 minutes.
- Upload the zipped Pip requirements as a layer.
- Upload the zipped code.
- Set the
TWITTER_BOT_SETTINGS_MODULE
environment variable. - Add an EventBridge trigger that runs the Lambda function
TWEETS_PER_DAY
times.- For example, the cron expression
0 15,19,23 * * ? *
runs the function at 3 p.m., 7 p.m., and 11 p.m. (UTC time) every day.
- For example, the cron expression
Note: In production, the bot is instrumented by lambda_function.py
. On each invocation, the lambda_handler
makes at most three attempts to run the bot.
A staging instance can be started for manual testing:
- Install Pip requirements under Python 3.8.
pip install -r mandarin_twitter_bot/requirements.txt
- Export staging settings.
export TWITTER_BOT_SETTINGS_MODULE="mandarin_twitter_bot/config/staging_config.conf"
- Start a Docker instance of DynamoDB Local.
docker-compose up -f mandarin_twitter_bot/docker-compose.yml
- Create DynamoDB tables. Note that this requires that the AWS CLI is installed.
sh mandarin_twitter_bot/deploy/create_tables.sh http://localhost:8000
A testing instance can be started for automated testing:
- Export test settings.
export TWITTER_BOT_SETTINGS_MODULE="mandarin_twitter_bot/config/test_config.conf"
- Start a Docker instance of DynamoDB Local.
docker-compose up -f mandarin_twitter_bot/tests/docker-compose.yml
- Create DynamoDB tables.
sh mandarin_twitter_bot/deploy/create_tables.sh http://localhost:8001
- Run automated tests.
python -m unittest discover mandarin_twitter_bot.tests
The characters chosen are retrieved from Wiktionary's Mandarin Frequency Lists, which includes the 10,000 most frequently used Chinese characters.
- Parse words from the source. This creates ten files, one for each of the lists.
python -m mandarin_twitter_bot.scripts.parse_words_from_wiktionary
-
Create an input directory
INPUT_DIR
and move the created files into it. Create an output directoryOUTPUT_DIR
. -
Clean the words in the input directory, removing duplicates, and write them to a file
words.txt
inOUTPUT_DIR
.
python -m mandarin_twitter_bot.scripts.clean_words INPUT_DIR OUTPUT_DIR
- Randomize the words.
sort -R words.txt > randomized.txt
- Upload the word data to the currently configured instance of DynamoDB.
python -m mandarin_twitter_bot.scripts.upload_word_data randomized.txt