A super-simple asynchronous python library for speaking with Twitter's streaming API. Implemented basic authentication and non-authenticating proxy support in the rudimentary HTTP client.
The idea was to make the least effort possible to get things working. All of the standard HTTP client libraries seemed to block until the end of transmission, making them inappropriate for use with the streaming API.
For a Twisted solution, see twitty-twister. Other interesting Python modules using the streaming API via different approaches are tweepy and instwitter-py.
Python 2.5 or higher. If using Python 2.5, also uses simplejson (which is included in Python 2.6 as json). In order to achieve SSL support, the asyncore implementation now depends upon tlslite. There are no requirements beyond that.
The more elaborate example programs fixreplies.py
and textori.py
require
the python-twitter library.
Because I like to offer flexibility, and started out not having the utmost
confidence in my original hacked-together HTTP client (although it's stood up
fairly well), you have a choice of asynchronous IO "engines." If you have
PycURL installed, you may choose its time-tested, non-blocking HTTP client
implementation with a --curl
command-line option used with any of the
example applications. If you have Facebook/FriendFeed's open-source
Tornado, then you can choose its iostream sub-module with a
--tornado
option. In all other cases, it falls back to the basic
implementation built upon the standard library invoked with --async
. The
twitasync.py
, twittornado.py
, and twitcurl.py
modules transparently
expose the same basic interface to the main twitstream
module: all typical
usage will focus on the twitstream
module.
Twitstream-test is usable from the command line as a very rudimentary client:
twitstream-test.py spritzer
twitstream-test.py track ftw fail
twitstream-test.py follow 12 13 15 16 20 87
Every usage of the streaming API requires authentication against a user
account. The methods available to the general public are spritzer
, track
,
and follow
.
As a simple implementation of a tweet display roughly modeled on twistori,
textori.py
takes in keywords and pretty-prints a live track
ing stream from
the keywords entered. The below-listed keywords are used as the default
setting, when no keywords are entered.
textori.py love hate think believe feel wish
The code in this example is most notable for the tweet text unescaping and parsing all accomplished in a single (lengthy) callable.
As a proof-of-concept, there's the modestly-named fixreplies.py
, which mines
your friends, followers, favorites and/or conversations to derive a list of
people to follow (which can cause a lot of API calls at startup). It then uses
the streaming API's follow
method to get all tweets to and from those
chosen users. For example, the following command line will check your latest
500 status messages for people to whom you've replied, and filters out the
people you do not already follow as well as a couple celebrities that everyone
seems to empathize with:
fixreplies.py --pages 5 --chat --friends --exclude=stephenfry,Oprah
For Mac users, there is a --growl
option, which uses the Growl
notification framework and its Python interface available in the
Growl SDK. The class does its best at distinguishing between categories
of status messages, allowing a user to change display options.
This code example uses a variant upon the status pretty-printing of the textori example. The chief purpose of this example is to use Twitter's traditional API in order to get more use out of the streaming API.
A proof-of-concept showing that you don't need to print out every tweet in the
callback. stats.py
sets up a counter/histogram on the status characteristic
desired. When halted (interrupted with ctrl-C or a KeyboardInterrupt
), it
prints a summary of the statistic collected.
stats.py friends
stats.py timezone --max 15
If you want to examine statistics off-line, the latest batch of schema-free
JSON document stores, like MongoDB or Apache CouchDB, make for good
candidates. warehouse.py
runs the spritzer
method and stores each status
message in the designated data store. The implementation currently includes
adaptors for MongoDB and CouchDB, and would welcome models for your favorite
ORM+RDBMS.
warehouse.py
warehouse.py mongo://localhost:27017/db/twitcollection
The most notable addition in this example is the correct handling of delete
updates: it attempts to delete the referenced status message if it is in the
database.
The interface provides relatively low-level specialized streaming HTTP GET and
POST classes (currently geared specifically towards Twitter, and provided in
asynchat, tornado.iostream, and libcurl flavors), a general
twitstream
function that accepts an API method name and routes the software
there, and individual functions that match the API methods (including
spritzer
, track
, and follow
). Each of these returns a request-like
object that, when invoked with the run()
method, opens the connection and
continues into a loop until interrupted. The only programming you need to
provide is a function (or callable) that gets called with a dictionary
containing the latest single status.
For example, the basic spritz.py example shows the minimum amount of work needed to have a fully working program, using some built-in facilities for command-line option processing, documentation, and prompting for a username and password. For something truly minimal, you could use something like this:
#!/usr/bin/env python
import twitstream
USER = 'test'
PASS = 'test'
# Define a function/callable to be called on every status:
def callback(status):
print "%s:\t%s\n" % (status.get('user', {}).get('screen_name'), status.get('text'))
if __name__ == '__main__':
# Call a specific API method from the twitstream module:
stream = twitstream.spritzer(USER, PASS, callback)
# Loop forever on the streaming call:
stream.run()