This is the output of an attempt to "open source" the Andreessen Horowitz Library. The inspiration for using high-res photos from a Wired article to catalog all 1,087 books in the lobby of a16z is documented on Medium.
The library is available in three files:
books.md
: This is the original markdown list of books and the source for the two data files that follow. Changes and/or corrections to the book data captured in the photos should be captured here.books_ratings.md
: This is a an extension of the original list. Where possible, the Goodreads API was used to append additional attributes of each book to the original records in thebooks.md
file. The books in the library are sorted from highest to lowest rating. This file can be (re)generated from thebooks.md
file using thebook_ratings.py
python script described below.books_ratings.csv
: This is a csv version of the expandedbooks_ratings.md
suitable for further analysis in Excel, python or R. This file can be (re)generated from thebooks.md
file using thebook_ratings.py
python script described below.
The books.md
file has the following fields:
- Book ID: Describes the location of the book in the libarry. The first digit is the bookcase number (clockwise around the room), the second is the vertical shelf (‘A’ is the top shelf), and the third is the book’s position on that shelf (‘1’ is the far-left book).
- Title: The title of the book.
- Author: The author of the book.
- Links: Where possible, links to the books on Amazon and/or Goodreads.
books_ratings.md
and books_ratings.csv
have the following additional fields besides those listed above. These additional fields are derived through the Goodreads API where possible. This is done by using the books.md
file as input to the book_ratings.py
script (explained in more detail below). Not all books have a record in Goodreads. Also, the Goodreads API does not necessarily return complete data for all books. In both these cases, default values are added for these additional fields. Understanding the default values will allow records to be appropriately filtered in any subsequent analysis.
- Book Title: This is the title of the book on Goodreads. Should be more or less the same as the Title field from the original list. The default value is an empty character string.
- Average Rating: This is the average rating given by reviewers on Goodreads for the book. The default value is '0.0'.
- Ratings Count: This is the number of ratings given for a book. The default value is '0'.
- Number Pages: This is the number of pages in the book. The default value is '0'.
- Publication Date: This is the publication date in the format
Month/Day/Year
. The default value isNone/None/None
. If only the year of publication is known, then only that portion of the date will be set, e.g.None/None/<Year>
- Publisher: The publisher of the book. The default value is an empty character string.
- ISBN: The ISBN number of the book. The default value is an empty character string.
The books_ratings.md
and books_ratings.csv
files are generated by using the books.md
file and the book_ratings.py
python script. This script uses the Goodreads API to get additional attributes of each book.
In order to use the script to (re)generate the additional attributes for the books in the library, you will need to do the following:
- Download and install python. This script was developed using Python 3.5.1.
- Install the goodreads python package. Ensure you follow the instructions for installing the package on the project site. Note that this package is not longer maintained. It has been forked here in case you need another repository from which to download it.
- Install the pandas python library. This can be done using pip:
pip install pandas
More information on installing pandas can be found here.
- Request an API key from Goodreads here. You will not be required to authenticate to use this script so the API keys will be sufficient.
The script can be run from the command line. It expects the books.md
file to be in the same directory as the script. The script also expects 2 command line arguments, the api key and api secret you requested above:
python book_ratings.py <api_key> <api_secret>
The script will generate two files: books_ratings.md
and books_ratings.csv