Skip to content

Commit

Permalink
Convokit 3.0.1 (#249)
Browse files Browse the repository at this point in the history
  • Loading branch information
seanzhangkx8 authored Nov 20, 2024
1 parent c37ec3e commit cf31dd3
Show file tree
Hide file tree
Showing 10 changed files with 50 additions and 29 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, '3.10', '3.11', '3.12']
python-version: ['3.10', '3.11', '3.12']
mongodb-version: [5.0.2]

steps:
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
<!-- ALL-CONTRIBUTORS-BADGE:END -->

[![pypi](https://img.shields.io/pypi/v/convokit.svg)](https://pypi.org/pypi/convokit/)
[![py\_versions](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/pypi/convokit/)
[![py\_versions](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/pypi/convokit/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![license](https://img.shields.io/badge/license-MIT-green)](https://github.com/CornellNLP/ConvoKit/blob/master/LICENSE.md)
[![Discord Community](https://img.shields.io/static/v1?logo=discord&style=flat&color=red&label=discord&message=community)](https://discord.gg/WMFqMWgz6P)


This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.1](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.1) (released November 13, 2024); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.1](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.1) (released November 19, 2024); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.

Join our [Discord community](https://discord.gg/WMFqMWgz6P) to stay informed, connect with fellow developers, and be part of an engaging space where we share progress, discuss features, and tackle issues together.

Read our [documentation](https://convokit.cornell.edu/documentation) or try ConvoKit in our [interactive tutorial](https://colab.research.google.com/github/CornellNLP/ConvoKit/blob/master/examples/Introduction_to_ConvoKit.ipynb).

Expand Down Expand Up @@ -198,7 +200,7 @@ Name for download: `spolin-corpus`
In addition to the provided datasets, you may also use ConvoKit with your own custom datasets by loading them into a `convokit.Corpus` object. [This example script](https://github.com/CornellNLP/ConvoKit/blob/master/examples/converting_movie_corpus.ipynb) shows how to construct a Corpus from custom data.

## Installation
This toolkit requires Python >= 3.9.
This toolkit requires Python >= 3.10.

1. Download the toolkit: `pip3 install convokit`
2. Download Spacy's English model: `python3 -m spacy download en`
Expand Down
6 changes: 3 additions & 3 deletions docs/source/deli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,6 @@ Metadata for each conversation includes:
Usage
-----

Convert the DeliData Corpus into ConvoKit format using the following notebook: `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -72,12 +70,14 @@ For some quick stats:
* Number of Utterances: 17111
* Number of Conversations: 500

Additionally, if you want to process the original Deli data into ConvoKit format you can use the following script `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_

Additional note
---------------
Data License
^^^^^^^^^^^^

ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.
The license of the original distribution applies.

Contact
^^^^^^^
Expand Down
11 changes: 6 additions & 5 deletions docs/source/fomc.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
Federal Open Market Committee (FOMC) Corpus
===========================================

Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings).
Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings).

Distributed together with:
`Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings <https://chenhaot.com/papers/de-emphasis-fomc.html>`_. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016.

Please cite this paper when using this corpus in your research.

Dataset details
---------------

Expand Down Expand Up @@ -35,13 +37,11 @@ Metadata for utterances include:
Conversational-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Conversations are indexed by a string representing the meeting date.
Conversations are indexed by a string representing the meeting date.

Usage
-----------

Convert the FOMC Corpus into ConvoKit format using this notebook `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -55,11 +55,12 @@ Number of Speakers: 364
Number of Utterances: 108504
Number of Conversations: 268

Additionally, if you want to process the original FOMC data into ConvoKit format you can use the following script `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_

Additional note
---------------

The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction.
The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction.

Contact
^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions docs/source/fora.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,11 @@ Additional note
Data License
^^^^^^^^^^^^

ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.
ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.

Contact
^^^^^^^

Questions about the conversion into ConvoKit format should be directed to Sean Zhang <kz88@cornell.edu>

Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <hopes@mit.edu>, Deb Roy <dkroy@mit.edu>, and Jad Kabbara <jkabbara@mit.edu> of the original paper.
Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <hopes@mit.edu>, Deb Roy <dkroy@mit.edu>, and Jad Kabbara <jkabbara@mit.edu> of the original paper.
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Cornell Conversational Analysis Toolkit (ConvoKit) Documentation

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a `single unified interface <https://convokit.cornell.edu/documentation/architecture.html>`_ inspired by (and compatible with) scikit-learn.
Several large `conversational datasets <https://github.com/CornellNLP/ConvoKit#datasets>`_ are included together with scripts exemplifying the use of the toolkit on these datasets.
More information can be found at our `website <https://convokit.cornell.edu>`_. The latest version is `3.0.1 <https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.1>`_ (released Nov. 8, 2024).
More information can be found at our `website <https://convokit.cornell.edu>`_. The latest version is `3.0.1 <https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.1>`_ (released November 19, 2024).

Contents
--------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Installing ConvoKit

System Requirements
===================
ConvoKit requires Python 3.9 or above.
ConvoKit requires Python 3.10 or above.

Package Installation
====================
Expand Down
7 changes: 3 additions & 4 deletions docs/source/npr-2p.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ Conversations are indexed by the id of the first utterance that appears in the c
Usage
-----

Convert the NPR-2P Corpus into ConvoKit format using this notebook `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -53,10 +51,11 @@ To download directly with ConvoKit:
For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 22267
Number of Speakers: 22267
Number of Utterances: 428624
Number of Conversations: 22149
Number of Conversations: 22149

Additionally, if you want to process the original NPR-2P data into ConvoKit format you can use the following script `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_

Additional note
---------------
Expand Down
34 changes: 27 additions & 7 deletions docs/source/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,32 @@ General checks
Issues
^^^^^^

**Error associated with Numpy**
**Error Associated with Numpy 2.0.0**

Pre Spacy 3.8.2 is not compatible with numpy 2.0.0+ due to compatibility issues with thinc. Spacy 3.8.2 is compatible with numpy 2.0.0+ but currently requires thinc to be >=8.3.0, <8.4.0, so as a temporary solution ConvoKit now enforces spacy>=3.8.2, thinc >=8.3.0, <8.4.0. We will continue to keep an eye on spacy releases and update the requirements if there are new releases targeting this issue.
For additional insight into the issue:
`spaCy issue #13528 <https://github.com/explosion/spaCy/issues/13528>`_
`thinc issue #939 <https://github.com/explosion/thinc/issues/939>`_
The release of `numpy 2.0.0 <https://numpy.org/devdocs/release/2.0.0-notes.html>`_ is exciting,
yet it breaks backward compatibility for packages that are built with numpy 1.x.
While our new release (ConvoKit 3.0.1) addresses the problem and is built against the new numpy,
there are ConvoKit dependency packages that still experience issues with numpy 2.0.0+.
We have fixed known errors from testing, but the tests are far from exhaustive.
Therefore, if you face an error likely triggered by adapting to numpy 2.0.0+ versions,
we encourage you to please submit an issue on our GitHub, so we can address the problem as soon as possible. Thank you!

For explanations of what errors numpy 2.0 could cause on packages that are not built against it,
check the `numpy 2.0 migration guide <https://numpy.org/devdocs/numpy_2_0_migration_guide.html>`_.

An example of an issue that we fixed is demonstrated below:

Pre-Spacy 3.8.2 is not compatible with numpy 2.0.0+ due to compatibility issues with thinc.
Spacy 3.8.2 is compatible with numpy 2.0.0+ but currently requires thinc to be >=8.3.0, <8.4.0.
As a temporary solution, ConvoKit now enforces spacy>=3.8.2, thinc >=8.3.0, <8.4.0.
We will continue to monitor spacy releases and update the requirements if there are new releases targeting this issue.

For additional information about the issue, see:
`spaCy issue <https://github.com/explosion/spaCy/issues/13528>`_,
`thinc issue <https://github.com/explosion/thinc/issues/939>`_.

The issues are more likely to appear when you install ConvoKit in an existing environment where other packages have already been pre-installed.
Installing ConvoKit in a new environment following our installation guide should result in no errors.

-----------------------------

Expand Down Expand Up @@ -68,9 +88,9 @@ The two recommended fixes are to run:

and if that doesn't fix the issue, then run:

>>> open /Applications/Python\ 3.9/Install\ Certificates.command
>>> open /Applications/Python\ 3.10/Install\ Certificates.command

(Substitute 3.9 in the above command with your current Python version (e.g. 3.10 or 3.11 or 3.12) if necessary.)
(Substitute 3.10 in the above command with your current Python version (e.g. 3.11 or 3.12 or 3.13) if necessary.)

Immutability of Metadata Fields
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"dnspython>=1.16.0",
"thinc>=8.3.0,<8.4.0",
"h5py==3.12.1",
"numexpr>=2.8.0",
"numexpr>=2.8.0",
"ruff>=0.4.8",
"bottleneck",
],
Expand All @@ -68,7 +68,6 @@
},
classifiers=[
"Programming Language :: Python",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
Expand Down

0 comments on commit cf31dd3

Please sign in to comment.