Skip to content

Commit

Permalink
fix new dataset documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
seanzhangkx8 committed Nov 20, 2024
1 parent fec1afa commit 5476b49
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 14 deletions.
6 changes: 3 additions & 3 deletions docs/source/deli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,6 @@ Metadata for each conversation includes:
Usage
-----

Convert the DeliData Corpus into ConvoKit format using the following notebook: `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -72,12 +70,14 @@ For some quick stats:
* Number of Utterances: 17111
* Number of Conversations: 500

Additionally, if you want to process the original Deli data into ConvoKit format you can use the following script `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_

Additional note
---------------
Data License
^^^^^^^^^^^^

ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.
The license of the original distribution applies.

Contact
^^^^^^^
Expand Down
11 changes: 6 additions & 5 deletions docs/source/fomc.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
Federal Open Market Committee (FOMC) Corpus
===========================================

Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings).
Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings).

Distributed together with:
`Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings <https://chenhaot.com/papers/de-emphasis-fomc.html>`_. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016.

Please cite this paper when using this corpus in your research.

Dataset details
---------------

Expand Down Expand Up @@ -35,13 +37,11 @@ Metadata for utterances include:
Conversational-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Conversations are indexed by a string representing the meeting date.
Conversations are indexed by a string representing the meeting date.

Usage
-----------

Convert the FOMC Corpus into ConvoKit format using this notebook `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -55,11 +55,12 @@ Number of Speakers: 364
Number of Utterances: 108504
Number of Conversations: 268

Additionally, if you want to process the original FOMC data into ConvoKit format you can use the following script `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_

Additional note
---------------

The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction.
The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction.

Contact
^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions docs/source/fora.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,11 @@ Additional note
Data License
^^^^^^^^^^^^

ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.
ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.

Contact
^^^^^^^

Questions about the conversion into ConvoKit format should be directed to Sean Zhang <[email protected]>

Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <[email protected]>, Deb Roy <[email protected]>, and Jad Kabbara <[email protected]> of the original paper.
Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <[email protected]>, Deb Roy <[email protected]>, and Jad Kabbara <[email protected]> of the original paper.
7 changes: 3 additions & 4 deletions docs/source/npr-2p.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ Conversations are indexed by the id of the first utterance that appears in the c
Usage
-----

Convert the NPR-2P Corpus into ConvoKit format using this notebook `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_

To download directly with ConvoKit:

>>> from convokit import Corpus, download
Expand All @@ -53,10 +51,11 @@ To download directly with ConvoKit:
For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 22267
Number of Speakers: 22267
Number of Utterances: 428624
Number of Conversations: 22149
Number of Conversations: 22149

Additionally, if you want to process the original NPR-2P data into ConvoKit format you can use the following script `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_

Additional note
---------------
Expand Down

0 comments on commit 5476b49

Please sign in to comment.