fix new dataset documentation

CornellNLP · Nov 20, 2024 · 5476b49 · 5476b49
1 parent fec1afa
commit 5476b49
Show file tree

Hide file tree

Showing 4 changed files with 14 additions and 14 deletions.
diff --git a/docs/source/deli.rst b/docs/source/deli.rst
@@ -56,8 +56,6 @@ Metadata for each conversation includes:
 Usage
 -----
 
-Convert the DeliData Corpus into ConvoKit format using the following notebook: `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_
-
 To download directly with ConvoKit:
 
 >>> from convokit import Corpus, download
@@ -72,12 +70,14 @@ For some quick stats:
 * Number of Utterances: 17111
 * Number of Conversations: 500
 
+Additionally, if you want to process the original Deli data into ConvoKit format you can use the following script `Converting DeliData to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/DELI/ConvoKit_DeliData_Conversion.ipynb>`_
+
 Additional note
 ---------------
 Data License
 ^^^^^^^^^^^^
 
-ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable.  The license of the original distribution applies.
+The license of the original distribution applies.
 
 Contact
 ^^^^^^^

diff --git a/docs/source/fomc.rst b/docs/source/fomc.rst
@@ -1,11 +1,13 @@
 Federal Open Market Committee (FOMC) Corpus
 ===========================================
 
-Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings). 
+Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings).
 
 Distributed together with:
 `Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings <https://chenhaot.com/papers/de-emphasis-fomc.html>`_. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016.
 
+Please cite this paper when using this corpus in your research.
+
 Dataset details
 ---------------
 
@@ -35,13 +37,11 @@ Metadata for utterances include:
 Conversational-level information
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Conversations are indexed by a string representing the meeting date. 
+Conversations are indexed by a string representing the meeting date.
 
 Usage
 -----------
 
-Convert the FOMC Corpus into ConvoKit format using this notebook `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_
-
 To download directly with ConvoKit:
 
 >>> from convokit import Corpus, download
@@ -55,11 +55,12 @@ Number of Speakers: 364
 Number of Utterances: 108504
 Number of Conversations: 268
 
+Additionally, if you want to process the original FOMC data into ConvoKit format you can use the following script `Converting FOMC Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/FOMC/fomc_to_convokit.ipynb>`_
 
 Additional note
 ---------------
 
-The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction. 
+The original dataset can be downloaded `here <https://chenhaot.com/pages/de-emphasis-fomc.html>`_. Refer to the original README for more explanations on dataset construction.
 
 Contact
 ^^^^^^^

diff --git a/docs/source/fora.rst b/docs/source/fora.rst
@@ -100,11 +100,11 @@ Additional note
 Data License
 ^^^^^^^^^^^^
 
-ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable.  The license of the original distribution applies.
+ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies.
 
 Contact
 ^^^^^^^
 
 Questions about the conversion into ConvoKit format should be directed to Sean Zhang <[email protected]>
 
-Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <[email protected]>, Deb Roy <[email protected]>, and Jad Kabbara <[email protected]> of the original paper.
+Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder <[email protected]>, Deb Roy <[email protected]>, and Jad Kabbara <[email protected]> of the original paper.
diff --git a/docs/source/npr-2p.rst b/docs/source/npr-2p.rst
@@ -42,8 +42,6 @@ Conversations are indexed by the id of the first utterance that appears in the c
 Usage
 -----
 
-Convert the NPR-2P Corpus into ConvoKit format using this notebook `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_
-
 To download directly with ConvoKit:
 
 >>> from convokit import Corpus, download
@@ -53,10 +51,11 @@ To download directly with ConvoKit:
 For some quick stats:
 
 >>> corpus.print_summary_stats()
-Number of Speakers: 22267  
+Number of Speakers: 22267
 Number of Utterances: 428624
-Number of Conversations: 22149 
+Number of Conversations: 22149
 
+Additionally, if you want to process the original NPR-2P data into ConvoKit format you can use the following script `Converting NPR-2P Corpus to ConvoKit Format <https://github.com/CornellNLP/ConvoKit/blob/master/examples/dataset-examples/NPR-2P/npr_to_convokit.ipynb>`_
 
 Additional note
 ---------------