From 5476b49ca0222901cc796430277e4a21e7397524 Mon Sep 17 00:00:00 2001 From: seanzhangkx8 <106214464+seanzhangkx8@users.noreply.github.com> Date: Tue, 19 Nov 2024 19:09:30 -0500 Subject: [PATCH] fix new dataset documentation --- docs/source/deli.rst | 6 +++--- docs/source/fomc.rst | 11 ++++++----- docs/source/fora.rst | 4 ++-- docs/source/npr-2p.rst | 7 +++---- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/source/deli.rst b/docs/source/deli.rst index 30af32c2..c582452b 100644 --- a/docs/source/deli.rst +++ b/docs/source/deli.rst @@ -56,8 +56,6 @@ Metadata for each conversation includes: Usage ----- -Convert the DeliData Corpus into ConvoKit format using the following notebook: `Converting DeliData to ConvoKit Format `_ - To download directly with ConvoKit: >>> from convokit import Corpus, download @@ -72,12 +70,14 @@ For some quick stats: * Number of Utterances: 17111 * Number of Conversations: 500 +Additionally, if you want to process the original Deli data into ConvoKit format you can use the following script `Converting DeliData to ConvoKit Format `_ + Additional note --------------- Data License ^^^^^^^^^^^^ -ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies. +The license of the original distribution applies. Contact ^^^^^^^ diff --git a/docs/source/fomc.rst b/docs/source/fomc.rst index 068beefd..9bc11f5d 100644 --- a/docs/source/fomc.rst +++ b/docs/source/fomc.rst @@ -1,11 +1,13 @@ Federal Open Market Committee (FOMC) Corpus =========================================== -Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings). +Transcripts of recurring meetings of the Federal Reserve’s Open Market Committee (FOMC), where important aspects of U.S. monetary policy are decided, covering the period 1977-2008. (108,504 conversational exchanges between 364 speakers of FOMC board members in 268 meetings). Distributed together with: `Talk it up or play it down? (Un)expected correlations between (de-)emphasis and recurrence of discussion points in consequential U.S. economic policy meetings `_. Chenhao Tan and Lillian Lee. Presented in Text As Data 2016. +Please cite this paper when using this corpus in your research. + Dataset details --------------- @@ -35,13 +37,11 @@ Metadata for utterances include: Conversational-level information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Conversations are indexed by a string representing the meeting date. +Conversations are indexed by a string representing the meeting date. Usage ----------- -Convert the FOMC Corpus into ConvoKit format using this notebook `Converting FOMC Corpus to ConvoKit Format `_ - To download directly with ConvoKit: >>> from convokit import Corpus, download @@ -55,11 +55,12 @@ Number of Speakers: 364 Number of Utterances: 108504 Number of Conversations: 268 +Additionally, if you want to process the original FOMC data into ConvoKit format you can use the following script `Converting FOMC Corpus to ConvoKit Format `_ Additional note --------------- -The original dataset can be downloaded `here `_. Refer to the original README for more explanations on dataset construction. +The original dataset can be downloaded `here `_. Refer to the original README for more explanations on dataset construction. Contact ^^^^^^^ diff --git a/docs/source/fora.rst b/docs/source/fora.rst index e0291945..7c7250ab 100644 --- a/docs/source/fora.rst +++ b/docs/source/fora.rst @@ -100,11 +100,11 @@ Additional note Data License ^^^^^^^^^^^^ -ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies. +ConvoKit is not distributing the corpus separately, and thus no additional data license is applicable. The license of the original distribution applies. Contact ^^^^^^^ Questions about the conversion into ConvoKit format should be directed to Sean Zhang -Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder , Deb Roy , and Jad Kabbara of the original paper. \ No newline at end of file +Questions about the Fora corpus should be directed to the corresponding authors Hope Schroeder , Deb Roy , and Jad Kabbara of the original paper. diff --git a/docs/source/npr-2p.rst b/docs/source/npr-2p.rst index 85efd668..d7ac4838 100644 --- a/docs/source/npr-2p.rst +++ b/docs/source/npr-2p.rst @@ -42,8 +42,6 @@ Conversations are indexed by the id of the first utterance that appears in the c Usage ----- -Convert the NPR-2P Corpus into ConvoKit format using this notebook `Converting NPR-2P Corpus to ConvoKit Format `_ - To download directly with ConvoKit: >>> from convokit import Corpus, download @@ -53,10 +51,11 @@ To download directly with ConvoKit: For some quick stats: >>> corpus.print_summary_stats() -Number of Speakers: 22267 +Number of Speakers: 22267 Number of Utterances: 428624 -Number of Conversations: 22149 +Number of Conversations: 22149 +Additionally, if you want to process the original NPR-2P data into ConvoKit format you can use the following script `Converting NPR-2P Corpus to ConvoKit Format `_ Additional note ---------------