-
Notifications
You must be signed in to change notification settings - Fork 44
Import Microsoft Word Transcript Into R : Longer Safer Method
trinker edited this page Aug 22, 2012
·
4 revisions
If your transcripts are in a Microsoft Word format this tutorial will demonstrate one procedure for cleaning and importing your data into R for use with qdap
. An alternate method, described here, relies on the read.transcript
function and automates much of the parsing for the researcher. It is recommended that the researcher use the read.transcript
approach and only rely on the method described here if a problem arises with read.transcript
.
###The following video demonstrates how to clean a Microsoft Word based transcript and read it into R. ------------------------INSERT VIDEO UPON APPROVAL---------------
name | rich char | replacement |
---|---|---|
ellipsis | … | ... or (pause) |
left curly quote | “ | |
right curly quote | ” | |
left curly apostrophe | ‘ | ' |
right curly apostophe | ’ | ' |
en dash | – | ... or (pause) |
em dash | — | ... or (pause) |
bracket types | names |
---|---|
<text> | angle |
(text) | round |
{text} | curly |
[text] | square |
library(qdap);library(gdata)
#doc is dependant on the name of the researcher's document
doc <- "TCH 7 Pre-data Les 2, Year 1, 1-15-09.csv"
dat1 <- read.csv(doc, header=FALSE, strip.white = TRUE, sep=",",
as.is=FALSE, na.strings= " ")
truncdf(dat1, 80)
htruncdf(dat1, 15, 80)
htruncdf(dat1)
left.just(htruncdf(dat1, 15, 80), 2)
The bracketX
and bracketXtract
functions
examp2 <- examp2 <- structure(list(person = structure(c(1L, 2L, 1L, 3L), .Label = c("bob",
"greg", "sue"), class = "factor"), text = c("I love chicken [unintelligible]!",
"Me too! (laughter) It's so good.[interupting]", "Yep it's awesome {reading}.",
"Agreed. {is so much fun}")), .Names = c("person", "text"), row.names = c(NA,
-4L), class = "data.frame")
examp2
bracketX(examp2$text, 'square')
bracketX(examp2$text, 'curly')
bracketX(examp2$text)
examp2
bracketXtract(examp2$text, 'square')
bracketXtract(examp2$text, 'curly')
bracketXtract(examp2$text)
paste2(bracketXtract(examp2$text, 'curly'), " ")