descriptives.Rmd

---
title: "Descriptive Analysis of FEV Dataset"
author: "FE Harrell"
date: "`r Sys.Date()`"
output:
  html_document:
    highlight: pygments
    toc: yes
major: descriptive statistics; graphics
---
# Data Description and Univariable Statistics
```{r contents,results='asis'}
# results='asis' in the chunk header because html(contents) output html directly
require(Hmisc)
knitrSet(lang='markdown', w=4, h=3)  # set default figure width and height
getHdata(FEV)
html(contents(FEV), maxlevels=10, levelType='table')
```
```{r describe}
describe(FEV)
datadensity(FEV)
```

# Comparison of Distributions for Smokers and Non-Smokers
Some of the output below uses extended box plots.  A schematic for interpreting them is below.  Go [here](http://biostat.mc.vanderbilt.edu/HmiscNew) for more examples of `Hmisc` graphical summarization functions.
```{r ebp,w=5}
bpplt()
```
The table below includes quartiles for continuous variables.
```{r compare,w=3.5,h=1.5,top=1}
s <- summaryM(age + fev + height + sex ~ smoke, data=FEV)
print(s, npct='both', round=2)
plot(s, which='categorical')
```
```{r compareb,w=6,h=4}
plot(s, which='continuous')
```
Another way to describe categorical variables:
```{r comparec,w=5,h=2}
s <- summaryP(sex ~ smoke, data=FEV)
s
ggplot(s)
```

Another way to describe continuous variables:
```{r compared,w=6,h=2.5}
bpplotM(age + fev + height ~ smoke, data=FEV)
```

# Relationship between FEV1 and other variables
The following shows mean FEV1 and stratifies by smoking status and, in an extraordinarily information-losing fashion, by quartiles of continuous subject characteristics using the `Hmisc` package `summary.formula` function.
```{r fev,h=5}
s <- summary(fev ~ age + height + sex + smoke, data=FEV)
s
plot(s)
```
Much more effective is to use the _loess_ smoother to show the relationship between continuous variables and FEV1.  Marginal distributions of age and height are shown using extended box plots at the top of the plots.  Feint tick marks show the data density of age and height for each smoking group.
```{r fevb,w=6,h=3}
par(mfrow=c(1,2))   # one row, two column graphic
summaryRc(fev ~ age + height + stratify(smoke), data=FEV,
          label.curves=list(keys='lines', cex=.5), bpplot='top', nloc=list(x=.9, y=.1))
```

# Computing Environment
```{r rsession,echo=FALSE}
si <- sessionInfo(); si$loadedOnly <- NULL
print(si, locale=FALSE)
```
```{r cite,results='asis',echo=FALSE}
print(citation(), style='text')
```