Skip to content

Commit

Permalink
added excercise for day 05
Browse files Browse the repository at this point in the history
  • Loading branch information
asntech committed Dec 6, 2024
1 parent 448ca4d commit 3395f97
Show file tree
Hide file tree
Showing 16 changed files with 5,551 additions and 0 deletions.
867 changes: 867 additions & 0 deletions 05-docs/extras/Jupyter-notebook-bios259.ipynb

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions 05-docs/extras/Quarto_excercise_bios259.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: "Quarto excercise -- BIOS259"
format: html
editor: visual
author:
- name: Aziz Khan
affiliation: Stanford University, CA, USA
- name: Your Name
affiliation: Your University Name
date: "`r format(Sys.time(), '%d %B %Y')`"
abstract: "This is hands-on excrcise for BIOS 259: The Art of Reproducible Science
– a Stanford Biosciences mini-course on computational reproducibility\n"
tags:
- reproducibility
- notebook
- iris
---

## Introduction

In this **Quarto document**, we'll explore some built-in datasets in R base and create visualizations to analyze the data. The goal is to demonstrate how Quarto combines code and narrative text to produce *reproducible research*.

## Load Data

We'll start by loading the `iris` dataset, which contains measurements of iris flowers.

```{r}
# Load the iris dataset
data(iris)
head(iris)
```

The iris dataset contains measurements of sepal length, sepal width, petal length, and petal width for **`r length(unique(iris$Species))` species** of iris flowers: setosa, versicolor, and virginica.

## Summary Statistics

Let's explore the structure of the iris dataset and summary statistics for each variable.

```{r}
# Explore dataset structure
str(iris)
# Summary statistics
summary(iris)
```

## Data Visualization

### Scatter Plot

We'll create a scatter plot to visualize the relationship between sepal length and sepal width for each species of iris flowers.

```{r, fig.width=8, fig.height=5}
# Scatter plot of sepal length vs. sepal width
library(ggplot2)
iris_scatter_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
labs(title = "Scatter Plot",
x = "Sepal Length", y = "Sepal Width")
# print the plot
iris_scatter_plot
```

### Make the plot publication ready

```{r iris-figure1a, fig.width=8, fig.height=5}
# Load the cowplot
require(cowplot)
# Use the theme from cowplot
iris_scatter_plot <- iris_scatter_plot + theme_cowplot(12)
iris_scatter_plot
```

### Boxplot

Next, we'll create a boxplot to compare the distribution of petal lengths for each species of iris flowers.

```{r iris-figure1b, fig.width=4, fig.height=5}
# Boxplot of petal length by species
iris_box_plot <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
labs(title = "Boxplot",
x = "Species", y = "Petal Length") + theme_cowplot(12)
iris_box_plot
```

### Density Plot

Finally, let's add a density plot to visualize the distribution of sepal lengths for each species of iris flowers.

```{r iris-figure1c}
# Density plot of sepal length by species
iris_density_plot <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plot",
x = "Sepal Length", y = "Density") + theme_cowplot(12)
iris_density_plot
```

## Conclusion

In this R Markdown document, we explored the iris dataset available in R base and created visualizations to analyze the data. By combining code and narrative text in R Markdown, we produced a reproducible analysis that can be easily shared and reproduced by others.

The `cowplot` package provides the function `plot_grid()` to arrange plots into a grid and label them.

```{r iris-figure1, fig.width=14, fig.height=4}
#Arranging plots into a grid
plot_grid(iris_scatter_plot, iris_box_plot, iris_density_plot, labels = c('A', 'B','C'),
ncol=3, rel_widths = c(2,1.5,2))
```

Feel free to modify the code, explore other datasets, or add additional visualizations to further analyze the data!

> **Note:** To generate publication ready figures you can try [ggpubr](https://rpkgs.datanovia.com/ggpubr/index.html)
## R session

A good practice is to print R session to record the versions of the packages used.

```{r}
devtools::session_info()
```
33 changes: 33 additions & 0 deletions 05-docs/extras/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Literate programming exercises
> Literate programming using R Markdown/Notebook, Jupyter Notebook, and Quarto
This repository contains three exercise files for practicing data analysis and visualization:

1. `R_notebook_bios259.Rmd`: R Notebook Markdown file.
2. `Quarto_excercise_bios259.qmd`: Quarto document.
3. `Jupyter-notebook-bios259.ipynb`: Jupyter Notebook.

Follow the instructions below to run each exercise:

## R Notebook Exercise

1. Open RStudio.
2. Open the `R_notebook_bios259.Rmd` file.
3. Install any required R packages mentioned in the document using `install.packages(c('cowplot','ggplot2')`.
4. Preview the R Notebook document to produce the HTML report.

## Quarto Exercise

1. Install Quarto if you haven't already (`install.packages("quarto")`).
2. Open the `Quarto_excercise_bios259.qmd` file in RStudio or a text editor.
3. Run the Quarto document to produce the output.

## Jupyter Notebook Exercise

1. If you've already installed Jupyter Notebook and `seaborn` python package. If you haven't already installed, use the `environment.yaml` provided in the repo.
2. Navigate to the directory containing `Jupyter-notebook-bios259.ipynb` in your terminal.
3. Run `jupyter notebook` to start the Jupyter Notebook server.
4. Open `Jupyter-notebook-bios259.ipynb` in the Jupyter interface and execute the code cells.

Feel free to explore and modify the exercises to practice your data analysis skills!

126 changes: 126 additions & 0 deletions 05-docs/extras/R_notebook_bios259.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: "R Notebook excercise -- BIOS259"
author:
- name: Aziz Khan
affiliation: Stanford University, CA, USA
- name: Your Name
affiliation: Your University Name
date: "`r format(Sys.time(), '%d %B %Y')`"
output:
html_notebook: default
abstract: "This is hands-on excrcise for BIOS 259: The Art of Reproducible Science
– a Stanford Biosciences mini-course on computational reproducibility\n"
tags:
- reproducibility
- notebook
- iris
---

## Introduction

In this **R Notebook**, we'll explore some built-in datasets in R base and create visualizations to analyze the data. The goal is to demonstrate how R Markdown combines code and narrative text to produce *reproducible research*.

## Load Data

We'll start by loading the `iris` dataset, which contains measurements of iris flowers.

```{r}
# Load the iris dataset
data(iris)
head(iris)
```

The iris dataset contains measurements of sepal length, sepal width, petal length, and petal width for **`r length(unique(iris$Species))` species** of iris flowers: setosa, versicolor, and virginica.

## Summary Statistics

Let's explore the structure of the iris dataset and summary statistics for each variable.

```{r}
# Explore dataset structure
str(iris)
# Summary statistics
summary(iris)
```

## Data Visualization

### Scatter Plot

We'll create a scatter plot to visualize the relationship between sepal length and sepal width for each species of iris flowers.

```{r, fig.width=8, fig.height=5}
# Scatter plot of sepal length vs. sepal width
library(ggplot2)
iris_scatter_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
labs(title = "Scatter Plot",
x = "Sepal Length", y = "Sepal Width")
# print the plot
iris_scatter_plot
```

### Make the plot publication ready

```{r, fig.width=8, fig.height=5}
# Load the cowplot
require(cowplot)
# Use the theme from cowplot
iris_scatter_plot <- iris_scatter_plot + theme_cowplot(12)
iris_scatter_plot
```

### Boxplot

Next, we'll create a boxplot to compare the distribution of petal lengths for each species of iris flowers.

```{r, fig.width=4, fig.height=5}
# Boxplot of petal length by species
iris_box_plot <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
labs(title = "Boxplot",
x = "Species", y = "Petal Length") + theme_cowplot(12)
iris_box_plot
```

### Density Plot

Finally, let's add a density plot to visualize the distribution of sepal lengths for each species of iris flowers.

```{r}
# Density plot of sepal length by species
iris_density_plot <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plot",
x = "Sepal Length", y = "Density") + theme_cowplot(12)
iris_density_plot
```

## Conclusion

In this R Markdown document, we explored the iris dataset available in R base and created visualizations to analyze the data. By combining code and narrative text in R Markdown, we produced a reproducible analysis that can be easily shared and reproduced by others.

The `cowplot` package provides the function `plot_grid()` to arrange plots into a grid and label them.

```{r, fig.width=14, fig.height=4}
#Arranging plots into a grid
plot_grid(iris_scatter_plot, iris_box_plot, iris_density_plot, labels = c('A', 'B','C'),
ncol=3, rel_widths = c(2,1.5,2))
```

Feel free to modify the code, explore other datasets, or add additional visualizations to further analyze the data!

> **Note:** To generate publication ready figures you can try [ggpubr](https://rpkgs.datanovia.com/ggpubr/index.html)
## R session

A good practice is to print R session to record the versions of the packages used.

```{r}
devtools::session_info()
```
2,120 changes: 2,120 additions & 0 deletions 05-docs/extras/R_notebook_bios259.nb.html

Large diffs are not rendered by default.

26 changes: 26 additions & 0 deletions 05-docs/extras/quorto.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: "The Art of Reproducibility"
author: "Aziz Khan"
format: revealjs
---

## Getting up

- Turn off alarm
- Get out of bed
-

## Going to sleep

- Get in bed
- Count sheep

## My Quarto Demo Document

## Introduction

Welcome to my Quarto demo document! In this document, we will learn the basics of Quarto and how to create beautiful and interactive documents.

## Getting Started

To get started with Quarto, you will need to install the Quarto CLI. You can do this by running the following command:
4 changes: 4 additions & 0 deletions 05-docs/gapminder-demo/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
*.log
.git/
.pixi
pixi.lock
19 changes: 19 additions & 0 deletions 05-docs/gapminder-demo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM python:3.9-slim

# Install Pixi

FROM ghcr.io/prefix-dev/pixi:latest

# Set working directory
WORKDIR /project

# Copy project files
COPY . .

# Install dependencies
RUN cd /project
RUN pixi install

# Default command
CMD ["pixi", "run", "preprocess"]

Loading

0 comments on commit 3395f97

Please sign in to comment.