added excercise for day 05

asntech · Dec 6, 2024 · 3395f97 · 3395f97
1 parent 448ca4d
commit 3395f97
Show file tree

Hide file tree

Showing 16 changed files with 5,551 additions and 0 deletions.
diff --git a/05-docs/extras/Jupyter-notebook-bios259.ipynb b/05-docs/extras/Jupyter-notebook-bios259.ipynb
diff --git a/05-docs/extras/Quarto_excercise_bios259.qmd b/05-docs/extras/Quarto_excercise_bios259.qmd
@@ -0,0 +1,126 @@
+---
+title: "Quarto excercise -- BIOS259"
+format: html
+editor: visual
+author:
+- name: Aziz Khan
+  affiliation: Stanford University, CA, USA
+- name: Your Name
+  affiliation: Your University Name
+date: "`r format(Sys.time(), '%d %B %Y')`"
+abstract: "This is hands-on excrcise for BIOS 259: The Art of Reproducible Science
+  – a Stanford Biosciences mini-course on computational reproducibility\n"
+tags:
+- reproducibility
+- notebook
+- iris
+---
+
+## Introduction
+
+In this **Quarto document**, we'll explore some built-in datasets in R base and create visualizations to analyze the data. The goal is to demonstrate how Quarto combines code and narrative text to produce *reproducible research*.
+
+## Load Data
+
+We'll start by loading the `iris` dataset, which contains measurements of iris flowers.
+
+```{r}
+# Load the iris dataset
+data(iris)
+head(iris)
+```
+
+The iris dataset contains measurements of sepal length, sepal width, petal length, and petal width for **`r length(unique(iris$Species))` species** of iris flowers: setosa, versicolor, and virginica.
+
+## Summary Statistics
+
+Let's explore the structure of the iris dataset and summary statistics for each variable.
+
+```{r}
+# Explore dataset structure
+str(iris)
+
+# Summary statistics
+summary(iris)
+
+```
+
+## Data Visualization
+
+### Scatter Plot
+
+We'll create a scatter plot to visualize the relationship between sepal length and sepal width for each species of iris flowers.
+
+```{r, fig.width=8, fig.height=5}
+# Scatter plot of sepal length vs. sepal width
+library(ggplot2)
+iris_scatter_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
+  geom_point() +
+  labs(title = "Scatter Plot",
+       x = "Sepal Length", y = "Sepal Width")
+
+# print the plot
+iris_scatter_plot
+```
+
+### Make the plot publication ready
+
+```{r iris-figure1a, fig.width=8, fig.height=5}
+# Load the cowplot
+require(cowplot)
+# Use the theme from cowplot
+iris_scatter_plot <- iris_scatter_plot + theme_cowplot(12)
+iris_scatter_plot
+```
+
+### Boxplot
+
+Next, we'll create a boxplot to compare the distribution of petal lengths for each species of iris flowers.
+
+```{r iris-figure1b, fig.width=4, fig.height=5}
+# Boxplot of petal length by species
+iris_box_plot <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
+  geom_boxplot() +
+  labs(title = "Boxplot",
+       x = "Species", y = "Petal Length") + theme_cowplot(12)
+
+iris_box_plot
+```
+
+### Density Plot
+
+Finally, let's add a density plot to visualize the distribution of sepal lengths for each species of iris flowers.
+
+```{r iris-figure1c}
+# Density plot of sepal length by species
+iris_density_plot <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
+  geom_density(alpha = 0.5) +
+  labs(title = "Density Plot",
+       x = "Sepal Length", y = "Density") + theme_cowplot(12)
+iris_density_plot
+```
+
+## Conclusion
+
+In this R Markdown document, we explored the iris dataset available in R base and created visualizations to analyze the data. By combining code and narrative text in R Markdown, we produced a reproducible analysis that can be easily shared and reproduced by others.
+
+The `cowplot` package provides the function `plot_grid()` to arrange plots into a grid and label them.
+
+```{r iris-figure1, fig.width=14, fig.height=4}
+#Arranging plots into a grid
+plot_grid(iris_scatter_plot, iris_box_plot, iris_density_plot, labels = c('A', 'B','C'),
+          ncol=3, rel_widths = c(2,1.5,2))
+
+```
+
+Feel free to modify the code, explore other datasets, or add additional visualizations to further analyze the data!
+
+> **Note:** To generate publication ready figures you can try [ggpubr](https://rpkgs.datanovia.com/ggpubr/index.html)
+
+## R session
+
+A good practice is to print R session to record the versions of the packages used.
+
+```{r}
+devtools::session_info()
+```
diff --git a/05-docs/extras/README.md b/05-docs/extras/README.md
@@ -0,0 +1,33 @@
+# Literate programming exercises 
+> Literate programming using R Markdown/Notebook, Jupyter Notebook, and Quarto
+
+This repository contains three exercise files for practicing data analysis and visualization:
+
+1. `R_notebook_bios259.Rmd`: R Notebook Markdown file.
+2. `Quarto_excercise_bios259.qmd`: Quarto document.
+3. `Jupyter-notebook-bios259.ipynb`: Jupyter Notebook.
+
+Follow the instructions below to run each exercise:
+
+## R Notebook Exercise
+
+1. Open RStudio.
+2. Open the `R_notebook_bios259.Rmd` file.
+3. Install any required R packages mentioned in the document using `install.packages(c('cowplot','ggplot2')`.
+4. Preview the R Notebook document to produce the HTML report.
+
+## Quarto Exercise
+
+1. Install Quarto if you haven't already (`install.packages("quarto")`).
+2. Open the `Quarto_excercise_bios259.qmd` file in RStudio or a text editor.
+3. Run the Quarto document to produce the output.
+
+## Jupyter Notebook Exercise
+
+1. If you've already installed Jupyter Notebook and `seaborn` python package. If you haven't already installed, use the `environment.yaml` provided in the repo.
+2. Navigate to the directory containing `Jupyter-notebook-bios259.ipynb` in your terminal.
+3. Run `jupyter notebook` to start the Jupyter Notebook server.
+4. Open `Jupyter-notebook-bios259.ipynb` in the Jupyter interface and execute the code cells.
+
+Feel free to explore and modify the exercises to practice your data analysis skills!
+
diff --git a/05-docs/extras/R_notebook_bios259.Rmd b/05-docs/extras/R_notebook_bios259.Rmd
@@ -0,0 +1,126 @@
+---
+title: "R Notebook excercise -- BIOS259"
+author:
+- name: Aziz Khan
+  affiliation: Stanford University, CA, USA
+- name: Your Name
+  affiliation: Your University Name
+date: "`r format(Sys.time(), '%d %B %Y')`"
+output:
+  html_notebook: default
+abstract: "This is hands-on excrcise for BIOS 259: The Art of Reproducible Science
+  – a Stanford Biosciences mini-course on computational reproducibility\n"
+tags:
+- reproducibility
+- notebook
+- iris
+---
+
+## Introduction
+
+In this **R Notebook**, we'll explore some built-in datasets in R base and create visualizations to analyze the data. The goal is to demonstrate how R Markdown combines code and narrative text to produce *reproducible research*.
+
+## Load Data
+
+We'll start by loading the `iris` dataset, which contains measurements of iris flowers.
+
+```{r}
+# Load the iris dataset
+data(iris)
+head(iris)
+```
+
+The iris dataset contains measurements of sepal length, sepal width, petal length, and petal width for **`r length(unique(iris$Species))` species** of iris flowers: setosa, versicolor, and virginica.
+
+## Summary Statistics
+
+Let's explore the structure of the iris dataset and summary statistics for each variable.
+
+```{r}
+# Explore dataset structure
+str(iris)
+
+# Summary statistics
+summary(iris)
+
+```
+
+## Data Visualization
+
+### Scatter Plot
+
+We'll create a scatter plot to visualize the relationship between sepal length and sepal width for each species of iris flowers.
+
+```{r, fig.width=8, fig.height=5}
+# Scatter plot of sepal length vs. sepal width
+library(ggplot2)
+iris_scatter_plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
+  geom_point() +
+  labs(title = "Scatter Plot",
+       x = "Sepal Length", y = "Sepal Width")
+
+# print the plot
+iris_scatter_plot
+```
+
+### Make the plot publication ready
+
+```{r, fig.width=8, fig.height=5}
+# Load the cowplot
+require(cowplot)
+# Use the theme from cowplot
+iris_scatter_plot <- iris_scatter_plot + theme_cowplot(12)
+iris_scatter_plot
+```
+
+### Boxplot
+
+Next, we'll create a boxplot to compare the distribution of petal lengths for each species of iris flowers.
+
+```{r, fig.width=4, fig.height=5}
+# Boxplot of petal length by species
+iris_box_plot <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
+  geom_boxplot() +
+  labs(title = "Boxplot",
+       x = "Species", y = "Petal Length") + theme_cowplot(12)
+
+iris_box_plot
+```
+
+### Density Plot
+
+Finally, let's add a density plot to visualize the distribution of sepal lengths for each species of iris flowers.
+
+```{r}
+# Density plot of sepal length by species
+iris_density_plot <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
+  geom_density(alpha = 0.5) +
+  labs(title = "Density Plot",
+       x = "Sepal Length", y = "Density") + theme_cowplot(12)
+iris_density_plot
+```
+
+## Conclusion
+
+In this R Markdown document, we explored the iris dataset available in R base and created visualizations to analyze the data. By combining code and narrative text in R Markdown, we produced a reproducible analysis that can be easily shared and reproduced by others.
+
+The `cowplot` package provides the function `plot_grid()` to arrange plots into a grid and label them.
+
+```{r, fig.width=14, fig.height=4}
+#Arranging plots into a grid
+plot_grid(iris_scatter_plot, iris_box_plot, iris_density_plot, labels = c('A', 'B','C'),
+          ncol=3, rel_widths = c(2,1.5,2))
+
+```
+
+Feel free to modify the code, explore other datasets, or add additional visualizations to further analyze the data!
+
+> **Note:** To generate publication ready figures you can try [ggpubr](https://rpkgs.datanovia.com/ggpubr/index.html)
+
+## R session
+
+A good practice is to print R session to record the versions of the packages used.
+
+```{r}
+devtools::session_info()
+```
diff --git a/05-docs/extras/R_notebook_bios259.nb.html b/05-docs/extras/R_notebook_bios259.nb.html
diff --git a/05-docs/extras/quorto.qmd b/05-docs/extras/quorto.qmd
@@ -0,0 +1,26 @@
+---
+title: "The Art of Reproducibility"
+author: "Aziz Khan"
+format: revealjs
+---
+
+## Getting up
+
+-   Turn off alarm
+-   Get out of bed
+- 
+
+## Going to sleep
+
+-   Get in bed
+-   Count sheep
+
+## My Quarto Demo Document
+
+## Introduction
+
+Welcome to my Quarto demo document! In this document, we will learn the basics of Quarto and how to create beautiful and interactive documents.
+
+## Getting Started
+
+To get started with Quarto, you will need to install the Quarto CLI. You can do this by running the following command:
diff --git a/05-docs/gapminder-demo/.dockerignore b/05-docs/gapminder-demo/.dockerignore
@@ -0,0 +1,4 @@
+*.log
+.git/
+.pixi
+pixi.lock
diff --git a/05-docs/gapminder-demo/Dockerfile b/05-docs/gapminder-demo/Dockerfile
@@ -0,0 +1,19 @@
+FROM python:3.9-slim
+
+# Install Pixi
+
+FROM ghcr.io/prefix-dev/pixi:latest
+
+# Set working directory
+WORKDIR /project
+
+# Copy project files
+COPY . .
+
+# Install dependencies
+RUN cd /project
+RUN pixi install
+
+# Default command
+CMD ["pixi", "run", "preprocess"]
+