AtlasofPhenotypes.Rmd

---
title: |
  | Genomic conflicts of natural selection for ecologically-relevant traits
  | Supplemental Appendix
author: |
  | **Megan Ruffley, Laura Leventhal, Shannon Hateley, 
  | Sue Rhee , Moises Exposito-Alonso**
  | 1 Department of Plant Biology, Carnegie Institution for Science, California, USA
  | 2 Department of Biology, Stanford University, California, USA
  | correspondence: meganrruffley@gmail.com
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  html_document:
    toc: yes
    toc_depth: '2'
    df_print: paged
mainfont: Times New Roman
indent: yes
editor_options:
  chunk_output_type: console
  preview: viewer
---
```{r echo=FALSE, eval=T, message=FALSE, warning=FALSE, errors=FALSE, paged.print=FALSE}
  # pdf_document:
  #   toc: yes
  #   toc_depth: 2

# mypath <- "~/safedata/natvar"    ##the .rmd is in natvar/analyses/ 
# knitr::opts_knit$set(root.dir = mypath, warning=FALSE, errors=FALSE)
# setwd("./safedata/natvar/")
setwd("/natvar/")
#devtools::install(".")
#devtools::load_all("~/safedata/natvar") # natvar
#install.packages("ggplot2", dependencies = T, lib="/home/mruffley/R/x86_64-pc-linux-gnu-library/3.6")
#devtools::install_version("ggplot2", version = "3.3.5", )
library(knitr)
library(tinytex)
#tinytex::install_tinytex()
#remotes::install_github('rstudio/rmarkdown')
library(dplyr)
library(ggplot2)
#library(moiR)
library(data.table)
#library(corrplot)
library(cowplot)
library(caret)
#library(ggfortify)
theme_set(theme_cowplot())
#install.packages("caret", dependencies=T)
#remove.packages("ggplot2")
#source("~/safedata/natvar/analysesphenotypeselection/R/phenoselection_multi_FUNCTIONS.R")
```

--------------------------------------------------------------------------------

################################################################################
# I. Phenotypic landscape of _Arabidopsis thaliana_
################################################################################

################################################################################
## I.1 Description and Curation of Phenotypes 
################################################################################

### Data retrieval
The phenotypes come from a combination of laboratory and field experiments that use a subset of the 1001 genomes A. thaliana ecotypes Nearly 600 phenotypes come from the Arapheno database (Seren et al. 2017), but the remainder are not included in the public database. We also rely heavily on fitness data from 8 outdoor field experiments where fitness was measured under high and low rainfall environments (Exposito-Alonso et al. 2019). In total, data from 1862 phenotypes were gathered from over 108 published sources (Table SI.1). In the case of replicated ecotypes within an experiment with multiple measures for the same phenotype, we took the mean of the replicates.

```{r, echo=FALSE, eval=F}
#d<-read.table("./data/pheno_fromgoogle.tsv",header = T)
d <- read.table(file="./data/atlas_dir.tsv", header = T, sep="\t")
d <- d[,-4]


#write.table(d, "./tables/Allphenotype_Info_tableSI1.tsv", col.names = T, row.names = F, quote = F, sep="\t")
knitr::kable(d, col.names = colnames(d), caption = "Table SI.1 Phenotypes gathered for this study with the original publication, general name of the phenotype, number of 1001 genomes accessions used in the original study, general function category, and stress strategy classification.")
```

### Missing Data
We assembled the phenotype data for the 1001 genomes individuals of _Arabidopsis thaliana_, which is specifically 1,135 individuals. A there was not complete coverage of phenotype data across all accessions. Most phenotypes had less than 25% coverage across the 1,135 accessions, a few had data for several hundred accessions, and no phenotype had complete coverage across all accessions (__Fig. SI.1__).

```{r, echo=F, eval=T, fig.cap="Figure SI.1 Phenotypic coverage across 1001 Genomes A. thaliana Accessions", fig.height=2.5, fig.width=6}
RERUN=F 
if(RERUN){
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix_MR.csv", sep=",", header = T)
  dim(pheno)
  id<-pheno[,1]
  pheno<- pheno[,-1]
  phenomiss<-apply(pheno,2,function(x) sum(!is.na(x)) / length(x) )
  accmiss<-apply(pheno,1,function(x)sum(!is.na(x)) / length(x))
  plot_grid(
    qplot(phenomiss, xlab="% of accessions",ylab="# phenotypes",bins=25)+
      scale_x_continuous(breaks = seq(0,1,by=0.25),labels =seq(0,1,by=0.25)*100 ),
    qplot(accmiss, xlab="% of phenotypes",ylab="# accessions",bins=25)+
      scale_x_continuous(breaks = seq(0,1,by=0.25),labels =seq(0,1,by=0.25)*100 )
   ) -> phenotype_missingness
  write.table(file = './tables/atlas1001_phenotype_missingness-peraccession.tsv',
          x = data.frame(perc_miss=accmiss,id=id), sep = "\t", quote = F, row.names = F)
  write.table(file = './tables/atlas1001_phenotype_missingness-perphenotype.tsv',
          x = data.frame(perc_miss=phenomiss,phenotype=names(phenomiss)), sep = "\t", quote = F, row.names = F)

  save(file = "./figs/tmpobjects/phenotype_missingness.rda",phenotype_missingness)
  
  
  pdf(file="./figs/missingdatasummary.pdf", height=3, width = 3)
  qplot(phenomiss, xlab="% of accessions",ylab="# phenotypes",bins=25)+
      scale_x_continuous(breaks = seq(0,1,by=0.25),labels =seq(0,1,by=0.25)*100 )
  dev.off()
  
  
  phenotype_missingness
}else{
  load("./figs/tmpobjects/phenotype_missingness.rda")
  phenotype_missingness
}
```

### General function and drought adaptation classification
We classified all phenotypes into a general functional category (__Fig. SI.2A__), and were able to classify 1282 phenotypes as related to one of three drought adaptation strategies; _escape_, _avoidance_, and _tolerance_ (__Fig. SI.2B__, Kooyers 2015). We classified escape phenotypes as those that contribute to late germination, rapid growth, and fast reproduction. This encapsulates phenotypes related to dormancy induction and germination rate, growth rate, vernalization, and all traits related to reproduction, importantly flowering time and days to fruit. Avoidance phenotypes are those that are related to endure drought with advantageous phenotypes that both conserve and locate water, these include root growth and angle related phenotypes, leaf area, biomass accumulation, stomata density and size, and delta_C13, which represents a metric of  water use efficiency. Tolerance phenotypes are primarily metabolite related, as they may play an important role in dealing with desiccation and osmotic regulation. _A. thaliana_ is not observed as employing a tolerance strategy for drought adaptation, and so it is possible many of the metabolites are unrelated to drought adaptation, or even tolerance. The few metabolites we know of to be associated with drought and temperature stress, such as absiscic acid accumulation (ABA) and rhamnose, we associate with the avoidance strategy. Other than those two exceptions, we do not consider the tolerance classified phenotypes.

```{r, echo=FALSE, eval=T, fig.cap="Figure SI.2 Raw counts of phenotypes classified as general phenotype categories and drought response strategies.", fig.height=3, fig.width=5}
## chunk working, tested 9/20
RERUN=F
if(RERUN){
  library(future)
  d<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  d1<-table(d$phenotypecategory) %>% as.data.frame()
  d2<-table(d$stressstrategy) %>% as.data.frame()
  
  head(d)
   d[grep("DOG", d$phenotype),]
  d[grep("root", d$phenotype),]
  grep("root", d$phenotype)
  d[d$phenotype=="Growth_rate",]
  # d[d$phenotype=="Growth_rate",]
  # d_es <- d %>% filter(stressstrategy=="Avoidance")
  # summary(d_es$numaccessions)
  
  p1<-ggplot(d1) + geom_col(aes(x = Var1, y=Freq, fill=Var1 ), color='white', alpha=0.5) + xlab("") + ylab("# phenotypes")+
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    scale_fill_manual("",values = c("Defense"=alpha("red3"),"Development"=alpha('darkorange1'), 
                                    "Ionomics"=alpha("goldenrod1"), 
                                    "Metabolite"=alpha("yellow"), "Microbiome"=alpha("limegreen"), 
                                    "Reproduction"=alpha("dodgerblue1" ))) +
    theme(legend.position = c(.7, .7), legend.key.size =unit(.5, 'cm'), 
    legend.text = element_text(size=5), legend.title = element_blank()) 
  p2<-ggplot(d2) + geom_col(aes(x = Var1, y=Freq ,fill=Var1),color='white', alpha=0.5) + xlab("") +
    scale_fill_manual("",values = c("Avoidance"=alpha("red"),"Escape"=alpha('green4'), "Tolerance"=alpha("navy")))+    
    ylab("") + 
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    #theme(plot.margin=margin(l=-0.5,unit="cm")) +
    theme(legend.key.size =unit(.5, 'cm'),  legend.position = c(.01, .8),
    legend.text = element_text(size=5), legend.title = element_blank()) 

  PhenoCatStrat_Histograms<-plot_grid(p1,p2,nrow=1 ,rel_widths = c(1.6,1),align = "hv")
  save(file = "./figs/tmpobjects/PhenoCatStrat_Histograms.rda",PhenoCatStrat_Histograms)
  
  pdf(file="./figs/PhenoCatStrat_Histograms.pdf", width = 5, height=3.5)
  PhenoCatStrat_Histograms
  dev.off()
}else{
  load("./figs/tmpobjects/PhenoCatStrat_Histograms.rda")
  PhenoCatStrat_Histograms
  #![Histogram of phenotypes in functional categories (left) and drought response strategies (right)](../figs/PhenoHistogram.jpg){width=40% height=40%}
}
```

### Phenotypic Imputation 
We used the R package *missForest* to do a full dataset imputation (Stekhoven and Bühlmann 2012). This was done iteratively for each phenotype and all 1,135 individuals, with all other phenotypes being used as predictors. The out-of-bag approach resulted in an average normalized root mean squared error (NRMSE) of 0.195 (__Figure 3__). We report the NRMSE for all phenotypes in Supplemental Table [tables/]. Although useful for certain purposes, such as identifying ecotypes with potential extreme values in traits of interest for further experimentation, we do not recommend using this imputed matrix as face value.

```{r, echo=F, eval=T, fig.cap="Figure SI.3 Normalized root-MSE for all phenotypes.", fig.width=3, fig.height=2.5, warning=F}
RERUN=F 
if(RERUN){
  #getwd()
  #setwd("./safedata/natvar")
  library(missForest)
  library(stats)
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)
  pheno[pheno== -9]<-NA   ## turn -9 into NA 
  #sum(pheno== -9, na.rm = T)  ##check
  dimp<- missForest(pheno[,-1], variablewise=T,maxiter = 5)
  #has lower NRMSE at 0.2098436 on June 8 2021
  atlas1001_phenotype_matrix_imputed_withID=data.frame(pheno$V1,dimp$ximp)
  names(atlas1001_phenotype_matrix_imputed_withID)[1]<-"id"
  error <- data.frame(pheno=colnames(pheno[,-1]), MSE=dimp$OOBerror) 
  NRMSE_list<-c()
  varExp_list <- c()
  for (i in 1:ncol(pheno)-1){
    n <- sqrt(error[i,2]) / (max(pheno[,i+1], na.rm=T) - min(pheno[,i+1], na.rm=T))
    NRMSE_list <- c(NRMSE_list, n)
  }
  mean(NRMSE_list[-which(is.na(NRMSE_list)) ]) ##  0.1909089
  error$NRMSE <- NRMSE_list
  nrmse_hist <- qplot(error[,3], xlab="NRMSE",ylab="# accessions",bins=25)+
    scale_x_continuous(breaks = seq(0,1,by=0.25),labels =seq(0,1,by=0.25) )
  
  ### Save
  # write.table(atlas1001_phenotype_matrix_imputed_onlypheno,
  #             file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv',
  #             row.names = F, quote = F, col.names = T)
  write.table(atlas1001_phenotype_matrix_imputed_onlypheno, file="data/atlas1001_phenotype_matrix_imputed_withID.csv", sep=",", row.names = F, quote = F, col.names = T)
  write.table(error, file="./tables/atlas1001_imputationaccuracy-perphenotype.tsv", sep="\t", quote = F, row.names = F)
  save(file="./figs/tmpobjects/nrmse_hist.rda", nrmse_hist)
  
  tmp <- read.table(file="./data/atlas1001_phenotype_matrix_imputed_withID.csv", sep=",", header = T)
  head(tmp)
  sum(tmp$rFitness_mli == -9)
  #use_data(atlas1001_phenotype_matrix_imputed_onlypheno, overwrite = T)

}else{
  error<- read.table(file="./tables/atlas1001_imputationaccuracy-perphenotype.tsv", header = T)
  head(error)
  
  #er <- error[error$pheno %in% colnames(pheno), ]
  #colnames(pheno)[(!colnames(pheno) %in% er$pheno)]
  #sum(duplicated(colnames(pheno)))
  
  error$nMSE <- "NA"
  i<-1
  for (i in 1:nrow(error)){
    error$nMSE[i] <- error$MSE[i]/var(pheno[,i+1], na.rm = T)
  }
  error$NRMSE <- as.numeric(error$NRMSE)
  range(error$NRMSE, na.rm = )
  
  summary(error$NRMSE)
  error[error$NRMSE==1, ]
  p<-error %>%  filter(NRMSE<0.155)
  nrow(p)
  u<-error %>%  filter(NRMSE>0.234)
  nrow(u)
  df <- error %>% filter(pheno!="Agro_PPV_infection")
   mean_nrmse <- mean(df$NRMSE, na.rm = T)
   summary(df$NRMSE)
   
   hpd_interval <- HPDinterval(as.mcmc(df$NRMSE), prob = 0.95)

   
   l <-mean_nrmse - IQR(df$NRMSE, na.rm = T)/2
   u <- mean_nrmse + IQR(df$NRMSE, na.rm = T)/2
   
    sd_nrmse <- sd(df$NRMSE, na.rm = T)

    # Calculate standard error
    se_nrmse <- sd_nrmse / sqrt(nrow(df))
    
    margin_error <- qt(0.975, df = nrow(df) - 1) * se_nrmse

# Calculate confidence interval
ci_lower <- mean_nrmse - margin_error
ci_upper <- mean_nrmse + margin_error

    
  library(coda)
  library(ggplot)
error$NRMSE
  p1 <- ggplot(error, aes(x = NRMSE)) + 
  geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +  # Adjust binwidth and colors as needed
  labs(x = "NRMSE", y = "Frequency") +  # Customize titles +
    geom_vline(xintercept = c(0.071, 0.36), linetype = "dashed", color = "red", size = 1) +  # Add vertical lines at 'l' and 'u'
  theme_minimal() 
  p1
  pdf(file="./figs/")
  
  
  mean(error$nMSE, na.rm=T)
  sum(error$nMSE>1.0, na.rm=T)
  length(na.omit(error$nMSE))
  nrow(error)

  nrmse_hist <- qplot(error[,3], xlab="NRMSE",ylab="# accessions")+
    scale_x_continuous(breaks = seq(0,1,by=0.25),labels =seq(0,1,by=0.25) ) +
    geom_vline(xintercept = c(0.071, 0.36), linetype = "dashed", color = "black", size = .8)
  
  pdf(file="./figs/NRMSE.pdf", width = 3, height = 3)
  nrmse_hist
  dev.off()
}
```

################################################################################
## I.2 Principal Components Analysis
################################################################################

We use principal components analysis (PCA) to decorrelate and decompose the escape and avoidance phenotypic variation. We also use a subset of 515 accessions out of the 1,135 accessions associated with the 1001 genomes accessions (1001 Genomes Project). We first isolated specific phenotypes related to dormancy, vernalization, germination, flowering time, leaf traits, roots, stomata, growth rates, and stress response. This narrowed down the phenotypes from 509 to 205, from here we removed highly correlated phenotypes (R2 > 0.7), resulting in a total of 64 phenotypes. We show the overall number of how many of each phenotype we use below (__Fig. SI.4__). 

```{r, echo=F, eval=T, fig.cap="Figure SI.4 Counts of phenotypes and phenotype categories used in PCA.", fig.height=3, fig.width=3}
## this chunk narrows down which phenotypes used in the PCA
RERUN=F 
if(RERUN){
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  #raw_pheno <- read.table(file = './data/atlas1001_phenotypes_matrix.csv')
  
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  
  load("./data/phenotypenames.rda")
  FT_phenos<-c("Flowering_time","FT16","FT10", "FT", "FT.1", "FT.2" )
  moiFT <- colnames(pheno)[grep("FT_", colnames(pheno))][1:8]
  dtf <- colnames(pheno)[grep("DTF", colnames(pheno))]
  ft_atwell_phenos <- c("X32_FT_Duration_GH", 
                        "X52_LC_Duration_GH", "X29_FLC",  "X69_FRI", "X58_LFS_GH","X86_FT_Field",  
                        "X49_FT_GH", "X31_FT10", "X30_FT16", "X102_FT22",  "X89_LD",  "X34_LDV",  
                        "X47_SD", "X104_SDV", "X37_0W", "X11_2W", "X100_4W",  "X48_8W",  "X99_8W_GH_FT",  "X53_0W_GH_FT")
  allflowering <- c(FT_phenos, ft_atwell_phenos, moiFT, dtf)
  allstomata<-c("StomatalIndex_mild_drought","StomatalIndex_well_watered","SIL_vs_SIC","Stomatal_density_in_cotyledon",
                "Stomatal_density_in_first_leaf","Stomatal_index_in_cotyledon","stomata_density","stomatasize", "Stomatal_index_in_first_leaf")
  allstress<-c("Delta_13C","drought_index", "ABA_96h_low_water_potential", "rhamnose_1_exp2", "proline_exp1", "leucine_exp1")
  allgrowth<-c("RGR","Growth_rate","rosette_DM","RosetteArea_mild_drought","RosetteArea_well_watered", "X90_Seedling_Growth", "RGRbv", 
               "RGRav")
  allgermination<-c("base_perc"      , "d11_10C_perc"   , "d11_4C_perc"    ,  "d15_10C_perc"  ,  "d15_4C_perc",
                     "d22_10C_perc"   ,  "d22_4C_perc"  ,   "d32_10C_perc", "d32_4C_perc"    ,  "d4_10C_perc" ,   "d4_4C_perc",
                     "d8_10C_perc"    ,  "d8_4C_perc" ,"DSDS50","DSDS10","DSDS90", "DTgerm","X39_After_Vern_Growth","X72_Vern_Growth",
                    "germination_3days", "germination_7days", "X45_Germ_22","X59_Germ_10", "X60_Germ_16", "X68_Germ_in_dark" , 
                    "X294_Germination_rate_21C",   "X106_Secondary_Dormancy", "X94_Seed_Dormancy",
                    "Germination_10C", "Germination_22C", "Germination_30C")
  leaf <- c("First_leaf_area","X92_Leaf_serr_22","X71_Trichome_avg_C","X6_Leaf_serr_10","X77_Trichome_avg_JA")
  
  allroot<-dplyr::filter(data.frame(phenotypenames), X1=="Busch_Slovak_PlantCell_2014_PID_24920330")$X2
  importanttraits<-c(allstomata,
                      allgrowth,
                      allgermination,
                      allflowering,leaf,
                      allroot,allstress)
  write.table(importanttraits, file="./data/importanttraits.csv",sep=",", col.names = F, quote = F, row.names = F )
  
  pheno[1:5,1:5]
  dim(pheno)
  idex515
  id <- pheno[idex515,1]
  target_pheno<- pheno[idex515,colnames(pheno) %in% importanttraits]
  #importanttraits[which(!importanttraits %in% colnames(target_pheno))]   ## should be 0
  target_pheno <- apply(target_pheno,2,fn)
  target_pheno <- apply(target_pheno,2,scale)
   target_pheno[1:5,1:5]
  ?scale
  Targ_cor<-cor(target_pheno)
  hist(target_pheno[,20])
  #corrplot(Targ_cor, method = "color", type = "lower", diag = F,tl.cex = .35, tl.srt = 45)
  
  ## Remove traits so that two traits are not more correlated than 0.7
  decorrelate<-findCorrelation(
                Targ_cor,
                cutoff = 0.7,
                verbose = FALSE,
                names = FALSE
              )
  
  decorrelate<- decorrelate[ !(decorrelate %in% which(colnames(target_pheno) %in% c("ABA","Delta_13C", "RGR", "FT16", "Growth_rate")))]
  target_pheno<-target_pheno[,-decorrelate]
  target_pheno <- cbind(id, target_pheno)
  head(target_pheno)
  saveRDS(target_pheno, file="./data/TargetPhenoMatrix.rda")
  
  
   ## manually categorize phenotypes
  phenoTypes <- data.frame(pheno=colnames(target_pheno), cat=rep(NA, 64))
  phenoTypes$cat[grep("Root", phenoTypes$pheno)] <- rep("roots", length(grep("Root", phenoTypes$pheno)))
  phenoTypes$cat[grep("root", phenoTypes$pheno)] <- rep("roots", length(grep("root", phenoTypes$pheno)))
  phenoTypes$cat[c(grep("FT", phenoTypes$pheno), 40,48)] <- rep("floweing time", length(grep("FT", phenoTypes$pheno)))
  phenoTypes$cat[grep("DTF", phenoTypes$pheno)] <- rep("floweing time", length(grep("DTF", phenoTypes$pheno)))
  phenoTypes$cat[grep("Germ", phenoTypes$pheno)] <- rep("germination", length(grep("Germ", phenoTypes$pheno)))
  phenoTypes$cat[c(grep("germ", phenoTypes$pheno),53)] <- rep("germination", length(grep("germ", phenoTypes$pheno))+1)
  phenoTypes$cat[grep("DSDS", phenoTypes$pheno)] <- rep("dormancy", length(grep("DSDS", phenoTypes$pheno)))
  phenoTypes$cat[grep("Dorm", phenoTypes$pheno)] <- rep("dormancy", length(grep("DSDS", phenoTypes$pheno)))
  phenoTypes$cat[grep("Stom", phenoTypes$pheno)] <- rep("stomata", length(grep("Stom", phenoTypes$pheno)))
  phenoTypes$cat[grep("Grow", phenoTypes$pheno)][1:2] <- rep("dormancy", 2) 
  phenoTypes$cat[grep("Grow", phenoTypes$pheno)][3:4] <- rep("growth", length(grep("Grow", phenoTypes$pheno))-2)
  phenoTypes$cat[grep("RGR", phenoTypes$pheno)] <- rep("growth", length(grep("RGR", phenoTypes$pheno)))
  phenoTypes$cat[c(grep("stom", phenoTypes$pheno),36)] <- rep("stomata", length(grep("stom", phenoTypes$pheno))+1)
  phenoTypes$cat[grep("Rose", phenoTypes$pheno)] <- rep("growth", length(grep("Rose", phenoTypes$pheno)))
  phenoTypes$cat[grep("Delta", phenoTypes$pheno)] <- rep("stress", length(grep("Delta", phenoTypes$pheno)))
  phenoTypes$cat[c(grep("leaf", phenoTypes$pheno), 45, 50, 52)] <- rep("leaf", length(grep("leaf", phenoTypes$pheno)))
  phenoTypes$cat[c(30, 31, 32, 58, 59, 63, 64 )]<- rep("stress", 7)

  cats <- phenoTypes %>% count(cat)

  TargetphenotypeCat<- ggplot(cats) + geom_col(aes(x = cat, y=n ), color='gray') + xlab("") + ylab("# phenotypes")+
     theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
     # scale_fill_manual("",values = c("dormancy"=transparent("gray"),"floweing time"=transparent('gray'), 
     #                                 "germination"=transparent("goldenrod1"), 
     #                                 "growth"=transparent("yellow"), "roots"=transparent("limegreen"), 
     #                                 "stomata"=transparent("dodgerblue1" ), "stress"=transparent("purple"), 
     #                                 "leaf"=transparent("pink"))) +
     theme(legend.position = c(.75, .7), legend.key.size =unit(.5, 'cm'), 
     legend.text = element_text(size=8), legend.title = element_blank()) 
   TargetphenotypeCat
   save(file="./figs/tmpobjects/TargetphenotypeCategories.rda", TargetphenotypeCat)
   
   pdf("./figs/TargetphenotypeCat.pdf", width = 4, height = 3)
   TargetphenotypeCat
   dev.off()
  
}else{
  load("./figs/tmpobjects/TargetphenotypeCategories.rda")
  TargetphenotypeCat
  
  
}
```

```{r, echo=F, eval=T, warning=F, fig.cap="Figure 1A (in main text). PCA axes 1 (14.7%) and 2 (8.6%) of 64 target phenotypes for the subsetted 515 accessions for which we have corresponding common garden fitness data for.", fig.height=4, fig.width=6}
##This chunk makes Figure 1A.
RERUN=F 
if(RERUN){
  
  
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)
  fitness <- c(colnames(pheno)[grep("Fitness", colnames(pheno))][c(3,4,7,8)], colnames(pheno)[grep("Survival", colnames(pheno))][c(3,4,7,8)],  colnames(pheno)[grep("Seed", colnames(pheno))][c(3,4,7,8)])
  phenofit <- pheno[,colnames(pheno) %in% fitness]
  phenofit[phenofit == -9] <- NA
  which(phenofit==-9)
  
  phenofit<-data.frame(id=pheno$V1, phenofit)
  dim(phenofit)
  phenofit <- phenofit %>% mutate(normFit = normalize.(rFitness_mlp)) %>%
    mutate(normSurv = normalize.(rSurvival_fruit_mlp)) 
  
  saveRDS(phenofit, file="./data/phenofit.rda")
    
  fit_dat <- merge(target_pheno, by="id", phenofit)
  
  
  ## run pca
  target_pheno<-readRDS(file="./data/TargetPhenoMatrix.rda")
  target_pheno[1:5,1:5]
  pcaTarget<-prcomp(target_pheno[,-1])
  saveRDS(pcaTarget, file="./data/pcaTarget.rda") 
  load<-pcaTarget$sdev^2 /sum(pcaTarget$sdev^2)
  load
  cumsum(pcaTarget$sdev^2/sum(pcaTarget$sdev^2))
  # Allload<-as.matrix(pcaTarget$rotation[,1:length(pcaTarget$sdev)])
  # 
  # ## check loadings
  # sort(abs(Allload[,1]), decreasing = T)[1:20] ## FT, Growth_rate
  # sort(abs(Allload[,2]), decreasing = T)[1:20] ## Root growth
  # sort(abs(Allload[,3]), decreasing = T)[1:20]
  # 
  pcaallplot<-autoplot(pcaTarget,x = 1,
           loadings = T, loadings.label = T,
           loadings.colour = "darkgrey",loadings.label.colour="black",
           loadings.label.size = 4.5)
  plot_grid(pcaallplot)

  library(RColorBrewer)
    # df <- data.frame(pc1= pcaTarget$x[,1] ,
    #                pc2 = -pcaTarget$x[,2] ,
    #                pc3 = pcaTarget$x[,3])
  
  df <- data.frame(pc1= pcaTarget$x[,1] * load[1],
                   pc2 = -pcaTarget$x[,2] * load[2],
                   pc3 = pcaTarget$x[,3] * load[1],
                   fit = fit_dat$normFit,
                   surv = fit_dat$normSurv)
  
  ## Make palette
  normalize.<-function(x) (x-min(x,na.rm=T)) / (max(x,na.rm=T)-min(x,na.rm=T))
  mypalette<-rgb(
    1-normalize.(df$pc1),
    normalize.(df$pc2),
    normalize.(df$pc1)
  )
  
  # sort(abs(Allload[,1]), decreasing = T)[1:30] ## FT, Growth_rate
  # sort(abs(Allload[,2]), decreasing = T)[1:30] ## Root growth
  # sort(abs(Allload[,3]), decreasing = T)[1:20]

 #png(file = './figs/PCrainbowScatterPlot_scaled.png')
 pdf(file = './figs/PCrainbowScatterPlot_scaled.pdf')
 PCplot_scaled <- ggplot(data=df, aes(x=pc1, y=pc2)) + 
    geom_point( aes(size=surv),color=mypalette, alpha=0.95) +
    ylab("PC2")+ xlab("PC1 ") + ylim(-1,1) + xlim(-1,1.1)+
    scale_size(range = c(2, 5)) +
    geom_hline(yintercept=0, linetype="dotted", color="grey50")+
    geom_vline(xintercept=0, linetype="dotted", color="grey50")+
    theme(legend.position = "right", legend.text = element_text("none")) +
    annotate("segment", x = 0, xend = -0.-0.02, y = 0, yend = -.94, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = -0.03, y =-.96, label = "% germination")+
    annotate("segment", x = 0, xend = -.5, y = 0, yend = .32, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = -.62, y = .36, label = "dormancy") +
    annotate("segment", x = 0, xend = .78, y = 0, yend = .23, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .83, y = .27, label = "flowering time") +
     annotate("segment", x = 0, xend = .48, y = 0, yend = .16, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .54, y = .2, label = "vernalization") +
    annotate("segment", x = 0, xend = .73, y = 0, yend = .1, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .88, y = .1, label = "delta_13C") +
    annotate("segment", x = 0, xend = .65, y = 0, yend = .45, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .76, y = .46, label = "FLC exp") +
    annotate("segment", x = 0, xend = .80, y = 0, yend = 0, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .92, y = -.01, label = "FRI exp") +
    annotate("segment", x = 0, xend = .88, y = 0, yend =-.13, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = 1, y = -.17, label = "growth rate") +
    annotate("segment", x = 0, xend = .42, y = 0, yend =-.12, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .43, y = -.14, label = "leaf area") +
    annotate("segment", x = 0, xend = -.68, y = 0, yend = .18, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = -.74, y = .2, label = "RGR") +
    annotate("segment", x = 0, xend = .26, y = 0, yend = .77, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .3, y =.81, label = "root RGR") +
    annotate("segment", x = 0, xend = .1, y = 0, yend = .63, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .1, y =.67, label = "root angle") +
    annotate("segment", x = 0, xend = -.15, y = 0, yend = -.8, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = -.1, y =-.85, label = "root horizontal index") +
    annotate("segment", x = 0, xend = -.26, y = 0, yend = .34, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = -.3, y =.37, label = "ABA") +
    # annotate("segment", x = 0, xend = -3.7, y = 0, yend = -6.3, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    # annotate("text", x = -4.4, y =-6.5, label = "rhamnose") +
    annotate("segment", x = 0, xend = .53, y = 0, yend = -.47, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
    annotate("text", x = .7, y =-.5, label = "stomatal denisty")+
   coord_equal()
  plot_grid(PCplot_scaled)
  dev.off()
  save(PCplot_scaled, file="./figs/tmpobjects/Figure_1A_PCcolorgram_scaled.rda")
  
  
  #png(file = './figs/PCrainbowScatterPlot.png')
  # pdf(file = './figs/PCrainbowScatterPlot.pdf')
  # PCplot <- ggplot(data=df, aes(x=pc1, y=pc2)) + 
  #   geom_point(size=5, color=mypalette, alpha=0.95) +
  #   ylab("PC2")+ xlab("PC1 ") + ylim(-10,11) + xlim(-8,9) +
  #   geom_hline(yintercept=0, linetype="dotted", color="grey50")+
  #   geom_vline(xintercept=0, linetype="dotted", color="grey50")+
  #   theme(legend.position = "right", legend.text = element_text("none")) +
  #   annotate("segment", x = 0, xend = -0.-0.2, y = 0, yend = -9.4, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = -0.3, y =-9.6, label = "% germination")+
  #   annotate("segment", x = 0, xend = -5, y = 0, yend = 3.2, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = -6.2, y = 3.6, label = "dormancy") +
  #   annotate("segment", x = 0, xend = 7.3, y = 0, yend = 2, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 7.7, y = 2.5, label = "flowering time") +
  #    annotate("segment", x = 0, xend = 4.8, y = 0, yend = 1.6, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 5.4, y = 2, label = "vernalization") +
  #   annotate("segment", x = 0, xend = 5.7, y = 0, yend = 1, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 7, y = 1, label = "delta_13C") +
  #   annotate("segment", x = 0, xend = 6.2, y = 0, yend = 4.5, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 7.1, y = 4.6, label = "FLC exp") +
  #   annotate("segment", x = 0, xend = 7, y = 0, yend = 0, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 8, y = -.1, label = "FRI exp") +
  #   annotate("segment", x = 0, xend = 8.2, y = 0, yend =-1.3, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 7.9, y = -1.9, label = "growth rate") +
  #   annotate("segment", x = 0, xend = 4.2, y = 0, yend =-1.2, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 4.3, y = -1.4, label = "leaf area") +
  #   annotate("segment", x = 0, xend = -6.8, y = 0, yend = 1.8, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = -7.4, y = 2., label = "RGR") +
  #   annotate("segment", x = 0, xend = 2.6, y = 0, yend = 7.7, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 3, y =8.1, label = "root RGR") +
  #   annotate("segment", x = 0, xend = 1., y = 0, yend = 6.3, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 1, y =6.7, label = "root angle") +
  #   annotate("segment", x = 0, xend = -1.5, y = 0, yend = -8, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = -1, y =-8.5, label = "root horizontal index") +
  #   annotate("segment", x = 0, xend = -2.6, y = 0, yend = 3.4, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = -3, y =3.7, label = "ABA") +
  #   # annotate("segment", x = 0, xend = -3.7, y = 0, yend = -6.3, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   # annotate("text", x = -4.4, y =-6.5, label = "rhamnose") +
  #   annotate("segment", x = 0, xend = 5.3, y = 0, yend = -4.7, arrow = arrow(length = unit(.3,"cm")), color="grey20", size=.5) + 
  #   annotate("text", x = 7, y =-3.8, label = "stomatal denisty")
  # 
  # plot_grid(PCplot)
  # dev.off()
  # save(PCplot, file="./figs/tmpobjects/Figure_1A_PCcolorgram.rda")

}else{
  load(file="./figs/tmpobjects/Figure_1A_PCcolorgram.rda")
  plot_grid(PCplot)
}
```

The PCA captures 23% of the total phenotype variation in the first two axes (Fig. 1A). We were curious how this phenotypic variation looked in all of the 1001 genomes individuals within the Eurasian limits (999 ecotypes). We find the PCA with all of these ecotypes explains the same amount of variation in the first two axes (23.6%) as the PCA with the subset of samples.

```{r, echo=F, eval=T, fig.cap="Figure SI.5 PCA axes 1 (15.6%) and 2 (7.9%) of 64 target phenotypes for all Eurasian accessions of A. thaliana (n=999)."}
RERUN=F 
if(RERUN){
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  dimp<-read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  tmp_data <-dimp[dimp[,1] %in% pheno$id,1:5]
  tmp_data <- tmp_data %>% filter(longitude>-15 &longitude<90) %>%  filter(latitude>32 &latitude<65)
  
  tmp_pheno <-  pheno[pheno$id %in% tmp_data$id,] 
  
  importanttraits <- read.table(file="./data/importanttraits.csv")
  tmp_pheno <- tmp_pheno[,colnames(tmp_pheno) %in% importanttraits[,1]]
  
  tmp_pheno <- apply(tmp_pheno,2,fn)
  tmp_pheno <- apply(tmp_pheno,2,scale)
  tmp_cor<-cor(tmp_pheno)
  #corrplot(tmp_cor, method = "color", type = "lower", diag = F,tl.cex = .75)
  
  ## Remove traits so that two traits are not more correlated than 0.7
  decorrelate<-findCorrelation(
                tmp_cor,
                cutoff = 0.7,
                verbose = FALSE,
                names = FALSE
              )
  
  decorrelate<- decorrelate[ !(decorrelate %in% which(colnames(tmp_pheno) %in% c("ABA","Delta_13C", "RGR", "FT16", "Growth_rate")))]
  tmp_pheno<-tmp_pheno[,-decorrelate]
 
  tmp_pca<-prcomp(tmp_pheno)
  tmp_pca$sdev^2/sum(tmp_pca$sdev^2)
  cumsum(tmp_pca$sdev^2/sum(tmp_pca$sdev^2))
  
  pcaallplot<-autoplot(tmp_pca,
           loadings = T, loadings.label = T, alpha=0.1, 
           loadings.colour = "darkgrey",loadings.label.colour="black",
           loadings.label.size = 2.5)
  saveRDS(file="./figs/tmpobjects/1001_pcaallplot.rda", pcaallplot)
  
  pdf(file="./figs/1001_pcaLoadingsPlot.pdf", width = 6, height = 4)
  plot_grid(pcaallplot)
  dev.off()
  
}else{
   pcaallplot <- readRDS(file="./figs/tmpobjects/1001_pcaallplot.rda")
  # pcaallplot <- readRDS(file="./figs/tmpobjects/1001_pcaallplot_scaled.rda")
  plot_grid(pcaallplot)
  
}
```

################################################################################
## I.3 Phenotype and Climate Associations
################################################################################

### Phenotype-phenotype associations

We measured how many phenotypes were correlated with flowering time, WUE, and growth rate using Pearson's correlation coefficient and a significance threshold of 0.05 (__Table SI.3__). We find that the phenotypic trade-off between WUE and flowering time is not an isolated trade-off, but actually involved in a massive suite of correlated complex traits related to both seasonal and drought adaptation. We find that 36-39% of all traits classified as escape or avoidance are significantly correlated with flowering time 32, 17% with WUE 28, and 25% with growth rate.

```{r, echo=F, eval=T, message=F, warning=F}
RERUN=F 
if(RERUN){
  library(Hmisc)
   ## correlation Ft and WUE with all escape/avoid phenotype variables
  pheno <- read.csv(file = './data/atlas1001_phenotypes_matrix_MR.csv')
  pheno[1:5,1:5]
   d<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  head(d)
  
  ss<- as.character(d$phenotype[d$stressstrategy=="Escape"][1:125])
  ss2 <- as.character(d$phenotype[d$stressstrategy=="Avoidance"][1:384])
  ss <- c(ss, ss2)
  colnames(pheno) <- gsub("X", "", colnames(pheno))
  sum(colnames(pheno) %in% ss)
  ss_pheno <- pheno[,colnames(pheno) %in% ss]
  #ss_pheno <- apply(ss_pheno, 2, as.numeric)
  87/509
  
  ft_wue <- cbind(ss_pheno[,"FT16"],  ss_pheno[,"Delta_13C"],  ss_pheno[,"FT10"],  ss_pheno[,"FT_mlp"],   ss_pheno[,"FT_mli"], ss_pheno[,"Growth_rate"])
  colnames(ft_wue) <- c("FT16", "Delta_13C", "FT10", "FT_mlp", "FT_mli", "Growth_rate")
  ss_pheno <- cbind(ft_wue, ss_pheno[, !colnames(ss_pheno) %in% c("FT16", "Delta_13C", "FT10", "FT_mlp", "FT_mli", "Growth_rate")])
  ss_pheno[1:5,1:15]
  dim(ss_pheno)
  ss_pheno_cor_test <- rcorr(ss_pheno, type = "pearson")
  
  ss_pheno_cor_test$r[ss_pheno_cor_test$P > 0.05] <- NA
  dim(ss_pheno_cor_test$r)
  ss_pheno_cor_test$r[,1:6]
  ss_pheno_cors_signif <- data.frame(ss_pheno_cor_test$r[order(ss_pheno_cor_test$r[,2]),1:6])
  dim(ss_pheno_cors_signif)
  head(ss_pheno_cors_signif)
  
  write.table(ss_pheno_cors_signif, file="./tables/FT_WUE_GR_TraitAssociations_signif_only_TableSI3.csv", sep=",", 
              quote = F, row.names = T, col.names = T)
  
  ### Summary of how many traits in escape and avoidance are signif correlated with FT and WUE
  colSums(!is.na(ss_pheno_cors_signif))/506
}else{
  ss_pheno_cors_signif <- read.table(file="./tables/FT_WUE_GR_TraitAssociations_signif_only_TableSI3.csv", sep=",", header = T)
  colSums(!is.na(ss_pheno_cors_signif))/506
}

```

### Phenotype-latitude associations

Using the latitude of the original collection location associated with the focal 515 A. thaliana accessions, we estimated Pearson's correlation coefficient for the target 64 phenotypes and latitude. We also correlated latitude with the PC axes. We find that PC 1 is highly correlated with latitude across the natural range (Pearson’s r = 0.35 , p-value < 2.2x10-16), as are the phenotypes associated with PC1 (__Table SI.4__)

```{r, echo=F, eval=F, message=F, warning=F}
## This chunk makes Table SI.4
RERUN=F 
if(RERUN){
  target_pheno <- readRDS(file="./data/TargetPhenoMatrix.rda")
  head(target_pheno)
  dim(target_pheno)
  
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  
  ## phenotype
  dimp<-read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  dimp <- dimp[,c("id", "name", "latitude", "longitude", "FT16", "Delta_13C", "X94_Seed_Dormancy" )]
  index515_2029 <- which(dimp$id %in% idsfield)
  head(dimp)
  
  ## correlation of target phenotypes and lattitude
  tp_dimp <- merge(target_pheno, by.x="id", dimp, by.y="id")
  head(tp_dimp)
  head(tp_dimp[,c(67, 2:65)])
  target_pheno_andLat <-  tp_dimp[,c(67, 2:65)]
  target_pheno_andLat_cors <- cor(target_pheno_andLat)
  lat_pheno_cors <- target_pheno_andLat_cors[-1,1]
  lat_pheno_cors <- data.frame(names(lat_pheno_cors), lat_pheno_cors)
  colnames(lat_pheno_cors) <- c("phenotype","latitude")
  lat_pheno_cors <- lat_pheno_cors[order(lat_pheno_cors[,2]),c(1,2)]
  write.table(lat_pheno_cors, file="./tables/lat_targetpheno_cors_TableSI4.tsv", row.names = F, col.names = T, quote = F, sep="\t")
  
  ## PCs correlation with latitude
  ## also look at how PCs correlate with climate data
  pcaTarget <- readRDS(file="./data/pcaTarget.rda")
  pcs_withid <- cbind(id, pcaTarget$x)
  
  head(pcaTarget)
  head(pcs_withid)
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  pheno <- read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  whichfield<-which(pheno$id %in% idsfield)
  idex515_2029<-whichfield
  # saveRDS(idex515, file="./data/idex515_2029.rda")
  # 
  # ##  some specific correlation tests
  head(df)
  df_withpcs <- merge(df, by.x="id", pcs_withid, by.y="id")
  head(df_withpcs)
  dorm_lat_cor <- cor.test(df$dorm, df$lat)
  # 
  pcs_clim <- data.frame(pc1 = pcaTarget$x[,1], pc2 = pcaTarget$x[,2],lat=dimp$latitude[idex515], long=dimp$longitude[idex515],   clim[idex515, ])
  pcs_clim_cor <- cor(pcs_clim, use = "pairwise.complete.obs", method = "pearson")
  # pcs_clim_cor[order(pcs_clim_cor[,1]), 1:2]
  # pc2 + cor with bio17, negative cor bio15
  
  cor.test(y=df_withpcs$PC1,x= df_withpcs$lat, method = "pearson")
  
}else{
  lat_pheno_cors <- read.table(file="./tables/lat_targetpheno_cors_TableSI4.tsv", header=T, sep="\t")
  knitr::kable(lat_pheno_cors, col.names = colnames(lat_pheno_cors), caption = "Table SI.4 Pearson's correlation coefficient estimates of target phenotypes with latitude.", fixed_thread=T )
}
```


### Phenotype-climate associations

We downloaded bioclimatic, temperature, precipitation, and evapotranspiration rate estimates from WORLDCLIM 2.0 for all of the localities associated with the 1001 genomes accessions. We again subsetted the accessions to the 515 for which we have fitness data available and estimated Pearson's correlation coefficient for various associations of climate data with target phenotypes such as flowering time, WUE (measured as delta C13), and growth rate (__Table SI.5__). Additionally, we fit linear models of the climate data as a function of the target phenotypes. We did this both exclusively with just the climate data and phenotypes, but we also fit linear models using genomic principle components (PCs) from the 1001 genomes individuals and latitude as covariates in the model (__Fig. 1B__)

```{r, echo=F, eval=T, ,message=F, warning=F, fig.cap="Figure 1B,C,D (main text). (top) Flowering time and Delta_C13 association. (middle) Climate Associations with floweing time, and (bottom) delta_C13", fig.height=8, fig.width=3.5}
## This chunk makes Figures 1 B, C & D
RERUN=F 
if(RERUN){
  setwd("./safedata/natvar/")
  library(raster)
  library(missForest)
  library(rgdal)
  library(tidyverse)
  library(Hmisc)
  library(corrplot)
  library(cowplot)
  theme_set(theme_cowplot())
  
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  index515_2029 <- which(dimp$id %in% idsfield)
  index515 <- which(dimp$id %in% idsfield)
  
  ## phenotype
  dimp_2029 <- read.csv(file = './data/atlas_phenotype_matrix_withid.csv') ##2029
  dimp2 <- dimp_2029[, c("id", "latitude", "longitude")]
  head(dimp2)
  
  dimp_1001 <- read.csv(file = './data/atlas1001_phenotypes_matrix_MR.csv')
  dimp_1001[1:5,1:5]
  dimp1 <- dimp_1001[, c("id", "FT16", "FT10", "FT_mli", "FT_mlp", "Flowering_time", "Delta_13C","Growth_rate", "X94_Seed_Dormancy", "rSurvival_fruit_mlp", "rSurvival_fruit_mli", "rFitness_mlp", "rFitness_mli" )]
  dimp1[1:5,1:5]
  
  
  # head(dimp2)
  #dimp1[,-1] <- apply(dimp1[,-1], 2, scale)
  dimp <- merge(dimp1, dimp2, by="id", all.x=T)
  head(dimp)
  
  ## climate
  clim <- read.table(file="./climate/2029gclimate.csv", sep=",",header = T)
  #clim <- read.table(file="./climate/worldclim2/2029g_climate_accessions.csv", sep=",",header = T)
  head(clim)
  clim <-  data.frame(id=dimp2$id, clim)
  
  ## Genetic PCs
  gimp <- read.csv(file="./data/atlas_phenotype_matrix_imputedwithpcs.csv")
  gimp <-  data.frame(id=dimp2$id, gimp[,1:20])
  cgimp <- merge(clim, gimp, by="id")
  # head(dimp)
  
  ## merge back and only the 1001 accessions
  df <- merge(dimp, cgimp, by="id")
  head(df)
  df <- data.frame(df)
  
  ## get mean of temp. min and max, mean precip, and meanpet
  df$meanMaxTemp <- apply(df[,c(57:68)], 1, mean)
  df$meanMinTemp <- apply(df[,c(45:51)], 1, mean)
  df$meanPrecip <- apply(df[,c(33:39)], 1, mean)
  df$meanPET <- apply(df[,c(69:75)], 1, mean)
  
  df <- df %>% filter(longitude>-15 &longitude<90) %>%  filter(latitude>32 &latitude<65)
  head(df)

  #### check key correlations
  ## wue and ft
  cor.test(df$FT16, df$Delta_13C)
  cor.test(df$FT16, df$bio18)
  

  df <- apply(df[,-1], 2, as.numeric)
  head(df)
  dim(df)
  df_cor_test <- rcorr(df[], type = "pearson")

  ## remove non-sig. corrs from table
  df_cor_test$r[df_cor_test$P > 0.05] <- NA
  sort(df_cor_test$r[,1])
  df_cors_signif <- df_cor_test$r[order(df_cor_test$r[,1]),1:14]
  head(df_cors_signif)
  
  df_cors_mens <- df_cors_signif[grep("mean", rownames(df_cors_signif)),]
  df_cors_bio <- df_cors_signif[grep("bio", rownames(df_cors_signif)),]
  df_cors_prec <- df_cors_signif[grep("prec", rownames(df_cors_signif)),]
 df_cors_temp <- df_cors_signif[c(grep("tmin", rownames(df_cors_signif)), grep("tmax", rownames(df_cors_signif))),]
 
  
 write_out <- rbind(df_cors_mens, df_cors_bio,df_cors_prec, df_cors_temp )
  write.table(write_out, file="./tables/Trait-Clima-Assoc-SuppT4.csv", sep=",", 
              quote = F, row.names = T, col.names = T)
  
  #  write.table(df_cors_bio, file="./tables/TraitBIOCLIMAssociations_signif_only_notlimitedto515.csv", sep=",", 
  #             quote = F, row.names = T, col.names = T)
  # 
  # write.table(df_cors_signif, file="./tables/TraitClimateAssociations_signif_only.csv", sep=",", 
  #             quote = F, row.names = T, col.names = T)
  
  # df_cor[1:5,1:5]
  # df_cor <- df_cor[order(df_cor[,1]),1:6]
  # #df_cor[grep("bio", rownames(df_cor)),]
  # write.table(df_cor, file="./tables/TraitClimateAssociations_TableSI5.csv", sep=",", 
  #             quote = F, row.names = T, col.names = T)
  
  # df_cor <- read.table(file="./tables/TraitClimateAssociations_TableSI5.csv", sep=",", header=T)
  # 

  plot(lm(df$FT16~df$bio12))
  plot(y=df$FT16, x=df$bio12)
  cor.test(df$FT16, df$bio1)
  
  ## look at correlation of ft and wue, while accounting for latitude and 7 genetic PCs
  df_i <- df[index515, ]
  df[1:5,1:5]
  mypalette <- readRDS(file="./data/mypalette.rda")
  colnames(df)[3:16] <- c("lat", "long", "ft", "ft2", "ft3","ft4","ft5", "wue", "gr", "dorm", "survmlp", "survmli", "fitmlp", "fitmli")
  df <-  data.frame(df)
  
  fit <- lm(wue ~ ft + lat + GPC1 + GPC2 + GPC3 + GPC4 + GPC5 + GPC6 + GPC7 , data = df)
  summary(fit)

  df$wue
  lildf <- na.omit(data.frame(ft = df$FT16,
                   bio= df$bio17,
                   pal = mypalette))
  
  df$ft
  cor.test(lildf$ft, lildf$bio)

  
 ft_annprecip <-  ggplot(lildf, aes(x=bio, y=ft)) + geom_point(color=lildf$pal, alpha=0.7) +
    scale_color_gradient(low="white", high="black") +  #ylim(1.75,5) +
    geom_smooth(aes(x=bio, y=ft), method=glm , color="#b2182b", se=T ) + 
    labs(x="mean Annual precip. (mm)", y="Flowering time @ 16C") 
 ft_annprecip
 pdf(file="./figs/ft_annual_precip_plot.pdf", height = 4, width = 4)
 ft_annprecip
 dev.off()
 
 summary(lm(data=df_2029, FT16 ~ bio12 + latitude + GPC1 + GPC2 + GPC3 + GPC4 + GPC5 ))
  
  
  #dim(justphenos)
  df$survmli[df$survmli==-9] <- NA
  df$fitmli[df$fitmli==-9] <- NA
  
  normalize.<-function(x) (x-min(x,na.rm=T)) / (max(x,na.rm=T)-min(x,na.rm=T))
  
  #df$survmlp <- scale(unlist(df$survmlp))
  df$survmlp <- normalize.(df$survmlp)
  df$fitmlp <- normalize.(df$fitmlp)
  df$survmli <- normalize.(df$survmli)
  
  hist(df$survmlp)
  r2 <- cor.test(df$wue, df$ft, method = "pearson")
  r2
  hist(df$survmlp)
  lildf <- na.omit(data.frame(ft = df$ft,
                   wue = df$wue,
                   fit = df$fitmlp,
                   fit2 =  df$survmlp,
                   pal = mypalette))
  cor.test(lildf$ft, lildf$wue)
  
  range(lildf$fit2)
   breaks <- c(0, .25, .5, .75, 1)  # Define the breaks
 sizes <- c(1, 1.5, 2, 3.5, 5)  # Define the sizes corresponding to each break

 ft_wue_plot <-  ggplot(lildf, aes(x=ft, y=wue)) + geom_point(aes(size=fit2), color=lildf$pal, alpha=0.7) +
    scale_color_gradient(low="white", high="black") +  #ylim(1.75,5) +
    geom_smooth(aes(x=ft, y=wue), method=glm , color="#b2182b", se=T ) + 
    labs(x="Flowering time @ 16C", y="Delta_13C") +
    scale_size(limits = c(0.53, 0.82), range = c(2, 5))
 ft_wue_plot
 

 plot_grid(ft_wue_plot, ft_wue_plot2, ft_wue_plot3, ft_wue_plot4, labels = c("surv.mlp", "surv.mli", "fit.mlp", "fit.mli"))
 
 
  pdf(file="./figs/FtWUEscatterPlot.pdf", width = 4, height = 3)
  ft_wue_plot
  dev.off()

  colnames(df)
  df$meanMaxtemp <-  apply(df[,c(58:69)], MARGIN = 1, FUN = mean)
  head(df)
   fit <- lm(FT16 ~ meanMaxtemp + latitude + GPC1 + GPC2 + GPC3 + GPC4, data = df)
  summary(fit)
  r2 <- cor.test(x=df$meanMaxtemp, y=df$FT10)
  r2
  lildf <- na.omit(data.frame(ft = df$ft,
                   bio1 = df$bio17,
                   fit = df$fitmlp,
                   fit2 =  df$survmlp,
                   pal = mypalette))
   FT_temp_plot_color <-  ggplot(lildf, aes(y=ft, x=bio1)) + 
     geom_point(aes(size=fit2), color=lildf$pal, alpha=.7) +
    scale_color_gradient(low="white", high="black") +
     geom_smooth(aes(y=ft, x=bio1), method=glm , color="#b2182b", se=T ) +
     labs(y="Flowering time (days)", x="Precip. of Warmest Quarter") +
     xlim(50,250) + 
    scale_size(limits = c(0.53, 0.82), range = c(2, 5))
   FT_temp_plot_color
 
  pdf(file="./figs/wue-ft-climate-ft.pdf", width = 4.3, height = 6)
  plot_grid(ft_wue_plot, FT_temp_plot_color, ncol=1, nrow=2)
  dev.off()
  
  fit <- lm(FT16 ~ Athprecip + latitude + GPC1 + GPC2 + GPC3 + GPC4, data = df)
  summary(fit)
  colnames(df)
  colnames(df)
  df$Athprecip <- apply(df[,c(34:40)], MARGIN = 1, FUN = mean)
  r2 <- cor.test(x=df$Athprecip, y=df$FT16)
  
  r2
   
  lildf <- na.omit(data.frame(wue = df$wue,
                   bio1 = df$bio12,
                   pal = mypalette))
   wue_AthPrecip_plot_color <-  ggplot(lildf, aes(y=wue, x=bio1)) + geom_point(color=lildf$pal, cex=3, alpha=.7) +
    scale_color_gradient(low="white", high="black") +
     geom_smooth(aes(y=wue, x=bio1), method=glm , color="#b2182b", se=T ) +
     labs(y="Flowering time (days)", x="Mean Precip. Jan-Jul (mm)") 
   
   wue_AthPrecip_plot_color
   
 wue_pet_plot <-  ggplot(df) + geom_point(aes(y=wue, x=meanpet, color=lat), cex=2, alpha=1) +
    scale_color_gradient(low="white", high="black") +  xlim(1.75,5) +
  geom_smooth(aes(y=wue, x=meanpet), method=glm , color="#b2182b", se=T ) +
  labs(y="Delta_13C", x="Mean Evapotranspiration") 
  # +  annotate("text", y = -32, x =4.1, label = paste0("italic(p) <", 
  #                   format(lm$p.value, digits=2)), parse = TRUE, size=5) +
  # annotate("text", y = -32.5, x = 4.1, label = paste("italic(R^2) ==", 
  #                   format(lm$estimate, digits=2)), parse = TRUE, size=5, col="#b2182b") 
  #wue_pet_plot
  pdf(file="./figs/EvaptWUEscatterPlot.pdf")
  wue_pet_plot
  dev.off()
  dim(df)
  
  ## try something other than summer precip (biol17), trying biol 16 precip in wettest quarter
    head(df)
    colnames(df)
  df$lifetimePET <- apply(df[,c(65:69)], FUN = mean, MARGIN = 1)
  fit <- lm(ft ~ lifetimePET + lat + GPC1 + GPC2 + GPC3 + GPC4 + GPC5 + GPC6 + GPC7 , data = df) 
  summary(fit)
  r2 <- cor.test(x=df$lifetimePET, y=df$ft)
  lildf <- na.omit(data.frame(ft = df$ft,
                   meanpet = df$meanpet,
                   pal = mypalette))
  ft_meanpet_plot_color <- ggplot(data=lildf, aes(x=meanpet, y=ft)) + 
    geom_point(size=3, color=lildf$pal, alpha=0.7) +
    geom_smooth(aes(x=meanpet, y=ft), method=glm , color="#b2182b", se=T ) +
    labs(x="Evapotranspiration Jan - April (mm/day)", y="Flowering time @ 16C")
  ft_meanpet_plot_color
  
  ## try something other than summer precip (biol17); trying biol 12 annual precip
  head(df)
  fit <- lm(ft2 ~ bio12 + lat + GPC1 + GPC2 + GPC3 , data = df) 
  summary(fit)
  
  
  ## FT2 and precip
  r2 <- cor.test(x=df$bio12, y=df$ft2)
  r2
    lildf <- na.omit(data.frame(ft = df$ft2 ,
                   bio12 = df$bio12,
                   pal = mypalette))
  ft_bio12_plot_color <- ggplot(data=lildf, aes(y=ft, x=bio12)) + 
    geom_point(size=3, color=lildf$pal, alpha=0.7) +
    geom_smooth(aes(y=ft, x=bio12), method=glm , color="#b2182b", se=T) +
    labs(y="Flowering time @ 16C", x="Annual Precip.")
  ft_bio12_plot_color
  
  
    pdf(file="./figs/ft_bio12scatterPlot.pdf")
  ft_bio12_plot_color
  dev.off()
  

  ## summer precip or precip in driest quarter
  fit <- lm(ft ~ bio17 + lat + GPC1 + GPC2 + GPC3 + GPC4 + GPC5 + GPC6 + GPC7 , data = df)
  summary(fit)
  r2 <- cor.test(x=df$bio1, y=df$ft)
  lildf <- na.omit(data.frame(ft = df$ft,
                   bio17 = df$bio17,
                   pal = mypalette))
  ft_bio17_plot_color <- ggplot(data=lildf, aes(x=ft, y=bio17)) + 
    geom_point(size=3, color=lildf$pal, alpha=0.7) +
    geom_smooth(aes(x=ft, y=bio17), method=glm , color="#b2182b", se=T ) +
    labs(x="Flowering time @ 16C", y=" Precipitation of Driest Quarter")
  ft_bio17_plot_color
  
  ft_bio17_plot_grey <-  ggplot(df) + geom_point(aes(x=ft, y=bio17, color=lat), cex=2, alpha=.7) +
    scale_color_gradient(low="white", high="black") +  #xlim(1.75,5) +
  geom_smooth(aes(x=ft, y=bio17), method=glm , color="#b2182b", se=T ) +
  labs(x="Flowering time @ 16C", y=" Precipitation of Driest Quarter") 
  #ft_bio17_plot
  pdf(file="./figs/ft_bio17scatterPlot.pdf")
  ft_bio17_plot
  dev.off()
  
    
  fit <- lm(dorm ~ prec12+ lat + GPC1 + GPC2 + GPC3 + GPC4 + GPC5 + GPC6 + GPC7 , data = df)
  summary(fit)
  r2 <- cor.test(x=df$prec12, y=df$dorm)
  dorm_prec12_plot <-  ggplot(df) + geom_point(aes(x=prec12, y=dorm, color=lat), cex=2, alpha=.7) +
    scale_color_gradient(low="white", high="black") +  xlim(0, 110) +
  geom_smooth(aes(x=prec12, y=dorm), method=glm , color="#b2182b", se=T ) +
  labs(x="Precipitation in December", y="Primary Dormancy") 
  dorm_prec12_plot
  pdf(file="./figs/dorm_prec12scatterPlot.pdf")
  dorm_prec12_plot
  dev.off()
  
  bigplot<- plot_grid(ft_wue_plot, FT_temp_plot_color, wue_AthPrecip_plot_color, ncol=1, nrow=3)
  pdf(file="./figs/Fig1B-D_color.pdf", width = 4, height = 9)
  bigplot
  dev.off()
  
  
  bigplot_color <-  plot_grid(ft_wue_plot, ft_meanpet_plot_color, wue_AthPrecip_plot_color, ncol=1, nrow=3)
  saveRDS(bigplot_color, file="./figs/tmpobjects/ClimatAssociationswTraitsPlots_color.rda")
  
  saveRDS(bigplot, file="./figs/tmpobjects/ClimatAssociationswTraitsPlots.rda")
  pdf(file="./figs/Fig1B-D_color.pdf")
  bigplot_color
  dev.off()
  
  
  ## side look at how climate correlates with fecundity
  ## side look at how climate correlates with flowering time variance
  ## load df from above
  p <- read.table(file = './data/atlas1001_phenotypes_matrix.csv', sep=",", header = T)
  p<- p[idex515,]
  fitness <- colnames(p)[grep("Fitness", colnames(p))][1:8]
  survival <- colnames(p)[grep("Survival", colnames(p))][1:8]
  seeds <- colnames(p)[grep("rSeeds", colnames(p))][1:8]
  all_fitness <- pheno[idex515,c(fitness, survival, seeds)]
  dim(all_fitness)
  FTvariance_df <- readRDS(file="./data/FTvariance_df.rda")
  
  df2 <- data.frame(id= dimp$id, ft=dimp$FT16, wue=dimp$Delta_13C, dorm=dimp$X94_Seed_Dormancy, lat=dimp$latitude, long=dimp$longitude, clim)
  dim(df2)
  
  df2 <- df2[index515_2029, ]
  df2 <- cbind(df2[,2:6], FTvariance_df$FT_variance, all_fitness, df2[,7:ncol(df2)])
  head(df2)
  df_cor2 <- cor(df2, use = "pairwise.complete.obs", method = "pearson")
  ## look at ft variance
  df_cor2[order(df_cor2[,6]), 1:6]
  head(df_cor2)
  
  ## look at rSeeds_mli
  df_cor2[order(df_cor2[,"rSeeds_mli"]), 25:30]
  
  cor.test(df2$`FTvariance_df$FT_variance`, df2$dorm) ## positively correlated
  plot(df2$`FTvariance_df$FT_variance`, df2$dorm)
  cor.test(df2$`FTvariance_df$FT_variance`, df2$ft) ## negatively correlated
  plot(df2$`FTvariance_df$FT_variance`, df2$ft)
  
   cor.test(df2$rSeeds_mli, df2$dorm)
   cor.test(df2$rSeeds_mli, df2$wue)
   plot(df2$rSeeds_mli, df2$wue)
  
  ## also look at how PCs correlate with climate data
  pcaTarget <- readRDS(file="./data/pcaTarget.rda")
  pcs_withid <- cbind(id, pcaTarget$x)
  
  head(pcaTarget)
  head(pcs_withid)
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  pheno <- read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  whichfield<-which(pheno$id %in% idsfield)
  idex515_2029<-whichfield
  # saveRDS(idex515, file="./data/idex515_2029.rda")
  # 
  # ##  some specific correlation tests
  head(df)
  df_withpcs <- merge(df, by.x="id", pcs_withid, by.y="id")
  head(df_withpcs)
  dorm_lat_cor <- cor.test(df$dorm, df$lat)
  # 
  pcs_clim <- data.frame(pc1 = pcaTarget$x[,1], pc2 = pcaTarget$x[,2],lat=dimp$latitude[idex515], long=dimp$longitude[idex515],   clim[idex515, ])
  pcs_clim_cor <- cor(pcs_clim, use = "pairwise.complete.obs", method = "pearson")
  # pcs_clim_cor[order(pcs_clim_cor[,1]), 1:2]
  # pc2 + cor with bio17, negative cor bio15
  
  cor.test(y=df_withpcs$PC1,x= df_withpcs$lat, method = "pearson")

}else{
  # bigplot <- readRDS(file="./figs/tmpobjects/ClimatAssociationswTraitsPlots.rda")
  # bigplot
  bigplot_color <- readRDS(file="./figs/tmpobjects/ClimatAssociationswTraitsPlots_color.rda")
  bigplot_color
}
```

Many of the phenotypic relationships are correlated with climate data, for instance flowering time is highly related to precipitation-related climate variables in that as precipitation decreases, flowering time increases (Table S2) (Pearson’s R = -0.14, p-value < 8e-6) (Fig. 1C). Even when correcting for population structure and latitude, we find this climate variable a significant predictor of flowering time (p-value < 1e-3). Alternatively,  delta_C13 is negatively correlated with temperature-related climatic variables where lower average temperatures correspond to high delta_C13 (Table S2), also highly related to low estimates of evapotranspiration (Pearson’s R = -0.17, p-value < 3e-3) (Fig. 1D). However, mean evapotranspiration does not remain a significant predictor of delta_C13 when correcting for population structure and latitude (p-value < 0.22).

### PC-Climate models

Using randomForest we constructed two climate models to predict the phenotypic landscape of PC1 and PC2 across the entire range of Arabidopsis. To build the models, we used climatic data (temperature, precipitation, and bioclimatic variables; 55 total) associated with the localities of the focal 515 accessions to predict first PC 1, and then the same climatic data to predict PC 2. These models used 1000 decision trees With these models, we then predicted the values of PC 1 and PC 2 for all of Eurasia between longitude -15 and 90, and latitude 34 and 65. After normalizing the predictions, we overlaid the two models to display the same color scheme that appears in Figure 1A.

```{r, echo=F, eval=T,message=F, warning=F, fig.cap="Figure 1E (main text). Projection of the expected strategy in PCA of (Figure 1A) based on a Random Forest niche model of the PC space by local climate in 515 A. thaliana accessions, scale is 100 km.", fig.height=4, fig.width=7}
## This chunk make Figure 1E
RERUN=F 
if(RERUN){
  library(raster)
  library(rgdal)
  library(missForest)
  library(randomForest)
  library(tidyverse)
  library(cowplot)
  theme_set(theme_cowplot())
  
  ## load PCs
  pcaTarget <- readRDS(file="./data/pcaTarget.rda") 
  df <- data.frame(pc1= pcaTarget$x[,1],
                   pc2 = -pcaTarget$x[,2],
                   pc3 = pcaTarget$x[,3])
  
  dimp<-read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  gimp <- read.csv(file="./data/atlas_phenotype_matrix_imputedwithpcs.csv")
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  gimp <- gimp[dimp$id %in% idsfield,1:7]
  dimp <- dimp[dimp$id %in% idsfield,]
  dimp <- dimp[,c("id", "name", "latitude", "longitude", "FT16", "Delta_13C")]
  #dim(dimp)
  
  projectionCRS <- CRS("+proj=laea +lon_0=0.001 +lat_0=89.999 +ellps=sphere")
  Locations<-data.frame(longitude=dimp$longitude, latitude=dimp$latitude) # just moving the lat long data to new variable
  coordinates(Locations) = c("longitude", "latitude")  #housekeeping lines to project coordinates
  proj4string(Locations) <- CRS("+proj=longlat +ellps=WGS84")
  sPointsDF <- spTransform(Locations, CRS=projectionCRS)
  euroclim<-stack( './data/euroclim.grd')

  menv<-raster::extract(euroclim,Locations) ## we officially extract the climate points from our rasterstack using our new locations
  climatenames<-names(euroclim)
  menv<-data.frame(menv)
  menv<-na.roughfix(menv)
  
  dimp<- cbind(dimp, menv)
  #dimp[1:5, 1:10]
  
  # ## this is the order the pcs are in
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  pheno <- pheno[idex515, 1:2]
  head(pheno)
   ## need to make sure climate data is in same order as PCs
  dat <- merge(dimp, by.x="id", pheno, by.y="id" )
  dat[1:5,1:10]
  colnames(dat)
  dat <- dat[,7:61]
  ncol(dat)
  # head(dat)
  
  ## PC1 modl
  # ref_table <- data.frame(pc1=pcaTarget$x[,1], dat, gimp)
  # p<-glm(pc1~., data=ref_table)
  # summary(p)
  ref_table <- data.frame(pc1=pcaTarget$x[,1], dat)
  head(ref_table)
  mod1<- randomForest(data= ref_table, pc1 ~ ., ntree=1000, importance=T)
  nRMSE<- sqrt(mod1$mse[1000])/ (max(ref_table$pc1, na.rm=T) - min(ref_table$pc1, na.rm=T))
  tests <- c()
  
  while(length(tests)<101){
      trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.66))
        train<-ref_table[trainsample,]
        test<-ref_table[-trainsample,]
        cv1<-randomForest(data=train, pc1 ~ .)  
        c<- cor.test(predict(cv1,test),test[,'pc1']) ## not bad 0.73
        tests <- c(tests, c$estimate)
  }
  
  mean(tests)
  hpd_interval <- HPDinterval(as.mcmc(tests), prob = 0.95)

  IQR(tests)
  
         trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.66))
        train<-ref_table[trainsample,]
        test<-ref_table[-trainsample,]
        cv1<-randomForest(data=train, pc1 ~ .)  
        cor.test(predict(cv1,test),test[,'pc1']) ## not bad 0.73
  pre1<-raster::predict(euroclim,mod1)
  
  pc1raw<-pre1
  writeRaster(pc1raw, filename='./data/pc1raw.tiff', format="GTiff")

  ## PC2
  ref_table <- data.frame(pc2=pcaTarget$x[,2], dat)
  mod2<- randomForest(data= ref_table, pc2 ~ ., ntree=1000, importance=T)
  nRMSE<- sqrt(mod2$mse[1000])/ (max(ref_table$pc2, na.rm=T) - min(ref_table$pc2, na.rm=T))
  tests <- c()
  
  while(length(tests)<101){
      trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.66))
        train<-ref_table[trainsample,]
        test<-ref_table[-trainsample,]
        cv1<-randomForest(data=train, pc2 ~ .)  
        c<- cor.test(predict(cv1,test),test[,'pc2']) ## not bad 0.73
        tests <- c(tests, c$estimate)
  }
  
  mean(tests)
  hpd_interval <- HPDinterval(as.mcmc(tests), prob = 0.95)

  
    trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.66))
    train<-ref_table[trainsample,]
    test<-ref_table[-trainsample,]
    cv1<-randomForest(data=train, pc2 ~ .)  
    cor.test(predict(cv1,test),test[,'pc2']) ## not as good, 0.628
  pre2<-raster::predict(euroclim,mod2)
  pc2raw<-pre2
  writeRaster(pc2raw, filename='./data/pc2raw.tiff', format="GTiff")
  
  ## normalize predictions
  pc1pred<- (pre1 -min(values(pre1),na.rm=T) ) / (max(values(pre1),na.rm=T)- min(values(pre1),na.rm=T) )
  pc2pred<- (pre2 -min(values(pre2),na.rm=T) ) / (max(values(pre2),na.rm=T)- min(values(pre2),na.rm=T) )
  normalize.<-function(x) (x-min(x,na.rm=T)) / (max(x,na.rm=T)-min(x,na.rm=T))
  mypalette<-rgb(
    1-normalize.(df$pc1),
    normalize.(df$pc2),
    normalize.(df$pc1)
  )
  preds<-stack(
    1-pc1pred,
    pc2pred, # careful with the position
    pc1pred  # careful!
  )
  saveRDS(mypalette, file="./data/mypalette.rda")
  saveRDS(preds, file="./data/preds.rda")
  
  pdf('./figs/PCclimateProjMap.pdf',width = 12,height = 12)
  plotRGB(preds,r=1,g=2, b=3,scale=T) + 
    points(y=dimp$latitude, x= dimp$longitude,pch=4, cex=1, col=moiR::transparent("gray20",0.7)) +
    scalebar(1000, xy = c(73,34.5), lonlat = T, type='bar', divs=4, cex=.5)
  dev.off()
  
}else{
  library(raster)
  dimp<-read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  dimp <- dimp[dimp$id %in% idsfield,]
  mypalette <- readRDS(file="./data/mypalette.rda")
  preds <- readRDS(file="./data/preds.rda")
  plotRGB(preds,r=1,g=2, b=3,scale=T) + points(y=dimp$latitude, x= dimp$longitude,pch=4, cex=1, col=moiR::transparent("gray10",0.7))+
    scalebar(1000, xy = c(73,34.5), lonlat = T, type='bar', divs=4, cex=.5)
}
```


```{r, echo=F, eval=F,message=F, warning=F}
### Phenotype-Climate models 
RERUN=F 
if(RERUN){
  library(raster)
  library(rgdal)
  library(tidyverse)
  library(cowplot)
  theme_set(theme_cowplot())
  
  dimp<-read.csv(file = './data/atlas_phenotype_matrix_withid.csv')
  dimp <- dimp[,c("id", "name", "latitude", "longitude", "FT16", "Delta_13C", "X94_Seed_Dormancy" )]
  
  ## climate
  clim <- read.table(file="./climate/2029gclimate.csv", sep=",",header = T)
  
  ## Flowering Time
dat<-dimp
dim(dat)
#ref_table <- data.frame(FT=df$FT, dat)
ref_table <- data.frame(FT=dat$FT16, dat[,c(7:61)])
ref_table<- na.omit(ref_table)
modFT<- randomForest(data= ref_table, FT ~ ., ntree=1000, importance=T)
  trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.8))
  train<-ref_table[trainsample,]
  test<-ref_table[-trainsample,]
  cv1<-randomForest(data=train, FT ~ .)  
  cor(predict(cv1,test),test[,'FT']) ## not bad 0.7109
preFT<-raster::predict(euroclim,modFT)
preFTraw<-preFT
varImpPlot(modFT)
#writeRaster(pc1raw, filename='../data/pc1raw.tiff', format="GTiff")

## Germination
ref_table <- data.frame(germ=df$Germ, dat)
modGerm<- randomForest(data= ref_table, germ ~ ., ntree=500)
  trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.8))
  train<-ref_table[trainsample,]
  test<-ref_table[-trainsample,]
  cv1<-randomForest(data=train, germ ~ .)  
  cor(predict(cv1,test),test[,'germ']) ## pretty good 0.647
preGerm<-raster::predict(euroclim,modGerm)
Germraw<-preGerm
#writeRaster(pc2raw, filename='../data/pc2raw.tiff', format="GTiff")

## Delta_13C
#ref_table <- data.frame(wue=df$WUE, dat)
ref_table <- data.frame(wue=dimp$Delta_13C, dat[,c(7:61)])
ref_table<- na.omit(ref_table)
dim(ref_table)
modWUE<- randomForest(data= ref_table, wue ~ ., ntree=1000, importance=T)
  trainsample<-sample(1:nrow(ref_table),round(nrow(ref_table)*0.8))
  train<-ref_table[trainsample,]
  test<-ref_table[-trainsample,]
  cv1<-randomForest(data=train, wue ~ .)  
  cor(predict(cv1,test),test[,'wue']) ## pretty good 0.42
preWUE<-raster::predict(euroclim,modWUE)
WUEraw<-preWUE


## function to plot variable importance
ImportancePlot <- function(df) {
  increMSE<- ggplot(df, aes(x=reorder(var, mse), y=mse)) +geom_col(aes(fill=type))+coord_flip() +
    scale_fill_manual(values=c("#4393c3", "#d6604d"))+
    labs(y="Increase in MSE", x="")+
    theme(legend.position="none")+
    theme(axis.text = element_text(size = 8)) +
     theme(axis.title = element_text(size = 11))
  nodepurity<- ggplot(df, aes(x=reorder(var, purity), y=purity)) +geom_col(aes(fill=type))+coord_flip() +
    scale_fill_manual(values=c("#4393c3", "#d6604d"))+
    labs(y="Increase in Node Purity", x="")+
    theme(legend.position="none")+
    theme(axis.text = element_text(size = 8))   +
    theme(axis.title = element_text(size = 11))
  plot_grid(increMSE, nodepurity)

}


summary(mod1)
## id all climatic variables as prec or temp
type = c(rep("temp", 11), rep("prec", 20), rep("temp", 23) )

ImportancePlot(df)
## PC 1
df_mod1 <- data.frame(var=rownames(mod1$importance), purity=mod1$importance[,2], 
                 mse=mod1$importance[,1],
                 type = type)
ImportancePlot(df_mod1)

pdf(file = "./figs/VarImpPlot_RF_PC1.pdf")
ImportancePlot(df_mod1)
dev.off()

## PC 2
df_mod2 <- data.frame(var=rownames(mod2$importance), purity=mod2$importance[,2], 
                 mse=mod2$importance[,1],
                 type = type)
ImportancePlot(df_mod2)

pdf(file = "./figs/VarImpPlot_RF_PC2.pdf")
ImportancePlot(df_mod2)
dev.off()

FTpred <- (preFT -min(values(preFT),na.rm=T) ) / (max(values(preFT),na.rm=T)- min(values(preFT),na.rm=T) )
Germpred <- (preGerm -min(values(preGerm),na.rm=T) ) / (max(values(preGerm),na.rm=T)- min(values(preGerm),na.rm=T) )
WUEpred <- (preWUE -min(values(preWUE),na.rm=T) ) / (max(values(preWUE),na.rm=T)- min(values(preWUE),na.rm=T) )

raster::plot(FTpred, col = c('#9e0142','#d53e4f','#f46d43','#fdae61','#fee08b','#ffffbf','#e6f598','#abdda4','#66c2a5','#3288bd','#5e4fa2')) 
raster::plot(WUEpred, col = c('#9e0142','#d53e4f','#f46d43','#fdae61','#fee08b','#ffffbf','#e6f598','#abdda4','#66c2a5','#3288bd','#5e4fa2')) 
  
}else{
  
}
```

################################################################################
# II. Natural selection conflict between drought escape and avoidance 
################################################################################

################################################################################
## II.1 Background 
################################################################################

### Common Garden Experiments (Expostio-Alonso, __et al.__ 2019)
We heavily relied on __A. thaliana__ common garden field experiments in wet and dry rainfall environments (Exposito-Alonso, et al. 2019) to measure selection on all of these phenotypes. In these experiments 515 A. thaliana ecotypes were used in two contrasting local environments, one in Madrid, Spain and the other in Tubingen, Germany. In both of these environments, there were also two rainfall treatments, high and low rainfall, and two planting treatments, high and low density. Importantly, flowering time was measured for all individual and populations of accessions, along with three measures of fitness; viability, fecundity, and lifetime fitness. Viability was measured as survival proportion among replicates, fecundity as the total number of offspring - estimated by silique count and a fixed ratio of seeds per silique. Lastly, lifetime fitness is the number of offspring, including 0s for no offspring, by the mean of the population. 

```{r, echo=F, eval=T,message=F, warning=F, fig.cap="Figure 2A,B,C (main text). Mean-centered flowering time related to normalized lifetime fitness, proportion of survival replicates (Survival), and number of seeds (Fecundity); data from Exposito-Alonso et al. 2019", fig.height=4, fig.width=9}
library(moiR)
RERUN=F 
if(RERUN){
  normalize.<-function(x) (x-min(x,na.rm=T)) / (max(x,na.rm=T)-min(x,na.rm=T))
  load("./data/d4.rda")
  head(d4)
  
  # fitness <- matrix(matrix(NA,    # Create empty data frame
  #                         nrow = 517,
  #                         ncol = 1))
  # for (k in unique(d4$code)){
  #   print(k)
  #   tmp <- d4 %>% filter(code==k) %>% 
  #     mutate(normFit = normalize.(Fitness)) %>% 
  #     mutate(normSurv = normalize.(Survival_fruit)) 
  #     
  #   name1 <- paste0("id_", k)
  #   name2 <- paste0("fitness_", k)
  #   name3 <- paste0('Survival_', k)
  #   name4 <- paste0('Seeds_', k)
  #   fitness_d4 <- data.frame(name1 = tmp$id,
  #                            name2 = tmp$normFit,
  #                            name3 = tmp$normSurv,
  #                            name4 = tmp$Seeds)
  #   colnames(fitness_d4) <- c(name1, name2, name3, name4)
  #   if (nrow(fitness_d4) != 517){
  #     fitness_d4 <- rbind(fitness_d4, rep(NA, ncol(fitness_d4)))
  #   }
  #   fitness <- cbind(fitness, fitness_d4)
  # }
  # head(fitness)
  # 
  
  # dat <- d4 %>% filter(code=="mlp"|code=="mli"|code=="tlp"|code=="tli")
  # dat <- dat %>% group_by(code) %>%
  #   mutate(mFT=Flowering_time-mean(na.omit(Flowering_time))) %>%
  #   ungroup
  # dat <- dat %>% group_by(code) %>%
  #   mutate(normFit = normalize.(Fitness)) %>%
  #   mutate(normSurv = normalize.(Survival_flowering))
  # head(dat)
  
  #dat_mli <- d4 %>% filter(code=="mlp"|code=="mli")
  dat_mli <- d4 %>% filter(code=="mli")
  dat_mli <- dat_mli %>% mutate(mFT=Flowering_time-mean(na.omit(Flowering_time)))
  dat_mli <- dat_mli %>% group_by(code) %>% 
    mutate(normFit = normalize.(Fitness)) %>%
    mutate(normSurv = normalize.(Survival_fruit))  %>% 
    ungroup
  
  dat_mlp <- d4 %>% filter(code=="mlp")
  dat_mlp <- dat_mlp %>% mutate(mFT=Flowering_time-mean(na.omit(Flowering_time)))
  dat_mlp <- dat_mlp %>% group_by(code) %>% 
    mutate(normFit = normalize.(Fitness)) %>%
    mutate(normSurv = normalize.(Survival_fruit))  %>% 
    ungroup
  

  head(dat_mli)
  range(dat_mli$normFit)
  
  p<- dat_mli %>% filter(code=="mlp")
  range(p$normFit)
  # cov( dat_mli$normFit,dat_mli$mFT, use="pairwise.complete.obs")
  # fit <- lm(dat_mli$normFit~ dat_mli$mFT)
  # hist(dat_mli$Fitness)
  
  #dat_tli <- d4 %>% filter(code=="thp"|code=="thi")
  dat_thi <- d4 %>% filter(code=="thi")
  dat_thi <- dat_thi %>% mutate(mFT=Flowering_time-mean(na.omit(Flowering_time)))
  dat_thi <- dat_thi %>% #group_by(code) %>% 
    mutate(normFit = normalize.(Fitness)) %>%
    mutate(normSurv = normalize.(Survival_fruit))
  #%>% 
    ungroup
  head(dat_thi)
  summary(dat_thi$normFit)
  
  dat_thp <- d4 %>% filter(code=="thp")
  dat_thp <- dat_thp %>% mutate(mFT=Flowering_time-mean(na.omit(Flowering_time)))
  dat_thp <- dat_thp %>% group_by(code) %>% 
    mutate(normFit = normalize.(Fitness)) %>%
    mutate(normSurv = normalize.(Survival_fruit)) %>% 
    ungroup
  head(dat_thp)
  summary(dat_thp$normFit)

  # dat_tli$normFit <- normalize.(dat_tli$Fitness)
  # dat_tli$normSurv <- dat_tli$Survival_fruit
  # dat_tli$normFruit <- dat_tli$Seeds
  
  # fit <- lm(dat_tli$normFit~ dat_tli$mFT)
  #  lm(dat_mli$normSurv ~ dat_mli$mFT)
  #  cov(dat_tli$mFT, dat_tli$normFit, use="pairwise.complete.obs")
  #  cov(dat_mli$mFT, dat_mli$normFit, use="pairwise.complete.obs")
  # hist(dat_tli$Fitness)
  
  dat <- rbind(dat_mli, dat_mlp, dat_thi, dat_thp)
  head(dat)
  
  fit <- lm(na.omit(dat_mlp), method="lm",formula = normFit ~ poly(mFT,1))
  summary(fit)
  
  pdf(file = 'figs/Floweingtime_Fit_TubMad_polynomial.pdf')
  pfit <- ggplot(dat) + geom_point(aes(x=mFT, y=normFit, color=code),size=1, alpha=0.4) +
     geom_smooth(method="glm",formula = y ~ x, aes(x=mFT, y=normFit, color=code), se = F) +xlim(-40,25) + 
    scale_color_manual(values = c("#d6604d","#b2182b", "#4393c3", "#2166ac")) +
    ylab("lifetime fitness normalized") + xlab("Flowering time (mean centered)")  +
  theme(legend.position="none")
  pfit
  dev.off()
  
  # library(plyr)
  # models <- dlply(dat, "code", function(df) 
  #   lm(normSurv ~ mFT, data=df))
  # l_ply(models, summary, .print = TRUE)
  # 
  
  l_dat <- dat %>%  filter(code=="mli")
  unique(l_dat$site)
  
  cov(x = scale(l_dat$mFT), y=scale(l_dat$Fitness), method="pearson",
      use = "pairwise.complete.obs")
  cov(x = scale(l_dat$mFT), y=scale(l_dat$Survival_fruit), method="pearson",
      use = "pairwise.complete.obs")
  cov(x = scale(l_dat$mFT), y=scale(l_dat$Seeds), method="pearson",
      use = "pairwise.complete.obs")
  
  l_dat <- dat %>%  filter(code=="thi")
  range(l_dat$normFit)
  cov(x = scale(l_dat$Flowering_time), y=l_dat$normFit, method="pearson",
      use = "pairwise.complete.obs")
  cov(x = scale(l_dat$mFT), y=scale(l_dat$Survival_fruit), method="pearson",
      use = "pairwise.complete.obs")
  cov(x = scale(l_dat$mFT), y=scale(l_dat$Seeds), method="pearson",
      use = "pairwise.complete.obs")

  # fit <- lm(data = thi_dat, Seeds ~ mFT)
  # summary(fit)
  
  pdf(file = 'figs/Floweingtime_Surv_TubMad.pdf')
  psurv <- ggplot(dat) + geom_point(aes(x=mFT, y=normSurv, color=code),size=1 , alpha=0.6) +geom_smooth(method="lm", aes(x=mFT, y=normSurv,   color=code), se = F) +xlim(-40,25) + 
    scale_color_manual(values = c("#d6604d","#b2182b", "#4393c3", "#2166ac")) +
    ylab("Prop. of Surviving replicates") + xlab("Flowering time (mean centered)")  +
  theme(legend.position="none")
  psurv
  dev.off()

  pdf(file = './figs/Floweingtime_Fruit_TubMad.pdf')
  pfruit <- ggplot(dat) + geom_point(aes(x=mFT, y=Seeds, color=code), size=1, alpha=0.6) +geom_smooth(method="lm", aes(x=mFT, y=Seeds,   color=code), se = F) +xlim(-40,25) + 
    scale_color_manual(values = c("#d6604d","#b2182b", "#4393c3", "#2166ac")) +
    ylab("Seeds set (# of seeds)") + xlab("Flowering time (mean centered)")  +
  theme(legend.position="none")
  pfruit
  dev.off()
  
  pdf(file="./figs/survival_seeds_scatterplot.pdf", width=3.5, height=6)
  plot_grid(psurv, pfruit, ncol=1, nrow=2)
  dev.off()
  
  saveRDS(pfit, file="figs/Floweingtime_Fit_TubMad.rda")
  saveRDS(psurv, file="figs/Floweingtime_Surv_TubMad.rda")
  saveRDS(pfruit, file="figs/Floweingtime_Fruit_TubMad.rda")

  # pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  # t <- data.frame(fit = pheno$rSurvival_fruit_tlp, ft =pheno$FT_thp, code=rep("tlp", length(pheno$rSurvival_fruit_tlp)))
  # m <- data.frame(fit = pheno$rSurvival_fruit_mlp, ft =pheno$FT_mlp, code=rep("mlp", length(pheno$rSurvival_fruit_mlp)))
  # tm <- rbind(t, m)  
  # tm$normFT <- tm$ft - mean(tm$ft)
  # ggplot(tm) + geom_point(aes(x=normFT, y=fit, color=code), alpha=0.6) +geom_smooth(method="lm", aes(x=normFT, y=fit,   color=code), se = F)+
  #    scale_color_manual(values = c("#b2182b", "#2166ac")) +
  #   ylab("Normalized lifetime fitness") + xlab("Flowering time centered (days)")
  # 
  
  head(dat)
  
  all_fits <- matrix(NA, nrow=12, ncol=3)
  all_fits[,1] <- rep(unique(dat$code), each=3)
  all_fits[,2] <- rep(c("Fitness", "Survival_fruit", "Seeds"), 4)
  ps <- c()
  for (exp in unique(dat$code)){
    tmp_dat <- dat %>%  filter(code==exp)
    for (fit in c("Fitness", "Survival_fruit", "Seeds")){
      fit <- lm(unlist(tmp_dat[,fit]) ~ tmp_dat$mFT, na.action=na.exclude)
      #attributes(summary(fit))
      #dim(summary(fit)$coefficients)
      ps <- c(ps, summary(fit)$coefficients[2,4])
    }
  }
  all_fits[,3] <- ps
  all_fits
  
  fit <- lm(unlist(tmp_dat[,"Seeds"]) ~ tmp_dat$mFT, na.action=na.exclude)
  cor.test(unlist(tmp_dat[,"Seeds"]), tmp_dat$mFT)
  summary(fit)
  summary(fit)$coefficients[2,4]
  exp<-"mli"
  
}else{
  pfit <- readRDS(file="./figs/Floweingtime_Fit_TubMad.rda")
  psurv <- readRDS(file="./figs/Floweingtime_Surv_TubMad.rda")
  pfruit <- readRDS(file="./figs/Floweingtime_Fruit_TubMad.rda")
  
  plot_grid( psurv, pfruit, pfit, nrow=1)
}
```


### Flowering time in A. thaliana; correlations across multiple common garden and greenhouse studies

In the database of 1862 phenotypes, flowering time is measured many times in different studies and various environments. We know from previous research on flowering time, that this trait is fairly heritible. In our work, we get estimates of heritability around 0.90. This suggests that flowering time, or at least relative flowering time, should be fairly consistent for accessions across different experiemnts, and potentially different environments as well. Thus we measured how correlated flowering time data is across the various studies that have flowering time data. Additionally, we computed the variance in flowering time (FT_variance) for each accession and also correlated this variable with the other measures of flowering time (__Fig. SII.1__). 

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="Figure SII.1 Correlation of different flowering time measures across various studies.", fig.height=7, fig.width=7}
RERUN=F 
if(RERUN){
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)

  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  head(d4)
  dim(d4)
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  
  
  # pheno <- read.table(file="./data/atlas1001_phenotypes_matrix.csv", header = T, sep = ",")
  # head(pheno[,1:5])
  #head(raw_pheno)

  colnames(pheno)[grep("F", colnames(pheno))]
  ## Start with target trait categories, go through each category and evaluate best phenotypes to use
  ## Flowering time
  FT_phenos<-c("Flowering_time","FT16","FT10", "FT", "FT.1", "FT.2", "X31_FT10", "X30_FT16" )
  moiFT <- colnames(pheno)[grep("FT_", colnames(pheno))][1:8]
  #dtf <- colnames(pheno)[grep("DTF", colnames(pheno))]
  # ft_atwell_phenos <- c("X32_FT_Duration_GH", 
  #                       "X52_LC_Duration_GH", "X29_FLC",  "X69_FRI", "X58_LFS_GH","X86_FT_Field",  "X49_FT_GH", "X31_FT10", "X30_FT16",   "X102_FT22",  "X89_LD",  "X34_LDV", "X47_SD", "X104_SDV", "X37_0W", "X11_2W", "X100_4W",  "X48_8W",  "X99_8W_GH_FT",  "X53_0W_GH_FT")
  # head(ft_atwell_phenos)
  allflowering <- c(FT_phenos, moiFT)
  length(allflowering)
  ftnames<- c("FT_Manzano-Piedras_2014", "FT16_1001GC_2016", "FT10_1001GC_2016", "FT_Vidigal_2016", "FT_Kooke_2016", "FT_Aranzana_2005","FT10_Atwell_2010", "FT16_Atwell_2010", moiFT)
  
  FTpheno<- pheno[,colnames(pheno) %in% allflowering]
  colnames(FTpheno) <- ftnames
  FT_df <- data.frame(id=pheno$id, FTpheno)
  dim(FT_df)
    
  write.table(FT_df, file="./tables/FTdata_imputed.csv", sep = ",", quote = F, row.names = F,
              col.names = T)
  #sum(colnames(pheno) %in% allflowering)
  
  FTpheno <- apply(FTpheno,2,fn)
  # scale all traits by mean and variance
  FTpheno <- apply(FTpheno,2,scale)
  head(FTpheno)
  ## add in variance in flowering time from all of these measures
  FTvar <- apply(FTpheno, MARGIN = 1, FUN = var)
  FTvariance_df <- data.frame(id =pheno[,1], FT_variance = FTvar, FTpheno)
 head(FTvariance_df)
  
  write.table(FTvariance_df, file="./tables/FTdata&variance_imputed.csv", sep = ",", quote = F, row.names = F,
              col.names = T)
  saveRDS(FTvariance_df, file="./data/FTvariance_df.rda")
  
  FTcor<-cor(FTvariance_df[,-1])
  corrplot(FTcor, method = "color", type = "lower", diag = F,tl.cex = .75)
  saveRDS(FTcor, file="./data/FTcor.rda")
  
  pdf(file="./figs/FTcorplot.pdf")
  corrplot(FTcor, method = "color", type = "lower", diag = F,tl.cex = .75)
  dev.off()
    
}else{
  FTcor <- readRDS(file="./data/FTcor.rda")
  corrplot(FTcor, method = "color", type = "lower", diag = F,tl.cex = .75, addCoef.col = "black", number.cex = 0.75)
}

```

################################################################################
## II.2 Total Phenotypic Selection
################################################################################

In addition to measuring selection on flowering time (Fig. 2A,B,C), we also measured the total selective effect on all non-fitness phenotypes in this study (1821). In order to compare the measures of selection, all fitness measures and phenotypes were scaled prior to calculating their covariance. The fitness data was measured in two different environments, Madrid, Spain and Tubingen, Germany, in two rainfall treatments at each location, high and low rain, as well as two planting densities, high and low. This results in a combination of eight different treatments, and three fitness measures for each treatment; survival, seeds set, lifetime fitness. All were used to detect an association with selection for all non-fitness phenotypes.

To compute phenotypic selection coefficients s (Lande and Arnold 1983), we use covariation between relative lifetime fitness w (number of offspring including 0s by the mean of the population) and a given phenotype z (mean and variance centered):  w = sX where  s is the total selection coefficient. We computed s with 100 bootstrap replicates for each phenotype and noted those with a significant association with selection as those estimates with 95% CI that do not overlap with 0.

```{r, echo=F, eval=F, message=F, warning=F}
RERUN=F 
if(RERUN){
  ## this chunck can also be run independently from natvar/analyses/runSelection.R
  source("~/safedata/natvar/analyses/phenoselection_multi_FUNCTIONS-copy.R")
  ## load datasets
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  atlasstrategies %>% filter(phenotype=="FT")
  pheno <- read.table(file = './data/atlas1001_phenotypes_matrix.csv', sep=",", header = T)
  pheno[1:5,1:5]
  load("./data/phenotypenames.rda")
  
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$V1 %in% idsfield)
  idex515<-whichfield
  
  ## removed duplicated phenotype from Martinez-berdeja and Kalladan
  pheno <- pheno[,c(1:1706, 1726:1883)] 
  pheno<- as.data.frame(pheno)
  pheno[1:5,1:5]
  phenotypenames <- phenotypenames[c(1:1705,1725:1882 ), ]
  p<- pheno[idex515,-1]
  colnames(p) <- paste(phenotypenames[,2])
  p<- cbind(id=pheno[idex515,1], p)
  p[1:5,1:5]
  
  # p_FT <- data.frame(id=p$id, FT_mli=p$FT_mli)
  # dat_FT <- data.frame(id=l_dat$id, FT_mli=l_dat$Flowering_time)
  # combine_FT <- merge(p_FT, dat_FT, by="id")
  # combine_FT
  # cbind(p$id, p$FT_mli, l_dat$id, l_dat$mFT)
  
  ## add in variance in Flowering time
  FTvariance_df <- readRDS(file="./data/FTvariance_df.rda")
  p[,ncol(p)+1] <- FTvariance_df$FT_variance
  dim(p)
  colnames(p)[1865] <- "var_ft"
  colnames(p)[1860:1865]
  
  pcaTarget <- readRDS(file="./data/pcaTarget.rda")
  pcaTarget
  # p <- cbind(p[,1], pcaTarget$x[,1:20])
  # head(p)
  
  # head(d4)
  # filed_mCent_fitness <- data.frame(na=rep(NA, 517))
  # for (cd in unique(d4$code)){
  #   tmp <- d4 %>% filter(code==cd)
  #   tmp <- tmp %>% mutate(mFT=Flowering_time-mean(na.omit(Flowering_time))) %>% 
  #     mutate(mSurvival_fruit=Survival_fruit-mean(na.omit(Survival_fruit))) %>% 
  #     mutate(mSeeds=Seeds-mean(na.omit(Seeds))) %>% 
  #     mutate(mFitness=Fitness-mean(na.omit(Fitness)))
  # 
  #   filed_mCent_fitness <- data.frame(filed_mCent_fitness, tmp$mSurvival_fruit,
  #                               tmp$mSeeds,
  #                               tmp$mFitness)
  #   print(cd)
  # }

  
  ## seperate all fitness data from phenotypes
  ##data from Exposito-Alonso, Fournier-Level et al 2011, Manzano-Pierdres et al 2014 field experiment and other phenotype data
  fitness <- colnames(p)[grep("Fitness", colnames(p))]
  survival <- c(colnames(p)[grep("Survival", colnames(p))][1:15], colnames(p)[grep("survival", colnames(p))][1])
  seeds <- colnames(p)[grep("Seeds", colnames(p))]
  #insitu<- colnames(p)[grep("insitu",colnames(p))]
  
  id <- grep("FT_", colnames(p))[1:8]
  p[,id][p[, id] == -9]  <- NA
  for (k in colnames(p)){
    p[,k][p[, k] == -9] <- NA
  }
  sum(p==-9, na.rm=T)
  
  all_fitness <- p[,c(fitness, survival, seeds)]
  for (k in colnames(all_fitness)){
    all_fitness[,k][all_fitness[, k] == -9] <- NA
  }
  sum(all_fitness==-9, na.rm=T)
  dim(all_fitness)
1823+41
  all_fitness$Seeds_Spain


  justphenos <- p[,!colnames(p) %in% c(fitness, survival, seeds)]
  id <- justphenos[,1]
  justphenos <- justphenos[,-1]
  
  #dim(justphenos)
  justphenos <- apply(justphenos, 2, fn)
  justphenos <- apply(justphenos, 2, scale)
  justphenos[1:5,1:5]
  justphenos <- cbind(justphenos, pcaTarget$x[,1:20])
  dim(justphenos)
  colnames(justphenos[,1820:1843])
  
  lsel<-c()
  #trait <- "rFitness_mli"
  for(i in c("rFitness","rSurvival_fruit","rSeeds")){
   for(j in c("mlp","mli","mhi","mhp", "tlp","tli","thi","thp")){
     trait<-paste0(i,"_",j)
     print(trait)
     w=all_fitness[,trait]/mean(all_fitness[,trait], na.rm=T)
     z=justphenos
     d1<-preparedata(w,z)
     res<-PHENOSELECTION(Variables=z, Fitness=w, replicates = 10)
     # tmp1<-do.call(cbind,parseformatted(res$gradient_linear)) %>% data.frame
     # tmp1$param<-"beta"
     tmp2<-do.call(cbind,parseformatted(res$coefficient_linear)) %>% data.frame
     tmp2$param<-"s"
     print(tmp2)
     sel<-tmp2 
     sel$mean<-fn(sel$mean)
     sel$se<-fn(sel$se)
     sel$lower=sel$mean - sel$se *1.96
     sel$upper=sel$mean + sel$se *1.96
     sel$trait<- colnames(justphenos)
     sel$fitness<-i
     sel$env<-j
     lsel<- rbind(lsel, sel)
    }
  }
  
  coeflinear<- function(data, indices) {
  d1 <- data[indices,]
  #P<-cov(d1[,-1])
  #Pinv<-solve(P)
  s<- cov(d1, use="pairwise.complete.obs")[-1,1]
  #B<-Pinv%*%s
  return(s)
}
  

  #saveRDS(lsel, file="SelectionRepResults_AllPhenos.rda")
  saveRDS(lsel, file="./data/SelectionRepResults_AllPhenos_112122_100bsreps.rda")
 
  ## the other fitness measures
  
  colnames(all_fitness)
  other_fitness <- all_fitness[,c(9:13, 22:29, 38:41 )]
  colnames(other_fitness)
  lsel_other<-c()
  #trait <- "rFitness_mli"
  for(i in colnames(other_fitness)){
     trait<-i
     print(trait)
     w=other_fitness[,trait]/mean(other_fitness[,trait], na.rm=T)
     z=justphenos
     d1<-preparedata(w,z)
     res<-PHENOSELECTION(Variables=z, Fitness=w, replicates = 10)
     # tmp1<-do.call(cbind,parseformatted(res$gradient_linear)) %>% data.frame
     # tmp1$param<-"beta"
     tmp2<-do.call(cbind,parseformatted(res$coefficient_linear)) %>% data.frame
     tmp2$param<-"s"
     print(tmp2)
     sel<-tmp2 
     sel$mean<-fn(sel$mean)
     sel$se<-fn(sel$se)
     sel$lower=sel$mean - sel$se *1.96
     sel$upper=sel$mean + sel$se *1.96
     sel$trait<- colnames(justphenos)
     sel$fitness<-i
     #sel$env<-j
     lsel_other<- rbind(lsel_other, sel)
    
  }
  
   saveRDS(lsel_other, file="./data/SelectionRepResults_OtherPhenos_112122_100bsreps.rda")
  
  
  res<-PHENOSELECTION(Variables=justphenos[,"FT_thi"], Fitness=all_fitness[,trait], replicates = 50)
  s<- cov(d1, use="pairwise.complete.obs")[-1,1]
  
  cov(y=all_fitness[,trait]/mean(all_fitness[,trait], na.rm=T), x=justphenos[,"FT_thi"]-mean(justphenos[,"FT_thi"], na.rm=T), use="pairwise.complete.obs")
  dim(d1)
  
  cbind(justphenos[,"FT_thi"], l_dat$Flowering_time)
  cbind(justphenos[,"FT_thi"]-mean(justphenos[,"FT_thi"], na.rm=T), l_dat$mFT)
  hist(justphenos[,"FT_thi"]/mean(justphenos[,"FT_thi"], na.rm=T))
  hist(l_dat$mFT)
       
  
  cbind(all_fitness[,"rFitness_thi"], l_dat$normFit)
    hist(all_fitness[,"rFitness_mlp"])
  hist(all_fitness[,"rFitness_mlp"]/mean(all_fitness[,"rFitness_mlp"], na.rm=T))
  hist(l_dat$normFit)
  
  
  justphenos[,1823:1843]
  cov(justphenos[,1824], all_fitness[,"rFitness_mlp"], use = "pairwise.complete.obs")
  justphenos
  
  l_dat <- dat %>%  filter(code=="thi")
  cov(x = l_dat$mFT, y=l_dat$normFit, method="pearson",
      use = "pairwise.complete.obs")
  cov(x = justphenos[,"FT_thi"]-mean(justphenos[,"FT_thi"], na.rm=T), y=all_fitness[,"rFitness_thi"], 
      method="pearson",
      use = "pairwise.complete.obs")
  

}else{
 
 lsel<- readRDS(file="./data/SelectionRepResults_AllPhenos_112122_100bsreps.rda")
 head(lsel)
 dim(lsel)
 
 lsel$trait[grep("PC", x=lsel$trait)]
 
 write.table(lsel, file="./tables/TotalSel_AllPhenotypes.tsv", sep="\t", quote = F, row.names = F, col.names = T)
 
 # 
 # lsel %>%  filter(env=="mli"|env=="mlp") %>%
 #   filter(trait=="FT_mli"|trait=="FT_mlp") %>%
 #   filter(fitness=="rFitness")
 #   #filter(signi!="ns")
 # 
 #  lsel %>%  filter(env=="thi"|env=="thp") %>%
 #   filter(trait=="FT_thi"|trait=="FT_thp") %>%
 #   filter(fitness=="rFitness")
 #   #filter(signi!="ns")
 # 
 #  lsel %>%  filter(env=="mli"|env=="mlp") %>%
 #   filter(trait=="FT_thi"|trait=="FT_thp")
 #   #filter(signi!="ns")
 # 
 # lsel %>%  filter(env=="mli"|env=="mlp") %>%
 #   filter(trait=="Flowering_time")
 # 
 #  lsel %>%  filter(env=="mli"|env=="mlp") %>%
 #   filter(trait=="FT.2")
 # 
 #  colnames(justphenos)[duplicated(colnames(justphenos))]


 Signif_vec <- c()
 names_vec <- c()
 fitnes_name <- c()
  for(i in c("rFitness","rSurvival_fruit","rSeeds")){
    for(j in c("mlp","mli","mhi","mhp", "tlp","tli","thi","thp" )){
      trait<-paste0(i,"_",j)


      tmp <- lsel %>% filter(fitness==i, env==j)
      significant_num <- sum(tmp$signi=="***" | tmp$signi=="**")
      names <- tmp$trait[tmp$signi=="***" | tmp$signi=="**"]

      # significant_num <- sum(tmp$signi!="ns")
      # names <- tmp$trait[tmp$signi!="ns"]

      fitnes_name <- c(fitnes_name, trait)
      Signif_vec <- c(Signif_vec, significant_num)
      names_vec <- c(names_vec, names)
    }
  }


  Signif_vec/ 1843
  col <- rep("madrid", length(fitnes_name))
  col[grep("_t", fitnes_name)] <- "tubing"

  df <- data.frame(sig = Signif_vec,
                   prop = Signif_vec/1826,
             fit = fitnes_name,
             col = col)
  colnames(df) <- c("Nsignif", "Prop.signif.", "fitness", "loc")
  saveRDS(df, file="./data/PhenotypeSelectionSummary.rda")
  
  
  df <- readRDS(file="./data/PhenotypeSelectionSummary.rda")
 
  tub_v_mad_selcoef <- ggplot(df) + geom_histogram(aes( x=`Nsignif`, fill=loc), bins = 10) +
    scale_fill_manual(values=c( '#e41a1c',"#234BC5"))
  saveRDS(tub_v_mad_selcoef, file="./figs/tmpobjects/tub_v_mad_selcoef.rda")
  # p <- df %>% filter(env=="tubing")
  # sum(p[,1])
  
  #hist(sort(table(names_vec), decreasing = T)[1:50])
  knitr::kable(df, col.names = colnames(df), caption = "Table SII.1 Summary of significant total selection coefficients measured for 1823 traits (plus 20 PCS) using 3 fitness traits, measured on individuals and populations, across 4 environments.", fixed_thread=T )
  
  ## sort the most associated phenotypes with selection
  #sort(table(names_vec), decreasing = T)[1:50]

  # head(justphenos[,1824:1843])
  # 
  # ## there are two PC1 phenotypes (whoops, one is from diff. study. Remove it from summary)
  # pc_sel <- lsel %>% filter(trait=="PC1")
  # dim(pc_sel)/2
  # sq <- seq(2, nrow(pc_sel), by=2)
  # pc_sel <- pc_sel[sq,]
  # pc_sel %>% filter(signi!="ns")
  # 
  # # install.packages("RVAideMemoire")
  # # library(RVAideMemoire)
  # cor.test(justphenos[,1824], all_fitness[,"rSeeds_mlp"], method = "pearson")
  # 
  # pc2_sel <- lsel %>% filter(trait=="PC2")
  # dim(pc2_sel)/2
  # sq <- seq(2, nrow(pc2_sel), by=2)
  # pc2_sel <- pc2_sel[sq,]
  # pc2_sel %>% filter(signi!="ns")
  # 
  # 
  # pcs_sel_response <- rbind(pc_sel, pc2_sel)
  # dim(pcs_sel_response)
  # write.table(pcs_sel_response, file="./tables/PC1_2_selectionResponse.tsv",
  #             quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # ## just some checking on the covariance measured
  # which(colnames(justphenos)=="PC1")
  # justphenos[,670:680]
  # justphenos[,colnames(justphenos)=="PC1"]
  # smp <- sample(seq(1,nrow(justphenos)), size = 300)
  # cov(justphenos[smp,1824], all_fitness[smp,"rFitness_mlp"]/mean(all_fitness[smp,"rFitness_mlp"], na.rm=T), use = "pairwise.complete.obs")
  # dim(all_fitness)
  # 
  ## look at flowering time phenotypes and their associations with fitness
  flw_phenos <- c("Flowering_time", "FT16", "FT10", "DTflower", "DTfruit", "DTbolt", "DTF_Oct_2040", "FT")
  ft_sel <- lsel %>% filter(trait %in% flw_phenos) %>%
    filter(fitness=="rSeeds") %>% 
    filter(signi!="ns")

  # write.table(ft_sel, file="./tables/FTphenotypesSignif_selectionResponse.tsv",
  #             quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # expo_ft <- c("FT_tli", "FT_mhi", "FT_mli", "FT_thi", "FT_thp", "FT_tlp", "FT_mlp", "FT_mhp")
  # expo_ft_sel <- lsel %>% filter(trait %in% expo_ft)
  # expo_ft_sel$trait <- gsub(pattern = "FT_", x=expo_ft_sel$trait, replacement = "")
  # 
  # expo_ft_sel <- expo_ft_sel[expo_ft_sel$trait == expo_ft_sel$env,]
  # expo_ft_sel$trait <- gsub(pattern = "t", x=expo_ft_sel$trait, replacement = "FT_t")
  # expo_ft_sel$trait <- gsub(pattern = "m", x=expo_ft_sel$trait, replacement = "FT_m")
  # dim(expo_ft_sel)
  # write.table(expo_ft_sel, file="./tables/ExpAlo_FTphenotypes_selectionResponse.tsv",
  #             quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # 
  # cor.test(justphenos[,"FT_mli"], all_fitness[,"rSeeds_mli"], use = "pairwise.complete.obs")
  # ft_sel[,1] < 0
  # ft_sel[which(ft_sel[,1]>0),]
  # 
  #  d<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  # d[grep("leaf", d$phenotype),]
  # # "FT"
  # # "125_DTFmainEffect2009"
  # 
  # sort(table(names_vec), decreasing = T)[1:80]
  # ### look at Growth rate relate phenotypes
  # ## rosette dry mass and growth rate
  # gr_phens <- c("rosette_DM", "Growth_rate", "RGR")
  # gr_sel <- lsel %>% filter(trait %in% gr_phens) %>%
  #   filter(signi!="ns")
  # cor.test(justphenos[,"RGR"], all_fitness[,"rSeeds_mlp"], use = "pairwise.complete.obs")
  # 
  # rgr_phens <- c("RGR")
  # rgr_sel <- lsel %>% filter(trait %in% rgr_phens) %>%
  #   filter(signi!="ns")
  # 
  # write.table(gr_sel, file="./tables/GRphenotypes_selectionResponse.tsv",
  #              quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # ## look at dormancy related traits
  # dorm_phens <- c("DSDS50", "DSDS50.1")
  # dorm_sel <- lsel %>% filter(trait %in% dorm_phens) %>%
  #   filter(signi!="ns")
  # dorm_sel
  # 
  # germ_phens <- c("base_perc")
  # germ_sel <- lsel %>% filter(trait %in% germ_phens) %>%
  #   filter(signi!="ns")
  # germ_sel
  # 
  # dorm_germ_sel <- rbind(dorm_sel, germ_sel)
  # write.table(dorm_germ_sel, file="./tables/DormPerc_phenotypes_selectionResponse.tsv",
  #              quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # dcar_pheno <- lsel %>% filter(trait=="Delta_13C") 
  # dcar_pheno
  # write.table(dcar_pheno, file="./tables/DeltaC13_phenotypes_selectionResponse.tsv",
  #              quote = F, row.names = F, col.names = T, sep="\t")
  # 
  # leaf <-  c("First_leaf_area", "leafsize")
  #  leaf_pheno <- lsel %>% filter(trait%in%leaf) %>%
  #           filter(signi!="ns")
  # 
  # 

}
 
```  

```{r, echo=F, eval=F, warning=F, message=F}
  df <- readRDS(file="./data/PhenotypeSelectionSummary.rda")
  #write.table(df, file="./tables/PhenotypeSelectionSummary.tsv", sep="\t", quote = F, row.names = F, col.names = T)
  knitr::kable(df, col.names = colnames(df), caption = "Table SII.1 Summary of significant total selection coefficients measured for 1823 traits (plus 20 PCS) using 3 fitness traits, measured on individuals and populations, across 4 environments.", fixed_thread=T )
```


```{r, echo=F, eval=T, warning=F, message=F, fig.cap="Histogram of the number of phenotypes with a significant association with a fitness measure for each of the 24 fitness phenotypes; for which there are three fitness phenotypes across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p)."}
## signif selection by env.
tub_v_mad_selcoef <- readRDS(file="./figs/tmpobjects/tub_v_mad_selcoef.rda")
tub_v_mad_selcoef
```


```{r, echo=F, eval=F, warning=F, message=F}
## PC 1 and 2 TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
  pcs_sel_response <- read.table(file="./tables/PC1_2_selectionResponse.tsv",
              header = T, sep="\t")
  knitr::kable(pcs_sel_response, caption = "Table SII.2 Total selection coefficients measured for PC axes 1 and 2 with all fitness traits, measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```

```{r, echo=F, eval=F, warning=F, message=F}
## Exposito-Alonso et al. 2019, FLOWERING TIME TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
    expo_ft_sel <- read.table(file="./tables/ExpAlo_FTphenotypes_selectionResponse.tsv",
              header = T, sep="\t")
  
  knitr::kable(expo_ft_sel, caption = "Table SII.3 Total selection coefficients measured for flowering time traits from Exposito-Alonso et al. (2019) with 3 fitness traits, measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```

```{r, echo=F, eval=F, warning=F, message=F}
## FLOWERING TIME TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
  ft_sel <- read.table(file="./tables/FTphenotypesSignif_selectionResponse.tsv",
              header = T, sep="\t")
  knitr::kable(ft_sel, caption = "Table SII.4 Significant total selection coefficients measured for flowering time traits from various other studies associated with 3 fitness traits from Exposito-Alonso et al 2019, measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```

```{r, echo=F, eval=F, warning=F, message=F}
## GROWTH RATE RELATED TRAITS TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
   gr_sel <- read.table(file="./tables/GRphenotypes_selectionResponse.tsv",
               header=T, sep="\t")
   knitr::kable(gr_sel, caption = "Table SII.5 Significant total selection coefficients measured for growth rate related traits associated with 3 fitness traits measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```


```{r, echo=F, eval=F, warning=F, message=F}
## DORMANCY TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
   dorm_germ_sel <- read.table(file="./tables/DormPerc_phenotypes_selectionResponse.tsv",
               header = T, sep="\t")
   knitr::kable(dorm_germ_sel, caption = "Table SII.6 Significant total selection coefficients measured for primary and secondary dormancy related traits associated with 3 fitness traits measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```

```{r, echo=F, eval=F, warning=F, message=F}
## DELTA_C13 TOTAL SELECTION TABLE
RERUN=F 
if(RERUN){
}else{
   dcar_pheno <- read.table(file="./tables/DeltaC13_phenotypes_selectionResponse.tsv",
               header = T, sep="\t")
   knitr::kable(dcar_pheno, caption = "Table SII.7 Total selection coefficients measured for Delta_C13 associated with 3 fitness traits measured across two environments (Tubingen and Madrid; t/m), two rainfall treatments (high and low; h/l) and two planting densities (individual and population; i/p).", fixed_thread=T )
}
```

PC1 had the most significant associations with lifetime fitness, where in all environments PC1 was under negative selection (Table SII.2), indicating the escape strategy is favored by natural selection. Importantly in the dry-hot environment, fitness was negatively associated with PC1 (slf = -0.152, p-valueboot < 1.0x10-3), meaning fitness was highest for accessions that were early flowering (slf = -0.110, p-valueboot < 1.0x10-3; Table SII.3-4), fast growing (slf = -0.176, p-valueboot < 1.0x10-3; Table SII.5), had high seed dormancy (slf = 0.129, p-valueboot < 5.0x10-2; Table SII.6), and low WUE (slf = -0.110, ns; Table SII.7), which coincide with the individual phenotype’s associations with lifetime fitness. All in all, selection in all environments evaluated with lifetime fitness favored the escape strategy, supporting previous observations of common gardens of A. thaliana 24, including additional associations from reanalyzed common garden fitness data 46, 47 and flowering time (Table SII.4). When using survival as the fitness trait, we found the same pattern across the board, where PC1 is generally under negative selection (hot-dry ss = -0.214, p-valueboot < 1.0x10-3), along with the associated phenotypes (Table SII4-II7). Only when considering fecundity from the hot-dry environment do we see positive selection for PC1 (sf = 0.109, p-valueboot < 5྾10-2,  Table SII.2). This pattern is expectedly also observed when the phenotypes that make up the primary variation on PC 1 are associated with fecundity; late flowering (sf = 0.078, ns; Pearson's r = 0.104,  p-value = 3.2྾10-2), slow relative growth rate (sf = -0.142, p-valueboot < 1྾10-2; Table SII.5), low dormancy  (sf = -0.130, p-valueboot < 1.0x10-3; Table SII.6),  and high WUE (sf = 0.133, ns; Table SII.7) are associated with high seed number, indicating selection for the avoidance strategy. Shifts in sign of natural selection components when considering lifetime fitness and survival compared to fecundity made us suspicious of selection conflict between multiple strategies. From first principles, we expected natural selection to favor the avoidance strategy, in traits like water use efficiency, under water-limiting, overwintering common garden experiments. 


################################################################################
## II.3 Multivariate Phenotype Selection
################################################################################

Because ecophysiological strategies are syndromes of many correlated traits, to understand the true target of natural selection we account for those correlations amongst 12 main traits (Fig. 1A) using B = P-1s , where B is the selection gradient, P is the phenotypic variance-covariance matrix (Lande and Arnold 1983). In this, we can extract the independent effects that each trait has associated with selection that cannot be attributed to their associations with other traits. 

```{r, echo=F, eval=F, warning=F, message=F}
RERUN=F 
if(RERUN){
}else{
  Pheno_Fitnes_Overlap<- read.table(file="./tables/PhenotypeFitnessOverlap.csv" ,sep=",", header = T,  row.names = 1)
  colnames(Pheno_Fitnes_Overlap) <- c("fitness_ExpAl2019", "FT", "dorm", "RootRGR", "RHindex", "DelC13", "StomDens", "StomIndex", "VernGrow", "Germ", "GR", "RGR", "ABA")
  
  knitr::kable(Pheno_Fitnes_Overlap[,1:7], caption = "Table SII.8 Phenotype sample overlap with corresponding accession fitness data; numbers indicate the total number of individuals with raw phenotype data for the same individuals with fitness data.
; Part1.", fixed_thread=T )
}
```

```{r, echo=F, eval=F, warning=F, message=F}
RERUN=F 
if(RERUN){
}else{
  Pheno_Fitnes_Overlap<- read.table(file="./tables/PhenotypeFitnessOverlap.csv" ,sep=",", header = T,  row.names = 1)
  colnames(Pheno_Fitnes_Overlap) <- c("fitness_ExpAl2019", "FT", "dorm", "RootRGR", "RHindex", "DelC13", "StomDens", "StomIndex", "VernGrow", "Germ", "GR", "RGR", "ABA")
  
  knitr::kable(Pheno_Fitnes_Overlap[,c(1,8:13)], caption = "Table SII.8 Phenotype sample overlap with corresponding accession fitness data; numbers indicate the total number of individuals with raw phenotype data for the same individuals with fitness data.
; Part2.", fixed_thread=T )
}
```

#### Selection analysis on 12 targeted traits 

```{r, echo=F, eval=T,message=F, warning=F, fig.cap="Figure SII.3 Multivariate selection analysis for 12 focal traits with dry-hot, cool-wet, and high/low planting density fitness data from Exposito-Alonso et al. 2019.", fig.height=7, fig.width=8.5}
RERUN=F 
if(RERUN){
  source("./analyses/phenoselection_multi_FUNCTIONS-copy.R")
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  #pheno <- read.table(file = './data/atlas1001_phenotypes_matrix_MR.csv', sep=",", header = T)
  dim(pheno)
  lsel<-list()
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  
  set.seed(0)
  #colnames(pheno)[grep("Vern",  colnames(pheno))]
  df <- pheno[idex515,colnames(pheno) %in% c("ABA_96h_low_water_potential", 
                                            #"Growth_rate", 
                                            "Delta_13C", 
                                            "DSDS10", 
                                            #"Stomatal_index_in_first_leaf", ## only 55
                                            "stomata_density",
                                            "stomatasize",
                                            "FT16",  
                                            "d8_10C_perc", 
                                            "RGR", 
                                            #"rhamnose_1_exp2", 
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003")]
                                            #"First_leaf_area", 
                                            #"X72_Vern_Growth"
                                            #"X34_LDV")]
    
  se <- 515-colSums(is.na(df))
  summary(se/515)
  ## check raw phenotype database to see which have a good number of samples in 515 accessions
  head(df)
  
  # ## add in variance in Flowering time / nevermind, not really under selection, maybe with seeds
  # FTvariance_df <- readRDS(file="./data/FTvariance_df.rda")
  # df$varianceFT <- FTvariance_df$FT_variance
  # head(df)
  
  colnames(df) <- c("FloweringTime", 
                    "Dormancy", "RootRGR", "RootHorizIndex", #"Rhamnose", 
                    "Delta13C", "StomataDensity",
                    "StomataSize",  #"Vernalization",  
                    "GerminationPerc", 
                   # "GrowthRate", 
                    "RGR", "ABA")
  #df <- apply(df,2,fn)
  df <- apply(df,2,scale)
  dim(df)
  head(df)
  
  fitness <- colnames(pheno)[grep("Fitness", colnames(pheno))][1:8]
  survival <- c(colnames(pheno)[grep("Survival", colnames(pheno))][1:12], colnames(pheno)[grep("survival", colnames(pheno))][1])[1:8]
  seeds <- colnames(pheno)[grep("Seeds", colnames(pheno))][1:8]

  # fitness <- colnames(pheno)[grep("Fitness", colnames(pheno))]
  # survival <- c(colnames(pheno)[grep("Survival", colnames(pheno))], colnames(pheno)[grep("survival", colnames(pheno))][1])
  # seeds <- colnames(pheno)[grep("Seeds", colnames(pheno))]
  all_fitness <- c(fitness, survival, seeds)
  
  ## get fitness data from the raw phenotype data, so no imputed fitness. Use imputed phenotypes though
  pheno2 <- read.table(file = './data/atlas1001_phenotypes_matrix_MR.csv', sep=",", header = T)
  dim(pheno2)
  
    all_fitness <- pheno2[,c(fitness, survival, seeds)]
  for (k in colnames(all_fitness)){
    all_fitness[,k][all_fitness[, k] == -9] <- NA
  }
  sum(all_fitness==-9, na.rm=T)
  
  # 
  # ## get the number of raw phenotypes covering the fitness data
  # Pheno_Fitnes_Overlap <- matrix(NA, nrow=length(all_fitness), ncol=13)
  # i<-1
  #   for (trait in all_fitness) {
  #     #trait<-paste0(i,"_",j)
  #     w=pheno2[,trait]/mean(pheno2[,trait], na.rm=T)
  #     print(trait)
  #     #w=relativefitness(atlas[,trait][idex515])
  #     z=df
  #     d1<-preparedata(w,z)
  #     d1 <- d1[!is.na(d1[,1]),]
  #     nrow(d1)
  #     dat <- nrow(d1)-colSums(is.na(d1))
  #     Pheno_Fitnes_Overlap[i,] <- dat
  #     i<- i+1
  #   }
  # colnames(Pheno_Fitnes_Overlap) <- c("fitness",  colnames(df))
  # rownames(Pheno_Fitnes_Overlap) <- all_fitness
  # write.table(Pheno_Fitnes_Overlap, file="./tables/PhenotypeFitnessOverlap.csv", sep=",", row.names = T, col.names = T, quote = F)
  
  dim(df)
  dim(pheno)
  # df <- df[,-10]
  sum(pheno[idex515,colnames(pheno) %in% all_fitness] == -9, na.rm = T)
  
    lsel<- c()
  for(i in c("rFitness","rSurvival_fruit","rSeeds")){
    for(j in c("mlp","mli","mhi", "mhp", "thi","thp", "tli", "tlp")){
  #for (trait in all_fitness) {
      trait<-paste0(i,"_",j)
      w=all_fitness[idex515,trait]/mean(all_fitness[idex515,trait], na.rm=T)
      print(trait)
      #w=relativefitness(atlas[,trait][idex515])
      z=df
      d1<-preparedata(w,z)
      d1 <- na.omit(d1)
      print(dim(d1))
      res<-PHENOSELECTION(Variables=z, Fitness=w, replicates = 100)
      tmp1<-do.call(cbind,parseformatted(res$gradient_linear)) %>% data.frame
      tmp1$param<-"beta"
      tmp2<-do.call(cbind,parseformatted(res$coefficient_linear)) %>% data.frame
      tmp2$param<-"s"
      sel<-rbind(tmp1,tmp2) 
      sel$mean<-as.numeric(sel$mean)
      sel$se<-as.numeric(sel$se)
      sel$lower=sel$mean - sel$se *1.96
      sel$upper=sel$mean + sel$se *1.96
      sel$trait<- colnames(df)
      sel$fitness<-i
      sel$env<-j
      lsel<-rbind(lsel,sel)
    }
  }
  
    
  lsel_beta <- lsel %>% filter(param=="beta")
  lsel_s <- lsel %>% filter(param=="s")
  lsel2<- merge(lsel_beta, lsel_s, by=c("trait", "fitness", "env"))
  head(lsel2)
  #saveRDS(lsel2, file="./data/multivariate_SelectionResults_NoGrowthRate.rda")
  saveRDS(lsel2, file="./data/multivariate_SelectionResults_targettraits_1122122.rda")
  
 lsel2 <- readRDS(file="./data/multivariate_SelectionResults_targettraits_1122122.rda")  

 sel_plot <- ggplot(lsel2)+
    geom_point(aes(x=mean.y, y=mean.x, shape=fitness, color=env, size=signi.x)) +
    scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
    scale_shape_manual(values=c(15, 19, 17)) +
    scale_color_manual(values=c( "#de6ddc", "#f54e91", '#e41a1c',"#F97559", "#234BC5", "#5B75C6", "#7ae6a3", "darkgreen")) + 
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y, color=env), size=.25) +
    geom_segment(aes(y=mean.x,yend=mean.x, x=upper.y,xend=lower.y, color=env), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("b (direct selection") + xlab("s (total selection)") +
    facet_wrap(~trait, scales = 'fixed') +
    xlim(-.4,0.2) + ylim(-0.4,0.4)+
    scale_x_continuous("s (total selection)") + scale_y_continuous("b (direct selection")
 sel_plot
  pdf(file = 'figs/MultivariatePhenSel_ALL_Envs_suppplot.pdf', width = 7.5, height=6)
  sel_plot
  dev.off() 
  
  
  # head(lsel)
  # 
  # lsel %>% filter(trait=="Vernalization") %>% 
  #   filter(signi!="ns")
  # 
  # lsel %>% filter(signi=="**") %>% 
  #   filter(param=="beta")
  # 
  #  lsel %>% filter(env %in% c("thp", "thi")) %>% 
  #    filter(signi!="ns") %>% filter(param=="beta")
  
  ## total number of times a trait is "directly" (beta) associated with fitness
  # b<-   lsel %>% group_by(fitness) %>% 
  #     filter(signi!="ns", param== "beta") %>% 
  #     summarise(table(fitness))
  # view(b)
  #   
  # plotSelection <- function(df){
  #   ggplot(df, aes(x=mean2, y=mean1, label=trait1)) +
  #     geom_text( size=4)+geom_point(alpha=.7)+
  #  # scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
  #   geom_segment(aes(y=mean1,yend=mean1, x=lower2,xend=upper2), size=.25) +
  #   geom_segment(aes(y=lower1,yend=upper1, x=mean2,xend=mean2), size=.25) +
  #   geom_hline(yintercept=0, linetype="dashed")+
  #   geom_vline(xintercept=0, linetype="dashed")+
  #   ylab("b (direct selection") + xlab("s (total selection)") 
  #  }
  
  fitnes_MP2014 <- lsel %>% filter(fitness=="Fitness") %>% 
      group_split(param) %>%  do.call(cbind,.)
  colnames(fitnes_MP2014) <- paste0(colnames(fitnes_MP2014), rep(seq(1, 2), each=ncol(lsel)))
  saveRDS(fitnes_MP2014, file="./figs/tmpobjects/fitness_MP2014.rda")
  # plotSelection(fitnes_MP2014)
  
  library(dplyr)
  library(tidyr)
  
  Fitness_Fourneir2011 <- lsel %>% filter(fitness%in% 
      c("Fitness_Finland", "Fitness_Germany", "Fitness_Spain", "Fitness_UnitedKingdom",
      "Seeds_Finland", "Seeds_Germany", "Seeds_Spain", "Seeds_UnitedKingdom",
      "Survival_Finland", "Survival_Germany", "Survival_Spain", "Survival_UnitedKingdom") ) %>% 
      separate(fitness, c("fit", "env")) %>% 
      group_split(param) %>%   do.call(cbind,.)
  colnames(Fitness_Fourneir2011) <- paste0(rep(c("b_", "s_"), each=ncol(lsel)+1), colnames(Fitness_Fourneir2011))
  saveRDS(Fitness_Fourneir2011, file="./figs/tmpobjects/Fitness_Fourneir2011.rda")
  
  
  ####### old plots of just exposito-alonso fitness data
  
    dim(lsel)
  lsel_beta <- lsel %>% filter(param=="beta")
  lsel_s <- lsel %>% filter(param=="s")
  lsel2<- merge(lsel_beta, lsel_s, by=c("trait", "fitness", "env"))
  head(lsel2)
  
  levels(lsel2$signi.x)
  all_selection_bytrait <- ggplot(lsel2)+
    geom_point(aes(x=mean.y, y=mean.x, shape=fitness, color=env, size=signi.x)) +
    scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
    scale_shape_manual(values=c(15, 19, 17)) +
    scale_color_manual(values=c( '#e41a1c',"#F97559", "#234BC5", "#5B75C6")) + 
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y, color=env), size=.25) +
    geom_segment(aes(y=mean.x,yend=mean.x, x=upper.y,xend=lower.y, color=env), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("b (direct selection") + xlab("s (total selection)") +
    facet_wrap(~trait, scales = 'fixed') +
    scale_x_continuous("s (total selection)") + scale_y_continuous("b (direct selection")
  all_selection_bytrait
  saveRDS(all_selection_bytrait, file="./figs/tmpobjects/all_selection_bytrait.rda")
  
  all_selection_bytrait <- readRDS( file="./figs/tmpobjects/all_selection_bytrait.rda")
  
  pdf(file = 'figs/MultivariatePhenotypeSelection.pdf')
  all_selection_bytrait
  dev.off()
  
  ## make small table for significant selection effects
  tmp <-  lsel2 %>% 
    filter(env%in%c("mlp", "mli"))
  #%>% 
   # filter(fitness=="rFitness") 
  ## %>% filter( signi.x%in% c("***", "**", "*") | signi.y%in% c("***", "**", "*")) 
  #tmp <- tmp[,-c(2,3)]
  
  tmp2 <-  lsel2 %>% 
    filter(env%in%c("thp", "thi")) 
  #%>% 
  #  filter(fitness=="rFitness") 
  ## %>% filter( signi.x%in% c("***", "**", "*") | signi.y%in% c("***", "**", "*")) 
  rbind(tmp, tmp2)
  
  tmpwrite <- data.frame(trait = tmp$trait, 
                         env = tmp$env,
                         fitness=tmp$fitness,
                         beta = paste0(tmp$mean.x, " (", signif(tmp$lower.x, digits = 3), ", ", tmp$upper.x,")"),
                         beta.s = tmp$signi.x,
                         #Direction = c("Avoid", "Avoid", "Escape",  "Avoid"), 
                         s= paste0(tmp$mean.y, " (", tmp$lower.y, ", ", tmp$upper.y,")"),
                         s.s = tmp$signi.y)

  tmpwrite2 <- data.frame(trait = tmp2$trait, 
                          env = tmp2$env,
                          fitness=tmp2$fitness,
                          beta = paste0(tmp2$mean.x, " (", tmp2$lower.x, ", ", signif(tmp2$upper.x, digits=3),")"),
                           beta.s = tmp2$signi.x,
                          #Direction = c("Avoid", "?", "Escape", "Escape"),
                          s= paste0(tmp2$mean.y, " (", tmp2$lower.y, ", ", tmp2$upper.y,")"),
                          s.s = tmp2$signi.y)
  write.table(tmpwrite, file="./tables/Beta&SelEstimates_mlp.tsv", sep="\t", quote = F, row.names = F, col.names = T)
  write.table(tmpwrite2, file="./tables/Beta&SelEstimates_thp.tsv", sep="\t", quote = F, row.names = F, col.names = T)
  #knitr::kable(tmpwrite)
  
  ## make projection of selection on FT over degrees C
  #  lsel %>% 
  #     #filter(signi.x%in%c("***", "**")) %>% 
  #     filter(fitness=="rFitness") %>% 
  #     #filter(env%in%c("mli","mlp"))
  #     filter(trait=="FloweringTime")
  #  
  #  MadC <- c(22, 22)
  #  tubC <- c(10,10)
  #  Mad_ft_beta <- c(0.336, 0.153)
  #  Tub_ft_beta <- c(0.018, -0.008)
  #  
  # library(tidyverse)
  #  df <- data.frame(beta =c(Mad_ft_beta, Tub_ft_beta), loc= c("a", "b", "a", "b"),
  #                   C = c(22, 22, 10, 10))
  #  
  #  Mad_ft_beta- Tub_ft_beta / (MadC - tubC)
  
  # SelectionbyC <-ggplot(df, aes(y=beta, x=C, col=loc)) + geom_point() +
  #   geom_smooth(method = "lm") +
  #   scale_color_manual(values=c("gray33", "gray66")) +
  #   annotate("text", y=.3, x=16, label="+0.318 direct \nselection / degrees C", size=2.7) +
  #   theme(legend.position="none") + xlab("Degrees C") + ylab("FT b (direct selection)") + 
  #   #scale_color_manual(values=c("gray33", "gray66")) +
  #   annotate("text", y=.07, x=19, label="+0.162 direct \nselection / degrees C", size=2.7)
  # saveRDS(SelectionbyC, file="./figs/SelectionbyDegreesC.rda")

  
  ## What happens when you remove growth rate
  selection_noGrowthRate <- ggplot(lsel2)+
    geom_point(aes(x=mean.y, y=mean.x, shape=fitness, color=env, size=signi.x)) +
    scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
    scale_shape_manual(values=c(15, 19, 17)) +
    scale_color_manual(values=c( '#e41a1c',"#F97559", "#234BC5", "#5B75C6")) + 
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y, color=env), size=.25) +
    geom_segment(aes(y=mean.x,yend=mean.x, x=upper.y,xend=lower.y, color=env), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("b (direct selection") + xlab("s (total selection)") +
    facet_wrap(~trait, scales = 'fixed') +
    scale_x_continuous("s (total selection)") + scale_y_continuous("b (direct selection")
  selection_noGrowthRate
  
  saveRDS(selection_noGrowthRate, file="./figs/tmpobjects/MultivSel_noGrowthRate_.rda")
  
  # pdf(file = 'figs/MultivariatePhenSel_NoGrowthRate.pdf')
  # selection_noGrowthRate
  # dev.off()
  # 
  ## What happens when you remove Flowering time
  # selection_noFT <- ggplot(lsel)+
  #   geom_point(aes(x=mean.y, y=mean.x, shape=fitness, color=env, size=signi.x)) +
  #   scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
  #   scale_shape_manual(values=c(15, 19, 17)) +
  #   scale_color_manual(values=c( '#e41a1c',"#F97559", "#234BC5", "#5B75C6")) + 
  #   geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y, color=env), size=.25) +
  #   geom_segment(aes(y=mean.x,yend=mean.x, x=upper.y,xend=lower.y, color=env), size=.25) +
  #   geom_hline(yintercept=0, linetype="dashed")+
  #   geom_vline(xintercept=0, linetype="dashed")+
  #   ylab("b (direct selection") + xlab("s (total selection)") +
  #   facet_wrap(~trait, scales = 'fixed') +
  #   scale_x_continuous("s (total selection)") + scale_y_continuous("b (direct selection")
  # pdf(file = 'figs/MultivariatePhenSel_NoFloweringtime.pdf')
  # selection_noFT
  # dev.off()
  ## Leave one out approximation 
  
  
}else{
  all_selection_bytrait <-  readRDS(file="./figs/tmpobjects/all_selection_bytrait.rda")
  all_selection_bytrait
  
}
```

To understand which phenotypes had the strongest impact on flowering time, we performed iterative estimates of the mutlivariate selection coefficients (s and Beta), while removing one trait at a time. Notably, when growth rate was removed, the impact on flowering time estimates changed slightly, but mainly the effect of selection moved toward selecting for ecotypes that did not require vernalization. Only when we removed both growth rate and growth after vernalization did we see the estimates of flowering time significantly affected, and the significant positive selective effect for late flowering gone (__Figure SII.4__)

```{r, echo=F, eval=T, warning=F, fig.cap="Figure SII.4 Multivariate selection analysis for 10 focal traits (growth rate and vernalization growth removed) with dry-hot, cool-wet, and high/low planting density fitness data from Exposito-Alonso et al. 2019.",  fig.height=7, fig.width=8.5}
 selection_noGrowthRate <- readRDS(file="./figs/tmpobjects/MultivSel_noGrowthRate.rda")
selection_noGrowthRate
```

```{r, echo=F, eval=F, warning=F}
RERUN=F 
if(RERUN){
}else{
  tmpwrite<- read.table(file="./tables/Beta&SelEstimates_mlp.tsv",sep="\t", header = T)
  knitr::kable(tmpwrite,  caption = "Table SII.9 Direct and total selection estimates from the dry-hot environment (Madrid) for both planting densities (individual/population; i/p).")
}
```

```{r, echo=F, eval=F, warning=F}
RERUN=F 
if(RERUN){
}else{
  tmpwrite2<- read.table(file="./tables/Beta&SelEstimates_thp.tsv",sep="\t", header = T)
  knitr::kable(tmpwrite2, caption = "Table SII.10 Direct and total selection estimates from the cool-wet environment (Tubingen) for both planting densities (individual/population; i/p).")
}
```

#### Summarized plot of just MLI and THI for survival selection

```{r, echo=F, eval=F, warning=F}
### Summarized plots of just fitness MLP and fitness THP
RERUN=F 
if(RERUN){
  #lsel2 <-  readRDS(file="./data/multivariate_SelectionResults_targettraits.rda")
  lsel2 <- readRDS(file="./data/multivariate_SelectionResults_targettraits_1122122.rda")  
  head(lsel2)
  
  lsel4 <- lsel2 %>% 
    filter(env=="mli") %>% 
    #filter(env%in%c("mlp", "mli")) %>% 
    filter(fitness=="rSurvival_fruit")
  
  x$fitness
    ## plot of just Seeds , mlp
  head(lsel)
  
  pdf(file = 'figs/rFit_mli_selection_points.pdf')
  pdf(file = 'figs/rSurv_mli_selection_points.pdf')
  ggplot(lsel4, aes(x=mean.y, y=mean.x, label=trait))+
    annotate("rect", xmin = Inf, xmax = 0, ymin = Inf, ymax = 0, fill= "white", alpha=.5)  +
    annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0 , fill= "white", alpha=.5) +
    annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = -Inf, fill= "gray88", alpha=.5) +
    annotate("rect", xmin = 0, xmax = -Inf, ymin = Inf, ymax = 0, fill= "gray88", alpha=.5)+
    geom_point() +
    #scale_color_manual(values=c("#234BC5", "#5B75C6", '#e41a1c',"#F97559")) +
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y), size=.25) +
    geom_segment(aes(x=lower.y,xend=upper.y, y=mean.x,yend=mean.x), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    labs(y="b (direct selection)", x="s (total selection)") + 
    ylim(-0.33, 0.33) + xlim(-0.25, 0.25) 
    #geom_label(size=2) 
  dev.off()
  
    # Kingsolver & Diamond 2011
    install.packages("lmodel2")
    library(lmodel2)
    x_centered <- lsel4$mean.x - mean(lsel4$mean.x) #B
    y_centered <- lsel4$mean.y - mean(lsel4$mean.y) #s
    # Fit a reduced major axis regression model
    model <- lmodel2(y_centered ~ x_centered)

  
    lsel5 <- lsel2 %>% 
    filter(env=="thi") %>% 
    #filter(env%in%c("mlp", "mli")) %>% 
    filter(fitness=="rSurvival_fruit")
  
  
    ## plot of just Seeds , mlp
  head(lsel)
  
  pdf(file = 'figs/rFit_thi_selection.pdf')
  pdf(file = 'figs/rSurv_thi_selection_pointsonly.pdf')
  ggplot(lsel5, aes(x=mean.y, y=mean.x, label=trait))+
    annotate("rect", xmin = Inf, xmax = 0, ymin = Inf, ymax = 0, fill= "white", alpha=.5)  +
    annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0 , fill= "white", alpha=.5) +
    annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = -Inf, fill= "gray88", alpha=.5) +
    annotate("rect", xmin = 0, xmax = -Inf, ymin = Inf, ymax = 0, fill= "gray88", alpha=.5)+
    geom_point() +
    #scale_color_manual(values=c("#234BC5", "#5B75C6", '#e41a1c',"#F97559")) +
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y), size=.25) +
    geom_segment(aes(x=lower.y,xend=upper.y, y=mean.x,yend=mean.x), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    labs(y="b (direct selection)", x="s (total selection)")  + ylim(-.2, 0.2) + xlim(-.02, 0.02)
   # geom_label(size=1) 
  dev.off()
  
}else{

}
```


```{r eval=F, echo=F, fig.align='center', fig.cap="", fig.height=4, fig.width=4, warning=FALSE}
#### Simplified models of multivariate selection with only Flowering time, Growth rate, and DeltaC13

# In the multivariate selection analyses we noticed the main effects of selection are acting on flowering time and growth rate, and that these traits are most likely influenced by or influencing DeltaC13. Therefore, we opted to run a few more simple models of multivairate selection while only considering these three phenotypes and their relationship to fitness, survival, and fecundity from Exposito-Alonso et al. 2019.

RERUN=F 
if(RERUN){
  source("~/safedata/natvar/analyses/phenoselection_multi_FUNCTIONS-copy.R")
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  #pheno <- read.table(file = './data/atlas1001_phenotypes_matrix_MR.csv', sep=",", header = T)
  dim(pheno)
  pheno[1:5,1:5]
  lsel<-list()
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  
  set.seed(0)
  #colnames(pheno)[grep("Vern",  colnames(pheno))]
  df <- pheno[,colnames(pheno) %in% c( "Growth_rate", 
                                            "Delta_13C",
                                            #"stomata_density",
                                            #"stomatasize",
                                            "FT_mlp",
                                            "FT16")]
  
  df[,"FT_mlp"][df[,"FT_mlp"] == -9]  <- NA
  
  df
  library(missForest)
  dimp<- missForest(df, variablewise = T)
  df <- dimp$ximp
  
  fitness <- pheno[,colnames(pheno) %in% c( "rFitness_mlp", 
                                                   "rFitness_mli",
                                                   "rSeeds_mlp",
                                                   "rSeeds_mli",
                                                   "rSurvival_fruit_mlp",
                                                   "rSurvival_fruit_mli")]
                                                   # "rFitness_thp", 
                                                   # "rFitness_thi",
                                                   # "rSeeds_thp",
                                                   # "rSeeds_thi",
                                                   # "rSurvival_fruit_thp",
                                                   # "rSurvival_fruit_thi")]
  head(fitness)
  for (k in colnames(fitness)){
    fitness[,k][fitness[, k] == -9] <- NA
  }
  sum(fitness==-9, na.rm=T)
  
  #df <- apply(df, 2, scale)
  all_df <- data.frame(df, fitness)
  dim(all_df)
                   
  515-colSums(is.na(all_df)) ## check raw phenotype database to see which have a good number of samples in 515 accessions
  dim(df)
  sum(is.na(all_df))
  
  head(all_df)
  
  mod_results <- list()
  i<-1
  for (resp in colnames(fitness)){
    print(resp)
    df <- data.frame(resp = all_df[,resp], 
                     Delta_13C = all_df[,"Delta_13C"],
                     Growth_rate = all_df[,"Growth_rate"],
                     FT_mlp = all_df[,"FT_mlp"])
    #df<-na.omit(df)
    resis <- lm(formula = resp ~ Delta_13C + Growth_rate + FT_mlp, data = df)
    mod_results[[i]] <- summary(resis)
    i <- i +1
    
  }
 mod_results

  
}else{

}
```

```{r, echo=F, eval=F, warning=F, fig.cap="Change in direct effect on Flowering time over change in degrees C", fig.height=2.7, fig.width=2.7, message=F}
## chunk with selection in Madrid and Tubingen over C
RERUN=F 
if(RERUN){
}else{
  SelectionbyC <- readRDS(file="./figs/SelectionbyDegreesC.rda")
  SelectionbyC
}
```

#### Other Field experiments selection estimates

In addition to the fitness data from Exposito-Alonso et al. 2019, we also used fitness data from additional common garden experiments using A. thaliana in varying environments. Specifically, we used fitness data from Fournier-Level et al. 2011 and Manzano-Piedres et al. 2014. The coverage of phenotypes and fitness data for each study are below. 

```{r eval=T, echo=F, fig.align='center', fig.cap="Figure SII.5 Multivariate phenotype selection using fitness data from Manzano-Peidras et al. 2014.", fig.height=4, fig.width=4, warning=FALSE}
RERUN=F 
if(RERUN){
}else{
    plotSelection <- function(df){
    ggplot(df, aes(x=mean2, y=mean1, label=trait1)) +
      geom_text( size=3)+geom_point(alpha=.7)+
   # scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
    geom_segment(aes(y=mean1,yend=mean1, x=lower2,xend=upper2), size=.25) +
    geom_segment(aes(y=lower1,yend=upper1, x=mean2,xend=mean2), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("b (direct selection") + xlab("s (total selection)") 
    }

    
  fitness_MP2014 <- readRDS(file="./figs/tmpobjects/fitness_MP2014.rda")
  plotSelection(fitness_MP2014)
  pdf(file = )
  plotSelection(fitness_MP2014)
}
```

The fitness data from Manzano-Piedras et al. 2014 is from 279 A. thaliana individuals from Iberian populations in Spain, grown in a region of southern Spain that is relatively dry and hot. Using this fitness data to measure selection on the we target phenotypes revealed first that every trait was significantly associated with total selection independently. However, when accounting for trait covariation, fewer traits were truley under direct selection. Notably flowering time was under negative selection (beta = -0.163 (-.34, 0.015) ), indicating higher fitness associated with individuals that flower early. However, there is also significant negative selection on dormancy (B = -0.447 (-0.63, -0.27)) and positive selection for stomata denisty (B = 0.13 (-0.01, 0.27)) and post-vernalization growth (B = 0.059 (-0.002, 0.12)), indicating a higher fitness for individuals that do not have a high primary dormancy, meaning they germinate prior to winter and overwinter as small rosettes. Additionally, increased stomata denisity and individuals that grew better after experiencing vernalization also had a higher fitness, and these inidividuals also typically over-winter and have higher water use efficience. The discrepancy in selection pressures for escape and avoidance strategies that we observed with the Exposito-Alonso et al. 2019 fitness data persists here in this data as well. This is not a huge surprise given the populations under study were from the Iberian peninsula in Spain, which is a geographic location known to harbor diverse populations of A. thaliana that harbor individuals that germinate early and over-winter, as well those that experience high dormancy and germinate in early spring (Mendez-Vigo et al. 2012).

```{r, echo=F, eval=T, warning=F, fig.cap="Figure SII.6 Multivariate selection analysis for 12 focal traits using fitness data from Fournier-Level et al. 2011.", fig.align="center", fig.width=8, fig.height=6}
RERUN=F 
if(RERUN){
}else{
    plotSelection <- function(df){
    ggplot(df, aes(x=s_mean, y=b_mean, shape=b_fit, color=b_env, size=b_signi)) +
     #geom_text( size=4)+
     geom_point()+
     scale_size_manual(values=c("ns" =.5, "*"=2, "**"=3, "***"= 4)) +
     scale_color_manual(values=c("Finland"="#4292c6", "Germany" = "#41ab5d",
                                 "Spain" = "#d73027", "UnitedKingdon"="#253494")) +
     geom_segment(aes(y=b_mean,yend=b_mean, x=s_lower,xend=s_upper), size=.25) +
     geom_segment(aes(y=b_lower,yend=b_upper, x=s_mean,xend=s_mean), size=.25) + 
     geom_hline(yintercept=0, linetype="dashed")+
     geom_vline(xintercept=0, linetype="dashed") +
     ylab("b (direct selection") + xlab("s (total selection)") +
     facet_wrap(~b_trait, scales = "fixed") 
    }
    
    Fitness_Fourneir2011 <- readRDS(file="./figs/tmpobjects/Fitness_Fourneir2011.rda")
    plotSelection(Fitness_Fourneir2011)
}
```

The fitness data from Fourneir-Level et al. 2011.


```{r, echo=F, eval=F, warning=F}
#### Selection analysis on Principle Components
RERUN=F 
if(RERUN){

  pheno <- read.table(file = './data/atlas1001_phenotypes_matrix_MR.csv', sep=",", header = T)
  pcaTarget <- readRDS(file="./data/pcaTarget.rda") 
  pcaTarget$sdev^2/sum(pcaTarget$sdev^2)
  cumsum(pcaTarget$sdev^2/sum(pcaTarget$sdev^2))
  
  
  sort(abs(pcaTarget$rotation[,1]), decreasing = T)[1:10]
  plot(pcaTarget$x[,1], pheno$FT16[idex515])
  
  dim(pcaTarget)
  
  dat <- pcaTarget$x
  
  lsel<- c()
  for(i in c("rFitness","rSurvival_fruit","rSeeds")){
    for(j in c("mlp","mli","thi","thp")){
  #for (trait in all_fitness) {
      trait<-paste0(i,"_",j)
      w=pheno[idex515,trait]/mean(pheno[idex515,trait], na.rm=T)
      print(trait)
      #w=relativefitness(atlas[,trait][idex515])
      z=dat[,1:10]
      d1<-preparedata(w,z)
      d1 <- na.omit(d1)
      print(dim(d1))
      res<-PHENOSELECTION(Variables=z, Fitness=w, replicates = 100)
      tmp1<-do.call(cbind,parseformatted(res$gradient_linear)) %>% data.frame
      tmp1$param<-"beta"
      tmp2<-do.call(cbind,parseformatted(res$coefficient_linear)) %>% data.frame
      tmp2$param<-"s"
      sel<-rbind(tmp1,tmp2) 
      sel$mean<-fn(sel$mean)
      sel$se<-fn(sel$se)
      sel$lower=sel$mean - sel$se *1.96
      sel$upper=sel$mean + sel$se *1.96
      sel$trait<- gsub("PCx",pattern = "x", seq(1, 10))
      sel$fitness<-i
      sel$env<-j
      lsel<-rbind(lsel,sel)
    }
  }
  
    lsel_beta <- lsel %>% filter(param=="beta")
  lsel_s <- lsel %>% filter(param=="s")
  lsel2<- merge(lsel_beta, lsel_s, by=c("trait", "fitness", "env"))
  head(lsel2)

 ggplot(lsel2)+
    geom_point(aes(x=mean.y, y=mean.x, shape=fitness, color=env, size=signi.x)) +
    scale_size_manual(values=c("ns" =.5, "*"=1, "**"=2, "***"= 3)) +
    scale_shape_manual(values=c(15, 19, 17)) +
    scale_color_manual(values=c( '#e41a1c',"#F97559", "#234BC5", "#5B75C6")) + 
    geom_segment(aes(y=lower.x,yend=upper.x, x=mean.y,xend=mean.y, color=env), size=.25) +
    geom_segment(aes(y=mean.x,yend=mean.x, x=upper.y,xend=lower.y, color=env), size=.25) +
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("b (direct selection") + xlab("s (total selection)") +
    facet_wrap(~trait, scales = 'fixed') +
    scale_x_continuous("s (total selection)") + scale_y_continuous("b (direct selection")

  
}else{
  
}
  
```

################################################################################
## II.4 Multivariate prediction of selection response
################################################################################

We inferred the population responses to this natural selection by using the selection response equation (ref. 49): z = GP-1s, where G is the additive genetic variance-covariance matrix constructed from heritability and genetic correlation estimates among 12 traits, P is the phenotypic variance-covariance matrix, and s is the raw covariance of the trait with lifetime fitness in hot-dry conditions. 

For this, we used the estimates of s from lifetime fitness in both the hot-dry environment (Madrid (m) and low water (l)) and the cool-wet environment (tubingen (t), high water (h)), for both individual (i) and population (p) planting densities. 

We used the SNP-based heritability estimates from the univariate GWA with target phenotypes. We used genetic correlations estimated from both the correlation of summary statistics (Figure SII.7) from LD pruned alleles and from the estimated correlations from the mGWA (Figure SII.8). 


```{r, echo=F, eval=T, warning=F, fig.cap="Figure SII.7 Estimated phenotypic response to multivariate selection, using genetic correlations estimated using LD-pruned GWA summary statistics", fig.width=8, fig.height=6, message=F}
RERUN=F 
if(RERUN){
#   library(RColorBrewer)
#   ##get genetic correlations in a quick way, then do them using LDSC
#   z <- readRDS(file="TopSNPs/Zscores.hq.rda")
#   z <- as.data.frame(z)
  targets <- c("ABA_96h_low_water_potential",
               "Growth_rate",
               "Delta_13C",
               "DSDS10",
              "Stomatal_index_in_first_leaf", ## only 55
               "stomata_density",
               #"stomatasize",
               "FT16",
               "d8_10C_perc",
               "RGR",
               #"rhamnose_1_exp2",
               "Root_horizontal_index_day001",
               "Relative_root_growth_rate_day002.day003",
               #"First_leaf_area",
               "X72_Vern_Growth")
#   target_z <- z[,colnames(z) %in% targets]
#   head(target_z)
#   # colnames(target_z) <- c("Flowering time @ 16C", "Growth rate", "RGR", "Stomatal_index_1stleaf", "Rhamnose", "Germination rate",
#   #                             "Delta_13C", "ABA", "Percent germination @ 10C", "Dormancy", "Root angle", "Rosette area", "Root RGR", 
#   #                             "Root width", "proline")
#   
#   
#   # colnames(target_z) <- c("FT @ 16C", "PrimaryDormancy", "Relative Root GR", "Root Horiz Index", "Rhamnose","Delta_13C", "Stomatal Index 1st   lf", "Perc. Germination","Growth rate","RGR"," ABA")
#   target_Zcor <- cor(target_z)
#   pdf(file="./figs/GeneticZCorsCorplot.pdf")
#   corrplot(target_Zcor, method = "color", type = "lower", diag = F,tl.cex = .35, col=brewer.pal(9,'Greys'), addCoef.col = "white",
#            number.cex = 0.45, tl.srt = 45)
#   dev.off()
  
  
  # imp_multivar_corr_matrix <- readRDS(file="./data/imp_multivar_corr_matrix.Rda")
  # # multivar_corr_matrix <- readRDS(file="./data/multivar_corr_matrix.Rda")
  # target_Zcors <- readRDS(file="./data/target_Zcors.rda")
  # colnames(target_Zcors)[8] <- "X72_Vern_Growth"
  # rownames(target_Zcors)[8] <- "X72_Vern_Growth"
  
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  sum(colnames(pheno) %in% targets)
  targets %in% colnames(pheno)
  
  # load field experiment information to get the 515 accessions
  load("./data/d4.rda")
  idsfield<-unique(d4$id)
  whichfield<-which(pheno$id %in% idsfield)
  idex515<-whichfield
  
  df <- pheno[idex515,colnames(pheno) %in% targets]
  df <- apply(df,2,fn)
  df <- apply(df,2,scale)
  head(df)
  dim(df)
  
  ## load heritability estimated to put as diagonals in matrix
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda") 
  
  head(st_all)
  # target_h2 <- st_all[st_all$phenotype %in% targets,colnames(st_all) %in% c("phenotype", "PVE_median", "pve.x", "pve.y")]
  # diag(target_Zcors) <- target_h2$pve.x[match(colnames(target_Zcors), target_h2$phenotype)]
  # target_h2$phenotype[1] <- "X72_Vern_Growth"
  #diag(imp_multivar_corr_matrix) <- target_h2$PVE_median[match(colnames(imp_multivar_corr_matrix), target_h2$phenotype)]
  # h2[grep("Root_horizontal_index_day001", h2$phenotype),]
  # diag(target_Zcor)[is.na(diag(target_Zcor))] <- 0
  
  # Gmatrix <- target_Zcors
  # Gmatrix <- imp_multivar_corr_matrix
  # saveRDS(Gmatrix, file="./data/Gmatrix_imp_multivar_corr_matrix.rda")
  # saveRDS(Gmatrix, file="./data/Gmatrix_Zcors.rda")
  
 Gmatrix <- readRDS(file="./data/Gmatrix_imp_multivar_corr_matrix.rda")
  
  source("~/safedata/natvar/analyses/phenoselection_multi_FUNCTIONS-copy.R")
  
  lsel<-list()
  set.seed(0)
  i<- "rSurvival_fruit"
  j <- "mli"
  
  for(i in c("rFitness","rSurvival_fruit","rSeeds")){
    for(j in c("mlp","mli","thi","thp")){
      trait<-paste0(i,"_",j)
      w=pheno[,trait][idex515]
      print(w)
      z=df
      d1<-preparedata(w,z)
 
      res<-PHENOSELECTION(Variables=z, Fitness=w, Gmatrix=Gmatrix, replicates = 500)
      #res<-PHENOSELECTION(Variables=z, Fitness=w,  replicates = 100)
      tmp1<-do.call(cbind,parseformatted(res$gradient_linear)) %>% data.frame
      tmp1$param<-"beta"
      tmp2<-do.call(cbind,parseformatted(res$coefficient_linear)) %>% data.frame
      tmp2$param<-"s"
      tmp3<-do.call(cbind,parseformatted(res$response_linear)) %>% data.frame
       tmp3$param<-"z"
      
      sel<-rbind(tmp1,tmp2, tmp3) 
      sel$mean<-fn(sel$mean)
      sel$se<-fn(sel$se)
      sel$lower=sel$mean - sel$se *1.96
      sel$upper=sel$mean + sel$se *1.96
      sel$trait<-colnames(df)
      sel$fitness<-i
      sel$env<-j
      lsel<-rbind(lsel,sel)
    }
  }
  
  dim(lsel)
  #saveRDS(lsel, file="./figs/tmpobjects/deltaZprojections.rda")
  lsel <- readRDS(file="./figs/tmpobjects/deltaZprojections.rda")
  
  
  lsel_beta <- lsel %>% filter(param=="beta")
  lsel_s <- lsel %>% filter(param=="s")
  lsel_z <- lsel %>% filter(param=="z")
  
  lsel2<- merge(lsel_beta, lsel_s, by=c("trait", "fitness", "env"))
  lsel3<-merge( lsel_beta, lsel_z, by=c("trait", "fitness", "env"))
  lsel4 <- merge( lsel_s, lsel_z, by=c("trait", "fitness", "env"))
  
 lsel_z <- lsel %>% 
   filter(param=="z") %>% 
   filter(fitness=="rFitness") #%>% 
   #filter(env=="mlp")
 
  # lsel_z <- lsel %>% 
  #  filter(param=="z") %>% 
  #  filter(fitness=="rSurvival_fruit") #%>% 
  #  #filter(env=="mlp")
 
  toplot<-data.frame(selmean=lsel_z$mean,
                   selse=lsel_z$se,
                   selsigni=lsel_z$signi,
                   trait=lsel_z$trait,
                   env= lsel_z$env) %>%
       mutate(selsign=selmean/abs(selmean)) %>%
       mutate(selmean=selmean) %>%
       mutate(lower=selmean - selse *1.96,
              upper=selmean + selse *1.96)

 deltaZ_summaryPlot <-  ggplot(toplot) +
        geom_point(aes(y=(selmean),x=trait, color=selsigni))+
        geom_hline(yintercept = 0,color='grey')+
        scale_color_manual(values=c("ns"="skyblue","*"="purple","**"= "orange","***"= "red")) +
        # geom_point(aes(y=(selmean),x=1:nrow(toplot)))+
        geom_segment(aes(y=(lower),yend=(upper),x=trait,xend=trait))+
        # scale_color_manual(breaks=c(-1,+1),values=c('orange','green'))+
        labs(color="",x='',y="Delta Z")+
        theme(axis.text.x = element_text(angle = 45, size = 8, hjust=1)) +
        facet_wrap(~env, scales = "free_y") 
        # scale_x_continuous(breaks=1:nrow(toplot), labels = )
        
        # scale_x_continuous(breaks=1:nrow(toplot), labels = )
  saveRDS(deltaZ_summaryPlot, file="./figs/deltaZ_summaryPlot_Zcors.rda")
  
  pdf(file="./figs/deltaZ_summaryPlot_mGWASZcors.pdf", width = 10, height = 6)
  deltaZ_summaryPlot
  dev.off()
  

  levels(lsel$signi)
  head(lsel2)
  
  head(lsel3)
  
  all_selection_bytrait2 <- ggplot(lsel3)+
  geom_point(aes(x=mean.x, y=mean.y, shape=fitness, color=env), size=3) +
  #scale_size_manual(values=c(3,0.5, 2,2.5)) +
  scale_shape_manual(values=c(15, 19, 17)) +
  scale_color_manual(values=c('#e41a1c',"#F97559", "#234BC5", "#5B75C6")) +
  geom_segment(aes(y=lower.y,yend=upper.y, x=mean.x, xend=mean.x, color=env), size=.25) +
  geom_segment(aes(x=lower.x,xend=upper.x, y=mean.y,yend=mean.y, color=env), size=.25) +
  geom_hline(yintercept=0, linetype="dashed")+
  geom_vline(xintercept=0, linetype="dashed")+
  ylab("delta_z") + xlab("b (direct selection)") +
  facet_wrap(~trait, scales = 'fixed')+
  theme(axis.text.x=element_text(angle=90)) 
  all_selection_bytrait2
  pdf(file="./figs/DeltaZbyBetaSelectionPlot.pdf")
  all_selection_bytrait2
  dev.off()

  all_selection_bytrait3 <- ggplot(lsel4)+
  geom_point(aes(x=mean.x, y=mean.y, shape=fitness, color=env), size=3) +
  #scale_size_manual(values=c(3,0.5, 2,2.5)) +
  scale_shape_manual(values=c(15, 19, 17)) +
  scale_color_manual(values=c('#e41a1c',"#F97559", "#234BC5", "#5B75C6")) +
  #geom_segment(aes(y=lower.y,yend=upper.y, x=mean.x, xend=mean.x, color=env), size=.25) +
  #geom_segment(aes(x=lower.x,xend=upper.x, y=mean.y,yend=mean.y), size=.25) +
  geom_hline(yintercept=0, linetype="dashed")+
  geom_vline(xintercept=0, linetype="dashed")+
  ylab("delta_z") + xlab("s (total selection)") +
  facet_wrap(~trait, scales = 'fixed')+
  theme(axis.text.x=element_text(angle=90)) 
  all_selection_bytrait3
    
  
}else{
  deltaZ_summaryPlot1 <- readRDS(file="./figs/deltaZ_summaryPlot_Zcors.rda")
  deltaZ_summaryPlot1

}
```

```{r, echo=F, eval=T, warning=F, message=F, fig.cap="Figure SII.8 Estimated phenotypic response to multivariate selection, using genetic correlations estimated using mGWA summary statistics", fig.width=8, fig.height=6}
 deltaZ_summaryPlot2 <- readRDS(file="./figs/deltaZ_summaryPlot_imputedZcors.rda")
#png(file="./figs/deltaZ_summaryPlot_imputedZcors.png", units="in",res=72, width = 12, height = 6, bg="transparent")
deltaZ_summaryPlot2
#dev.off()
```


```{r, echo=F, eval=F, warning=F, message=F}
RERUN=F 
if(RERUN){

  #st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  target_GWA_stats_only <-read.table("./tables/TaretPheno_GWAstats.tsv", sep="\t", header = T)
  
  lsel <- readRDS(file="./figs/tmpobjects/deltaZprojections.rda")
  
  lsel_beta <- lsel %>% filter(param=="beta")
  lsel_s <- lsel %>% filter(param=="s")
  lsel_z <- lsel %>% filter(param=="z")
  
  lsel2<- merge(lsel_beta, lsel_s, by=c("trait", "fitness", "env"))
  # lsel3<-merge( lsel_beta, lsel_z, by=c("trait", "fitness", "env"))
  # lsel4 <- merge( lsel_s, lsel_z, by=c("trait", "fitness", "env"))
  
  lsel5 <- merge(lsel2, lsel_z, by=c("trait", "fitness", "env"))
  head(lsel5)
  
 lsel_limit <- lsel5 %>% 
   #filter(param=="z") %>% 
   filter(fitness=="rFitness") %>% 
   filter(env=="mlp")
  head(lsel_limit)
 
  head(target_GWA_stats_only)
  ## hard code in name of traits to merge dataframes
  target_GWA_stats_only$trait <- c("X72_Vern_Growth", "ABA_96h_low_water_potential", "d8_10C_perc", "Delta_13C", 
                                   "DSDS10", "FT16", "Growth_rate", "Relative_root_growth_rate_day002.day003",
                                   "RGR", "Root_horizontal_index_day001", "stomata_density",
                                   "Stomatal_index_in_first_leaf")
  lsel_limit_h2 <- merge(lsel_limit, target_GWA_stats_only, by="trait")
 
  summary(lsel_limit_h2$pve_norm)
  summary(lsel_limit_h2$pve_raw)
  
  ## get error bars for delta z
  toplot<-data.frame(selmean=lsel_limit_h2$mean,
                   selse=lsel_limit_h2$se,
                   selsigni=lsel_limit_h2$signi,
                   trait=lsel_limit_h2$trait,
                   env= lsel_limit_h2$env) %>%
       mutate(selsign=selmean/abs(selmean)) %>%
       mutate(selmean=selmean) %>%
       mutate(lower=selmean - selse *1.96,
              upper=selmean + selse *1.96)
  
   ## get error bars for beta
  toplot_beta<-data.frame(selmean=lsel_limit_h2$mean.x,
                   selse=lsel_limit_h2$se.x,
                   selsigni=lsel_limit_h2$signi.x,
                   trait=lsel_limit_h2$trait,
                   env= lsel_limit_h2$env) %>%
       mutate(selsign=selmean/abs(selmean)) %>%
       mutate(selmean=selmean) %>%
       mutate(lower=selmean - selse *1.96,
              upper=selmean + selse *1.96)
  
  
  deltaZ_byBeta_wH2 <- ggplot(lsel_limit_h2) + geom_point(aes(x=mean.x, y=mean, size=pve_raw)) +  
    geom_hline(yintercept=0, linetype="dashed")+
    geom_vline(xintercept=0, linetype="dashed")+
    ylab("delta_z") + xlab("b (direct selection)") +
    #scale_size(range = c(0.2, 5)) +
    scale_size_binned(range=c(.5, 7), n.breaks = 20) +
    geom_segment(aes(y=toplot$lower,yend=toplot$upper, x=mean.x, xend=mean.x), size=.25) +
    geom_segment(aes(x=toplot_beta$lower,xend=toplot_beta$upper, y=mean,yend=mean), size=.25) +
     geom_label_repel(aes(x=mean.x, y=mean, label = phenotype),
                  box.padding   = 0.35, 
                  point.padding = 1,
                  segment.color = 'grey50', max.overlaps = 20) 
  deltaZ_byBeta_wH2
  
  saveRDS(deltaZ_byBeta_wH2, file="./figs/tmpobjects/deltaZ_byBeta_wH2.rda")
  
  pdf(file="./figs/deltaZ_byBeta_wH2.pdf", height = 7, width = 7)
  deltaZ_byBeta_wH2
  dev.off()


}else{
  
  deltaZ_byBeta_wH2 <- readRDS(file="./figs/tmpobjects/deltaZ_byBeta_wH2.rda")
  deltaZ_byBeta_wH2
}

```


################################################################################
# III. Genetic architecture of phenotypic landscape
################################################################################

################################################################################
## III.1 Curation of Phenotypes prior to GWAS
################################################################################

### Normilization of the Phenotypes
Before we performed genome-wide association studies (GWAS), we quantile transformed all phenotypes that were continuous values (Beasley et al. 2009) as using quantile transformed phenotypes in GWAS has shown to reduce type 1 error rates compared to non-normal, untransformed phenotypes (McCaw et al. 2019).  For count data and data with many zeros, we first performed a log(x+1) transformation, followed by a quantile transformation. Binary phenotypes were not transformed. In addition to normalizing the raw phenotype data, we normalized the imputed phenotype data.

```{r, echo=F, eval=F}
## Phenotype correlations and transformations script in ~/natvar/analyses/PhenotypeCorrelations.R
RERUN=F 
if(RERUN){
  library(bestNormalize)
  
  ## functions needed
  is.binary <- function(v) {
    x <- unique(v)
    length(x) - sum(is.na(x)) == 2L}
  quantileT <- function(x){
    if (is.binary(x)) {
      return(x)
    } else if (sum(x==0, na.rm=T) > 4) {    ## if more than two are 0s
      log.data <- log10(x + 1)
      tq.data <- orderNorm(log.data, warn = FALSE)
      x <- signif(tq.data$x.t, 6)
      return(x)
    }else {
      data <- orderNorm(x, warn = FALSE)
      x <- signif(data$x.t, 6)
      return(x)}}
  
  ## raw pheno
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)
  id<- pheno[,1]
  pheno<- pheno[,-1]
  ## imputed 
  pheno_imp <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID', sep=",", header = T)
  pheno_imp <- pheno_imp[,-c(1,2)]

  pheno_raw_quant <- apply(X = pheno, MARGIN = 2, FUN = quantileT)
  pheno_raw_quant <- cbind(id, pheno_raw_quant)
  pheno_imp_quant <- apply(X = pheno_imp, MARGIN = 2, FUN = quantileT)
  pheno_imp_quant <- cbind(id, pheno_imp_quant)

  write.table(pheno_raw_quant,
              file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t",
              row.names = F, quote = F, col.names = T)
  write.table(pheno_imp_quant,
              file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t",
              row.names = F, quote = F, col.names = T)  
}
```

### Correlation of All Phenotypes
We assessed whether the correlation structure of the phenotypes remained consistent with the imputation and quantile normalization. First, we broadly assessed the correlation structure across all of the phenotype datasets, that is the raw data, the normalized data, the imputed data, and the imputed and normalized data (__Figure SIII.1__). Additionally, we looked at the correlation structure for a subset of target traits associated with the escape and avoidance drought response strategy.

```{r, echo=F, eval=F, warning=F, fig.cap="Figure SIII.1 Pearson's correlation coefficients for all phenotypes across various datasets A) raw B) imputed C) raw normalized D) imputed normalized.", fig.width=8, fig.height=8, fig.retina = 1}
## don't run this chunk because building the correlation plot is memory intense
RERUN=F 
if(RERUN){
  library(reshape2)
  ## load phenotype data
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)[,-1]
  pheno_imp <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=",", header = T)[,-c(1,2)]
  pheno_raw_quant <- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  pheno_imp_quant <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  
  ## do correlations
  dats <- list(pheno, pheno_imp, pheno_raw_quant, pheno_imp_quant)
  for (i in 1:length(dats)){
    dats[[i]] <- cor(dats[[i]], use = "pairwise.complete.obs")}
  
  dats <- lapply(dats, cor(use = "pairwise.complete.obs"))
  dats <- lapply(dats, melt)
  dats <- lapply(dats, na.omit)
  
raw <- ggplot(data = dats[[1]], aes(Var1, Var2, fill = value))+
  geom_tile()+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1,1), space = "Lab",
                       name="") + ylab("phenotype") + xlab("phenotype")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+ coord_fixed()

imp <- ggplot(data = dats[[2]], aes(Var1, Var2, fill = value))+
  geom_tile()+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1,1), space = "Lab",
                       name="") + ylab("phenotype") + xlab("phenotype")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+ coord_fixed()

raw_quant <- ggplot(data = dats[[3]], aes(Var1, Var2, fill = value))+
  geom_tile()+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1,1), space = "Lab",
                       name="") + ylab("phenotype") + xlab("phenotype")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+ coord_fixed()

imp_quant <- ggplot(data = dats[[4]], aes(Var1, Var2, fill = value))+
  geom_tile()+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1,1), space = "Lab",
                       name="", ) + ylab("phenotype") + xlab("phenotype")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        legend.position = "none")+ coord_fixed() 
  
comparisonPhenoCorPlots <- plot_grid(raw, imp, raw_quant, imp_quant, labels = c("A", "B", "C", "D"))
plot_grid(raw, imp, labels=c("raw", "imputed"))
# dats<-c()
# imp<-c()
# imp_quant<-c()
# pheno<-c()
# pheno_imp<- c()

tiff("./figs/PhenoCor.tiff", res = 270,units = "in", height = 6, width = 6)
comparisonPhenoCorPlots
dev.off()

#save(file="./figs/tmpobjects/comparisonPhenoCorPlots.rda", comparisonPhenoCorPlots)

}else{

  library(tiff)
  library(grid)
  getwd()
  p <- grid.raster(readTIFF("./figs/PhenoCor.tiff"))
  png(filename = "./figs/PhenoCor.png",
    width = 480, height = 480, units = "px", pointsize = 12)
  grid.raster(readTIFF("./figs/PhenoCor.tiff"))
  dev.off()
  
  
}
##![Phenotypic correlations for A) raw B) imputed C) raw quantile transformation D) imputed quantile transformation](../figs/PhenoCorComparisonPlot.pdf){width=50% height=50%}

```
![Figure SIII.1 Pearson's correlation coefficients for all phenotypes across various datasets A) raw B) imputed C) raw normalized D) imputed normalized.](../figs/PhenoCor.png)

Generally, we do not think normalizing the raw phenotype data disrupted the correlation structure across the phenotypes (__Figure SIII.1 A,B__). Though the imputation of the data undoubtedly introduced variation in the correlation structure not seen in the raw phenotype correlations (__Figure SIII.1 B__). This new variation was exasperated through normalizing the imputed data (__Figure SIII.1 D__). 


### Target Phenotype correlations 

More specifically, we assesed the differences in phenotypic correlations of raw phenotype data and imputed phenotype data for the 12 target phenotypes used in the multivariate selection analysis. 

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="Figure SIII.2 Pearson's correlation coefficients for 12 target phenotypes used in multivariate selection analysis. Upper right triangle are the correlations of imputed phenotype data and the lower left triangle are the correlations of raw data.", fig.width=5, fig.height=5}
RERUN=F
if(RERUN){
  pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  pheno_raw <- read.table(file = './data/atlas1001_phenotypes_matrix_MR.csv', sep=",", header = T)
  dim(pheno)
  
  set.seed(0)
  colnames(pheno)[grep("Vern",  colnames(pheno))]
  df <- pheno[,colnames(pheno) %in% c("ABA_96h_low_water_potential",
                                            "Growth_rate",
                                            "Delta_13C",
                                            "DSDS10",
                                           "Stomatal_index_in_first_leaf", ## only 55
                                            "stomata_density",
                                            #"stomatasize",
                                            "FT16",
                                            "d8_10C_perc",
                                            "RGR",
                                            #"rhamnose_1_exp2",
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003",
                                            #"First_leaf_area",
                                            #"X72_Vern_Growth"
                                            "X34_LDV")]
  df_raw <- pheno_raw[,colnames(pheno_raw) %in% c("ABA_96h_low_water_potential", 
                                            "Growth_rate", 
                                            "Delta_13C", 
                                            "DSDS10", 
                                           "Stomatal_index_in_first_leaf", ## only 55
                                            "stomata_density",
                                            #"stomatasize",
                                            "FT16",  
                                            "d8_10C_perc", 
                                            "RGR", 
                                            #"rhamnose_1_exp2", 
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003",
                                            #"First_leaf_area", 
                                            #"X72_Vern_Growth"
                                            "X34_LDV")]
  colnames(df) <- c("FT @ 16C", "PrimaryDormancy", "Relative Root GR", "Root Horiz Index","Delta_13C", "StomataDensity", "StomatalIndex 1stlf", "Vernalization", "Perc. Germination","Growth rate","RGR","ABA")
  colnames(df_raw) <- c("FT @ 16C", "PrimaryDormancy", "Relative Root GR", "Root Horiz Index","Delta_13C", "StomataDensity", "StomatalIndex 1stlf", "Vernalization", "Perc. Germination","Growth rate","RGR","ABA")
  
  df <- apply(df, MARGIN = 2, as.numeric)
  df_raw <- apply(df_raw, MARGIN = 2, as.numeric)
  
  colSums(is.na(df_raw))
  ## only 3 data points shared between these phenotypes
  f <- na.omit(df_raw[,c("FT @ 16C", "Growth rate")])
  cor.test(f[,1], f[,2])
  
  #target_Pcor <- cor(df, use = "pairwise.complete.obs")
  target_Pcor <- rcorr(df, type = "pearson")
  #target_Pcor_raw <- cor(df_raw, use = "pairwise.complete.obs")
  target_Pcor_raw <- rcorr(df_raw, type="pearson")
  
  # replace the lower triangle with the raw correlations
  target_Pcor$r[lower.tri(target_Pcor$r)] <- target_Pcor_raw$r[lower.tri(target_Pcor_raw$r)]
  target_Pcor$P[lower.tri(target_Pcor$P)] <- target_Pcor_raw$P[lower.tri(target_Pcor_raw$P)]
  # make the diagonal 0
  diag(target_Pcor$r) <- 0
  diag(target_Pcor$P) <- 0
  
  dim(target_Pcor$P)
  
  saveRDS(target_Pcor, file="./data/Phenotype_Imp&Raw_TargetCorrelations.rda")
  png(file="./figs/Phenotype_Imp&Raw_Corplot.png")
  corrplot(corr = target_Pcor$r, p.mat = target_Pcor$P, method = "color", type = "full", diag = F,tl.cex = .7, col=brewer.pal(9,'RdBu'), addCoef.col = "white", number.cex = 1, tl.srt = 45)
  dev.off()
}else{
  library(RColorBrewer)
  target_Pcor <- readRDS(file="./data/Phenotype_Imp&Raw_TargetCorrelations.rda")
  corrplot(corr = target_Pcor$r, p.mat = target_Pcor$P, method = "color", type = "full", diag = F,tl.cex = .7, col=brewer.pal(9,'RdBu'), addCoef.col = "white", number.cex = 1, tl.srt = 45)
  #corrplot(corr = target_Pcor$r, method = "color", type = "full", diag = F,tl.cex = .5, col=brewer.pal(9,'RdBu'), addCoef.col = "white", number.cex = 1, tl.srt = 45)
  
}

#![Phenotype correlations. Lower triangle are complete pairwise correlations between raw phenotype data. Upper triangle are correlations of imputed data for the 1,135 natural accession of A. thaliana](~/safedata/natvar/figs/Phenotype_Imp&Raw_Corplot.pdf)

```


################################################################################
## III.2 Genome-Wide Associations 
################################################################################

### Genome-Wide Associations (GWA)
We used 1,135 wild strain ecotypes and 11,769,920 SNPs generated from the 1001 Genomes Project (1001 Genomes Consortium 2016) to estimate the genetic basis of all 1850 phenotypes. We conducted GWA using a univariate linear mixed model (lmm) implemented in the software, GEMMA (Zhou and Stephens 2014) for each of the phenotypes. This model is as described, a linear mixed model where deviations in the trait from different genotypes at a Single Nucleotide Polymorphism (SNP) are represented as the number of alternative alleles at a position, which is mean-centered and variance scaled. An effect size deviation in the trait is calculated with respect to the population mean, and where correlated effects of SNPs due to population structure and linkage disequilibrium are corrected using a  background genome strain effect following a multivariate normal distribution centered in zero and a variance indicated by a Kinship matrix, which intends to calculate  in an uncorrelated genotype space. This further enables us to study correlations between traits free of population genetic structure confounders. No minor allele frequency cutoff was used, so all real values of allele frequencies were used as covariates in the model. 


```{r, echo=F, eval=F, warning=F }
RERUN=F 
if(RERUN){
  
  ## These are the four phenotype datasets you could run GWAS with; raw, imputed, normalized, imputed-normalized.
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)[,-1884]
  #pheno_imp <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_onlypheno.csv', sep=",", header = T)[,-c(1,2,-1884)]
  #pheno_raw_quant <- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  #pheno_imp_quant <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  
  dirs <- read.table('./data-raw/atlas_phenotype_names.csv', sep=",", header = T)  ## load list of phenotype directories
  allInfo <- read.csv("./data-raw/allarabidopsisaccessions.csv",fill=T)    ## info about each ecotype
  fam <- read.table("./data-raw/1001gbi.fam")
  allpheno <- merge(x=fam, by.x='V1',all.x=TRUE , y=allInfo, by.y='pk')
    
  #for(i in 1:nrow(dirs)){
  for(i in 1:2){ # this was just for a test
    name <- paste0(getwd(),"/phenotypes/", as.character(dirs[i,1])) ## store name of directory you are working in currently
    folder <- paste0(name,"/data") ## name of the folder with the quantile transformed phenotype data
    phenos <- list.files(folder,pattern=".pheno") ## list of all phenotypes in that folder
    
    
    ###ended here
      ## print this to know where you while running loop
      print(name)
    
      if(length(phenos)==0){  ## record the name of the folder with phenotypes that are not parsed - shouldn't happen at this point
        print(paste("not parsed! ",name))
        notparsed<-append(notparsed,name)
      }else{
    
        print(paste("# phenotypes ",length(phenos)))
        for(j in 1:length(phenos)) {  ## go through all phenotypes
    
          phenoname<-gsub(pattern=".pheno",replacement="",phenos[j])
          if(phenoname=="phenotype_value") phenoname <- name
          tmp<-read.delim(paste0(folder,"/",phenos[j]),header=T)  ## read in phenotype data
          #tmp<-read.csv(paste0(folder,"/",phenos[j]),header=T)  ## read in phenotype data
          newfam<-merge(fam[,1:5],by.x='V1',tmp,by.y='accession_id',all.x=T)  ## create GWA
    
          #solve the problem of multiple IDs overlap as random value
          newfam<-newfam %>% group_by(V1,V2,V3,V4,V5) %>%
            summarise(averagepheno=head(phenotype_value,1))
          colnames(newfam)[6] <- phenoname
          newfam[,phenoname][is.na(newfam[,phenoname])] <- -9
    
          # write fam
          gemmafolder<-paste0(name,"/1001")
          gemmafolderpheno<-paste0(gemmafolder,"/",phenoname)
          system(paste('mkdir',gemmafolder)) ## make gemma folder, but they all already exist
          system(paste('mkdir',gemmafolderpheno))
    
          write.table(quote=F,row.names=F,col.names=F,
                      file=paste0(gemmafolderpheno,'/1001gbi.fam'),
                      x=newfam)
          system(paste('ln -f ../../1001g/1001gbi.bim ', paste0(gemmafolderpheno,'/','1001gbi.bim')))
          system(paste('ln -f ../../1001g/1001gbi.bed ', paste0(gemmafolderpheno,'/','1001gbi.bed')))
          system(paste('ln -f ../../1001g/1001gbi.sXX.txt ', paste0(gemmafolderpheno,'/','1001gbi.sXX.txt')))
          write.table(quote=F,row.names=F,col.names=F,
                      file=paste0(gemmafolderpheno,'/','rungwa.sh'),
                      x=rbind(
                        "#!/bin/bash",
                        "#SBATCH --time=0-15:00",
                        "#SBATCH --cpus-per-task=1",
                        "#SBATCH --mem-per-cpu=6G",
                        "#SBATCH --partition=DPB,SHARED",
                        paste0("#SBATCH --job-name=",name),
                        paste0("#SBATCH --output=",name,".slurm.log"),
                        paste('gemma -bfile 1001gbi -miss 0.99 -maf 0 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -o ',paste0(name,'norm.lmm'))
                      )
          )
          if(!file.exists(paste0(gemmafolderpheno,'/output/',name,'.norm.lmm.assoc.txt'))) system( paste('cd ', gemmafolderpheno, ";",'sbatch rungwa.sh'))
          #system( paste('cd ', gemmafolderpheno, ";",'sbatch rungwa.sh'))
          #print(paste("count is", count, "-finished at i:", i, "and j:", j))
          #count <- count + 1
        } ## end for j loop
      } ## end else statement
    } ## end for i loop
    ## on all the datasets
    
    ## blsmm on the smaller dataset
  }else{
      
    }   
```

### Correlation of effect sizes using data normalization and imputation

Following the GWAS with the various phenotype datasets, we compared the estimates of effect sizes across the various datasets. We calculated pearson's correlation coefficients for the effect size estimates across these runs.

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="Figure SIII.3 Comparison of Pearson's correlation coefficients for effect size estimates from GWA using raw phenotype data, normalized phenotype data, and imputed phenotype data.", fig.width=8, fig.height=3.5}
## code to check correlations of effects sizes across the runs
## compilation code is in CheckGWA.R
RERUN=F 
if(RERUN){
  library(dplyr)
  library(data.table)
  install.packages("ggplot2")
  library(ggplot2)
  
  ## 1. Prepare initial data
  setwd("~/safedata/natvar/phenotypes/")
  dirs <- read.table('../phenotypes/phenotypes_list.txt')  ## load list of phenotype directories
  
  #### To check Raw vs. Normalized, and/or Imputed vs. one of those
  GwaCors_RawvNorm <- matrix(NA,ncol = 4, nrow=2000)
  GwaCors_RawvImput <- matrix(NA,ncol = 4, nrow=2000)
  GwaCors_NormvImput <- matrix(NA,ncol = 4, nrow=2000)
  
  load(file = "../data/atlas1001_phenotype_matrix_imputed_onlypheno.rda")
  #length(colnames(atlas1001_phenotype_matrix_imputed_onlypheno)[-1])
  impute_phenoname_index <- colnames(atlas1001_phenotype_matrix_imputed_onlypheno)[-1]
  #head(impute_phenoname_index)
  
  c <- 1
  #i<-1
  for(i in 1:nrow(dirs)){
    #for(i in 1:2){ # this was just for a test
  
    ##Adjustments for schmity runs
    #name <- "Schmitt_Martinez-Berdeja_PNAS_2020_PID_foreditMR"
    name <- as.character(dirs[i,]) ## store name of directory you are working in currently
  
    folder1 <- paste0(name,"/transformed_data") ## name of the folder with the quantile transformed phenotype data
    phenos1 <- list.files(folder1,pattern=".pheno") ## list of all phenotypes in that folder
  
    folder2 <- paste0(name,"/data") ## name of the folder with the quantile transformed phenotype data
    phenos2 <- list.files(folder2,pattern=".pheno") ## list of all phenotypes in that folder
  
    ## print this to know where you while running loop
    print(name)
    if(length(phenos1)==0){  ## record the name of the folder with phenotypes that are not parsed - shouldn't happen at this point
      print(paste("not parsed! ",name))
      c <- c+1
      print(c)
      #notparsed<-append(notparsed,name)
    }else{
      for(j in 1:length(phenos1)) {  ## go through all phenotypes
  
        #print(j)
        phenoname1<-gsub(pattern=".pheno",replacement="",phenos1[j])
        phenoname2<-gsub(pattern=".pheno",replacement="",phenos2[j])
  
        gemmafolder<-paste0(name,"/1001")
        output_pheno1<-paste0(gemmafolder,"/",phenoname1,"/output/")
        output_pheno2<-paste0(gemmafolder,"/",phenoname2,"/output/")
  
        res1 <- list.files(output_pheno1)[1]
        id <- which(impute_phenoname_index == phenoname2)
        output_imputed <- paste0("../ImputedPhenoGwas/output/pheno", id, ".assoc.txt")
        print(c(phenoname2, impute_phenoname_index[id]))
  
        if (file.exists(paste0(output_pheno1, res1)) & file.exists(file=paste0(output_pheno2, res1)) &
            file.exists(file=output_imputed)){
  
          assoc_norm <- fread(file=paste0(output_pheno1, res1))
          assoc_raw <- fread(file=paste0(output_pheno2, res1))
          assoc_imput <- fread(file=output_imputed)
  
          ## Raw v. Norm
          tmp <- merge(x=assoc_raw, by.x = "rs", y=assoc_norm, by.y="rs", all.x = T)
          tmp <- tmp[!is.na(tmp$af.y),]
          af <- cor(tmp$af.x, tmp$af.y)
          beta <- cor(tmp$beta.x, tmp$beta.y)
          se <- cor(tmp$se.x, tmp$se.y)
          GwaCors_RawvNorm[c,2:4] <- c(af, beta, se)
          GwaCors_RawvNorm[c,1] <- phenoname2
          print(c("RawvNorm:", af, beta, se))
  
          ## Raw v. Imput
          tmp <- merge(x=assoc_raw, by.x = "rs", y=assoc_imput, by.y="rs", all.x = T)
          tmp <- tmp[!is.na(tmp$af.y),]
          af <- cor(tmp$af.x, tmp$af.y)
          beta <- cor(tmp$beta.x, tmp$beta.y)
          se <- cor(tmp$se.x, tmp$se.y)
          GwaCors_RawvImput[c,2:4] <- c(af, beta, se)
          GwaCors_RawvImput[c,1] <- phenoname2
          print(c("RawvImput:", af, beta, se))
  
          ## Norm v. Imput
          tmp <- merge(x=assoc_norm, by.x = "rs", y=assoc_imput, by.y="rs", all.x = T)
          tmp <- tmp[!is.na(tmp$af.y),]
          af <- cor(tmp$af.x, tmp$af.y)
          beta <- cor(tmp$beta.x, tmp$beta.y)
          se <- cor(tmp$se.x, tmp$se.y)
          GwaCors_NormvImput[c,2:4] <- c(af, beta, se)
          GwaCors_NormvImput[c,1] <- phenoname2
          print(c("NormvImput:", af, beta, se))
  
  
        }
        c <- c+1
        print(c)
        # samp <- sample(seq(1,nrow(tmp),1), size = 10000, replace = F)
        # par(mfrow=c(1,3))
        # plot(tmp$af.x[samp], tmp$af.y[samp], main="allele freq correlation")
        # plot(tmp$beta.x[samp], tmp$beta.y[samp], main="beta correlation")
        # plot(tmp$se.x[samp], tmp$se.y[samp], main="se correlation")
  
      } ## end for j loop
    } ##else
  } ## end for i loop
  
  
  # saveRDS(GwaCors_RawvNorm, file = "../MeganAnalysis/GwaCor_RawvNorm.rda")
  # saveRDS(GwaCors_RawvImput, file = "../MeganAnalysis/GwaCor_RawvImput.rda")
  # saveRDS(GwaCors_NormvImput, file = "../MeganAnalysis/GwaCor_NormvImput.rda")
  
    ### Comparing raw vs. normalized gwa output
  GwaCors_RawvNorm <- readRDS(file = "./MeganAnalysis/GwaCor_RawvNorm.rda")
  head(GwaCors_RawvNorm)
  colnames(GwaCors_RawvNorm) <- c("pheno", "af", "beta", "se")
  GwaCors_RawvNorm<- as.data.frame(GwaCors_RawvNorm[!is.na(GwaCors_RawvNorm[,2]),])
  GwaCors_RawvNorm[,3] <- as.numeric(paste(GwaCors_RawvNorm[,3]))
  raw_v_norm_PLT <- ggplot(GwaCors_RawvNorm, aes(x=beta)) + xlab("beta correlation") +
    geom_histogram(aes(y=..density..), bins =50 , colour="black", fill="white")+ ggtitle("Raw vs. Normalized") +
    geom_density(alpha=.2, fill="#FF6666")
    
  ## some code for plotting Raw v Imputed
  GwaCors_RawvImput <- readRDS( file = "./MeganAnalysis/GwaCor_RawvImput.rda")
  head(GwaCors_RawvImput)
  colnames(GwaCors_RawvImput) <- c("pheno", "af", "beta", "se")
  GwaCors_RawvImput<- as.data.frame(GwaCors_RawvImput[!is.na(GwaCors_RawvImput[,3]),])
  GwaCors_RawvImput[,3] <- as.numeric(paste(GwaCors_RawvImput[,3]))
  ## good plot
  raw_v_imputed_PLT <- ggplot(GwaCors_RawvImput, aes(x=beta)) + xlab("beta correlation") +
    geom_histogram(aes(y=..density..), bins =50 , colour="black", fill="white") + ggtitle("Raw vs. Imputed") +
    geom_density(alpha=.2, fill="#FF6666")

  ## some code for plotting Norm v Imputed
  GwaCors_NormvImput <- readRDS(file = "./MeganAnalysis/GwaCor_NormvImput.rda")
  colnames(GwaCors_NormvImput) <- c("pheno", "af", "beta", "se")
  GwaCors_NormvImput<- as.data.frame(GwaCors_NormvImput[!is.na(GwaCors_RawvImput[,3]),])
  GwaCors_NormvImput[,3] <- as.numeric(paste(GwaCors_NormvImput[,3]))
  ## good plot
  norm_v_imputed_PLT <- ggplot(GwaCors_NormvImput, aes(x=beta)) + xlab("beta correlation") +
    geom_histogram(aes(y=..density..), bins =50 , colour="black", fill="white") + ggtitle("Normalized vs. Imputed") +
    geom_density(alpha=.2, fill="#FF6666")
  
  saveRDS(raw_v_norm_PLT, file="./figs/tmpobjects/raw_v_norm_PLT.rda")
  saveRDS(raw_v_imputed_PLT, file="./figs/tmpobjects/raw_v_imputed_PLT.rda")
  saveRDS(norm_v_imputed_PLT, file="./figs/tmpobjects/norm_v_imputed_PLT.rda")
  
  
}else{
  
  raw_v_norm_PLT <- readRDS( file="./figs/tmpobjects/raw_v_norm_PLT.rda")
  raw_v_imputed_PLT <-  readRDS( file="./figs/tmpobjects/raw_v_imputed_PLT.rda")
  norm_v_imputed_PLT <- readRDS(file="./figs/tmpobjects/norm_v_imputed_PLT.rda")
  plot_grid(raw_v_norm_PLT, norm_v_imputed_PLT, raw_v_imputed_PLT, rows = 1)

  ## check which phenotypes have vry low correlations and why? - answer: they have a lot of missing data typcially
  ## check script "CheckGWA.R for more info
  ## in raw v. norm, most low corrs have almost all missing data >1000 individuals
}
  
```

```{r, echo=F, eval=F}
##We also assessed the accuracy of the calls given that we did not use any allele frequency cutoff. We can post-filter the data and remove SNPs will very low call rates
## code assessing the missing data

```

Additionally, using the original imputed matrix from http://arapheno.1001genomes.org that contains 10,709,466 SNPs, we subsetted 1,353,386 SNPs and all 1,135 Arabidopsis thaliana wild strains to be used in a Bayesian sparse linear mixed model (BSLMM) GWA, also implemented in the software GEMMA (Zhou and Stephens 2014), for each of the 1846 phenotypes. The BSLMM full model is quite similar to the LMM. Unlike the LMM that assumes the effects sizes of SNPs are normally distributed, the BSLMM implements a mixed normal model. Here the effect sizes, are either attributed to direct effects, that is those that directly affect the phenotype, and indirect effects, that is background effects that all SNPs have.  Similar to the LMM, correlated effects of SNPs due to population structure and linkage disequilibrium are corrected using a  background genome strain effect that follows a multivariate normal distribution centered in zero and a variance indicated by a Kinship matrix.

We assessed the convergence of the BSLMM GWA analysis below.

```{r, echo=F, eval=F, message=F, warning=F}
## code to check correlations of effects sizes across the runs
## compilation code is in CheckGWA.R
RERUN=F 
if(RERUN){
  ## 0.
library(dplyr)
library(data.table)
hqsn <- read.table(file="~/safedata/natvar/hqsnps515nature.txt")

## 1. Prepare initial data
setwd("~/safedata/natvar/phenotypes")
dirs <- read.table('../phenotypes/phenotypes_list.txt')

#mega_hypparams_table <- c()
mega_hypparams_table <- readRDS(file="../tables/mega_hypparams_table.rda")

i=7
for(i in 52:nrow(dirs)){
  name <- as.character(dirs[i,]) ## store name of directory you are working in currently
  folder <- paste0(name,"/transformed_data") ## name of the folder with the quantile transformed phenotype data
  phenos <- list.files(folder,pattern=".pheno") ## list of all phenotypes in that folder

  ## print this to know where you while running loop
  print(name)

  if(length(phenos)==0){  ## record the name of the folder with phenotypes that are not parsed - shouldn't happen at this point
    print(paste("not parsed! ",name))
    notparsed<-append(notparsed,name)
  }else{
    print(paste("# phenotypes ",length(phenos)))
    for(j in 1:length(phenos)) {  ## go through all phenotypes

      phenoname<-gsub(pattern=".pheno",replacement="",phenos[j])
      phenoname <- gsub(pattern = "norm", replacement = "bslmm", x = phenoname)
      gemmafolder<-paste0(name,"/1001")
      gemmafolder_output <- paste0(gemmafolder,"/",phenoname, "/output")

      #hypParamFile <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".lmm.hyp.txt")
      hypParamFile <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".assoc.txt.hyp.txt")
      hyp.params<-read.table(hypParamFile,header=T)

      # Get mean, median, and 95% ETPI of hyperparameters
      # ==============================================================================
      # pve -> proportion of phenotypic variance explained by the genotypes
      hyp.params <- hyp.params[20001:100000,]
      pve<-c("PVE", mean(hyp.params$pve, na.rm=T),quantile(hyp.params$pve, probs=c(0.5,0.025,0.975), na.rm=T))

      # pge -> proportion of genetic variance explained by major effect loci
      pge<-c("PGE",mean(hyp.params$pge, na.rm=T),quantile(hyp.params$pge, probs=c(0.5,0.025,0.975), na.rm=T))

      # pi -> proportion of variants with non-zero effects
      pi<-c("pi",mean(hyp.params$pi, na.rm=T),quantile(hyp.params$pi, probs=c(0.5,0.025,0.975), na.rm=T))

      # n.gamma -> number of variants with major effect
      n.gamma<-c("n.gamma",mean(hyp.params$n_gamma, na.rm=T),quantile(hyp.params$n_gamma, probs=c(0.5,0.025,0.975), na.rm=T))
      # ==============================================================================

      # get table of hyperparameters
      # ==============================================================================
      hyp.params.table<-as.data.frame(rbind(pve,pge,pi,n.gamma),row.names=F)
      colnames(hyp.params.table)<-c("hyperparam", "mean","median","2.5%", "97.5%")
      # show table
      hyp.params.table

      parm <- rep(hyp.params.table[,1], each=4)
      #paste0(parm, colnames(hyp.params.table[2:5]))
      dat <- cbind(hyp.params.table[1,2:5], hyp.params.table[2,2:5], hyp.params.table[3,2:5], hyp.params.table[4,2:5])
      colnames(dat) <- paste(parm, colnames(hyp.params.table[2:5]), sep="_")
      dat

      mega_hypparams_table <- rbind(mega_hypparams_table, dat)

      # write table to file
      TableFile <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".hyperparameteres.tsv")
      write.table(hyp.params.table, file=TableFile, sep="\t", quote=F, col.names = T)
      # ==============================================================================

      # plot traces and distributions of hyperparameters
      # ==============================================================================
      OutputPDF <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".hyperparameteres.pdf")
      pdf(file=OutputPDF, width=8.3,height=11.7)

      #layout(matrix(c(1,1,2,3,4,4,5,6), 4, 2, byrow = TRUE))
      #layout(matrix(c(1,1,2,3,4,4,5,6), byrow = TRUE))

      par(mfrow = c(3, 3)) # Create a 2 x 2 plotting matrix
      # PVE
      # ------------------------------------------------------------------------------
      plot(hyp.params$pve, type="l", ylab="PVE", main="PVE - trace")
      hist(hyp.params$pve, main="PVE - posterior distribution", xlab="PVE")
      #plot(density(hyp.params$pve), main="PVE - posterior distribution", xlab="PVE")
      # ------------------------------------------------------------------------------

      # PGE
      # ------------------------------------------------------------------------------
      plot(hyp.params$pge, type="l", ylab="PGE", main="PGE - trace")
      hist(hyp.params$pge, main="PGE - posterior distribution", xlab="PGE")
      #plot(density(hyp.params$pge), main="PGE - posterior distribution", xlab="PGE")
      # ------------------------------------------------------------------------------

      # pi
      # ------------------------------------------------------------------------------
      plot(hyp.params$pi, type="l", ylab="pi", main="pi")
      hist(hyp.params$pi, main="pi", xlab="pi")
      #plot(density(hyp.params$pi), main="pi", xlab="pi")
      # ------------------------------------------------------------------------------

      # No gamma
      # ------------------------------------------------------------------------------
      plot(hyp.params$n_gamma, type="l", ylab="n_gamma", main="n_gamma - trace")
      hist(hyp.params$n_gamma, main="n_gamma - posterior distribution", xlab="n_gamma")
      #plot(density(hyp.params$n_gamma), main="n_gamma - posterior distribution", xlab="n_gamma")
      ## ------------------------------------------------------------------------------

      #plot_grid(plot(hyp.params$pve, type="l", ylab="PVE", main="PVE - trace"),
               #hist(hyp.params$pve, main="PVE - posterior distribution", xlab="PVE"))


      dev.off()

      # Load parameters output
      # ==============================================================================
      #ParamFile <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".lmm.param.txt")
      # ParamFile <- paste0(gemmafolder_output, "/",name, "_", phenoname, ".assoc.txt.param.txt")
      # params<-fread(ParamFile,header=T,sep="\t", data.table=F)
      #
      # # Get variants with sparse effect size on phenotypes
      # # ==============================================================================
      # # add sparse effect size (= beta * gamma) to data frame
      # params["eff"]<-abs(params$beta*params$gamma)
      #
      # # get variants with effect size > 0
      # params.effects<-params[params$eff>0,]
      #
      # # show number of variants with measurable effect
      # nrow(params.effects)
      #
      # # sort by descending effect size
      # params.effects.sort<-params.effects[order(-params.effects$eff),]
      #
      # # show top 10 variants with highest effect
      # head(params.effects.sort, 10)
      #
      # # variants with the highest sparse effects
      # # ------------------------------------------------------------------------------
      # # top 1% variants (above 99% quantile)
      # top1<-params.effects.sort[params.effects.sort$eff>quantile(params.effects.sort$eff,0.99),]
      # # top 0.1% variants (above 99.9% quantile)
      # top01<-params.effects.sort[params.effects.sort$eff>quantile(params.effects.sort$eff,0.999),]
      # # top 0.01% variants (above 99.99% quantile)
      # top001<-params.effects.sort[params.effects.sort$eff>quantile(params.effects.sort$eff,0.9999),]
      # # ------------------------------------------------------------------------------
      # #
      # # # write tables
      #  pref <- paste0(gemmafolder_output, "/",name, "_", phenoname)
      # write.table(top1, file=paste0(pref, ".top1eff.dsv"), quote=F, row.names=F, sep="\t")
      # write.table(top01, file=paste0(pref, ".top0.1eff.dsv"), quote=F, row.names=F, sep="\t")
      # write.table(top001, file=paste0(pref, ".top0.01eff.dsv"), quote=F, row.names=F, sep="\t")
      # # ==============================================================================

      # Get variants with high Posterior Inclusion Probability (PIP) == gamma
      # ==============================================================================
      # PIP is the frequency a variant is estimated to have a sparse effect in the MCMC

      # sort variants by descending PIP
      # params.pipsort<-params[order(-params$gamma),]
      #
      # # Show top 10 variants with highest PIP
      # head(params.pipsort,10)
      #
      # # sets of variants above a certain threshold
      # # variants with effect in 1% MCMC samples or more
      # pip01<-params.pipsort[params.pipsort$gamma>=0.01,]
      # # variants with effect in 10% MCMC samples or more
      # pip10<-params.pipsort[params.pipsort$gamma>=0.10,]
      # # variants with effect in 25% MCMC samples or more
      # pip25<-params.pipsort[params.pipsort$gamma>=0.25,]
      # # variants with effect in 50% MCMC samples or more
      # pip50<-params.pipsort[params.pipsort$gamma>=0.50,]
      #
      #
      # # plot variants PIPs across linkage groups/chromosomes
      # # ==============================================================================
      # # Prepare data
      # # ------------------------------------------------------------------------------
      # # add linkage group column (chr)
      # chr<-gsub("lg|_.+","",params$rs)
      # params["chr"]<-chr
      #
      # # sort by linkage group and position
      # params.sort<-params[order(as.numeric(params$chr), params$rs),]
      #
      # # get list of linkage groups/chromosomes
      # chrs<-sort(as.numeric(unique(chr)))
      # # ------------------------------------------------------------------------------
      #
      # # Plot to a png file because the number of dots is very high
      # # drawing this kind of plot over the network is very slow
      # # also opening vectorial files with many objects is slow
      # # ------------------------------------------------------------------------------
      # # ------------------------------------------------------------------------------
      # png(file=paste0(pref, ".pip_plot.png"), width=11.7,height=8.3,units="in",res=200)
      #
      # # set up empty plot
      # plot(-1,-1,xlim=c(0,nrow(params.sort)),ylim=c(0,.1),ylab="PIP",xlab="linkage group", xaxt="n")
      #
      #
      # # plot grey bands for chromosome/linkage groups
      # # ------------------------------------------------------------------------------
      # start<-1
      # lab.pos<-vector()
      # for (ch in chrs){
      #   size<-nrow(params.sort[params.sort$chr==ch,])
      #   cat ("CH: ", ch, "\n")
      #   colour<-"light grey"
      #   if (ch%%2 > 0){
      #     polygon(c(start,start,start+size,start+size,start), c(0,1,1,0,0), col=colour, border=colour)
      #   }
      #   cat("CHR: ", ch, " variants: ", size, "(total: ", (start+size), ")\n")
      #   txtpos<-start+size/2
      #   lab.pos<-c(lab.pos, txtpos)
      #
      #   start<-start+size
      # }
      #
      # # Add variants outside linkage groups
      # #chrs<-c(chrs,"NA")
      # size<-nrow(params.sort[params.sort$chr=="NA",])
      # lab.pos<-c(lab.pos, start+size/2)[1:5]
      # # ------------------------------------------------------------------------------
      #
      # # Add x axis labels
      # axis(side=1,at=lab.pos,labels=chrs,tick=F)
      #
      # # plot PIP for all variants
      # # ------------------------------------------------------------------------------
      # # rank of variants across linkage groups
      # x<-seq(1,length(params.sort$gamma),1)
      # # PIP
      # y<-params.sort$gamma
      # # sparse effect size, used for dot size
      # z<-params.sort$eff
      # # log-transform to enhance visibility
      # z[z==0]<-0.00000000001
      # z<-1/abs(log(z))
      # # plot
      # symbols(x,y,circles=z, bg="black",inches=1/5, fg=NULL,add=T)
      # # ------------------------------------------------------------------------------
      #
      # # highligh high PIP variants (PIP>=0.25)
      # # ------------------------------------------------------------------------------
      # # plot threshold line
      # abline(h=0.25,lty=3,col="dark grey")
      # # rank of high PIP variants across linkage groups
      # x<-match(params.sort$gamma[params.sort$gamma>=0.25],params.sort$gamma)
      # # PIP
      # y<-params.sort$gamma[params.sort$gamma>=0.25]
      # # sparse effect size, used for dot size
      # z<-params.sort$eff[params.sort$gamma>=0.25]
      # z<-1/abs(log(z))
      #
      # symbols(x,y,circles=z, bg="red",inches=1/5,fg=NULL,add=T)
      # # ------------------------------------------------------------------------------
      #
      # # add label high PIP variants
      # text(x,y,labels=params.sort$rs[params.sort$gamma>=0.25], adj=c(0,0), cex=0.8)
      # # ------------------------------------------------------------------------------
      # # ------------------------------------------------------------------------------
      #
      # # close device
      # dev.off()
      # ==============================================================================


    }
  }
}

saveRDS(mega_hypparams_table, file="../tables/mega_hypparams_table_2.rda")
}

```

![Example of BSLMM convergence check using Flowering Time of plants grown at 16C.](../phenotypes/1001_Consortium_Cell_2016_PID_27293186/1001/bslmm_FT16/output/1001_Consortium_Cell_2016_PID_27293186_bslmm_FT16.hyperparameteres.pdf)


################################################################################
## III.3 Heritability and Genetic Architecture
################################################################################

### Heritability from GEMMA
Measures of heritability for each trait are estimated using the GWAS. Essentially the relatedness, or kinship, matrix is used to explain the variation in the phenotype. By doing this, you can use the relatedness of individuals to measure the amount of additive genetic variation contributing to the phenotypic variation. We summarize estimates of heritability below for the aforementioned categorization of traits. 

```{r, echo=FALSE, eval=T, warning=FALSE, fig.cap="Fig. SIII.4 Estimates of SNP-based heritability using raw phenotypes (top row) and normalized phenotypes (bottom row) categorized as drought strategies (left) and general functional categories (right).", fig.height=8, fig.width=8}
## chunk working, tested 9/20
RERUN=F
if(RERUN){
  library(stringr)
  setwd("~/safedata/natvar/")
  dirs <-read.table("./data/atlas_phenotype_names_numaccessions_willannotatemanual_strategies_MR.tsv",
                  header = T,
                  fill = T)
  mynames<- as.character(dirs$phenotype)
  
  ###====== for bslmm files=====#####
  #myfiles<-c(paste0('./phenotypes/',dirs[,1], "/1001/bslmm_", dirs[,2], "/output/", dirs[,1],"_bslmm_", dirs[,2],".assoc.txt.hyp.txt"))
  # hypstats <- matrix(NA, nrow=length(myfiles), ncol = 6)
  # colnames(hypstats) <- c("h", "pve" , "rho", "pge", "pi", "gamma")
  # list_hyp_tables <- list()
  #mega_hypparams_table <- c()
  
  ###====== for lmm files, normalized and raw =====#####
  #myfiles<-c(paste0('phenotypes/',dirs[,1], "/1001/norm_", dirs[,2], "/output/", dirs[,1],"norm.lmm.log.txt"))
   myfiles<-c(paste0('phenotypes/',dirs[,1], "/1001/", dirs[,2], "/output/", dirs[,1],".lmm.log.txt"))
  
  ##only needed with raw data
  ## quick hack bc intern overwrote output file and then left and I don't have permission to delete it.
  tmp_file <- myfiles[grep("Delta", myfiles)]
  #gsub("output", "output_old", tmp_file)
  myfiles[grep("Delta", myfiles)] <- gsub("output", "output_old", tmp_file)
  myfiles[grep("Delta", myfiles)]
  
  stats_df <- data.frame()


  fi <- 1
  for(fi in 1:length(myfiles)){
    print(fi)
    file=myfiles[fi]
    myname<-mynames[fi]
    
    print(file)
    print(myname)
    
    if(file.exists(file)) {
     
      ###======Run this chunk for raw and normalized lmm gwas===###
      all_lines <- readLines(file)
      head(all_lines)

      pve_split <- str_split(all_lines[25],pattern = " ")[[1]]
      pve <- as.numeric(pve_split[length(pve_split)])

      se_pve_split <- str_split(all_lines[26],pattern = " ")[[1]]
      se_pve <- as.numeric(se_pve_split[length(se_pve_split)])

      tmp_df <- data.frame(name = myname, pve, se_pve)
      stats_df <- rbind(stats_df, tmp_df)
      ###===============================================###
      
      ###======Run this chunk for bslmm gwas===###
      # hyp.params <- read.table(file, header =T)
      # # head(hyp.params)
      # 
      # h<-c("h",mean(hyp.params$h),quantile(hyp.params$h, probs=c(0.5,0.025,0.975)))
      # pve<-c("PVE", mean(hyp.params$pve),quantile(hyp.params$pve, probs=c(0.5,0.025,0.975), na.rm=T))
      # # rho-> approximation to proportion of genetic variance explained by variants with major effect (PGE)
      # # rho=0 -> pure LMM, highly polygenic basis
      # # rho=1 -> pure BVSR, few major effect loci
      # rho<-c("rho",mean(hyp.params$rho),quantile(hyp.params$rho, probs=c(0.5,0.025,0.975)))
      # pge<-c("PGE",mean(hyp.params$pge),quantile(hyp.params$pge, probs=c(0.5,0.025,0.975)))
      # # pi -> proportion of variants with non-zero effects
      # pi<-c("pi",mean(hyp.params$pi),quantile(hyp.params$pi, probs=c(0.5,0.025,0.975)))
      # # n.gamma -> number of variants with major effect
      # n.gamma<-c("n.gamma",mean(hyp.params$n_gamma),quantile(hyp.params$n_gamma,probs=c(0.5,0.025,0.975)))
      # hyp.params.table<-as.data.frame(rbind(h,pve,rho,pge,pi,n.gamma),row.names=F)
      # colnames(hyp.params.table)<-c("hyperparam", "mean","median","2.5%", "97.5%")
      # 
      # parm <- rep(hyp.params.table[,1], each=4)
      # #paste0(parm, colnames(hyp.params.table[2:5]))
      # dat <- cbind(hyp.params.table[1,2:5], hyp.params.table[2,2:5], hyp.params.table[3,2:5], 
      #              hyp.params.table[4,2:5], hyp.params.table[5,2:5], hyp.params.table[6,2:5], myname)
      # colnames(dat) <- c(paste(parm, colnames(hyp.params.table[2:5]), sep="_"), "pheno")
      # dat
      # 
      # 
      # mega_hypparams_table <- rbind(mega_hypparams_table, dat)
      ###===============================================###
    }else{
        newfile <- gsub(".lmm.log.txt", "norm.lmm.log.txt", x = file)
        if(file.exists(newfile)){
            all_lines <- readLines(newfile)
            head(all_lines)

            pve_split <- str_split(all_lines[25],pattern = " ")[[1]]
            pve <- as.numeric(pve_split[length(pve_split)])

            se_pve_split <- str_split(all_lines[26],pattern = " ")[[1]]
            se_pve <- as.numeric(se_pve_split[length(se_pve_split)])

            tmp_df <- data.frame(name = myname, pve, se_pve)
            stats_df <- rbind(stats_df, tmp_df)
        }
    }
  }

  ## BSLMM GWA data //need to save this still
  write.table(mega_hypparams_table, file="./tables/BSLMM_GWA_Param_Estimates.tsv", sep="\t", quote = F, col.names = T, row.names = F)
  ## Raw phenotype GWA
  write.table(stats_df, file="./tables/rawGWA_h2estimates.tsv", sep="\t", quote = F, col.names = T, row.names = F)
  ## Normalized phenotype GWA
  write.table(stats_df, file="./tables/normGWA_h2estimates.tsv", sep="\t", quote = F, col.names = T, row.names = F)
  
  
 stats_df <- read.table( file="./tables/rawGWA_h2estimates.tsv", sep="\t", header = T)
 stats_df[grep("FT", stats_df$name),]
 
  ######===========Plotting h2===========#########
  dirs <-read.table("./data/atlas_phenotype_names_numaccessions_willannotatemanual_strategies_MR.tsv",
                  header = T,
                  fill = T)
  class(dirs$phenotype)
  dirs$phenotype<- as.character(dirs$phenotype)
  # dirs$phenotype[duplicated(dirs$phenotype)]
  # which(duplicated(dirs$phenotype))
  dirs$phenotype[957]<- "FT.1"
  dirs$phenotype[1496]<-"FT.2"
  dirs$phenotype[1846]<-"DSDS50.1"
  dirs[dirs$phenotype=="ABA_96h_low_water_potential",]
  
  stats_df <- read.table(file="./tables/normGWA_h2estimates.tsv", header=T)
  class(stats_df$name)
  stats_df$name <-  as.character(stats_df$name)
  # stats_df$name[duplicated(stats_df$name)]
  # which(duplicated(stats_df$name))
  stats_df$name[955]<-"FT.1"
  stats_df$name[1494]<-"FT.2"
  stats_df$name[1844]<-"DSDS50.1"
  # head(stats_df)
  #aba.data<- data.frame(name="ABA", pve=0.144507, se_pve=0.26466)
  #stats_df<- rbind(stats_df, aba.data)
    
  stats_table_norm <- merge(dirs, by.x="phenotype", stats_df, by.y="name")
  dim(stats_table_norm)
  saveRDS(stats_table_norm, file="./data/stats_table_norm.rda")
  
  ## Raw data
  stats_df <- read.table(file="./tables/rawGWA_h2estimates.tsv", header=T)
  dim(stats_df)
  which(duplicated(stats_df$name))
  stats_df$name <-  as.character(stats_df$name)
  stats_df$name[956]<-"FT.1"
  stats_df$name[1495]<-"FT.2"
  stats_df$name[1845]<-"DSDS50.1"

  
  stats_table_raw <- merge(dirs, by.x="phenotype", stats_df, by.y="name")
  dim(stats_table_raw)
  saveRDS(stats_table_raw, file="./data/stats_table_raw.rda")
  
  
  ## remove phenotypes we couldn't get data from
  # dirs$phenotype[!dirs$phenotype %in% stats_df$name]
  # which( dirs$phenotype=="Agro_PPV_infection")
  # dirs[680:690,]
  # stats_df[680:690,]
  
  stats_table <- stats_table_raw
  ## Overlapping histograms for h2
  statsSS <- stats_table[!is.na(stats_table$stressstrategy),]
  p1 <- ggplot(statsSS, aes(x = pve, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlim(0,1) + xlab("h2") +
    theme(legend.position = c(0.35, .7), legend.key.size =unit(.7, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy")))
  statsPC <- stats_table[!is.na(stats_table$phenotypecategory),]
  p2 <- ggplot(statsPC, aes(x = pve, fill = phenotypecategory), show.legend=F) +
    geom_density(alpha=0.25) + xlim(0,1) +xlab("h2") +
    scale_fill_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" ))) +
    theme(legend.position = c(0.35, .7), legend.key.size =unit(.5, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) + 
    ylab("")
  
  
  h2_denistyplot<-plot_grid(p1, p2)
  saveRDS(file = "./figs/tmpobjects/raw_h2_densityplot.rda",h2_denistyplot)
  saveRDS(file = "./figs/tmpobjects/norm_h2_densityplot.rda",h2_denistyplot)
  
}else{
  ## we don't load the object anymore, instead we load the curated figure
  h2_denistyplot_raw <- readRDS(file = "./figs/tmpobjects/raw_h2_densityplot.rda")
  h2_denistyplot_norm <- readRDS(file = "./figs/tmpobjects/norm_h2_densityplot.rda")
  
  plot_grid(h2_denistyplot_raw, h2_denistyplot_norm, nrow=2, labels=c("raw", "normalized "))
}

```


```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap="Estimates of heritability and rho (polygenicity) from BSLMM GWA.", fig.height=8, fig.width=8}
#### BSLMM heritability and estimate of polygenicity (rho)
## chunk working, tested 9/20
RERUN=F
if(RERUN){
  setwd("~/safedata/natvar/")
  dirs <-read.table("./data/atlas_phenotype_names_numaccessions_willannotatemanual_strategies_MR.tsv",
                  header = T,
                  fill = T)
  dirs$phenotype<- as.character(dirs$phenotype)
  # dirs$phenotype[duplicated(dirs$phenotype)]
  # which(duplicated(dirs$phenotype))
  dirs$phenotype[957]<- "FT.1"
  dirs$phenotype[1496]<-"FT.2"
  dirs$phenotype[1846]<-"DSDS50.1"
  
  mynames<- as.character(dirs$phenotype)
  mega_hypparams_table <- read.table(file="./tables/BSLMM_GWA_Param_Estimates.tsv", sep="\t", header = T)
  mega_hypparams_table$pheno[duplicated(mega_hypparams_table$pheno)]
  which(duplicated(mega_hypparams_table$pheno))
  mega_hypparams_table$pheno <- as.character(mega_hypparams_table$pheno )
  mega_hypparams_table$pheno[957]<- "FT.1"
  mega_hypparams_table$pheno[1495]<-"FT.2"
  mega_hypparams_table$pheno[1845]<-"DSDS50.1"

  #dirs <- dirs[-1033,]
  stats_table_bslmm <- merge(dirs,by.x="phenotype", mega_hypparams_table, by.y="pheno")
  which(stats_table_bslmm$phenotype=="ABA_96h_low_water_potential")
  head(stats_table_bslmm)
  saveRDS(stats_table_bslmm, file="./data/stats_table_bslmm.rda")
  
  
  ###=====Plotting=======###
  stats_table <- stats_table_bslmm
  ## Overlapping histograms for h2
  statsSS <- stats_table[!is.na(stats_table$stressstrategy),]
  head(statsSS)
  p1 <- ggplot(statsSS, aes(x = PVE_median, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlim(0,1) + xlab("h2") +
    theme(legend.position = c(0.35, .7), legend.key.size =unit(.7, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy")))
  statsPC <- stats_table[!is.na(stats_table$phenotypecategory),]
  p2 <- ggplot(statsPC, aes(x = PVE_median, fill = phenotypecategory), show.legend=F) +
    geom_density(alpha=0.25) + xlim(0,1) +xlab("h2") +
    scale_fill_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" ))) +
    theme(legend.position = c(0.35, .7), legend.key.size =unit(.5, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) + 
    ylab("")
  h2_denistyplot<-plot_grid(p1, p2)
  saveRDS(file = "./figs/tmpobjects/bslmm_h2_densityplot.rda",h2_denistyplot)

  
  ## Overlapping histograms for rho
  #statsSS <- stats_table[!is.na(stats_table$stressstrategy),]
  head(statsSS)
  p3 <- ggplot(statsSS, aes(x = rho_median, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlim(0,1) + 
    theme(legend.position = "none") +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy")))
  statsPC <- stats_table[!is.na(stats_table$phenotypecategory),]
  p4 <- ggplot(statsPC, aes(x = rho_median, fill = phenotypecategory), show.legend=F) +
    geom_density(alpha=0.25) + xlim(0,1) +
    theme(legend.position = "none") + ylab("") +
    scale_fill_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" )))
  rho_densityplot <- plot_grid(p3, p4)
  saveRDS(file = "./figs/tmpobjects/bslmm_rho_densityplot.rda",rho_densityplot)
  
  h2_rho_denistyplots <- plot_grid(p1, p2, p3, p4)
  saveRDS(file = "./figs/tmpobjects/bslmm_h2_rho_denistyplots.rda",h2_rho_denistyplots)
  h2_rho_denistyplots
  
  
}else{
   h2_rho_denistyplots <- readRDS(file = "./figs/tmpobjects/bslmm_h2_rho_denistyplots.rda")
   h2_rho_denistyplots
}

```


```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap="Estimates of heritability and rho (polygenicity) from BSLMM GWA compared.", fig.height=8, fig.width=8}
RERUN=F
if(RERUN){
  stats_table <- readRDS(file="./data/stats_table_bslmm.rda")
  
  ##===PVE by Rho - stress strat===##
  head(stats_table)
  stats_table$phenotypecategory
  pve_by_rho_Plot <- ggplot(stats_table) + geom_point(aes(x=PVE_median, y=rho_median, size=numaccessions, color=stressstrategy))+
    xlab("PVE") + ylab("polygenic -> monogenic") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy"), 
                                     "NA"=transparent("gray")))
  
  pve_by_rho_Plot
  
  ##===PVE by Rho - pheno category ===##
  pve_by_rho_Plot_phenoCat <- ggplot(stats_table) + geom_point(aes(x=PVE_median, y=rho_median, size=numaccessions/2, color=phenotypecategory))+
    xlab("PVE") + ylab("polygenic -> monogenic") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" )))
  pve_by_rho_Plot_phenoCat
  pve_by_rho <- plot_grid(pve_by_rho_Plot, pve_by_rho_Plot_phenoCat, nrow=2)
  saveRDS(pve_by_rho, file="./figs/tmpobjects/bslmm_pve_by_rho_Plots.rda")
}else{
  pve_by_rho <- readRDS(file="./figs/tmpobjects/bslmm_pve_by_rho_Plots.rda")
  pve_by_rho
}
```


```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap="Estimates of PGE (sparse effects only) from BSLMM GWA, and comparison of PGE and rho (polygenicity).", fig.height=8, fig.width=6}
RERUN=F
if(RERUN){
  stats_table <- readRDS(file="./data/stats_table_bslmm.rda")
  
  ##===PGE===##
  head(stats_table)
  PGE_plot <- ggplot(stats_table, aes(x = PGE_median, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlim(0,1) + xlab("PGE") +
    theme(legend.position = c(0.5, .7), legend.key.size =unit(.7, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                    "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  PGE_plot
  
  ##===PGE by Rho --stress strat====##
  PGE_by_rho_Plot <- ggplot(stats_table) + geom_point(aes(x=PGE_median, y=rho_median, size=numaccessions,
                                                            color=stressstrategy))+
    xlab("PGE (sparse effects)") + ylab("polygenic -> monogenic") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  PGE_by_rho_Plot
  
  PGE_by_n_gamma_Plot <- ggplot(stats_table) + geom_point(aes(x=PGE_median, y=n.gamma_median, size=numaccessions,
                                                            color=stressstrategy))+
    xlab("PGE (variance explained by sparse effects)") + ylab("number of major effect loci") + #xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  PGE_by_n_gamma_Plot
  
  target_GWA_stats_only <-read.table("./tables/TaretPheno_GWAstats.tsv", sep="\t", header = T)
  targetPGE_by_n_gamma_Plot <- ggplot(target_GWA_stats_only, aes(x=PGE, y=n.gamma, label=phenotype)) + geom_point()+
    xlab("PGE (variance explained by sparse effects)") + ylab("number of major effect loci") +geom_text(hjust=0, vjust=-.7)
  targetPGE_by_n_gamma_Plot
  
  pge_plots <- plot_grid(PGE_plot, PGE_by_rho_Plot,PGE_by_n_gamma_Plot,targetPGE_by_n_gamma_Plot, nrow=2)
  pge_plots
  saveRDS(pge_plots, file="./figs/tmpobjects/pge_plots.rda")
  
}else{
  pge_plots <- readRDS(file="./figs/tmpobjects/pge_plots.rda")
  pge_plots
}
```

```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap="Estimates of pi and N_gamma (number of major effect loci) from BSLMM GWA.", fig.height=8, fig.width=4}
RERUN=F
if(RERUN){
  stats_table <- readRDS(file="./data/stats_table_bslmm.rda")
  
  ##====Pi=====##
  head(stats_table)
  PI_plot <- ggplot(statsSS, aes(x = pi_median, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlab("pi") + ylim(0,100)+ xlim(0,0.03) +
    theme(legend.position = c(0.5, .7), legend.key.size =unit(.7, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                    "Tolerance"=transparent("navy")))
  PI_plot
  
  ##====N Gamma====##
  ngamma_plot <- ggplot(statsSS, aes(x = n.gamma_median, fill = stressstrategy)) +
    geom_density(alpha=0.25) + xlab("N gamma") +
    theme(legend.position = c(0.5, .7), legend.key.size =unit(1, 'cm'), 
          legend.text = element_text(size=7), legend.title = element_blank()) +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                    "Tolerance"=transparent("navy")))
  ngamma_plot
  
  pi_gamma_plot <- plot_grid(PI_plot, ngamma_plot, nrow=2)
  saveRDS(pi_gamma_plot, file="./figs/tmpobjects/pi_gamma_plot.rda")
}else{
  pi_gamma_plot <- readRDS(file="./figs/tmpobjects/pi_gamma_plot.rda")
  pi_gamma_plot
}
```

We also assessed the amount of variation in estimates of heritability based on whether we used normalized or raw phenotype data (A) or using raw data in the LMM or BSLMM algorithm for GWA.

```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap=" Comparison of estimates of SNP-based heritability when using raw and normalized phenotype data (top), and estimates from the BSLMM GWA and using raw data (bottom).", fig.height=8, fig.width=6}
RERUN=F
if(RERUN){
  stats_table_bslmm <- readRDS(file="./data/stats_table_bslmm.rda")
  stats_table_norm <- readRDS(file="./data/stats_table_norm.rda")
  stats_table_raw <- readRDS(file="./data/stats_table_raw.rda")
  
  st_1 <- merge(stats_table_norm, by.x="phenotype", stats_table_raw, by.y="phenotype")
  st_1$phenotype[!st_1$phenotype %in% stats_table_norm$phenotype]
  st_all <- merge(stats_table_bslmm, st_1, by.all="phenotype")
  head(st_all)  
  saveRDS(st_all, file="./data/all_gwa_stats_table.rda")
  
  norm_v_raw_h2 <- ggplot(st_all) + geom_point(aes(x=pve.x, y=pve.y, size=numaccessions.x,
                                                            color=stressstrategy))+
    xlab("Norm h2") + ylab("h2") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  norm_v_raw_h2
  
  bslmm_v_raw_h2 <- ggplot(st_all) + geom_point(aes(x=pve.y, y=PVE_median, size=numaccessions.x,
                                                            color=stressstrategy))+
    xlab("h2") + ylab("bslmm h2") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  bslmm_v_raw_h2
  
  h2_comparison_plots <- plot_grid(norm_v_raw_h2, bslmm_v_raw_h2, nrow=2)
  saveRDS(h2_comparison_plots, file="./figs/tmpobjects/h2_comparison_plots.rda")
  
  # PGE_by_rho_Plot2 <- ggplot(stats_table) + geom_point(aes(x=PGE_median, y=rho_median, size=numaccessions,
  #                                                           color=phenotypecategory))+
  #   xlab("PGE (sparse effects)") + ylab("polygenicity") + xlim(0,1) + ylim(0,1) +
  #   scale_color_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'),
  #                                   "Ionomics"=transparent("goldenrod1"),
  #                                   "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"),
  #                                   "Reproduction"=transparent("dodgerblue1" )))
  
}else{
  h2_comparison_plots <- readRDS(file="./figs/tmpobjects/h2_comparison_plots.rda")
  h2_comparison_plots
}
```


```{r, echo=FALSE, eval=F, warning=FALSE, fig.cap="Total number and type of phenotypes that had heritability estimated less than 0.01", fig.height=8, fig.width=4}
##We were curious about which phenotypes were consistently showing a heritability of less than 0.01; so we investigated.
RERUN=F
if(RERUN){
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  low_h2 <- st_all[st_all$pve.x < 0.01 & st_all$pve.y < 0.01,]
  head(low_h2)
  
  d1<-table(low_h2$phenotypecategory) %>% as.data.frame()
  d2<-table(low_h2$stressstrategy, useNA = "always") %>% as.data.frame()
  p1<-ggplot(d1) + geom_col(aes(x = Var1, y=Freq, fill=Var1 ), color='white') + xlab("") + ylab("# phenotypes")+
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    scale_fill_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" ))) +
    theme(legend.position = c(.82, .7), legend.key.size =unit(.7, 'cm'), 
    legend.text = element_text(size=7), legend.title = element_blank()) 
  p2<-ggplot(d2) + geom_col(aes(x = Var1, y=Freq ,fill=Var1),color='white',) + xlab("") +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy"),
                                    na.value=transparent("grey10")))+    
    ylab("") + 
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    #theme(plot.margin=margin(l=-0.5,unit="cm")) +
    theme(legend.key.size =unit(1, 'cm'),  legend.position = c(.2, .8),
    legend.text = element_text(size=7), legend.title = element_blank()) 
  
  low_h2_plots <- plot_grid(p1, p2, nrow=2)
  low_h2_plots
  saveRDS(low_h2_plots, file="./figs/tmpobjects/low_h2_plots.rda")

}else{
  low_h2_plots <- readRDS(file="./figs/tmpobjects/low_h2_plots.rda")
  low_h2_plots
  
}
```


```{r, echo=FALSE, eval=T, warning=FALSE, fig.cap="Fig. SIII.5 Comparison of SNP-based heritability estimates from normalized and raw phenotype data, and from the LMM  (with raw data) and BSLMM GWA algorithm, while removing all estimates where both methods estimated less than 0.01 or greater than 0.99 heritability.", fig.height=8, fig.width=6}
RERUN=F
if(RERUN){
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  dim(st_all)
  1848-670
  low_h2 <- st_all[st_all$pve.x <= 0.01 & st_all$pve.y <= 0.01,]
  low_high_h2 <- rbind(low_h2, st_all[st_all$pve.x >= 0.99 & st_all$pve.y >= 0.99,])
  dim(low_high_h2)
  
  
  head(low_h2)
  low_h2$stressstrategy[is.na(low_h2$stressstrategy)] <- as.factor("NA")
  
  not_low_h2 <- st_all[st_all$pve.x > 0.01 & st_all$pve.y > 0.01,]
  not_low_h2_1 <- not_low_h2[not_low_h2$pve.x < 0.99 & not_low_h2$pve.y < 0.99,]
  dim(not_low_h2_1)
  IQR(not_low_h2_1$PVE_median)
  IQR(not_low_h2_1$pve.x) ## this is from normalized data
  IQR(not_low_h2_1$pve.y) ## this is from raw data
  summary(not_low_h2_1$pve.y) ## this is from raw data
  IQR(not_low_h2_1$n.gamma_mean)
  summary(not_low_h2_1$n.gamma_mean)
  
  escape_h2 <- not_low_h2_1 %>% filter(stressstrategy=="Escape")
  head(escape_h2)
  IQR(escape_h2$pve.y) 
  mean(escape_h2$pve.y) 
  
  avoidance_h2 <- not_low_h2_1 %>% filter(stressstrategy=="Avoidance")
  mean(avoidance_h2$pve.y) 
  IQR(avoidance_h2$pve.y, ) 
  
  wilcox.test(escape_h2$pve.y, avoidance_h2$pve.y, alternative = "greater", conf.int = T, paired = F)
  wilcox.test(escape_h2$pve.x, avoidance_h2$pve.x, alternative = "greater", conf.int = T, paired = F)
  
  df <- data.frame(ss = c(as.character(escape_h2$stressstrategy), as.character(avoidance_h2$stressstrategy)),
                   h2 = c(escape_h2$pve.y, avoidance_h2$pve.y)) 
  
  boxplot_h2_escape_avoid <- ggplot(df) +
  aes(x = ss, y = h2) +
  geom_boxplot(fill = c(transparent("red"), transparent('green4'))) +
  xlab("stress strategy")
  theme_minimal()
  boxplot_h2_escape_avoid
  saveRDS(boxplot_h2_escape_avoid, file="figs/tmpobjects/boxplot_h2_escape_avoid.rda")
  
  IQR(not_low_h2_1$n.gamma_median)
  
  not_low_h2_2 <- not_low_h2[not_low_h2$PVE_median < 0.99 & not_low_h2$pve.y < 0.99,]
  
  
  norm_v_raw_h2 <- ggplot(not_low_h2_1) + geom_point(aes(x=pve.x, y=pve.y, size=numaccessions.x,
                                                            color=stressstrategy))+
    xlab("Norm h2") + ylab("h2") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  norm_v_raw_h2
  
  not_low_h2 <- st_all[st_all$PVE_median > 0.01 & st_all$pve.y > 0.01,]
  bslmm_v_raw_h2 <- ggplot(not_low_h2) + geom_point(aes(x=pve.y, y=PVE_median, size=numaccessions.x,
                                                            color=stressstrategy))+
    xlab("h2") + ylab("bslmm h2") + xlim(0,1) + ylim(0,1) +
    scale_color_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'),
                                     "Tolerance"=transparent("navy"), "NA" = transparent("gray")))
  bslmm_v_raw_h2
  
  h2_comparison_plots <- plot_grid(norm_v_raw_h2, bslmm_v_raw_h2, nrow=2)
  h2_comparison_plots
  saveRDS(h2_comparison_plots, file="./figs/tmpobjects/h2_comparison_plots_h2sremoved.rda")
  
}else{
  h2_comparison_plots <- readRDS(file="./figs/tmpobjects/h2_comparison_plots_h2sremoved.rda")
  h2_comparison_plots
}

```

```{r, echo=FALSE, eval=T, warning=FALSE, fig.width=4, fig.height=4, fig.cap="Figure SIII.6 Boxplot of SNP-based heritability estimates for escape traits and avoidance traits.", message=FALSE}
boxplot_h2_escape_avoid <- readRDS(file="figs/tmpobjects/boxplot_h2_escape_avoid.rda")
boxplot_h2_escape_avoid
```

```{r, echo=FALSE, eval=T, warning=FALSE, fig.width=6, fig.height=4, fig.cap="Figure SIII.7 Number of phenotypes with SNP-based heritability estimates within 0.15 difference of each other (red) compared to phenotypes with a great than 0.15 difference across estimation techniques (purple).", message=FALSE}
## this chunk needs fixed, object in else doesnt load/run
RERUN=F
if(RERUN){
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  st_all[1:5,1:5]
  sum(st_all$phenotype %in% d$phenotype)
   d$phenotype[!d$phenotype %in% st_all$phenotype]
  
  pheno <- read.table(file="./data/atlas1001_phenotypes_matrix_MR.csv", header = T, sep=",")
  pheno[1:5,1:5]
  
  
  #bslmm.filt <- readRDS(file="./data/bslmm.filt.dat.rda")
  
  low_h2 <- st_all[st_all$pve.x <= 0.01 & st_all$pve.y <= 0.01,]
  low_high_h2 <- rbind(low_h2, st_all[st_all$pve.x > 0.99 & st_all$pve.y > 0.99,])
  head(low_h2)
  low_h2$stressstrategy[is.na(low_h2$stressstrategy)] <- as.factor("NA")
  
  not_low_h2 <- st_all[st_all$pve.x > 0.01 & st_all$pve.y > 0.01,]
  not_low_h2_1 <- not_low_h2[not_low_h2$pve.x < 0.99 & not_low_h2$pve.y < 0.99,]
  
  not_low_h2_2 <- not_low_h2[not_low_h2$PVE_median < 0.99 & not_low_h2$pve.y < 0.99,]
  
  ## make two groups, one where the estimates of h are consistent...and one where they are not consistent
  
   dist <- abs(st_all$pve.x) - abs(st_all$pve.y)
   hist(dist[dist >= -0.005 & dist <= 0.005])
   consistent_h2 <- st_all[dist >= -0.15 & dist <= 0.15,]
   head(consistent_h2)
   consistent_h2$h2comp <- rep("similarh2", nrow(consistent_h2))
   
   not_consistent_h2 <- st_all[dist < -0.15 | dist > 0.15,]
   head(not_consistent_h2)
   not_consistent_h2$h2comp <- rep("inconsistenth2", nrow(not_consistent_h2))
   
   new_st_all <- rbind(not_consistent_h2, consistent_h2)
   overlap_plot <- ggplot(new_st_all) + geom_histogram(aes(x=PVE_median, fill=h2comp), position = "identity") +
     scale_fill_manual(values=c("similarh2"=transparent("red"), "inconsistenth2" =transparent("blue"))) +
     theme(legend.position="none") + xlab("PVE") + ylab("")
  overlap_plot
  saveRDS(overlap_plot, file="./figs/tmpobjects/overlap_inconh2_plot.rda")
  
  d1<-table(not_consistent_h2$phenotypecategory) %>% as.data.frame()
  d2<-table(not_consistent_h2$stressstrategy, useNA = "always") %>% as.data.frame()
  p1<-ggplot(d1) + geom_col(aes(x = Var1, y=Freq, fill=Var1 ), color='white') + xlab("") + ylab("# phenotypes")+
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    scale_fill_manual("",values = c("Defense"=transparent("red3"),"Development"=transparent('darkorange1'), 
                                    "Ionomics"=transparent("goldenrod1"), 
                                    "Metabolite"=transparent("yellow"), "Microbiome"=transparent("limegreen"), 
                                    "Reproduction"=transparent("dodgerblue1" ))) +
    theme(legend.position = c(.82, .7), legend.key.size =unit(.7, 'cm'), 
    legend.text = element_text(size=4), legend.title = element_blank()) 
  p2<-ggplot(d2) + geom_col(aes(x = Var1, y=Freq ,fill=Var1),color='white',) + xlab("") +
    scale_fill_manual("",values = c("Avoidance"=transparent("red"),"Escape"=transparent('green4'), "Tolerance"=transparent("navy"),
                                    na.value=transparent("grey10")))+    
    ylab("") + 
    theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
    #theme(plot.margin=margin(l=-0.5,unit="cm")) +
    theme(legend.key.size =unit(1, 'cm'),  legend.position = c(.2, .8),
    legend.text = element_text(size=7), legend.title = element_blank()) 
  
  incon_h2_plots <- plot_grid(p1, p2, nrow=1)
  incon_h2_plots
  saveRDS(incon_h2_plots, file="./figs/tmpobjects/incon_h2_plots.rda")
  
   sum(new_st_all$h2comp=="similarh2")
   dim(consistent_h2)
   dim(not_consistent_h2)
   not_con_lis <- data.frame(not_consistent_h2$phenotype, not_consistent_h2$PVE_median, not_consistent_h2$pve.x, not_consistent_h2$pve.y)
   
   
   ## all target phenotypes have reliable/consistent estimates of h2
   consistent_h2$phenotype[grep("ABA", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("Growth", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("FT16", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("Dorm", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("perc", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("Vern", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("RGR", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("Root_horiz", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("Relative_root", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("stomata", consistent_h2$phenotype)]
   consistent_h2$phenotype[grep("first", consistent_h2$phenotype)] ## stomatal index has < 0.25 diff in h2
   consistent_h2$phenotype[grep("Delta", consistent_h2$phenotype)]
}else{
  overlap_plot <- readRDS(file="./figs/tmpobjects/overlap_inconh2_plot.rda")
  overlap_plot
}
```
  
```{r, echo=FALSE, eval=F, warning=FALSE, fig.width=7, fig.height=4, fig.cap="Categories of phenotypes that had h2 estimates greater than 0.15 differnt in magnitude."}
incon_h2_plots <- readRDS(file="./figs/tmpobjects/incon_h2_plots.rda")
incon_h2_plots
```
  
  
```{r, echo=FALSE, eval=T, warning=FALSE}
RERUN=F
if(RERUN){
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
   
  other_vern<- c("X89_LD",  "X34_LDV",  "X47_SD", "X104_SDV")
  other_vern <- gsub("X", "", other_vern)
  st_all[st_all$phenotype %in% other_vern,]
  ##maybe substitute vern after growth with X34_LDV
   
   head(st_all)
   ## make a table of target traits heritability estimates, and rho, and n_gamma
   target_pheno_GWA_stats <- st_all[st_all$phenotype %in% c("ABA_96h_low_water_potential", 
                                            "Growth_rate", 
                                            "Delta_13C", 
                                            "DSDS10", 
                                            "stomatasize", ## only 55
                                            "stomata_density",
                                            "FT16",  
                                            "d8_10C_perc", 
                                            "RGR", 
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003",
                                            "72_Vern_Growth"), ]

   head(target_pheno_GWA_stats)
   target_GWA_stats_only <- target_pheno_GWA_stats[,c(1,3,7,11,15,19,23, 27,34, 40)]
   target_GWA_stats_only$phenotype <- c("growth after Vernalization", "ABA", "germination percentage", "detla_13C", "dormancy",
                                        "Flowering time", "growth rate", "root RGR", "RGR","root horizontal index", "stomata density",
                                        "stomata size")
   
   colnames(target_GWA_stats_only) <- gsub("_median", "", colnames(target_GWA_stats_only))
   colnames(target_GWA_stats_only)[9:10] <- c("pve_norm", "pve_raw")
   colnames(target_GWA_stats_only)[2] <- "N samples"
   target_GWA_stats_only[c(3:7, 9:10)]<- round(target_GWA_stats_only[c(3:7, 9:10)], digits = 4)
   
   write.table(target_GWA_stats_only, "./tables/TaretPheno_GWAstats.tsv", sep="\t", col.names = T,
               row.names = F, quote = F)
   
}else{
   target_GWA_stats_only <-read.table("./tables/TaretPheno_GWAstats.tsv", sep="\t", header = T)
   knitr::kable(target_GWA_stats_only, col.names = colnames(target_GWA_stats_only), caption = "Table SIII.1 Summary heritability estimates and parameters from GWA of 12 target phenotypes; number of samples with data for a given phenotype, h estimate from BSLMM (not this is a hyperparameter and not heritability estimate exactly), PVE or proportion of variation explained from BSLMM (this IS the heritability estimate from BSLMM), rho is a hyperparameter on the polygenicity of a given trait, PGE is the proportion of genetic variance explained by sparse effect terms (~major effect loci),  pi is the proportion of SNPs with non-zero effects, N gamma is the estimated number of loci with large effects, pve_norm is the SNP-based heritability estimate from the GWA run with normalized data, and pve_raw is the SNP-based heritability estimate from the GWA run with raw data.")
}
  
```

```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the raw phenotypes and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}
### Does the amount of variation in the phenotypes associate with estimates of heritability?
##For this, we simply compared the variance of the phenotype to the estimate heritability and the results are below. 

RERUN=F
if(RERUN){
  ## phenotype datasets
  pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)[,-1]
  pheno_imp <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_onlypheno.csv', sep=",", header = T)[,-c(1,2)]
  pheno_raw_quant <- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  pheno_imp_quant <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T)[,-1]

  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  
  ## h2 and rho estimates
  dirs <-read.table("./data/atlas_phenotype_names_numaccessions_willannotatemanual_strategies_MR.tsv",
                  header = T,
                  fill = T)
  
  df <- data.frame(pheno=colnames(pheno))
  combined_names <- merge(st_all, by.x="phenotype", df, by.y="pheno") # 1577
  combined_names$newpheno <- combined_names$phenotype 
  
  df_notin <- data.frame(pheno=colnames(pheno)[!colnames(pheno) %in% combined_names$newpheno])
  df_notin
  df_notin$pheno <- gsub("X","", df_notin$pheno)
  
  #st_all$phenotype[grep(".1", st_all$phenotype)]
  cnames_2 <- merge(st_all, by.x="phenotype", df_notin, by.y="pheno" )
  head(cnames_2)
  cnames_2$newpheno <- cnames_2$phenotype

  all_names <- rbind(combined_names, cnames_2)
  #df_notin2 <- data.frame(pheno=df_notin$pheno[!df_notin$pheno %in% all_names$newpheno])

  st_all$phenotype[which(st_all$phenotype=="HopAM1-cell_death")] <- "HopAM1.cell_death"
  st_all$phenotype[which(st_all$phenotype=="HopAM1-rosette_chlorosisis")] <- "HopAM1.rosette_chlorosisis"
  

  # ## hacky stuff..maybe not needed anymore
  # h2$phenotype <- c(paste(h2$phenotype[1:36]), paste0("X", h2$phenotype[37:130]), paste(h2$phenotype[131:298]), paste0("X",   h2$phenotype[299:332]),  
  #                   paste(h2$phenotype[333:529]), paste0("X", h2$phenotype[530:531]),  paste(h2$phenotype[532:571]), paste0("X", h2$phenotype[572:576]),  
  #                   paste(h2$phenotype[577:668]), paste0("X", h2$phenotype[669:670]),  paste(h2$phenotype[671:1417]), 
  #                   paste0("X", h2$phenotype[1418:1420]),  
  #                   paste(h2$phenotype[1421:1494]) , paste0("X", h2$phenotype[1495:1601]),  paste(h2$phenotype[1602]), 
  #                   paste0("X", h2$phenotype[1603:1608]), 
  #                   paste(h2$phenotype[1609:1653]), paste0("X", h2$phenotype[1654:1667]), paste(h2$phenotype[1668]), 
  #                   paste0("X", h2$phenotype[1669:1670]), 
  #                   paste(h2$phenotype[1671:1849]))
  
  
  # h2$phenotype[803]<-"HopAM1.cell_death"
  # h2$phenotype[804]<- "HopAM1.rosette_chlorosisis"
  # h2$phenotype[which(h2$phenotype=="FT")] <- c("FT", "FT.1", "FT.2")
  
  vars <- data.frame()
  dim(vars)
  i<-1
  for (i in 1:nrow(st_all)){
    print(i) ## doing it this way bc the phenotypes don't always match
    id <- match(st_all$phenotype[i], colnames(pheno))

     if (is.na(id)){
      newname <- paste0("X", st_all$phenotype[i])
      id <- match(newname, colnames(pheno))
      vars[i,1] <- var(pheno[,id[1]], na.rm = T)
      id_2 <- match(newname, colnames(pheno_imp))
      vars[i,2] <- var(pheno_imp[,id_2[1]], na.rm = T)
      id_3 <- match(newname, colnames(pheno_raw_quant))
      vars[i,3] <- var(pheno_raw_quant[,id_3[1]], na.rm = T)
      id_4 <- match(newname, colnames(pheno_imp_quant))
      vars[i,4] <- var(pheno_imp_quant[,id_2[1]], na.rm = T)
      print(c(id, id_2, id_3, id_4))
      vars[i,5] <- sum(!is.na(pheno[,id[1]]))
      vars[i,6] <- st_all$phenotype[i]
     }else{
      vars[i,1] <- var(pheno[,id[1]], na.rm = T)
      id_2 <- match(st_all$phenotype[i], colnames(pheno_imp))
      vars[i,2] <- var(pheno_imp[,id_2[1]], na.rm = T)
      id_3 <- match(st_all$phenotype[i], colnames(pheno_raw_quant))
      vars[i,3] <- var(pheno_raw_quant[,id_3[1]], na.rm = T)
      id_4 <- match(st_all$phenotype[i], colnames(pheno_imp_quant))
      vars[i,4] <- var(pheno_imp_quant[,id_2[1]], na.rm = T)
      print(c(id, id_2, id_3, id_4))
      vars[i,5] <- sum(!is.na(pheno[,id[1]]))
      vars[i,6] <- st_all$phenotype[i]
    }
  }  
  colnames(vars) <- c("raw", "imputed", "normalized", "imp_norm", "Nsamples", "pheno")
  head(vars)
  vars_all <- merge(st_all, by.x="phenotype", vars, by.y="pheno")
  head(vars_all)
  
  vars_all <- vars_all %>% filter(Nsamples > 200)
  
  # h2$raw_var <- vars[,1]
  # h2$imp_var <- vars[,2]
  # h2$rawquant_var <- vars[,3]
  # h2$impquant_var <- vars[,4]
  # h2$n <- vars[,5]
  # 
  # h2_verylow <- h2[which(h2$pve<0.001),]
  # dim(h2_verylow)
  # write.table(h2_verylow, file="./tables/h2_verylow.tsv", sep="\t", col.names =T)
  # 
  
  ## First correlate raw variance with all estimates of h2
  rawVar1_normh2 <- ggplot(vars_all, aes(x=raw, y=pve.x)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Raw Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  rawVar1_normh2
  rawVar1_rawh2 <- ggplot(vars_all, aes(x=raw, y=pve.y)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Raw Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  rawVar1_normh2
  rawVar1_bslmmh2 <- ggplot(vars_all, aes(x=raw, y=PVE_median)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Raw Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  rawVar1_bslmmh2
  raw_h2_varplots <- plot_grid(rawVar1_normh2, rawVar1_normh2, rawVar1_bslmmh2)
  saveRDS(raw_h2_varplots, file = "./figs/tmpobjects/raw_h2_varplots.rda")
  
  ## Correlate imputed variance with all estimates of h2
  impVar1_normh2 <- ggplot(vars_all, aes(x=imputed, y=pve.x)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Imputed Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  impVar1_normh2
  impVar1_rawh2 <- ggplot(vars_all, aes(x=imputed, y=pve.y)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Imputed Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  impVar1_rawh2
  impVar1_bslmmh2 <- ggplot(vars_all, aes(x=imputed, y=PVE_median)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,10) +
    xlab("var(Imputed Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  impVar1_bslmmh2
  imp_h2_varplots <- plot_grid(impVar1_normh2, impVar1_rawh2, impVar1_bslmmh2)
  saveRDS(imp_h2_varplots, file = "./figs/tmpobjects/imp_h2_varplots_norm.rda")
  
  normVar1_normh2 <- ggplot(vars_all, aes(x=normalized, y=pve.x)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,1) +
    xlab("var(Normalized Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  normVar1_normh2
  normVar1_rawh2 <- ggplot(vars_all, aes(x=normalized, y=pve.y)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,1) +
    xlab("var(Normalized Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  normVar1_rawh2
  normVar1_bslmmh2 <- ggplot(vars_all, aes(x=normalized, y=PVE_median)) +
    geom_point(size=1) + ylim(0,1) + xlim(0,1) +
    xlab("var(Normalized Phenotype)") +
    ylab("h2") +
    geom_smooth(method="lm", formula = y~x, color="black") 
  normVar1_bslmmh2
  norm_h2_varplots <- plot_grid(normVar1_normh2, normVar1_rawh2, normVar1_bslmmh2)
  saveRDS(norm_h2_varplots, file = "./figs/tmpobjects/norm_h2_varplots_highN.rda")
  
}else{
  raw_h2_varplots <- readRDS(file = "./figs/tmpobjects/raw_h2_varplots.rda")
  raw_h2_varplots
  
}
```

```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the imputed phenotypes and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}
  imp_h2_varplots <- readRDS(file = "./figs/tmpobjects/imp_h2_varplots.rda")
  imp_h2_varplots
```


```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the normalized phenotypes and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}
  norm_h2_varplots <- readRDS(file = "./figs/tmpobjects/norm_h2_varplots.rda")
  norm_h2_varplots
```


```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the raw phenotypes with more than 200 samples, and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}

  ## phenotype datasets
 raw_h2_varplots <- readRDS(file = "./figs/tmpobjects/raw_h2_varplots_highN.rda")
 raw_h2_varplots
```

```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the imputed phenotypes with more than 200 samples, and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}
imp_h2_varplots <- readRDS(file = "./figs/tmpobjects/imp_h2_varplots_norm.rda")
imp_h2_varplots
```

```{r, echo=F, eval=F, warning=F,  fig.cap="Association between variance of the normalized phenotypes with more than 200 samples, and the estimated heritability (h2) from (A) GWA lmm with raw data, (B) lmm with normalized data, (C) in bslmm with raw data"}
norm_h2_varplots <- readRDS(file = "./figs/tmpobjects/norm_h2_varplots_highN.rda")
norm_h2_varplots
```

```{r, echo=F, eval=F}
### Structure of genetic variants across trait categories (tSNE plots)

##We visualized all Z scores across all phenotypes simultaneously (Fig. 1b,d) using a t-distributed stochastic neighbor embedding (t-SNE) algorithm (Maaten and Hinton 2008; Krijthe 2015). The Z scores were also correlated across m x m traits (Fig S3). The resulting m x m matrix . These correlations were also conducted for SNPs of different genome annotations using TAIR10 reference (http://arabidopsis.org). 
```

### Heritability Summaries

```{r, echo=FALSE, eval=T, warning=FALSE, fig.width=6, fig.height=4, fig.cap="Figure SIII.7 Number of phenotypes with SNP-based heritability estimates within 0.15 difference of each other (red) compared to phenotypes with a great than 0.15 difference across estimation techniques (purple).", message=FALSE}
## this chunk needs fixed, object in else doesnt load/run
RERUN=F
if(RERUN){
  st_all <- readRDS(file="./data/all_gwa_stats_table.rda")
  dim(st_all)
  head(st_all)
  
  summary(st_all$se_pve.x)
  
  norm.filt <- st_all %>% filter(se_pve.x<0.25)
  mean(norm.filt$pve.x)
  mean(norm.filt$pve.x) - IQR(norm.filt$pve.x)/2
  mean(norm.filt$pve.x) + IQR(norm.filt$pve.x)/2
  
  raw.filt <- st_all %>% filter(se_pve.y<0.25)
  mean(raw.filt$pve.y)
  mean(raw.filt$pve.y) - IQR(raw.filt$pve.y)/2
  mean(raw.filt$pve.y) + IQR(raw.filt$pve.y)/2
  
  st_all$HPDrange <- st_all$PVE_97.5.-st_all$PVE_2.5. 
  hist(st_all$HPDrange)
  
  
  bm <- st_all %>% filter(HPDrange <=0.8)
  
  st_all %>% select(paper, phenotype, )
  st_all[,13]
  to_write_h2_full_Table<- st_all[, c(2, 1, 3, 6:29, 34, 35, 40,41)]
  head(to_write_h2_full_Table)
  write.table(to_write_h2_full_Table, file="./tables/all_h2_Estiamtes_supp15.csv", col.names =T, row.names = F, quote = F, sep=",")
  
  IQR(to_write_h2_full_Table$pve.x)
  mean(to_write_h2_full_Table$pve.x) 
  mean(to_write_h2_full_Table$pve.x) - IQR(to_write_h2_full_Table$pve.x)/2
  
  
  bmfilt <- readRDS(file="./data/bslmm.filt.dat.rda")
  mean(bmfilt$PVE_mean)
  mean(bmfilt$PVE_mean) - IQR(bmfilt$PVE_mean)/2
  mean(bmfilt$PVE_mean) + IQR(bmfilt$PVE_mean)/2

  mean(bmfilt$n.gamma_mean)
  mean(bmfilt$n.gamma_mean) - IQR(bmfilt$n.gamma_mean)/2
  mean(bmfilt$n.gamma_mean) + IQR(bmfilt$n.gamma_mean)/2


  ## what is the distribution of std. error of pve.x and y, norm and raw
  percentiles <- c(5, 10, 25, 50, 75, 90, 95)
  percentile_values <- quantile(st_all$se_pve.y, probs = percentiles / 100)
  
  ## quick check to see dist. of error of h estimates
  ggplot(st_all, aes(x = se_pve.y)) +
    geom_density(fill = "skyblue", color = "darkblue", alpha = 0.7) + xlim(0,1)+
    theme_minimal() +
    geom_vline(xintercept = percentile_values, linetype = "dashed", color = "purple") +
    labs(title = "",
         x = "st.error of PVE",
         y = "Density")
  
  filt.dat <-  st_all %>% filter(se_pve.x<0.499) %>% filter(se_pve.y<.464)
  head(filt.dat)
  dim(filt.dat)

  ## quick check to see dist. of error of PVE estimates from BSLMM
  filt.dat$PVErange <- apply(filt.dat[, c( "PVE_2.5.", "PVE_97.5.")], 1, function(x) diff(range(x)))
  
   percentiles <- c(5, 10, 25, 50, 75, 90, 95)
   percentile_values <- quantile(filt.dat$PVErange, probs = percentiles / 100)
  
   ggplot(filt.dat, aes(x = PVErange)) +
    geom_density(fill = "skyblue", color = "darkblue", alpha = 0.7)+
    theme_minimal() +
    geom_vline(xintercept = percentile_values, linetype = "dashed", color = "purple") +
    labs(title = "",
         x = "2.5-97.5 of PVE",
         y = "Density")
    
   
    ## Do diff cut offs and look at trait architectures generally 
    

}else{
  
} 
```   


################################################################################
## III.4 Genetic correlations
################################################################################

### Approximation of all genetic correlations

We used a summary-statistic-based correlation  similar to LD score (Bulik-Sullivan et al. 2015) (the long-range linkage of A. thaliana impeded us to get reliable results using all genome-wide SNPs in LD score). Prior to correlating effect sizes, we transformed the effect sizes of each  SNP into a Z score as: (beta/se)^2. We also only used independent (as in from seperate estimated linkage blocks), high quality SNPs (those with high call rate across samples), resulting in an independent set of ~60 K SNPs. 

### Comparisons of genetic correlations and heritaility for traits

```{r, echo=F, eval=F, warning=F,  fig.cap="PVE and heritability estimates by the average genetic correlaation of the phenotype.", fig.width=8, fig.height=4}
RERUN=F
if(RERUN){
    ## Get Z scores sorted
    # betas <- fread(file="TopSNPs/allbetas.txt") ### these are updated runs as of April 2021
    # #betas<- as.data.frame(betas)
    # se <- fread(file="TopSNPs/allses.txt")
    # #se <- as.data.frame(se)
    # dim(betas)
    # z[1:5,1:15]
    # z <- betas/se
    # z <- na.roughfix(z) ## impute NAs for correlation
    #colnames(z) <- gsub(pattern = "-beta", x = colnames(z), replacement = "")
    #saveRDS(z, file="TopSNPs/Zscores.hq.rda")
    z <- readRDS(file="TopSNPs/Zscores.hq.rda")
    z[1:5,1:5]

    #atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
    pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)[,-1]
    pheno[1:5,1:5]
    # load field experiment information to get the 515 accessions
    # load("data/d4.rda")
    # idsfield<-unique(d4$id)
    # whichfield<-which(pheno$id %in% idsfield)
    # idex515<-whichfield

    ## this has the currated phenotpe names
    load(file="data/allphenotypes.rda") #
    sum(colnames(z) %in% colnames(allphenotypes))
    sum(colnames(z) %in% colnames(pheno))
    #pheno<- pheno[,-1884]
    colnames(pheno)<- colnames(allphenotypes)
    #sum(colnames(z) %in% colnames(pheno))
    #pheno <- pheno[,colnames(pheno) %in% colnames(z)]


    dim(z)
    dim(pheno)
    # the offset is because of the duplicated martinez phenotypes
    # colnames(pheno)[grep(".1", colnames(pheno))]
    # grep(".1", colnames(pheno))[545:569]
    # colnames(pheno)[1692:1710]
    pheno <- pheno[,-c(1692:1710)]

    z <- as.data.frame(z)
    z[1:5,1:5]
    z <- z[, colnames(z) %in% colnames(pheno)]
    pheno <- pheno[, colnames(pheno) %in% colnames(z)]

    ## all good now
    dim(z)
    dim(pheno)

    Zcor <- matrix(NA, ncol=ncol(z), nrow=ncol(z))

    for (i in 1:ncol(z)){
      for (j in 1:ncol(z)){

        if (i == j){
          Zcor[i,j] <-  1.0
        }else{
          Zcor[i,j] <- cor(z[,i], z[,j])
        }
        print(i)
        print(j)
      }
    }


    ## correlate all genetics
    ##Zcor <- rcorr(z, type="pearson")
    Zcor <- cor(z, method="pearson", use="pairwise.complete.obs")
    dim(Zcor)
    hist(Zcor)
    Zcor[lower.tri(Zcor, diag = TRUE)] <- NA
    Zcor[1:5,1:5]
    #Zcor <- readRDS(file="./data/Zcor.rda")
    dim(Zcor)
    Zcor[1:5,1:5]


    Zcor <- readRDS(file="./data/Zcor_all.rda")
    st_all <- readRDS(file="./data/all_gwa_stats_table.rda")

    ## figure out which phenos are not overlapping and remove
    # sum(st_all$phenotype %in% colnames(Zcor))
    # dim(Zcor)
    # which(!colnames(Zcor) %in% st_all$phenotype)
    ## just the one that isn't included
    Zcor <- Zcor[-1030,-1030]
    
    ## remove diagonals = to 1 bc irrelevant
    diag(Zcor) <- rep(NA, length(diag(Zcor)))
    Zcor[1:10,1:10]
    
    ## do some filtering prior to averaging and finding the max
    importanttraits <- read.table(file="./data/importanttraits.csv",sep=",")
    importanttraits$V1 <- gsub("X", "", importanttraits$V1 )
    st_all <- st_all %>% filter(phenotype %in% importanttraits$V1)
   
    st_all <- st_all %>%  filter(pve.x>0.05, pve.y > 0.05, pve.x<0.95, pve.y<0.95, numaccessions>100)
    
    Zcor_filtered <- Zcor[rownames(Zcor) %in% st_all$phenotype, colnames(Zcor) %in% st_all$phenotype]
    
    Zcor_max <- apply(Zcor_filtered, MARGIN = 1, max, na.rm=T)
    Zcor_mean<- apply(Zcor_filtered, MARGIN = 1, mean, na.rm=T)
    Zcor_var <- apply(Zcor_filtered, MARGIN = 1, var, na.rm=T)

    ## merge sum stats into df
    Zdf <- data.frame(phenotype=names(Zcor_max),
                      Zmax = Zcor_max,
                      Zmean = abs(Zcor_mean),
                      Zvar = Zcor_var)
    
    st_all_Zs <- merge(Zdf, st_all, by="phenotype")
    st_all_Zs[1:10,1:35]
    
    ## first raw plots of relationships between herit/poly and genetic cors
    ## Rho
    ggplot(st_all_Zs) +geom_point(aes(x=rho_median, y=Zmean)) +
      geom_smooth(method="lm", aes(x=rho_median, y=Zmean))
    cor.test(x=st_all_Zs$rho_median, y=st_all_Zs$Zmean)
    
    ## n gamme
    ggplot(st_all_Zs) +geom_point(aes(x=n.gamma_median, y=Zmean))
    cor.test(x=st_all_Zs$n.gamma_median, y=st_all_Zs$Zmean)
    
    ## PGE median
    ggplot(st_all_Zs) +geom_point(aes(x=PGE_median, y=Zmean, size=numaccessions))
    cor.test(x=st_all_Zs$PGE_median, y=st_all_Zs$Zmean)
    
    ## PVE median
    ct <- cor.test(x=st_all_Zs$pve.y, y=st_all_Zs$Zmean)
    p1 <- ggplot(st_all_Zs) +geom_point(aes(x=pve.y, y=Zmean, size=numaccessions)) +
      ylab("mean Genetic Correlation") +xlab("h2 (estimated from raw pheno. in lmm)") +
      geom_smooth(method="lm",aes(x=pve.y, y=Zmean), color="red") +
      annotate("text", x = 0.2, y = 0.20, 
               label = paste("r2 =", signif(ct$estimate, digits = 3) ,"\n", "p-value =",  signif(ct$p.value, digits = 3))) + theme(legend.position = "none")
    p1
    
    ct <- cor.test(x=st_all_Zs$pve.x, y=st_all_Zs$Zmean)
    p2 <- ggplot(st_all_Zs) +geom_point(aes(x=pve.x, y=Zmean, size=numaccessions)) +
      ylab("mean Genetic Correlation") +xlab("h2 (estimated from norm pheno. in lmm)") +
      geom_smooth(method="lm",aes(x=pve.x, y=Zmean), color="red") +
      annotate("text", x = 0.27, y = 0.2, 
               label = paste("r2 =", signif(ct$estimate, digits = 3) ,"\n", "p-value =",  signif(ct$p.value, digits = 3))) + theme(legend.position = "none")
    p2
    
    ct <- cor.test(x=st_all_Zs$PVE_median, y=st_all_Zs$Zmean)
    p3 <- ggplot(st_all_Zs) +geom_point(aes(x=PVE_median, y=Zmean, size=numaccessions)) +
      ylab("mean Genetic Correlation") +xlab("PVE (estimated from bslmm)") +
      geom_smooth(method="lm",aes(x=PVE_median, y=Zmean), color="red") +
      annotate("text", x = 0.3, y = 0.2, 
               label = paste("r2 =", signif(ct$estimate, digits = 3) ,"\n", "p-value =",  signif(ct$p.value, digits = 3))) + theme(legend.position = "none")
    p3
    pve_by_Zmean <- plot_grid(p1,p2,p3, nrow=1)
    saveRDS(pve_by_Zmean, file="./figs/pve_by_Zmean_plots.rda")

    ## Pi median
    ggplot(st_all_Zs) +geom_point(aes(x=pi_median, y=Zmean))
    cor.test(x=st_all_Zs$pi_median, y=st_all_Zs$Zmean)
    
    
}else{
  pve_by_Zmean <- readRDS( file="./figs/pve_by_Zmean_plots.rda")
  pve_by_Zmean
}
```
    
From the multivariate GWA estimates, the genetic architecture of flowering time at 16°C 27 and WUE were indeed highly correlated (rg=0.34, 95% CI: 0.25–0.42), along with flowering time and growth rate (rg=0.45, 95% CI 0.38–0.52), primary dormancy (R=-0.14, 95% CI -0.21– -0.08), root RGR (R=0.32, 95% CI 0.25–0.38), RGR (R=-0.32, 95% CI -0.38– -0.25), and stomatal index (R=0.33, 95% CI 0.26–0.39). Growth rate was highly correlated with dormancy (R=-0.41, 95% CI -0.49–-0.34), root RGR (R=-0.38, 95% CI -0.46– -0.31), WUE (R=0.33, 95% CI 0.25–0.42), stomatal index (R=0.44, 95% CI 0.37–0.51), and RGR (R=-0.38, 95% CI -0.46– -0.31). WUE was also highly correlated with dormancy (R=-0.26, 95% CI -0.35– -0.18), root RGR (R=0.37, 95% CI 0.28– 0.47), stomatal index (R=0.40, 95% CI 0.32–0.48). 


```{r, echo=F, eval=T, warning=F,  fig.cap="Figure SIII.8 Various genetic architecture estimates for target phenotypes by the distribution of their genetic correlations with other target phenotypes.", fig.width=8, fig.height=11}
RERUN=F
if(RERUN){ 
    Zcor <- readRDS(file="./data/Zcor_all.rda") ## from raw data using in lmm
    
    st_all <- readRDS(file="./data/all_gwa_stats_table.rda")

    ## figure out which phenos are not overlapping and remove
    # sum(st_all$phenotype %in% colnames(Zcor))
    # dim(Zcor)
    # which(!colnames(Zcor) %in% st_all$phenotype)
    ## just the one that isn't included
    Zcor <- Zcor[-1030,-1030]
    
    ## remove diagonals = to 1 bc irrelevant
    diag(Zcor) <- rep(NA, length(diag(Zcor)))
    Zcor[1:10,1:10]
    
    target <- c("ABA_96h_low_water_potential",
                                            "Growth_rate",
                                            "Delta_13C",
                                            "DSDS10",
                                           "Stomatal_index_in_first_leaf", ## only 55
                                            "stomata_density",
                                            #"stomatasize",
                                            "FT16",
                                            "d8_10C_perc",
                                            "RGR",
                                            #"rhamnose_1_exp2",
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003",
                                            #"First_leaf_area",
                                            "72_Vern_Growth")

    
    ## do some filtering prior to averaging and finding the max
    st_all <- st_all %>% filter(phenotype %in% target)
   
    st_all <- st_all %>%  filter(pve.x>0.05, pve.y > 0.05, pve.x<0.95, pve.y<0.95, numaccessions>100)
    
    Zcor_filtered <- Zcor[rownames(Zcor) %in% st_all$phenotype, colnames(Zcor) %in% st_all$phenotype]
    
    Zcor_max <- apply(Zcor_filtered, MARGIN = 1, max, na.rm=T)
    Zcor_mean<- apply(Zcor_filtered, MARGIN = 1, mean, na.rm=T)
    Zcor_var <- apply(Zcor_filtered, MARGIN = 1, var, na.rm=T)

    ## merge sum stats into df
    Zdf <- data.frame(phenotype=names(Zcor_max),
                      Zmax = Zcor_max,
                      Zmean = abs(Zcor_mean),
                      Zvar = Zcor_var)
    
    st_all_Zs <- merge(Zdf, st_all, by="phenotype")
    st_all_Zs[1:10,1:35]
    
    #st_all_Zs$pve.x[grep(st_all_Zs$phenotype %in% colnames(Zcor_filtered))] 
    h2 <- st_all_Zs$pve.x[match(colnames(Zcor_filtered), st_all_Zs$phenotype)]
    rho <- st_all_Zs$rho_median[match(colnames(Zcor_filtered), st_all_Zs$phenotype)]
    PVEmed <- st_all_Zs$PVE_median[match(colnames(Zcor_filtered), st_all_Zs$phenotype)]
    PGEmed <- st_all_Zs$PGE_median[match(colnames(Zcor_filtered), st_all_Zs$phenotype)]
    ngamma <- st_all_Zs$n.gamma_median[match(colnames(Zcor_filtered), st_all_Zs$phenotype)]
    df <- data.frame(vec = as.vector(Zcor_filtered),
                     names=rep(colnames(Zcor_filtered), each=10),
                     h2 = rep(h2, each=10),
                     rho = rep(rho, each=10),
                     pve = rep(PVEmed, each=10),
                     pge = rep(PGEmed, each=10),
                     ngamma = rep(ngamma, each=10),
                     pvexpge = rep(PVEmed*PGEmed, each=10),
                     pge_ngamma = rep(PGEmed/ngamma, each=10))
    df<- na.omit(df)
    head(df)
  labs <- data.frame(lab = unique(df$names), rho = unique(rho), h2=unique(h2),
                     pve= unique(df$pve), pge=unique(df$pge), pvexpge = unique(df$pvexpge),
                     pge_ngamma = unique(df$pge_ngamma), ngamma = unique(df$ngamma))
  labs$edit_label <- c("FT @ 16C", "primary dormancy", "root RGR", 
                       "root horiz. index", "delta_13C", "stomata density",
                       "Germination %", "growth rate","RGR", "ABA")
  
   gencor_h2_boxplot <- ggplot(df, aes(x=as.factor(h2), y=abs(vec))) + geom_boxplot() +
     theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5)) +
      xlab("heritability") + ylab("dsit. of genetic correlations") +
      geom_text(data = labs, aes(x = as.factor(h2), y = 0.5, label = edit_label), size = 3, angle = 60)
   
   gencor_h2_boxplot
   saveRDS(gencor_h2_boxplot, file = "./figs/gencor_h2_boxplot.rda")
    
  
   gencor_rho_boxplot <- ggplot(df, aes(x=as.factor(rho), y=abs(vec))) + geom_boxplot() +
     theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5)) +
      xlab("polygenicity prop.") + ylab("dsit. of genetic correlations") +
      geom_text(data = labs, aes(x = as.factor(rho), y = 0.5, label = edit_label), size = 3, angle = 60)
   gencor_rho_boxplot


   gencor_PVEbyPGE_boxplot <- ggplot(df, aes(x=as.factor(pvexpge), y=abs(vec))) + geom_boxplot() +
     theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5)) +
      xlab("total genetic variation explained by sparse effects") + ylab("dsit. of genetic correlations") +
      geom_text(data = labs, aes(x = as.factor(pvexpge), y = 0.5, label = edit_label), size = 3, angle = 60)
   gencor_PVEbyPGE_boxplot
   
    gencor_Ngamma_boxplot <- ggplot(df, aes(x=as.factor(ngamma), y=abs(vec))) + geom_boxplot() +
     theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5)) +
      xlab("number of major effect loci") + ylab("dsit. of genetic correlations") +
      geom_text(data = labs, aes(x = as.factor(ngamma), y = 0.5, label = edit_label), size = 3, angle = 60)
   gencor_Ngamma_boxplot 
   
  gencor_pgeNgamma_boxplot <- ggplot(df, aes(x=as.factor(pge_ngamma), y=abs(vec))) + geom_boxplot() +
     theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5)) +
      xlab("variance explained/major effect loci") + ylab("dsit. of genetic correlations") +
      geom_text(data = labs, aes(x = as.factor(pge_ngamma), y = 0.5, label = edit_label), size = 3, angle = 60)
   gencor_pgeNgamma_boxplot
   
   gencors_bigplot <- plot_grid(gencor_h2_boxplot, gencor_rho_boxplot,
                              gencor_Ngamma_boxplot,  gencor_pgeNgamma_boxplot, nrow=4)
   
   gencors_bigplot
   saveRDS(gencors_bigplot, file = "./figs/gencors_bigplot.rda")
    
}else{
  gencors_bigplot <- readRDS(file = "./figs/gencors_bigplot.rda")
  gencors_bigplot
}
```

### Heritability correlations (again)

```{r, echo=F, eval=F, warning=F}
RERUN=F
if(RERUN){ 

  bslmm.filt.dat <- readRDS(file="./data/bslmm.filt.dat.rda")
  head(bslmm.filt.dat)
  
  ## get new cats
  newcats <- read.table(file="./data/atlas_pheno_NewCategors.tsv", sep = "\t", header = T)[,c(2,4)]
  head(newcats)
  bslmm.filt.dat <- merge(bslmm.filt.dat, newcats, by="phenotype")
  
  cor.test(y=bslmm.filt.dat$PVE_mean, x=bslmm.filt.dat$pve.x)
  
  
  hist(bslmm.filt.dat$numaccessions)
  above50<- bslmm.filt.dat %>% filter(numaccessions>100)
   cor.test(y=above50$PVE_mean, x=above50$pve.x)
  
  above50$h2diff <- abs(above50$PVE_mean-above50$pve.x)
  hist(above50$h2diff )
  
  above50_filt <- above50 %>% filter(h2diff < 0.25)
   cor.test(y=above50_filt$PVE_mean, x=above50_filt$pve.x)
   
  ggplot(above50_filt, aes(y=PVE_mean, x=pve.x)) +
    geom_point(aes(size=se_pve.x))
  
  ## Used this in paper after filtering out inconsisten h2 estimates that were inconsisten to the range of 25%
  ## estimates had correlation of 0.959
  mean(above50_filt$PVE_mean)
 c(mean(above50_filt$PVE_mean)-IQR(above50_filt$PVE_mean)/2, mean(above50_filt$PVE_mean)+ IQR(above50_filt$PVE_mean)/2)
 ## h2 = 0.49 [0.20-0.79]) 
 
 #mean(above50_filt$pve.x)
  mean(above50_filt$pve.y)
  c(mean(above50_filt$pve.y)-IQR(above50_filt$pve.y)/2, mean(above50_filt$pve.y)+ IQR(above50_filt$pve.y)/2)
  
  
    mean(above50_filt$n.gamma_mean)
 c(mean(above50_filt$n.gamma_mean)-IQR(above50_filt$n.gamma_mean)/2, mean(above50_filt$n.gamma_mean)+ IQR(above50_filt$n.gamma_mean)/2)
  ##  55.77 potentially causal loci [29.1-82.4]
     mean(above50_filt$PGE_mean)
 c(mean(above50_filt$PGE_mean)-IQR(above50_filt$PGE_mean)/2, mean(above50_filt$PGE_mean)+ IQR(above50_filt$PGE_mean)/2)
##PGE = 0.41 [0.28-0.55]
  
}else{
  
}
```


### Genetype-Phenotype Correlations

![Genetic correlations of phenotypes compared to raw correlations of phenotypes](../figs/Pheno-Geno-correlation_2.pdf){width=50% height=50%}

    
```{r, echo=F, eval=T, warning=F, fig.cap="Figure SIII.9 Genetic correlations compared to phenotypic correlations between A) all pairs of traits, B) only avoidance classified traits, and C) only escape classified traits.", fig.width=6, fig.height=5}
RERUN=F
if(RERUN){   

  library(reshape2)
  # ## load phenotype data
  # pheno<- read.table(file="./data/atlas1001_phenotypes_matrix.csv", sep=",", header = T)[,-1]
  # pheno_imp <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=",", header = T)[,-c(1,2)]
  # pheno_raw_quant <- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  # pheno_imp_quant <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T)[,-1]
  # 
  # ## do correlations
  # dats <- list(pheno, pheno_imp, pheno_raw_quant, pheno_imp_quant)
  # for (i in 1:length(dats)){
  #   dats[[i]] <- cor(dats[[i]], use = "pairwise.complete.obs")}
  # 
  # saveRDS(dats, file="./data/List_PhenoCorrelations.rda")
  pheno_cor_dats <- readRDS(file="./data/List_PhenoCorrelations.rda")
  genetic_cor <- readRDS(file="./data/Zcor_all.rda")
  
  #head(dats)
  #dats <- lapply(dats, cor(use = "pairwise.complete.obs"))
  #dats <- lapply(dats, melt)
  #dats <- lapply(dats, na.omit)
  
  dats[[1]][1:5,1:5]

  manip_cor <- function(pheno_or_genetic_cor_matrix){
    pheno_or_genetic_cor_matrix[lower.tri(pheno_or_genetic_cor_matrix, diag = TRUE)] <- NA
    melt_Pcor <- reshape2::melt(pheno_or_genetic_cor_matrix)
    melt_Pcor <- na.omit(melt_Pcor)
    return(melt_Pcor)
  }
  
  Pcors <- manip_cor(pheno_cor_dats[[1]])
  head(Pcors)
  sum(is.na(Pcors))
  
  Gcors <- manip_cor(genetic_cor)
  head(Gcors)
  sum(is.na(Gcors))
  
  ## merge melted correlation matrices
  GenPhenCor <- merge(x=Gcors, by.x=c("Var1", "Var2"), y=Pcors, by.y = c("Var1", "Var2"))
  dim(GenPhenCor)
  #GenPhenCor[which(is.na(GenPhenCor)),]
  #sum(is.na(GenPhenCor))
  
  colnames(GenPhenCor) <- c("pheno1", "pheno2", "gen", "phen")
  head(GenPhenCor) ## value x is genetic cor, value y is the phenotype cor
  
  # plot(GenPhenCor$gen ~ GenPhenCor$phen)
    
  hist(GenPhenCor$gen)
  sum(is.infinite(GenPhenCor$gen))
  hist(GenPhenCor$phen)
  
  ## names cors as either escape-escape, escape-avoid, or avoid-avoid
  dim(GenPhenCor)
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  head(atlasstrategies)
  sum(atlasstrategies$phenotype %in% GenPhenCor$pheno1)
  
  GenPhenCor$s1<- rep(NA, nrow(GenPhenCor))
  GenPhenCor$s2<- rep(NA, nrow(GenPhenCor))

  
  # strat1 <- c()
  # strat2 <- c()
  # GenPhenCor$pheno2 <- gsub(".1", "", GenPhenCor$pheno2)
  for (i in 145090:nrow(GenPhenCor)){
     check1 <- which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno1[i]))
      if(length(check1)==0) {
        GenPhenCor$s2[i] <- "Escape"
      }else{GenPhenCor$s1[i] <- paste0(atlasstrategies$stressstrategy[which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno1[i]))])}  
    
      check <- which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno2[i]))
      if(length(check)==0) {
        GenPhenCor$s2[i] <- "Escape"
      }else{GenPhenCor$s2[i] <- paste0(atlasstrategies$stressstrategy[which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno2[i]))])}
      print(i)
      print(c(GenPhenCor$s1[i], GenPhenCor$s2[i]))
      #strat1 <- c(strat1, paste0(atlasstrategies$stressstrategy[which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno1[i]))]))
      #strat2 <- c(strat2, paste0(atlasstrategies$stressstrategy[which(atlasstrategies$phenotype==as.character(GenPhenCor$pheno2[i]))]))
    }
    
  saveRDS(GenPhenCor, file="./data/GenPhenCor_all.rda")
  saveRDS(GenPhenCor, file="./data/GenPhenCor_101822.rda")
  # GenPhenCor$strat1 <- strat1
  # GenPhenCor$strat2 <- strat2
  # length(strat2)
    
    #### PLOT
    #######################################################################################
  GenPhenCor <- readRDS(file="./data/GenPhenCor_101822.rda")
  head(GenPhenCor)
  
  total_genphenocor_plot <- ggplot(GenPhenCor, aes(x=phen, y=gen)) +
      geom_hex(bins = 10, binwidth = c(.07,.07)) +
      scale_fill_gradient(low="gray90",high="gray10",trans="log10", ) +
      labs(title = "",
           x = "Phenotype correlation(r)",
           y = "Genetic Correlation (r)",
           fill = "log10 # of pairs") +
      geom_smooth(method=glm,
                  color="black", fill="blue") +
      theme_set(theme_cowplot())
  pdf(file="./figs/mainGenPhenCorPlot.pdf")
  #png(file="./figs/mainGenPhenCorPlot.png")
  total_genphenocor_plot
  dev.off()
  saveRDS(total_genphenocor_plot, file="./figs/total_GenPhenCor_plot.rda")
  
  avoid_GenPhenCor <- GenPhenCor %>% filter(s1 == "Avoidance", s2 == "Avoidance")
  head(avoid_GenPhenCor)
  avoid_GenPhenCor_plot <- ggplot(avoid_GenPhenCor, aes(x=phen, y=gen)) +
      geom_hex(bins = 10, binwidth = c(.07,.07)) +
      scale_fill_gradient(low="white",high=transparent("red"),trans="log10", ) +
      labs(title = "",
           x = "Phenotype correlation(r)",
           y = "Genetic Correlation (r)",
           fill = "log10 # of pairs") +
      geom_smooth(method=glm,
                  color="black", fill="blue") +
      theme_set(theme_cowplot())
  avoid_GenPhenCor_plot
  
  escape_GenPhenCor <- GenPhenCor %>% filter(s1 == "Escape", s2 == "Escape")
  head(escape_GenPhenCor)
  escape_GenPhenCor_plot <- ggplot(escape_GenPhenCor, aes(x=phen, y=gen)) +
      geom_hex(bins = 10, binwidth = c(.07,.07)) +
      scale_fill_gradient(low="white",high=transparent("green4"),trans="log10", ) +
      labs(title = "",
           x = "Phenotype correlation(r)",
           y = "Genetic Correlation (r)",
           fill = "log10 # of pairs") +
      geom_smooth(method=glm,
                  color="black", fill="blue") +
      theme_set(theme_cowplot())
  escape_GenPhenCor_plot
  
  GenPhen_bigPlot <- plot_grid(total_genphenocor_plot, avoid_GenPhenCor_plot, escape_GenPhenCor_plot, nrow=2)
  
  saveRDS(GenPhen_bigPlot, file="./figs/GenPhen_bigPlot.rda")
  
}else{
  GenPhen_bigPlot <- readRDS(file="./figs/GenPhen_bigPlot.rda")
  GenPhen_bigPlot
  }
```


```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=8 }
##### Plotting phenotype target correlations alongside genetic correlations
RERUN=F
if(RERUN){

    imp_multivar_corr_matrix <- readRDS(file="./data/imp_multivar_corr_matrix.Rda")

    colnames(imp_multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    rownames(imp_multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    target_Pcor <- readRDS(file="./data/Phenotype_Imp&Raw_TargetCorrelations.rda")
    
    
}else{
  
}
```


################################################################################
# IV. Mapping Multivariate and Covariate GWAS
################################################################################

################################################################################
## IV.1 Running Multivariate GWAS
################################################################################

To estimate genetic correlations, we also used the multivariate mixed model implemented in GEMMA (Zhou and Stephens 2014).

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="Figure SIII.10 Genetic correlations for target traits estimated through A) summary statistics for LD blocks approximation, B) imputed and normalized phenotypes used in multivariate GWAS, and C) normalized phenotypes, not imputed, used in multivariate GWAS, in this case many phenotypes did not have enough overlapping samples to perform the method.", fig.width=8, fig.height=8 }
RERUN=F
if(RERUN){
  
  setwd("~/safedata/natvar/")
  pheno<- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T)
  ## used imputed data and got super weird estimates of heritability and genetic correlation
  #pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T)
  pheno <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T)
  
  head(pheno)
  getwd()
  system(paste('ln -f ../1001g/1001gbi.bim ', paste0('./multivarGWAS/','1001gbi.bim')))
  system(paste('ln -f ../1001g/1001gbi.fam ', paste0('./multivarGWAS/','1001gbi.fam')))
  system(paste('ln -f ../1001g/1001gbi.bed ', paste0('./multivarGWAS/', '1001gbi.bed')))
  system(paste('ln -f ../1001g/1001gbi.sXX.txt ', paste0('./multivarGWAS/','1001gbi.sXX.txt')))
  
  fam <- read.table("./data-raw/1001gbi.fam")
  #phenoname <- c("FT16", "DSDS10", "Delta_13C", "Growth_rate")
  #target_phenos <- pheno[,c("id","FT16", "DSDS10", "Delta_13C", "Growth_rate")]
  
  target_phenos <- pheno[,colnames(pheno) %in% c("id",
                                           "ABA_96h_low_water_potential", 
                                            "Growth_rate", 
                                            "Delta_13C", 
                                            "DSDS10", 
                                           "Stomatal_index_in_first_leaf", ## only 55
                                            "stomata_density",
                                            #"stomatasize",
                                            "FT16",  
                                            "d8_10C_perc", 
                                            "RGR", 
                                            #"rhamnose_1_exp2", 
                                            "Root_horizontal_index_day001",
                                            "Relative_root_growth_rate_day002.day003",
                                            #"First_leaf_area", 
                                            "X72_Vern_Growth")]
  head(target_phenos)
  phenoname <- c("ABA_96h_low_water_potential", 
                  "Growth_rate", 
                  "Delta_13C", 
                  "DSDS10", 
                  "Stomatal_index_in_first_leaf", ## only 55
                  "stomata_density",
                  "FT16",  
                  "d8_10C_perc", 
                  "RGR", 
                  "Root_horizontal_index_day001",
                  "Relative_root_growth_rate_day002.day003",
                  #"First_leaf_area", 
                  "X72_Vern_Growth")
  
  head(target_phenos)
  
  newfam <- merge(fam,by.x="V1", target_phenos, by.y="id")
  head(newfam)
  newfam[,phenoname][is.na(newfam[,phenoname])] <- -9
  newfam <- newfam[,-6]
  pheno_names <- colnames(newfam)[6:17]
  
    #also write out the covariance between the traits and run univariate gwas on that
  getCovary <- function( pheno1, pheno2) {
    newpheno <- c()
    mn1 <- mean(pheno1, rm.na=T)
    mn2 <- mean(pheno2, rm.na=T)
    sd1 <- sd(pheno1)
    sd2 <- sd(pheno2)
    for (i in 1:length(pheno1)){
      #print(c(pheno1[i], pheno2[i]))
      if (pheno1[i]==-9 || pheno2[i]==-9){
        newpheno <- c(newpheno, -9) 
        #print("-9")
      }else{
        cov <- ((pheno1[i]- mn1) * (pheno2[i]- mn2)) / (sd1*sd2)
        newpheno <- c(newpheno, cov)
        #print(cov)
      }
    }
    return(newpheno)
  }
  
  newfam$ftD13 <- getCovary(newfam$FT16, newfam$Delta_13C)
  newfam$ftDS <- getCovary(newfam$FT16, newfam$DSDS10)
  newfam$ftGR <- getCovary(newfam$FT16, newfam$Growth_rate)
  newfam$D13GR <- getCovary(newfam$Delta_13C, newfam$Growth_rate)
  newfam$DSGR <- getCovary(newfam$DSDS10, newfam$Growth_rate)
  head(newfam)
  
  getwd() 
  write.table(newfam, file="./multivarGWAS/1001gbi.fam", quote = F, row.names = F, col.names = F)

  
  trait1<-2
  trait2<-2
  count<-1
  setwd("~/safedata/natvar/multivarGWAS/")
  for (trait1 in 1:12) {
    for (trait2 in 1:12) {
      if (trait1 == trait2){
        print("same trait")
      }else if(file.exists(paste0("mGWAS_",pheno_names[trait2],"_", pheno_names[trait1]))){
        print("combo already ran")
      }else{
        print(paste("run", count,"_", pheno_names[trait1],"_", pheno_names[trait2]))
        count<- count+1
        newdirname <- paste0("imp_mGWAS_",pheno_names[trait1],"_", pheno_names[trait2])
        system(paste("mkdir", newdirname))
        system(paste("cp ./1001gbi.fam", newdirname))
        setwd(paste0("~/safedata/natvar/multivarGWAS/", newdirname))
        write.table(quote=F,row.names=F,col.names=F,
              file=paste0('multivargwa.sh'),
              x=rbind(
                "#!/bin/bash",
                "#SBATCH --cpus-per-task=2",
                "#SBATCH --mem-per-cpu=8G",
                "#SBATCH --partition=DPB",
                paste0("#SBATCH --job-name=", pheno_names[trait1], "_", pheno_names[trait2]),
                paste0("#SBATCH --output=", pheno_names[trait1], "_", pheno_names[trait2], ".slurm.log"),
              # paste("../gemma -bfile ../1001gbi -miss 0.95 -maf 0.01 -r2 1 -k ../1001gbi.sXX.txt -lmm 4 -n", trait1, trait2, "-o", newdirname)))
                paste("../gemma -bfile ../1001gbi -miss 0.05 -maf 0.05 -r2 1 -k ../1001gbi.sXX.txt -lmm 4 -n", trait1, trait2, "-o", newdirname)))
        system("conda activate gemma")
        system('sbatch multivargwa.sh')
        setwd("~/safedata/natvar/multivarGWAS/")
      }
    }
  }
  
  
  # write.table(quote=F,row.names=F,col.names=F,
  #             file=paste0('multivargwa.sh'),
  #             x=rbind(
  #               "#!/bin/bash",
  #               "#SBATCH --cpus-per-task=2",
  #               "#SBATCH --mem-per-cpu=8G",
  #               "#SBATCH --partition=DPB",
  #               paste0("#SBATCH --job-name=Ft16_Delta13"),
  #               paste0("#SBATCH --output=Ft16_Delta13.slurm.log"),
  #               paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 2 4 -o Ft16_Delta13') ## running
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 2 3 -o Ft16_Dorm') ## didn't run
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 2 5 -o Ft16_GR') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.05 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 4 5 -o Delta13_GR') ## nope not running
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 3 5 -o Dorm_GR') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.05 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 4 5 -o Delta13_Dorm') ## nope not running
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 6 -o covary_ftD132') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 7 -o covary_ftDS') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 8 -o covary_ftGR') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 9 -o covary_D13GR') ## done
  #               #paste('./gemma -bfile 1001gbi -miss 0.95 -maf 0.01 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 9 -o covary_DSGR') ## done
  #             )
  # )
  # system('sbatch multivargwa.sh')

  
  ### go through and get results from mvm gwas
  
  getwd()
  setwd("~/safedata/natvar/multivarGWAS/")
  dirs <- list.dirs(path = ".")
  # just_mGWAS <- na.omit(dirs[grep("mGWAS_", dirs)][133:length(dirs)])
  # mGWAS_output <- just_mGWAS[grep("output", just_mGWAS)]
  
  just_impmGWAS <- dirs[grep("imp_mGWAS_", dirs)]
  impmGWAS_output <- just_impmGWAS[grep("output", just_impmGWAS)]

  dir <- impmGWAS_output[1]
  notwork <- 0
  for (dir in impmGWAS_output){
    setwd("~/safedata/natvar/multivarGWAS/")
    setwd(dir)
    getwd()
    print(dir)
    files <- list.files(path ="." )
    if (length(files) < 1){
      print("didn't work")
      notwork <- notwork+1
    }
    else{
      cmd <- paste0("cp ", files[2], " ~/safedata/natvar/multivarGWAS/all_impmGWAS_logoutput/")
      system(cmd)
    }
  }
  
  # setwd("~/safedata/natvar/multivarGWAS/all_mGWAS_logOutput/")
  # file_list <- list.files(".")
  
  setwd("~/safedata/natvar/multivarGWAS/all_impmGWAS_logoutput/")
  file_list <- list.files(".")
  
  # multivar_corr_matrix <- matrix(NA, ncol=12, nrow=12)
  imp_multivar_corr_matrix <- matrix(NA, ncol=12, nrow=12)
  phenoname <- c("ABA_96h_low_water_potential", 
                  "Growth_rate", 
                  "Delta_13C", 
                  "DSDS10", 
                  "Stomatal_index_in_first_leaf", ## only 55
                  "stomata_density",
                  "FT16",  
                  "d8_10C_perc", 
                  "RGR", 
                  "Root_horizontal_index_day001",
                  "Relative_root_growth_rate_day002.day003",
                  "X72_Vern_Growth")
  rownames(imp_multivar_corr_matrix) <- phenoname
  colnames(imp_multivar_corr_matrix) <- phenoname
  diag(imp_multivar_corr_matrix) <- 0
  
  i<-2
  k<-4
  for (i in 1:length(phenoname)){
    for (k in 1:length(phenoname)){
      if (i == k){
        print("same pheno")
      }else{
        first_pass <- file_list[grep(phenoname[i], file_list)]
        second_pass <- first_pass[grep(phenoname[k], first_pass)]
        
        if(length(second_pass)>0) {
          
          all_lines <- readLines(second_pass)
          head(all_lines)
          
          pve_split <- str_split(all_lines[26],pattern = "\t")[[1]]
          pve_cor <- as.numeric(pve_split[1]) 
          imp_multivar_corr_matrix[phenoname[i], phenoname[k]] <- pve_cor
        
        }else{
          print("file does not exist")
        }
      }
    }
  }
  
  setwd("~/safedata/natvar/")
  saveRDS(imp_multivar_corr_matrix, file="./data/imp_multivar_corr_matrix.Rda")
  saveRDS(multivar_corr_matrix, file="./data/multivar_corr_matrix.Rda")

  Zcor <- readRDS(file="./data/Zcor_all.rda") ## from raw data using in lmm
  phenoname <- c("ABA_96h_low_water_potential", 
                    "Growth_rate", 
                    "Delta_13C", 
                    "DSDS10", 
                    "Stomatal_index_in_first_leaf", ## only 55
                    "stomata_density",
                    "FT16",  
                    "d8_10C_perc", 
                    "RGR", 
                    "Root_horizontal_index_day001",
                    "Relative_root_growth_rate_day002.day003",
                    "72_Vern_Growth")
  dim(Zcor)
  
  ## these are the three genetic correlation matrices
  
  target_Zcors <- Zcor[rownames(Zcor) %in% phenoname,colnames(Zcor) %in% phenoname]
  saveRDS(target_Zcors, file="./data/target_Zcors.rda")
  
}else{  
    phenoname <- c("ABA_96h_low_water_potential", 
                    "Growth_rate", 
                    "Delta_13C", 
                    "DSDS10", 
                    "Stomatal_index_in_first_leaf", ## only 55
                    "stomata_density",
                    "FT16",  
                    "d8_10C_perc", 
                    "RGR", 
                    "Root_horizontal_index_day001",
                    "Relative_root_growth_rate_day002.day003",
                    "X72_Vern_Growth")
    
    imp_multivar_corr_matrix <- readRDS(file="./data/imp_multivar_corr_matrix.Rda")
    multivar_corr_matrix <- readRDS(file="./data/multivar_corr_matrix.Rda")
    target_Zcors <- readRDS(file="./data/target_Zcors.rda")
    
    # colnames(target_Zcors)[8] <- "X72_Vern_Growth"
    # rownames(target_Zcors)[8] <- "X72_Vern_Growth"

    #colnames(imp_multivar_corr_matrix)[match(colnames(target_Zcors), colnames(imp_multivar_corr_matrix))]
    # match_id <- match(colnames(target_Zcors), colnames(imp_multivar_corr_matrix))
    # #which(colnames(imp_multivar_corr_matrix) == "X72_Vern_Growth")
    # 
    # imp_multivar_corr_matrix <- imp_multivar_corr_matrix[match_id,match_id]
    # multivar_corr_matrix <- multivar_corr_matrix[match_id,match_id]

    colnames(multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    rownames(multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    colnames(imp_multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    rownames(imp_multivar_corr_matrix) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    colnames(target_Zcors) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    rownames(target_Zcors) <- c("FT @ 16", "dormancy", "root RGR", "rootHorix ind.", 
                                        "Delta_13C", "stomata density",
                            "stomata ind.","post-vern. growth", "germination %", "growth rate", "RGR", "ABA")
    library(RColorBrewer)
    
    par(mfrow=c(2,2))
    corrplot(target_Zcors, method = "color", type = "full", diag = F,tl.cex = .75, col=brewer.pal(9,'PiYG'), addCoef.col = "black", number.cex = .5, tl.srt = 45)

    corrplot(imp_multivar_corr_matrix, method = "color", type = "full", diag = F,tl.cex = .75, col=brewer.pal(9,'PiYG'), addCoef.col = "black", number.cex = .5, tl.srt = 45)

    corrplot(multivar_corr_matrix, method = "color", type = "full", diag = F,tl.cex = .75, col=brewer.pal(9,'PiYG'), addCoef.col = "black", number.cex = .5, tl.srt = 45)

    
  # ## pdf plot for manuscript
  # diag(target_Zcors) <- 0
  # imp_multivar_corr_matrix[upper.tri(target_Zcors)] <- target_Zcors[upper.tri(target_Zcors)]
  # 
  # png(file = "./figs/Imputed_Raw_SingleCorMatrix.png")
  #    corrplot(imp_multivar_corr_matrix, method = "color", type = "full",
  #             diag = F,tl.cex = 1.2,tl.col = "black", col=brewer.pal(9,'PiYG'),
  #             addCoef.col = "black", number.cex = .85, tl.srt = 45)
  # dev.off()
  # 
  # target_GWA_stats_only <-read.table("./tables/TaretPheno_GWAstats.tsv", sep="\t", header = T)
  # 
  # target_GWA_stats_only$phenotype[1] <- "X72_Vern_Growth"
  # reordered_targetGWA_stats <- target_GWA_stats_only[match(rownames(imp_multivar_corr_matrix), target_GWA_stats_only$phenotype),]
  # 
  # imp_multivar_corr_matrix
  # reordered_targetGWA_stats <- reordered_targetGWA_stats[c(1,2,4,9)]
  # colnames(reordered_targetGWA_stats) <- c("pheno", "N", "h2.bslmm", "h2")
  # reordered_targetGWA_stats[,3] <- signif(reordered_targetGWA_stats[,3], digits = 4)
  # reordered_targetGWA_stats[,4] <- signif(reordered_targetGWA_stats[,4], digits = 3)
  # write.table(reordered_targetGWA_stats, file="./tables/reordered_GWA_target_h2estimates.tsv", col.names = T, quote = F, row.names = F)
  }

```

### Plotting GWA 

##### Flowering time

```{r, echo=F, eval=T, message=F, warning=F}
RERUN=F
if(RERUN){
  library(lattice)
  source("./analyses/QQPlotbyMatthewFlickinger.R")
  
 # ft_bslmm <- data.table::fread(file="./phenotypes/1001_Consortium_Cell_2016_PID_27293186/1001/bslmm_FT16/output/1001_Consortium_Cell_2016_PID_27293186_bslmm_FT16.assoc.txt.param.txt") head(ft_bslmm)
 # 
 # ft_reg <- data.table::fread(file="./phenotypes/1001_Consortium_Cell_2016_PID_27293186/1001/FT16/output/1001_Consortium_Cell_2016_PID_27293186norm.lmm.assoc.txt") %>% select(chr,rs, ps, af, beta, se, p_score)
 # head(ft_reg)
 # ft_norm <- data.table::fread(file="./phenotypes/1001_Consortium_Cell_2016_PID_27293186/1001/norm_FT16/output/1001_Consortium_Cell_2016_PID_27293186norm.lmm.assoc.txt") %>% select(chr,rs, ps, af, beta, se, p_score)
 # head(ft_norm)
  
  #ft_bslmm <- data.table::fread(file="./gwaresults/1001_Consortium_Cell_2016_PID_27293186_bslmm_FT16.assoc.txt.param.txt")
  #dim(ft_bslmm)
  
  ft_gwa <- data.table::fread(file="./gwaresults/1001_Consortium_Cell_2016_PID_27293186norm.lmm.assoc.txt")
  head(ft_gwa)
  norm_ft <- data.table::fread(file="./gwaresults/normFT.lmm.assoc.txt")
  head(norm_ft)

  # qqplot <- qqunif.plot(ft_reg$p_score) 
  # qqplot
   qqplot <- qqunif.plot(norm_ft$p_score) 
   qqplot
  
  #install.packages("devtools", dependencies = T)
  #library(devtools)
  #install_github("drveera/ggman")
  library(ggman)
  
  thresh <- -log10(0.05/nrow(ft_gwa))
  filt_ft_gwa <- ft_gwa %>% filter(p_score<0.1) %>% 
    filter(se < 1)
ft_gwa1 <- ggman(filt_ft_gwa, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +
  scale_color_manual(values=c("#1a1a1a","#bababa")) 
 ft_gwa1
   
 
   thresh <- -log10(0.05/nrow(norm_ft))
  filt_norm_ft <- norm_ft %>% filter(p_score<0.05) %>% 
    filter(se < .05)
ft_gwa2_norm <- ggman(filt_norm_ft, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +
  scale_color_manual(values=c("#1a1a1a","#bababa")) 
 ft_gwa2_norm
 
   filt_norm_ft$log10 <- -log10(filt_norm_ft$p_score)
  filt_norm_ft %>% filter(log10 >thresh) %>% filter(chr==1) 
  
 
 pdf(file="./figs/ft_manhattanplot.pdf", width=8, height=3)
 ft_gwa2_norm
 dev.off()
   
 ## Delta_C13
 dc13_gwa <- data.table::fread(file="./gwaresults/delta13C_raw.lmm.assoc.txt")
  head(dc13_gwa)
  norm_dc13 <- data.table::fread(file="./gwaresults/Meaux_Dittberner_MolEcol_2018_PID_30118161norm.lmm.assoc.txt")
  head(norm_dc13)
 
 
   #thresh <- -log10(0.05/nrow(dc13_gwa))
  filt_dc13_gwa <- dc13_gwa %>% filter(p_score<0.1) %>% 
    filter(se < 0.1)
   thresh <- -log10(0.05/nrow(norm_dc13))
 dc13_gwa1 <- ggman(filt_dc13_gwa, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +
  scale_color_manual(values=c("#1a1a1a","#bababa")) 
 dc13_gwa1
 
   filt_norm_dc13 <- norm_dc13 %>% filter(p_score<0.1) %>% 
    filter(se < 0.1)
   thresh <- -log10(0.05/nrow(norm_dc13))
 dc13_gwa2 <- ggman(filt_norm_dc13, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +
  scale_color_manual(values=c("#1a1a1a","#bababa")) 
 dc13_gwa2
   
 pdf("./figs/dc13_norm_manhattanPlot.pdf", width = 8, height = 2.5)
 dc13_gwa2
 dev.off()
   
 
  filt_dc13_gwa$log10 <- -log10(filt_dc13_gwa$p_score)
  filt_dc13_gwa %>% filter(log10 >thresh) %>% filter(chr==1) 
  
    #FRI_chr_posrange <- c(4, 269026, 270358)
    ## +/- 1kb
    FRI_chr_posrange <- c(4, 268026, 271358)
  FRI_chr_posrange
  FRI_snps <- filt_norm_ft %>% filter(chr==4) %>% 
    filter(ps > FRI_chr_posrange[2]) %>% 
    filter(ps < FRI_chr_posrange[3])
  dim(FRI_snps)
  
  #FLC_chr_posrange <- c(5, 3173724, 3179155)
    FLC_chr_posrange <- c(5, 3172724, 3180155)
  FLC_chr_posrange
  FLC_snps <- filt_norm_ft %>% filter(chr==5) %>% 
    filter(ps > FLC_chr_posrange[2]) %>% 
    filter(ps < FLC_chr_posrange[3])
  dim(FLC_snps)
  
  fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs)
  fri_flc_plot <- ggmanHighlight(ft_gwa2_norm, highlight = fri_flc_highlights, size = 0.3) 
  fri_flc_plot
  
  
}else{
  
}


```


##### Flowering time and Delta_C13

Linear mixed effects models were constructed using 248 samples phenotype for both FT16 and Delta_C13, 546,668 bialleic SNPs, an allele frequency cutoff of 0.05, and a 5% missing genotype data maximum of across samples. The REML estimate of h2 for FT16 was 0.860897 (se 0.09), and delta_C13 was 0.38 (se 0.13). The estimated genetic correlation between the two is 0.44 (se 0.08).

For the imputed data...

###### QQ plots

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="Figure SIII.11 QQplots for multivariate GWA with Flowering time and Delta_C13 for different GWA run with raw and imputed data, both with and without PCs.", fig.width=8, fig.height=8 }
RERUN=F
if(RERUN){
  
  ## multivariate gwa
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  tmp1<-data.table::fread(file="./gwaresults/mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  library(lattice)
  source("~/safedata/natvar/analyses/QQPlotbyMatthewFlickinger.R")
  qqplot <- qqunif.plot(tmp1$p_score) 
  qqplot
  
  ## mGWA with 5 genetic pcs
  pcs_tmp1 <- data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score)
  head(pcs_tmp1)

  pcs_qqplot <- qqunif.plot(pcs_tmp1$p_score)
  pcs_qqplot
  
  ## imputed multivariate gwa
  imp_tmp1 <- data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(imp_tmp1)
  
  imp_qqplot <- qqunif.plot(imp_tmp1$p_score) 
  imp_qqplot
  
  # imputed multivariate gwa with pcs
  imp_pcs_tmp1 <- data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score)
  head(imp_pcs_tmp1)

  imp_pcs_qqplot <- qqunif.plot(imp_pcs_tmp1$p_score)
  imp_pcs_qqplot
  
  ft_dC13_qqplots <- plot_grid(qqplot, pcs_qqplot, imp_qqplot, imp_pcs_qqplot, nrow = 2, ncol=2, labels=c("mGWAS", "pcs_mGWA", "imputed_mGWAS", "pcs_imputed_mGWA"))
  ft_dC13_qqplots
  saveRDS(ft_dC13_qqplots, file="./figs/tmpobjects/FT_dC13_qqplots.rda")
  
} else{
  
  ft_dC13_qqplots <- readRDS(file="./figs/tmpobjects/FT_dC13_qqplots.rda")
  ft_dC13_qqplots
  
}
```

###### mGWA top hit mapping / Manhattan plots

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Figure SIII.12 Manhattan plot of multivariate GWA with flowering time and delta_C13 without genetic pcs."}
RERUN=F
if(RERUN){
  
  library(devtools)
  install_github("drveera/ggman")
  library(ggman)
  setwd("./safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa
  ### ======================================================== ###
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  thresh <- -log10(0.05/nrow(tmp1))
  
  ## colored plots, how to change colors??
  p1 <- ggman(tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#54278f","#e6550d")) 
    
  tmp1$log10 <- -log10(tmp1$p_score)
  tmp1 %>% filter(log10 >thresh) %>% filter(chr==1) 
  
  
  grep("3876764", tmp1$ps)
  tmp1$rs[18450:18470]
  
  p1
  ggmanZoom(p1, start.position = 3869637, end.position = 3878812, chromosome = 1)
  
  
  saveRDS(p1, file="./figs/FT_dC13_mGWA_manhattanPlot_ggman.rda")
  FRI_chr_posrange
  FRI_snps <- tmp1 %>% filter(chr==4) %>% 
    filter(ps > FRI_chr_posrange[2]) %>% 
    filter(ps < FRI_chr_posrange[3])
  dim(FRI_snps)
  
  FLC_chr_posrange
  FLC_snps <- tmp1 %>% filter(chr==5) %>% 
    filter(ps > FLC_chr_posrange[2]) %>% 
    filter(ps < FLC_chr_posrange[3])
  dim(FLC_snps)
  
  fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs)
  fri_flc_plot <- ggmanHighlight(p1, highlight = fri_flc_highlights, size = 0.3) +
    scale_color_manual(values=c("#636363","#bdbdbd")) 
  saveRDS(fri_flc_plot, file="./figs/FT_dC13_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
  
  tmp1.gwas.sig <- tmp1[-log10(tmp1$P)>thresh,]
  
  which(-log10(tmp1$p_score)>thresh)
  tmp1[12010:12015]
  
   chrm1_peakhits<- tmp1.gwas.sig %>% filter(chr==1)
  write.table(chrm1_peakhits[6:25,]$rs, file="~/safedata/natvar/multivarGWAS/mGWAS_FT16_Delta_13C/peak1snps.txt", col.names = F, row.names = F, quote = F)
  
  chrm1_peakhits <- read.table(file="./multivarGWAS/mGWAS_FT16_Delta_13C/peak1snps.txt")
  
  ## in plink
  ## ~/plink-1.07-x86_64/plink --bfile 1001gbi --from rs2745383 --to rs 7021124 --make-bed --out peakchr1
  chrm1_peakhits$log10 <- -log10(chrm1_peakhits$p_score)
  
  chrm1_peakhits[3:25]
  
  ggplot(data=chrm1_peakhits[6:25,]) +  geom_point(aes(x= ps, y=log10))
  chrm1_peakhits <- chrm1_peakhits[6:25,]
  
  ld_mat <- fread(file="~/safedata/1001g/ld/r2.ld")
  
  ld_mat[1:5,]
  
  chrm1_peakhits$rs %in% ld_mat$SNP_A
  
  ld_mat_prune <- ld_mat[ld_mat$SNP_A %in% chrm1_peakhits$rs , ]
    ld_mat_prune <- ld_mat[ld_mat$SNP_A %in% chrm1_peakhits$rs & ld_mat$SNP_B %in% chrm1_peakhits$rs , ]
  
    
    sum(ld_mat$SNP_B %in% chrm1_peakhits$rs)
    
  ld_melt <- data.frame(A=ld_mat_prune$SNP_A, B=ld_mat_prune$SNP_B, r2=ld_mat_prune$R2)
  ld_melt <- ld_melt[sort(ld_melt$A)]
    
  library(reshape)
  
  LD_peak1 <- cast(ld_melt, A~B)
  LD_peak1
  saveRDS(LD_peak1, file="./data/LD_peak1.rda")
  
  
  gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
  head(gff)
  
  ## get locations of genes like FLC AT5G10140 and Frigida AT4G00650
  dim(gff)
  gene_list <- c()
  
  chr4_5_gff <-gff %>% filter(V1 %in% c(4,5))
  head(chr4_5_gff)
  
  i<-1
  
  gene_list_data <- c()
  target_Genes <- c("AT5G10140", "AT4G00650", "AT4G00640")
  for (i in 1:nrow(chr4_5_gff)) {
    part1 <- unlist(strsplit(as.character(chr4_5_gff$V9[i]),split = ".", fixed = T))[1]
    gene <- unlist(strsplit(part1, "=", fixed = T))[2]
    if (gene %in% target_Genes){
      gene_list_data <- rbind(gene_list_data, chr4_5_gff[i,])
    }
  }
  
  min(gene_list_data[gene_list_data$V1== 4,]$V4)
  max(gene_list_data[gene_list_data$V1== 4,]$V4)
  FRI_chr_posrange <- c(4, 269026, 270358)
  ## Frigida is chr 4 pos 269026-270358
  
  min(gene_list_data[gene_list_data$V1== 5,]$V4)
  max(gene_list_data[gene_list_data$V1== 5,]$V4)
  FLC_chr_posrange <- c(5, 3173724, 3179155)
  ## FLC is chr 5 from pos 3173724-3179155
  
  
  i<-56
  gene_list <- c()
  tmp1.gwas.sig[i,]
  colnames(tmp1.gwas.sig) <- c("chr", "ps", "rs", "p_score")
  for (i in 1:nrow(tmp1.gwas.sig)) {
    tmp <- gff %>% filter(V1==paste0("Chr",tmp1.gwas.sig$chr[i])) %>% 
      filter(V4 < tmp1.gwas.sig$ps[i]) %>% 
      filter(V5 > tmp1.gwas.sig$ps[i])
    
    part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
    gene <- unlist(strsplit(part1, "=", fixed = T))[2]
    
    gene_list <- c(gene_list, gene)
  }
  
  tmp1.gwas.sig$genes <- gene_list
  gene_list_writeout<- na.omit(gene_list)
  write.table(na.omit(gene_list), file="./tables/FT_dC13_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/servlets/Search?type=gene&action=new_search and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_dC13_mGWAtopHits_GOAnnotationsAll.tsv", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc" | category=="func")
  
  
  curated_GO_annots_FT_dC13_mGWA <- GO_annots[,c(1,2, 4, 7, 9)] %>% filter(category=="proc"| category=="func")
  saveRDS(curated_GO_annots_FT_dC13_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_dC13_mGWA.rda")
  
  ### ======================================================== ###
  ### repeat with mGWAS + 5 genetic pcs
  ### ======================================================== ###
  pcs_tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(pcs_tmp1)
  
  thresh <- -log10(0.05/nrow(pcs_tmp1))
  
  ## colored plots, how to change colors??
  p2 <- ggman(pcs_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", 
              relative.positions = T, sigLine = 6, title="") +  
    scale_color_manual(values=c("#54278f","#e6550d"))
  p2
  saveRDS(p2, file="./figs/FT_pcs_dC13_mGWA_manhattanPlot_ggman.rda")
  
  pcs_tmp1.gwas.sig <- pcs_tmp1[-log10(pcs_tmp1$p_score)>6,]

  
  ## function to get gene names from GFF file that overlap with top hit SNPs
  getGenes_fromGFF <- function(dat){
    #gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
    gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff")
    head(gff)
    gene_list <- c()
    for (i in 1:nrow(dat)) {
      tmp <- gff %>% filter(V1==dat$chr[i]) %>% 
        filter(V4 < dat$ps[i]) %>% 
        filter(V5 > dat$ps[i])
      
      part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
      gene <- unlist(strsplit(part1, "=", fixed = T))[2]
      gene_list <- c(gene_list, gene)
    }
    gff<-c()
    return(gene_list)
  }
  
  pcs_tmp1.gwas.sig$genes <- getGenes_fromGFF(pcs_tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(pcs_tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_dC13_pcs_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp  and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_dC13_pcs_mGWAtopHits_GOAnnotationsAll.tsv", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc" | category=="func")
  curated_GO_annots_FT_dC13_pcs_mGWA <- GO_annots[,c(1,2, 4, 7, 9,10)] %>% filter(category=="proc"| category=="func")
  saveRDS(curated_GO_annots_FT_dC13_pcs_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_dC13__pcs_mGWA.rda")
  
  
  big_Manhattan_Plot_FT_dC13_mGWA <- plot_grid(p1,p2, nrow=2, ncol=1)
  png(file="./figs/big_Manhattan_Plot_FT_dC13_mGWA.png")
  big_Manhattan_Plot_FT_dC13_mGWA
  dev.off()
  saveRDS(big_Manhattan_Plot_FT_dC13_mGWA, file="./figs/big_Manhattan_Plot_FT_dC13_mGWA.rda")
  
}else{
  # big_Manhattan_Plot_FT_dC13_mGWA <- readRDS(file="./figs/big_Manhattan_Plot_FT_dC13_mGWA.rda")
  # big_Manhattan_Plot_FT_dC13_mGWA
  ## not working too large
  #knitr::include_graphics("./figs/big_Manhattan_Plot_FT_dC13_mGWA.png")
  
 FT_dC13_mGWA_manhattanPlot <- readRDS(file="./figs/FT_dC13_mGWA_manhattanPlot_ggman.rda")
 FT_dC13_mGWA_manhattanPlot
}
```

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Figure SIII.13 Manhattan plot of multivariate GWA with flowering time and delta_C13 with 5 genetic pcs included in the analysis."}
  library(ggman)
  setwd("~/safedata/natvar/")  
  FT_pcs_dC13_mGWA_manhattanPlot <- readRDS(file="./figs/FT_pcs_dC13_mGWA_manhattanPlot_ggman.rda")
  FT_pcs_dC13_mGWA_manhattanPlot
```  

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Figure SIII.14 Manhattan plot of multivariate GWA with flowering time and delta_C13 with FRI (chr4) and FLC (chr5) SNPs marked in red."}
  library(ggman)
  setwd("~/safedata/natvar/")  
fri_flc_plot <- readRDS(file="./figs/FT_dC13_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
fri_flc_plot
```  

Note the the peak on the 5th chromosome,AT5G10170, that encodes for embryo development ending in seed dormancy, is extremely close to the FLC gene, AT5G10140. 
  
Also note the location of Frigida AT4G00650/AT4G00640

```{r, echo=F, eval=F, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_dC13_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_dC13_mGWA.rda")
  #write.table(curated_GO_annots_FT_dC13_mGWA, file="./tables/curated_GO_annots_FT_dC13_mGWA.tsv", sep="\t", col.names = T, row.names = F, quote = F)
  knitr::kable(curated_GO_annots_FT_dC13_mGWA, caption = "Table S18 Curated GO annotations for FT and dC13 multivariate GWA top hits.", fixed_thread=T)
}
```  
  
  
```{r, echo=F, eval=F, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA with pcs
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_dC13_pcs_mGWA <-   readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_dC13__pcs_mGWA.rda")
  write.table(curated_GO_annots_FT_dC13_pcs_mGWA, file="./tables/curated_GO_annots_FT_dC13_pcs_mGWA.tsv", sep="\t", col.names = T, row.names = F, quote = F)
  knitr::kable(curated_GO_annots_FT_dC13_pcs_mGWA, caption = "Table SIII.3 Curated GO annotations for FT and dC13 multivariate GWA using 5 genetic PCs top hits.", fixed_thread=T)
}
```  

###### Imputed traits; mGWA top hit mapping / Manhattan plots

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Figure SIII.15 Manhattan plot of multivariate GWA with imputed flowering time and delta_C13 phenotype data without genetic pcs."}
RERUN=F
if(RERUN){
  
  # library(devtools)
  # install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa from imputed phenotypes ft16 & dC13
  ### ======================================================== ###
  imp_tmp1<-data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(imp_tmp1)
  
  ## Manhattan plot
  thresh <- -log10(0.05/nrow(imp_tmp1))
  imp_p1 <- ggman(imp_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#54278f","#e6550d"))
  imp_p1
  saveRDS(imp_p1, file="./figs/FT_dC13_imputed_mGWA_manhattanPlot_ggman.rda")
  
  tmp1.impgwas.sig <- imp_tmp1[-log10(imp_tmp1$p_score)>thresh,]
  
    getGenes_fromGFF <- function(dat){
    gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
    gene_list <- c()
    for (i in 1:nrow(dat)) {
      tmp <- gff %>% filter(V1==dat$chr[i]) %>% 
        filter(V4 < dat$ps[i]) %>% 
        filter(V5 > dat$ps[i])
      
      part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
      gene <- unlist(strsplit(part1, "=", fixed = T))[2]
      gene_list <- c(gene_list, gene)
    }
    return(gene_list)
  }
  
  ## run function and get genes
  tmp1.impgwas.sig$genes <- getGenes_fromGFF(tmp1.impgwas.sig)
  gene_list_writeout<- unique(na.omit(tmp1.impgwas.sig$genes ))
  write.table(gene_list_writeout, file="./tables/FT_dC13_imputed_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp  and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_dC13_imputed_mGWA_GOAnnotationsAll.tsv", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc")
  curated_GO_annots_FT_dC13_imputed_mGWA <- GO_annots[,c(1,2, 4, 7, 9)] %>% filter(category=="proc" | category=="func")
  saveRDS(curated_GO_annots_FT_dC13_imputed_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_dC13_imputed_mGWA.rda")
  
  ### ======================================================== ###
  ### repeat with imputed mGWAS + 5 genetic pcs
  ### ======================================================== ###
  pcs_imp_tmp1<-data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(pcs_imp_tmp1)
  
  thresh <- -log10(0.05/nrow(pcs_imp_tmp1))
  
  ## colored plots, how to change colors??
  imp_pcs_p2 <- ggman(pcs_imp_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", 
              relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#54278f","#e6550d"))
  imp_pcs_p2
  saveRDS(imp_pcs_p2, file="./figs/FT_dC13_imp_pcs_mGWA_manhattanPlot_ggman.rda")
  
  pcs_imp_tmp1.gwas.sig <- pcs_imp_tmp1[-log10(pcs_imp_tmp1$p_score)>thresh,]

  dat <- pcs_imp_tmp1.gwas.sig
  ## function to get gene names from GFF file that overlap with top hit SNPs
  getGenes_fromGFF <- function(dat){
    #gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
    gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff")
    head(gff)
    gene_list <- c()
    i<-1
    for (i in 1:nrow(dat)) {
      chr_name <- paste0("Chr", dat$chr[i])
      tmp <- gff %>% filter(V1==chr_name) %>% 
        filter(V4 < dat$ps[i]) %>% 
        filter(V5 > dat$ps[i]) %>% 
        filter(V3!="chromosome")
      
      # tmp <- gff %>% filter(V1==dat$chr[i]) %>% 
      #   filter(V4 < dat$ps[i]) %>% 
      #   filter(V5 > dat$ps[i])
      
      #part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
      part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ";", fixed = T))[1]
      gene <- unlist(strsplit(part1, "=", fixed = T))[2]
      gene_list <- c(gene_list, gene)
    }
    return(gene_list)
    gff<-c()
  }
  
  
  # i<-56
  # head(gff)
  # gene_list <- c()
  # pcs_imp_tmp1.gwas.sig[53:56,]
  # pcs_imp_tmp1.gwas.sig[i,]
  # for (i in 1:nrow(pcs_imp_tmp1.gwas.sig)) {
  #   tmp <- gff %>% filter(V1==pcs_imp_tmp1.gwas.sig$chr[i]) %>% 
  #     filter(V4 <= pcs_imp_tmp1.gwas.sig$ps[i]) %>% 
  #     filter(V5 >= pcs_imp_tmp1.gwas.sig$ps[i])
  #   
  #   tmp2<- tmp$V5 > pcs_imp_tmp1.gwas.sig[i,]$ps
  #   part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
  #   gene <- unlist(strsplit(part1, "=", fixed = T))[2]
  #   
  #   gene_list <- c(gene_list, gene)
  # }

  
  pcs_imp_tmp1.gwas.sig$genes <- getGenes_fromGFF(pcs_imp_tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(pcs_imp_tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_dC13_imp_pcs_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp  and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_dC13_imp_pcs_GOAnnotationsAll.txt", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc"| category=="func")
  head(GO_annots)
  curated_GO_annots_FT_dC13_imp_pcs_mGWA <- GO_annots[,c(1,2, 4, 7, 9)] %>% filter(category=="proc" | category=="func")
  # head(curated_GO_annots_FT_dC13_imp_pcs_mGWA)
  # unique(curated_GO_annots_FT_dC13_imp_pcs_mGWA[,1])
  # lil_df <- GO_annots[,c(1, 4, 7)] %>% filter(category=="proc")
  # head(lil_df)
  # distinct(lil_df)
  # distinct(GO_annots[,c(1, 4)])
  # curated_GO_annots_FT_dC13_imp_pcs_mGWA <- distinct(GO_annots[,c(1, 4)])
  # 
  saveRDS(curated_GO_annots_FT_dC13_imp_pcs_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_dC13_imp_pcs_mGWA.rda")
  
  
}else{
  imp_p1 <- readRDS(file="./figs/FT_dC13_imputed_mGWA_manhattanPlot_ggman.rda")
  imp_p1
  
}
```
  

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Figure SIII.16 Manhattan plot of multivariate GWA with imputed flowering time and delta_C13 phenotype data, and with 5 genetic pcs included in the analysis."}
  library(ggman)
  setwd("~/safedata/natvar/")  
  imp_pcs_p2 <- readRDS(file="./figs/FT_dC13_imp_pcs_mGWA_manhattanPlot_ggman.rda")
  imp_pcs_p2
```  

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Figure SIII.17 Manhattan plot of multivariate GWA with imputed flowering time and delta_C13, and 5 genetic pcs, with FRI (chr4) and FLC (chr5) SNPs marked in red."}
RERUN=F
if(RERUN){  
  library(ggman)
  pcs_imp_tmp1<-data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(pcs_imp_tmp1)
  
  setwd("~/safedata/natvar/")  
  imp_pcs_p2 <- readRDS(file="./figs/FT_dC13_imp_pcs_mGWA_manhattanPlot_ggman.rda")
  imp_pcs_p2
  
  FRI_chr_posrange <- c(4, 269026, 270358)
  ## Frigida is chr 4 pos 269026-270358
  FLC_chr_posrange <- c(5, 3173724, 3179155)
  ## FLC is chr 5 from pos 3173724-3179155
  
    FRI_chr_posrange
  FRI_snps <- pcs_imp_tmp1 %>% filter(chr==4) %>% 
    filter(ps > FRI_chr_posrange[2]) %>% 
    filter(ps < FRI_chr_posrange[3])
  dim(FRI_snps)
  
  FLC_chr_posrange
  FLC_snps <- pcs_imp_tmp1 %>% filter(chr==5) %>% 
    filter(ps > FLC_chr_posrange[2]) %>% 
    filter(ps < FLC_chr_posrange[3])
  dim(FLC_snps)
  
  fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs)
  fri_flc_plot_imp_mGWA_ft_dc13 <- ggmanHighlight(imp_pcs_p2, highlight = fri_flc_highlights, size = 0.3) +
    scale_color_manual(values=c("#636363","#bdbdbd")) 
  fri_flc_plot_imp_mGWA_ft_dc13
  saveRDS(fri_flc_plot_imp_mGWA_ft_dc13, file="./figs/FT_dC13_imp_pcs_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
  
}else{
   fri_flc_plot_imp_mGWA_ft_dc13 <- readRDS(file="./figs/FT_dC13_imp_pcs_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
   fri_flc_plot_imp_mGWA_ft_dc13
}  
```  


```{r, echo=F, eval=F, warning=F, message=F}
## Curated GO annotation terms for imputed FT & dC13 mGWA
RERUN=F 
if(RERUN){
}else{
    curated_GO_annots_FT_dC13_imputed_mGWA <- readRDS( file="./figs/tmpobjects/curated_GO_annots_FT_dC13_imputed_mGWA.rda")
    write.table(curated_GO_annots_FT_dC13_imputed_mGWA, file="./tables/curated_GO_annots_FT_dC13_imputed_mGWA.tsv", sep="\t",
                col.names = T, row.names = F, quote = F)
  knitr::kable(curated_GO_annots_FT_dC13_imputed_mGWA, caption = "Table SIII.4 Curated GO annotations for FT and dC13 imputed multivariate GWA top hits.", fixed_thread=T)
}
```  
  
```{r, echo=F, eval=F, warning=F, message=F}
## Curated GO annotation terms for imputed FT & dC13 mGWA with pcs
RERUN=F 
if(RERUN){
}else{
    curated_GO_annots_FT_dC13_imp_pcs_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_dC13_imp_pcs_mGWA.rda")
      write.table(curated_GO_annots_FT_dC13_imp_pcs_mGWA, file="./tables/curated_GO_annots_FT_dC13_imp_pcs_mGWA.tsv", sep="\t",
                col.names = T, row.names = F, quote = F)
  knitr::kable(curated_GO_annots_FT_dC13_pcs_mGWA, caption = "Table SIII.5 Curated GO annotations for FT and dC13 imputed multivariate GWA using 5 genetic PCs top hits.", fixed_thread=T)
}
```  

###### Zoom in on Peak 2nd try

```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Figure SIII.12 Manhattan plot of multivariate GWA with flowering time and delta_C13 without genetic pcs."}
RERUN=F
if(RERUN){

  library(data.table)
  library(dplyr)
  #install.packages("topr")
  #library(topr)
  library(gridExtra)
  library(grid)
  library(ggplot2)
  library(cowplot)
  
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>%  dplyr::select(chr, ps, rs, p_score)
  head(tmp1)
  colnames(tmp1) <- c("CHROM", "POS", "ID", "P")
  
  data <- tmp1  
  data$logp <- -log10(data$P)
  thresh <- -log10(0.05/nrow(tmp1))
  data %>% filter(logp >thresh) %>% filter(CHROM==1) 
  

center <- 3876090
zoomGWAplot <- function(data, center, start, stop){
  
  data$logp <- -log10(data$P)
  thresh <- -log10(0.05/nrow(tmp1))
  
  zoomplot <- ggplot(data) + geom_point(aes(y=logp, x=POS)) +
    xlim(center-6000, center+3000) + ylim(6, 9) +
    geom_hline(yintercept=thresh, linetype="dashed", 
                color = "red", size=1)
  
  
}
  
  ## what are the genes
  gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff")
  head(gff)
  
  gff_gene <- gff %>% dplyr::filter(V3=="exon") %>% dplyr::filter(V1=="Chr1")
  head(gff_gene)
  
  thresh <- -log10(0.05/nrow(tmp1))
  peak <- data %>% filter(logp >thresh) %>% filter(CHROM==1)
  peak<- peak[-c(1,2),]
  
  min(peak$POS)
  genes <- c()
  i<- 1
  for (i in 1:length(peak$POS)){
    
    
   gene <- gff_gene[p.eak$POS[i] > gff_gene$V4 & peak$POS[i] < gff_gene$V5, ]
   print(gene)
  }
  
  AT1G11500.1
  AT1G11510.1
  AT1G11520.1
  AT1G11530.1
  AT1G11540.1
  
  gff_gene <- gff %>% dplyr::filter(V3=="gene") %>% dplyr::filter(V1=="Chr1")
  head(gff_gene)
  gff_gene %>%filter(V9=="Parent=AT1G11510.1")
  
  gff_gene[1180,]
  
  segs <- matrix(NA,5, 2)
  segs[1,1 ]<- gff_gene[1176,"V4"]
  segs[1,2 ]<- gff_gene[1176,"V5"]
  
  segs[2,1 ]<- gff_gene[1177,"V4"]
  segs[2,2 ]<- gff_gene[1177,"V5"]
  
   segs[3,1 ]<- gff_gene[1178,"V4"]
  segs[3,2 ]<- gff_gene[1178,"V5"]
  
  segs[4,1 ]<- gff_gene[1179,"V4"]
  segs[4,2 ]<- gff_gene[1179,"V5"]
  
  segs[5,1 ]<- gff_gene[1180,"V4"]
  segs[5,2 ]<- gff_gene[1180,"V5"]
  
  
  geneannotPlt <- ggplot(data) + geom_point(aes(y=logp, x=POS), col="white") +
    xlim(center-6000, center+3000) + ylim(7.4, 8.1) +
    ylab("") +
    geom_segment(aes(x=segs[1,1],xend=segs[1,2],y=8,yend=8)) +
    geom_segment(aes(x=segs[2,1],xend=segs[2,2],y=7.8,yend=7.8)) +
    geom_segment(aes(x=segs[3,1],xend=segs[3,2],y=7.6,yend=7.6)) +
    geom_segment(aes(x=segs[4,1],xend=segs[4,2],y=7.5,yend=7.5)) +
    geom_segment(aes(x=segs[5,1],xend=segs[5,2],y=8.0,yend=8.0)) 

  
  pdf("./figs/zoomPeak1_geneannot.pdf", height = 8, width=4 )
  plot_grid(zoomplot, geneannotPlt, nrow = 2, ncol=1 )
  dev.off()
  
  top_hits<- readRDS(file="data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
  chr1peakhits <- top_hits %>%  filter(chr.x==1)
  ## LD heatplot
  setwd("~/safedata/natvar/")
  ld_mat <- fread(file="../1001g/ld/r2.ld")
  
  ld_mat[1:5,1:5]

  chr1peakhits[97:130,]
  
  ld_mat$SNP_A[17936898:18011855]
  ld_prunedmat <- ld_mat[17936898:18011855, ]
  ld_prunedmat <- ld_prunedmat[ld_prunedmat$SNP_B %in% chr1peakhits$rs , ]
  ld_prunedmat
  
  which(ld_mat$SNP_A == "1_3870913") ## first snp
  which(ld_mat$SNP_A == "1_3876764")
  
  chrm1_peakhits$rs %in% ld_mat$SNP_A
  
  ld_mat_prune <- ld_mat[ld_mat$SNP_A %in% chr1peakhits$rs , ]
  
  ld_mat_prune <- ld_mat[ld_mat$SNP_A %in% chr1peakhits$rs & ld_mat$SNP_B %in% chr1peakhits$rs , ]
  
    sum(ld_mat$SNP_B %in% chrm1_peakhits$rs)
    
  ld_melt <- data.frame(A=ld_mat_prune$SNP_A, B=ld_mat_prune$SNP_B, r2=ld_mat_prune$R2)
  ld_melt <- ld_melt[sort(ld_melt$A)]
    
  library(reshape)
  
  LD_peak1 <- cast(ld_melt, A~B)
  LD_peak1
  saveRDS(LD_peak1, file="./data/LD_peak1.rda")
  LD_peak1 <- readRDS(file="./data/LD_peak1.rda")
  LD_peak1[1:5, 1:5]
  
  
 LD_peak1[is.na(LD_peak1)] <- 0.0
  library(corrplot)
 LD_peak1 <- data.frame(LD_peak1)
 
 rownames(LD_peak1) <- LD_peak1[,1]
 LD_peak1 <- LD_peak1[,-1]
 class(LD_peak1)
 
 pdf(file="./figs/LDcorplotPeakchr1.pdf", height = 4, width = 4)
  corrplot(as.matrix(LD_peak1), 
         type="upper", 
         order="original", 
         method="square",
         tl.cex = 0.5,
         col=COL1(sequential = c("YlOrRd"), n = 100),
          
)
  dev.off()
  
  dim(LD_peak1)
  corrplot(as.matrix(LD_peak1), method = 'color', is.corr = F)
  
  ggplot(df_long, aes(x = Var1, y = Var2, fill = r)) +
  geom_raster() +
  geom_text(aes(label = label)) +
  scale_fill_distiller(palette = "Spectral") +
  theme_minimal() +
  theme(panel.grid = element_blank())
  
  
  ## example from topr, but unfortunately only for human gene annotations
  head(CD_UKBB)
  
  regionplot(CD_UKBB, gene="IL23R")
  locuszoom(R2_CD_UKBB)
}
```


###### Correlated Expression patterns

```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Figure SIII.12 Manhattan plot of multivariate GWA with flowering time and delta_C13 without genetic pcs."}
RERUN=F
if(RERUN){
  
  library(Hmisc)
  atlasstrategies<-read.table("./data/pheno_fromgoogle.tsv",header = T)
  pheno <- read.csv(file = 'data/atlas_phenotype_matrix_withid.csv') #phenotypes
  #pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=",", header = T)
  pheno[1:5,1:10]
  
  targets <-c("id","ABA_96h_low_water_potential", 
                "Growth_rate", 
               "Delta_13C", 
               "DSDS10", 
               #"Stomatal_index_in_first_leaf", ## only 55
               "stomata_density",
               "stomatasize",
               "FT16",  
               "d8_10C_perc", 
               "RGR", 
               #"rhamnose_1_exp2", 
               "Root_horizontal_index_day001",
               "Relative_root_growth_rate_day002.day003",
               #"First_leaf_area", 
               #"X72_Vern_Growth"
               "X34_LDV",
              "rFitness_mlp",
              "rFitness_mli",
              "rSurvival_fruit_mlp",
              "rSurvival_fruit_mli",
              "rSeeds_mli",
              "rSeeds_mlp")
  
  
  ## pull out phenotypes
  target_pheno<- pheno[ ,colnames(pheno) %in% targets]
  head(target_pheno)
  dim(target_pheno)
  target_pheno[,2:13] <- apply(target_pheno[2:13],2,scale)
  
  which(target_pheno==-9)
  
  ## Expression data
    setwd("/Carnegie/DPB/Data/Shared/Labs/Moi/Everyone/floe1/")
  load("data/TG_data_20180606.Rdata") # dataframes have tidy matrix column
  load("data/gene_infoV2.Rdata") # gene annotation
  ind2use<- which(TG.meta$batch_comb != "MU") # remember to remove low-quality batch MU, this is all of the samples not equal to MU, that we will keep
  
  ## see what types of rna seq data are availabel
  class(TG.genes)
  dim(TG.genes$raw)
  TG.genes$raw[1:5,1:10]

  d <- TG.genes$raw[,ind2use] ## there are other type of the expression data that can be used, raw is relative transcripts per million, basically a ratio of transcritps for each gene
  d[1:5,1:10]
  
  ## very important line to transpose the data
  d<-data.frame(TG.meta$index[ind2use], t(d)) # Generate expression matrix (individuals=rows, genes=cols)
  dim(d)
  
  colnames(d)<-c("index",gene_infoV2$Name)
  d[1:5,1:5]
  dim(d)
  
  listGenes<- c("AT1G11500","AT1G11510","AT1G11540","AT1G11680","AT1G11740")
  
  
  Chr1peak_Genes <- cbind(d[,1],d[,colnames(d) %in% listGenes])
  head(Chr1peak_Genes)
   dim(Chr1peak_Genes)
  colnames(Chr1peak_Genes)[1] <- "index"
  
  ## merge phenotype and expression data
  exp_pheno_dat <- merge(Chr1peak_Genes, by.x="index", target_pheno, by.y="id")
  head(exp_pheno_dat)
  
  ## correlation line
  
  p<- data.frame(exp_pheno_dat[,-1])
  head(p)
  exp_pheno_cor<-rcorr(p, type = "spearman")
   exp_pheno_cor<-cor(exp_pheno_dat[,-1], method="pearson", use = "pairwise.complete.obs") ## give you p-values with correlation
  exp_pheno_cor
  
  #Targ_cor<-cor(target_pheno, method = "pearson", use = "pairwise.complete.obs")
  png(file="./figs/Chr1GeneCorExpressionCorplot.png", width = 800, height = 800)
  corrplot(corr = exp_pheno_cor[,1:5], method = "color", type = "lower", diag = F ,tl.cex = 1.2, col=brewer.pal(9,'RdBu'), addCoef.col = "black",    number.cex = .8, tl.srt = 45, sig.level = 0.05)
  
  dev.off()
  
  
  ## get localities of these accessions
  setwd("~/safedata/natvar/")
  all_ath_data <- read.table("./data/Arabidopsis_thaliana_world_accessions_list.tsv", sep="\t", header = T)
  dim(all_ath_data)
  
  #library(grDevices)
  
  ## merged little data of expression with huge matrix of all locality information
  Chr1peak_Genes_Locals <- merge(Chr1peak_Genes, by.x="index", all_ath_data, by.y="id")
  head(Chr1peak_Genes_Locals)

  ## generate new column for ratio of expression levels
  # d_g3pb_locals$g3pb_ratio <- d_g3pb_locals$G3BPL4/d_g3pb_locals$G3BP1
  # head(d_g3pb_locals)
  # 
  library(grDevices)
  # load europe raster stack
  euroclim<-stack( './data/euroclim.grd')
  
  rbPal <- colorRampPalette(c('yellow','red')) #builds a color palette 
  range(Chr1peak_Genes_Locals$AT1G11500)
  Chr1peak_Genes_Locals$Col <- rbPal(10)[as.numeric(cut(Chr1peak_Genes_Locals$AT1G11500,breaks = 10))]
  p1<- raster::plot(euroclim$bio12, col = brewer.pal(9,'Greys')) +
    points(x = Chr1peak_Genes_Locals$longitude, y = Chr1peak_Genes_Locals$latitude, col=Chr1peak_Genes_Locals$Col, pch=16)
    
p1  
  
  
}else{
  
}
```


##### Flowering time and Growth Rate

###### QQ plots

```{r, echo=F, eval=F, message=F, warning=F, fig.cap="QQplots for multivariate GWA with Flowering time and Growth Rate", fig.width=8, fig.height=8 }
RERUN=F
if(RERUN){
  
  ## multivariate gwa
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  library(lattice)
  source("~/safedata/natvar/analyses/QQPlotbyMatthewFlickinger.R")
  qqplot <- qqunif.plot(tmp1$p_score) 
  qqplot
  
  ## mGWA with 5 genetic pcs
  pcs_tmp1 <- data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_pcs_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(pcs_tmp1)
  
  pcs_qqplot <- qqunif.plot(pcs_tmp1$p_score)
  pcs_qqplot
  
  ## imputed multivariate gwa
  imp_tmp1 <- data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Growth_rate/output/imp_mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(imp_tmp1)
  
  imp_qqplot <- qqunif.plot(imp_tmp1$p_score) 
  imp_qqplot
  
  ## imputed multivariate gwa with pcs
  imp_pcs_tmp1 <- data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Delta_13C/output/imp_mGWAS_pcs_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score)
  head(imp_pcs_tmp1)

  imp_pcs_qqplot <- qqunif.plot(imp_pcs_tmp1$p_score)
  imp_pcs_qqplot
  
  ft_GR_qqplots <- plot_grid(qqplot, pcs_qqplot, imp_qqplot, imp_pcs_qqplot, nrow = 2, ncol=2, labels=c("mGWAS", "mGWAS_pcs", "imputed_mGWAS", "imp_pcs_mGWAS"))
  ft_GR_qqplots
  saveRDS(ft_GR_qqplots, file="./figs/tmpobjects/ft_GR_qqplots.rda")
  
} else{
  
  ft_GR_qqplots <- readRDS(file="./figs/tmpobjects/ft_GR_qqplots.rda")
  ft_GR_qqplots
  
}
```


###### mGWA top hit mapping / Manhattan plots

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with flowering time and Growth rate without genetic pcs."}
RERUN=F
if(RERUN){
  
  # library(devtools)
  # install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa
  ### ======================================================== ###
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  thresh <- -log10(0.05/nrow(tmp1))
  
  ## colored plots, how to change colors??
  p1 <- ggman(tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#31a354", "#df65b0")) 
    
  p1
  saveRDS(p1, file="./figs/FT_GrowthRate_mGWA_manhattanPlot_ggman.rda")
  
  FRI_chr_posrange <- c(4, 269026, 270358)
  FRI_chr_posrange
  FRI_snps <- tmp1 %>% filter(chr==4) %>% 
    filter(ps > FRI_chr_posrange[2]) %>% 
    filter(ps < FRI_chr_posrange[3])
  dim(FRI_snps)
  
  FLC_chr_posrange <- c(5, 3173724, 3179155)
  FLC_chr_posrange
  FLC_snps <- tmp1 %>% filter(chr==5) %>% 
    filter(ps > FLC_chr_posrange[2]) %>% 
    filter(ps < FLC_chr_posrange[3])
  dim(FLC_snps)
  
  fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs)
  fri_flc_plot <- ggmanHighlight(p1, highlight = fri_flc_highlights, size = 0.3) +
    scale_color_manual(values=c("#636363","#bdbdbd")) 
  fri_flc_plot
  saveRDS(fri_flc_plot, file="./figs/FT_GrowthRate_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
  
  tmp1.gwas.sig <- tmp1[-log10(tmp1$p_score)>thresh,]
  
  getGenes_fromGFF <- function(dat){
    #gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
    gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff")
    head(gff)
    gene_list <- c()
    i<-1
    for (i in 1:nrow(dat)) {
      chr_name <- paste0("Chr", dat$chr[i])
      tmp <- gff %>% filter(V1==chr_name) %>% 
        filter(V4 < dat$ps[i]) %>% 
        filter(V5 > dat$ps[i]) %>% 
        filter(V3!="chromosome")
      
      # tmp <- gff %>% filter(V1==dat$chr[i]) %>% 
      #   filter(V4 < dat$ps[i]) %>% 
      #   filter(V5 > dat$ps[i])
      
      #part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
      part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ";", fixed = T))[1]
      gene <- unlist(strsplit(part1, "=", fixed = T))[2]
      gene_list <- c(gene_list, gene)
    }
    return(gene_list)
    gff<-c()
  }
  
  tmp1.gwas.sig$genes <- getGenes_fromGFF(tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_GrowthRate_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_GrowthRate_mGWA_GOAnnotationsAll.txt", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc")
  curated_GO_annots_FT_GrowthRate_mGWA <- distinct(GO_annots[,c(1, 4, 7)]) %>% filter(category=="proc")
  saveRDS(curated_GO_annots_FT_GrowthRate_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_mGWA.rda")
  
  ### ======================================================== ###
  ### repeat with mGWAS + 5 genetic pcs
  ### ======================================================== ###
  pcs_tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_pcs_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(pcs_tmp1)
  
  thresh <- -log10(0.05/nrow(pcs_tmp1))
  
  ## colored plots, how to change colors??
  p2 <- ggman(pcs_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", 
              relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#31a354", "#df65b0")) 
  p2
  saveRDS(p2, file="./figs/FT_GrowthRate_pcs_mGWA_manhattanPlot_ggman.rda")
  
  pcs_tmp1.gwas.sig <- pcs_tmp1[-log10(pcs_tmp1$p_score)>thresh,]

  pcs_tmp1.gwas.sig$genes <- getGenes_fromGFF(pcs_tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(pcs_tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_GrowthRate_pcs_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp  and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_GrowthRate_pcs_mGWA_GOAnnotationsAll.txt", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc")
  curated_GO_annots_FT_GrowthRate_pcs_mGWA <- distinct(GO_annots[,c(1, 4, 7)]) %>% filter(category=="proc")
  saveRDS(curated_GO_annots_FT_GrowthRate_pcs_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_pcs_mGWA.rda")
  
}else{
  
 FT_GrowthRate_mGWA_manhattanPlot <- readRDS(file="./figs/FT_GrowthRate_mGWA_manhattanPlot_ggman.rda")
 FT_GrowthRate_mGWA_manhattanPlot
}
```

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with flowering time and Growth rate with 5 genetic pcs included in the analysis."}
  library(ggman)
  setwd("~/safedata/natvar/")  
  FT_GrowthRate_pcs_mGWA_manhattanPlot <- readRDS(file="./figs/FT_GrowthRate_pcs_mGWA_manhattanPlot_ggman.rda")
  FT_GrowthRate_pcs_mGWA_manhattanPlot
```  

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with flowering time and Growth Rate with FRI (chr4) and FLC (chr5) SNPs marked in red."}
  library(ggman)
  setwd("~/safedata/natvar/")  
fri_flc_plot_GR <- readRDS(file="./figs/FT_GrowthRate_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
fri_flc_plot_GR
```  

Note the the peak on the 5th chromosome,AT5G10170, that encodes for embryo development ending in seed dormancy, is extremely close to the FLC gene, AT5G10140. 
  
Also note the location of Frigida AT4G00650/AT4G00640

```{r, echo=F, eval=T, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_GrowthRate_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_mGWA.rda")
  knitr::kable(curated_GO_annots_FT_GrowthRate_mGWA, caption = "Curated GO annotations for FT and Growth Rate multivariate GWA top hits.", fixed_thread=T)
}
```  
  
  
```{r, echo=F, eval=T, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA with pcs
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_GrowthRate_pcs_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_pcs_mGWA.rda")
  knitr::kable(curated_GO_annots_FT_GrowthRate_pcs_mGWA, caption = "Curated GO annotations for FT and dC13 multivariate GWA using 5 genetic PCs top hits.", fixed_thread=T)
}
```  

###### imputed phenotype mGWA top hit mapping / Manhattan plots

```{r, echo=F, eval=T, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with imputed flowering time and Growth rate without genetic pcs."}
RERUN=F
if(RERUN){
  
  # library(devtools)
  # install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa from imputed phenotypes
  ### ======================================================== ###
  imp_tmp1<-data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Growth_rate/output/imp_mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(imp_tmp1)
  
  thresh <- -log10(0.05/nrow(imp_tmp1))
  
  ## colored plots, how to change colors??
  imp_p1 <- ggman(imp_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#31a354", "#df65b0")) 
  imp_p1
  saveRDS(imp_p1, file="./figs/FT_GrowthRate_imputed_mGWA_manhattanPlot_ggman.rda")
  
  FRI_chr_posrange <- c(4, 269026, 270358)
  FRI_chr_posrange
  FRI_snps <- imp_tmp1 %>% filter(chr==4) %>% 
    filter(ps > FRI_chr_posrange[2]) %>% 
    filter(ps < FRI_chr_posrange[3])
  dim(FRI_snps)
  
  FLC_chr_posrange <- c(5, 3173724, 3179155)
  FLC_chr_posrange
  FLC_snps <- imp_tmp1 %>% filter(chr==5) %>% 
    filter(ps > FLC_chr_posrange[2]) %>% 
    filter(ps < FLC_chr_posrange[3])
  dim(FLC_snps)
  
  fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs)
  fri_flc_plot <- ggmanHighlight(imp_p1, highlight = fri_flc_highlights, size = 0.3) +
    scale_color_manual(values=c("#636363","#bdbdbd")) 
  fri_flc_plot
  saveRDS(fri_flc_plot, file="./figs/FT_GrowthRate_imputed_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
  
  imp_tmp1.gwas.sig <- imp_tmp1[-log10(imp_tmp1$p_score)>thresh,]
  
  getGenes_fromGFF <- function(dat){
    #gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff")
    gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff")
    head(gff)
    gene_list <- c()
    i<-1
    for (i in 1:nrow(dat)) {
      chr_name <- paste0("Chr", dat$chr[i])
      tmp <- gff %>% filter(V1==chr_name) %>% 
        filter(V4 < dat$ps[i]) %>% 
        filter(V5 > dat$ps[i]) %>% 
        filter(V3!="chromosome")
      
      # tmp <- gff %>% filter(V1==dat$chr[i]) %>% 
      #   filter(V4 < dat$ps[i]) %>% 
      #   filter(V5 > dat$ps[i])
      
      #part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1]
      part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ";", fixed = T))[1]
      gene <- unlist(strsplit(part1, "=", fixed = T))[2]
      gene_list <- c(gene_list, gene)
    }
    return(gene_list)
    gff<-c()
  }
  
  imp_tmp1.gwas.sig$genes <- getGenes_fromGFF(imp_tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(imp_tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_GrowthRate_imp_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_GrowthRate_imp_mGWA_GOAnnotations.txt", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc")
  curated_GO_annots_FT_GrowthRate_imp_mGWA <- distinct(GO_annots[,c(1, 4, 7)]) %>% filter(category=="proc")
  saveRDS(curated_GO_annots_FT_GrowthRate_imp_mGWA, file="./figs/tmpobjects/curated_GO_annots_imp_FT_GrowthRate_mGWA.rda")
  
  ### ======================================================== ###
  ### repeat with imputed mGWAS + 5 genetic pcs
  ### ======================================================== ###
  imp_pcs_tmp1<-data.table::fread(file="./multivarGWAS/imp_mGWAS_FT16_Growth_rate/output/imp_mGWAS_pcs_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(imp_pcs_tmp1)
  
  thresh <- -log10(0.05/nrow(imp_pcs_tmp1))
  
  ## colored plots, how to change colors??
  imp_pcs_p2 <- ggman(imp_pcs_tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", 
              relative.positions = T, sigLine = thresh, title="") +  
    scale_color_manual(values=c("#31a354", "#df65b0")) 
  imp_pcs_p2
  saveRDS(imp_pcs_p2, file="./figs/FT_GrowthRate_imp_pcs_mGWA_manhattanPlot_ggman.rda")
  
  imp_pcs_tmp1.gwas.sig <- imp_pcs_tmp1[-log10(imp_pcs_tmp1$p_score)>thresh,]

  imp_pcs_tmp1.gwas.sig$genes <- getGenes_fromGFF(imp_pcs_tmp1.gwas.sig)
  gene_list_writeout<- unique(na.omit(imp_pcs_tmp1.gwas.sig$genes))
  write.table(gene_list_writeout, file="./tables/FT_GrowthRate_imp_pcs_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F)
  
  ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp  and get new tsv file of GO annotations
  GO_annots <- read.table(file="./tables/FT_GrowthRate_imp_pcs_mGWA_GOAnnotationsAll.txt", sep="\t", header = T)
  head(GO_annots)
  
  GO_annots %>% filter(category=="proc")
  curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA <- distinct(GO_annots[,c(1, 4, 7)]) %>% filter(category=="proc")
  saveRDS(curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA, file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA.rda")
  
}else{
  
 FT_GrowthRate_imp_mGWA_manhattanPlot <- readRDS(file="./figs/FT_GrowthRate_imputed_mGWA_manhattanPlot_ggman.rda")
 FT_GrowthRate_imp_mGWA_manhattanPlot
}
```

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with imputed flowering time and Growth rate with 5 genetic pcs included in the analysis."}
  library(ggman)
  setwd("~/safedata/natvar/")  
  FT_GrowthRate_imp_pcs_mGWA_manhattanPlot <- readRDS(file="./figs/FT_GrowthRate_imp_pcs_mGWA_manhattanPlot_ggman.rda")
  FT_GrowthRate_imp_pcs_mGWA_manhattanPlot
```  


```{r, echo=F, eval=F, message=F, warning=F, fig.width=8, fig.height=4, fig.cap="Manhattan plot of multivariate GWA with imputed flowering time and Growth Rate with FRI (chr4) and FLC (chr5) SNPs marked in red."}
## chunk not working, may need to re-save file?
  library(ggman)
  setwd("~/safedata/natvar/")  
fri_flc_plot_GR_imp <- readRDS(file="./figs/FT_GrowthRate_imputed_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda")
fri_flc_plot_GR_imp
```  


```{r, echo=F, eval=T, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_GrowthRate_imp_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_imp_FT_GrowthRate_mGWA.rda")
  knitr::kable(curated_GO_annots_FT_GrowthRate_imp_mGWA, caption = "Curated GO annotations for imputed FT and Growth Rate multivariate GWA top hits.", fixed_thread=T)
}
```  
  
  
```{r, echo=F, eval=T, warning=F, message=F}
## Curated GO annotation terms for FT & dC13 mGWA with pcs
RERUN=F 
if(RERUN){
}else{
  curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA <- readRDS(file="./figs/tmpobjects/curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA.rda")
  
  knitr::kable(curated_GO_annots_FT_GrowthRate_imp_pcs_mGWA, caption = "Curated GO annotations for imputed FT and dC13 multivariate GWA using 5 genetic PCs top hits.", fixed_thread=T)
}
```  


<!-- ################################################################################ -->
<!-- ## IV.2 Covariate GWAS -->
<!-- ################################################################################ -->

<!-- In an attempt to map alleles responsible for the covariation between flowering time, delta_13C, and growth rate, we calculate the covariance between these traits and run a univariate GWAS on them. -->

<!-- ```{r, echo=F, eval=T, message=F, warning=F, fig.cap="", fig.width=8, fig.height=8 } -->
<!-- RERUN=F -->
<!-- if(RERUN){ -->

<!--   setwd("~/safedata/natvar/") -->
<!--   #pheno<- read.table(file = 'data/atlas1001_rawPheno_Quantile.tsv', sep = "\t", header = T) -->
<!--   ## used imputed data and got super weird estimates of heritability and genetic correlation -->
<!--   #pheno <- read.table(file = 'data/atlas1001_phenotype_matrix_imputed_withID.csv', sep=" ", header = T) -->
<!--   pheno <- read.table(file = 'data/atlas1001_imputedPheno_Quantile.tsv', sep = "\t", header = T) -->

<!--   head(pheno) -->
<!--   # system(paste('ln -f ../1001g/1001gbi.bim ', paste0('./multivarGWAS/covariance_GWA/','1001gbi.bim'))) -->
<!--   # system(paste('ln -f ../1001g/1001gbi.fam ', paste0('./multivarGWAS/covariance_GWA/','1001gbi.fam'))) -->
<!--   # system(paste('ln -f ../1001g/1001gbi.bed ', paste0('./multivarGWAS/covariance_GWA/', '1001gbi.bed'))) -->
<!--   # system(paste('ln -f ../1001g/1001gbi.sXX.txt ', paste0('./multivarGWAS/covariance_GWA/','1001gbi.sXX.txt'))) -->

<!--   #system(paste('cp ../1001g/pca/1001pca.eigenvec', paste0('./multivarGWAS/covariance_GWA','1001pca.eigenvec'))) -->

<!--   # pcs <- read.table("./multivarGWAS/1001pca.eigenvec") -->
<!--   # head(pcs) -->
<!--   # newpcs <- cbind(rep(1, nrow(pcs)), pcs[,3:7]) -->
<!--   # head(newpcs) -->
<!--   # write.table(newpcs, file="./multivarGWAS/1001pcs1_5.eigenvec", col.names = F, row.names = F, quote = F) -->

<!--   fam <- read.table("./data-raw/1001gbi.fam") -->
<!--   head(fam) -->

<!--   ## grab target traits to do covariance GWA on -->
<!--   phenoname <- c("FT16", "DSDS10", "Delta_13C", "Growth_rate") -->
<!--   target_phenos <- pheno[,c("id","FT16", "DSDS10", "Delta_13C", "Growth_rate")] -->
<!--   target_phenos<- target_phenos[,-1] -->
<!--   head(target_phenos) -->

<!--   sum(target_phenos==-9,na.rm = T) -->

<!--   #also write out the covariance between the traits and run univariate gwas on that -->
<!--   pheno1<- target_phenos$FT16 -->
<!--   pheno2 <- target_phenos$Delta_13C -->
<!--   i<-1 -->
<!--   getCovary <- function( pheno1, pheno2) { -->
<!--     newpheno <- c() -->
<!--     mn1 <- mean(pheno1, na.rm=T) -->
<!--     mn2 <- mean(pheno2, na.rm=T) -->
<!--     sd1 <- sd(pheno1, na.rm = T) -->
<!--     sd2 <- sd(pheno2, na.rm = T) -->
<!--     for (i in 1:length(pheno1)){ -->
<!--       #print(c(pheno1[i], pheno2[i])) -->
<!--       # if (pheno1[i]==-9 || pheno2[i]==-9){ -->
<!--       #   newpheno <- c(newpheno, -9)  -->
<!--       #   #print("-9") -->
<!--       # }else{ -->
<!--       if (is.na(pheno1[i]) || is.na(pheno2[i])){ -->
<!--         newpheno <- c(newpheno, NA)  -->
<!--       }else{ -->
<!--         cov <- ((pheno1[i]- mn1) * (pheno2[i]- mn2)) / (sd1*sd2) -->
<!--         newpheno <- c(newpheno, cov) -->
<!--         #print(cov) -->
<!--       } -->
<!--     } -->
<!--     return(newpheno) -->
<!--   } -->

<!--   fam$ftD13 <- getCovary(target_phenos$FT16, target_phenos$Delta_13C) -->
<!--   fam$ftDS <- getCovary(target_phenos$FT16, target_phenos$DSDS10) -->
<!--   fam$ftGR <- getCovary(target_phenos$FT16, target_phenos$Growth_rate) -->
<!--   fam$D13GR <- getCovary(target_phenos$Delta_13C, target_phenos$Growth_rate) -->
<!--   fam$DSGR <- getCovary(target_phenos$DSDS10, target_phenos$Growth_rate) -->
<!--   head(fam) -->
<!--   newfam <- fam[,-6] -->
<!--   head(newfam) -->
<!--   sum(!is.na(newfam[,6])) -->
<!--   imp_colnames <- colnames(newfam[6:10]) -->

<!--   getwd()  -->
<!--   write.table(newfam, file="./multivarGWAS/covariance_Imp_GWA/1001gbi.fam", quote = F, row.names = F, col.names = F) -->

<!--   i<-1 -->
<!--   for (i in 1:5){ -->
<!--     setwd("~/safedata/natvar/multivarGWAS/covariance_Imp_GWA/") -->
<!--     #i_name <- paste0(i,"_pcs")      -->
<!--        write.table(quote=F,row.names=F,col.names=F, -->
<!--                    file=paste0('multivargwa_imp_',imp_colnames[i],'.sh'), -->
<!--                    x=rbind( -->
<!--                      "#!/bin/bash", -->
<!--                      "#SBATCH --cpus-per-task=2", -->
<!--                      "#SBATCH --mem-per-cpu=8G", -->
<!--                      "#SBATCH --partition=DPB", -->
<!--                      paste0("#SBATCH --job-name=", imp_colnames[i], "_imp_covaryGWA"), -->
<!--                      paste0("#SBATCH --output=", imp_colnames[i], "_imp_cGWA_.slurm.log"), -->
<!--                      # paste0('./gemma -bfile 1001gbi -miss 0.1 -maf 0.05 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n ',i,' -c 1001pcs1_5.eigenvec -o covary_imp_pcs_', imp_colnames[i]) -->
<!--                      paste0('./gemma -bfile 1001gbi -miss 0.1 -maf 0.05 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n ',i,' -o covary_imp_', imp_colnames[i]) -->
<!--                    ) -->
<!--        ) -->
<!--        #system("conda activate gemma") -->
<!--        system(paste0('sbatch multivargwa_imp_', imp_colnames[i],'.sh')) -->
<!--   } -->


<!-- }else{ -->

<!--   ## compile covariate results and color by selection coeff -->

<!-- } -->
<!-- ``` -->

<!-- #### covariance of target traits mapping -->

<!-- ```{r, echo=F, eval=T, message=F, warning=F, fig.cap="", fig.width=8, fig.height=4, fig.cap="Manhattan plot of GWA run with covariance between flowering time and dC13."} -->
<!-- RERUN=F -->
<!-- if(RERUN){ -->

<!--   library(ggman) -->
<!--   setwd("~/safedata/natvar/") -->

<!--   ### ======================================================== ### -->
<!--   ## covariate gwa from imputed phenotypes -->
<!--   ### ======================================================== ### -->
<!--   tmp1<-data.table::fread(file="./multivarGWAS/covary_output/covary_ftD132.assoc.txt") %>% select(chr,ps, rs, af, beta, se, p_score)  -->

<!--   tmp1<-data.table::fread(file="./multivarGWAS/covary_output/covary_new1_ftD13.assoc.txt") %>% select(chr,ps, rs, af, beta, se, p_score)  -->
<!--   head(tmp1) -->
<!--   summary(tmp1$se) -->
<!--   tmp1 <- tmp1 %>% filter(se<0.07) -->

<!--   thresh <- -log10(0.05/nrow(tmp1)) -->

<!--   ## colored plots, how to change colors?? -->
<!--   p1 <- ggman(tmp1, snp = "rs", bp = "ps", chrom = "chr", pvalue = "p_score", relative.positions = T, sigLine = thresh, title="") +   -->
<!--     scale_color_manual(values=c("#66c2a4", "#253494"))  -->
<!--   p1 -->
<!--   saveRDS(p1, file="./figs/covary_FT_dC13_GWA_manhattanPlot_ggman.rda") -->

<!--   FRI_chr_posrange <- c(4, 269026, 270358) -->
<!--   FRI_chr_posrange -->
<!--   FRI_snps <- tmp1 %>% filter(chr==4) %>%  -->
<!--     filter(ps > FRI_chr_posrange[2]) %>%  -->
<!--     filter(ps < FRI_chr_posrange[3]) -->
<!--   dim(FRI_snps) -->

<!--   FLC_chr_posrange <- c(5, 3173724, 3179155) -->
<!--   FLC_chr_posrange -->
<!--   FLC_snps <- tmp1 %>% filter(chr==5) %>%  -->
<!--     filter(ps > FLC_chr_posrange[2]) %>%  -->
<!--     filter(ps < FLC_chr_posrange[3]) -->
<!--   dim(FLC_snps) -->

<!--   fri_flc_highlights <- c(FRI_snps$rs, FLC_snps$rs) -->
<!--   fri_flc_plot <- ggmanHighlight(p1, highlight = fri_flc_highlights, size = 0.3) + -->
<!--     scale_color_manual(values=c("#636363","#bdbdbd"))  -->
<!--   fri_flc_plot -->
<!--   saveRDS(fri_flc_plot, file="./figs/FT_GrowthRate_imputed_withFLCFRImarked_mGWA_manhattanPlot_ggman.rda") -->

<!--   imp_tmp1.gwas.sig <- imp_tmp1[-log10(imp_tmp1$p_score)>thresh,] -->

<!--   getGenes_fromGFF <- function(dat){ -->
<!--     #gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_CDS.gff") -->
<!--     gff <- read.table(file="../arabidopsisthaliana_reference/TAIR10_GFF3_genes_transposons.gff") -->
<!--     head(gff) -->
<!--     gene_list <- c() -->
<!--     i<-1 -->
<!--     for (i in 1:nrow(dat)) { -->
<!--       chr_name <- paste0("Chr", dat$chr[i]) -->
<!--       tmp <- gff %>% filter(V1==chr_name) %>%  -->
<!--         filter(V4 < dat$ps[i]) %>%  -->
<!--         filter(V5 > dat$ps[i]) %>%  -->
<!--         filter(V3!="chromosome") -->

<!--       # tmp <- gff %>% filter(V1==dat$chr[i]) %>%  -->
<!--       #   filter(V4 < dat$ps[i]) %>%  -->
<!--       #   filter(V5 > dat$ps[i]) -->

<!--       #part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ".", fixed = T))[1] -->
<!--       part1 <- unlist(strsplit(as.character(tmp[1,9]),split = ";", fixed = T))[1] -->
<!--       gene <- unlist(strsplit(part1, "=", fixed = T))[2] -->
<!--       gene_list <- c(gene_list, gene) -->
<!--     } -->
<!--     return(gene_list) -->
<!--     gff<-c() -->
<!--   } -->

<!--   imp_tmp1.gwas.sig$genes <- getGenes_fromGFF(imp_tmp1.gwas.sig) -->
<!--   gene_list_writeout<- unique(na.omit(imp_tmp1.gwas.sig$genes)) -->
<!--   write.table(gene_list_writeout, file="./tables/FT_GrowthRate_imp_mGWA_topHits_GeneNames.txt", quote = F, col.names = F, row.names = F) -->

<!--   ##take this table to https://www.arabidopsis.org/tools/bulk/go/index.jsp and get new tsv file of GO annotations -->
<!--   GO_annots <- read.table(file="./tables/FT_GrowthRate_imp_mGWA_GOAnnotations.txt", sep="\t", header = T) -->
<!--   head(GO_annots) -->

<!--   GO_annots %>% filter(category=="proc") -->
<!--   curated_GO_annots_FT_GrowthRate_imp_mGWA <- distinct(GO_annots[,c(1, 4, 7)]) %>% filter(category=="proc") -->
<!--   saveRDS(curated_GO_annots_FT_GrowthRate_imp_mGWA, file="./figs/tmpobjects/curated_GO_annots_imp_FT_GrowthRate_mGWA.rda") -->

<!-- }else{ -->

<!-- } -->
<!-- ``` -->


<!-- ```{r, echo=F, eval=F, message=F, warning=F, fig.cap="Manahattan plots and qqplots for covariance runs of GWAS using FT, Delta_C13, and growth rate."} -->
<!-- RERUN=F -->
<!-- if(RERUN){ -->

<!--   setwd("~/safedata/natvar/") -->
<!--   tmp1<-data.table::fread(file="./multivarGWAS/output/covary_new1_pcs_ftD13.assoc.txt") %>% select(chr,ps, rs, af, beta, p_score)  -->
<!--   head(tmp1) -->

<!--   tmp2<-data.table::fread(file="./multivarGWAS/output/covary_new1_ftD13.assoc.txt") %>% select(chr,ps, rs, af, beta, p_score)  -->
<!--   head(tmp2) -->

<!--   library(lattice) -->
<!--   source("~/safedata/natvar/analyses/QQPlotbyMatthewFlickinger.R") -->
<!--   qqplot <- qqunif.plot(tmp1$p_score)  -->
<!--   qqplot -->

<!--   qqplot2 <- qqunif.plot(tmp2$p_score)  -->
<!--   qqplot2 -->

<!--   plot_grid(qqplot, qqplot2) -->

<!--   tmp1$zscores <- (tmp1$beta/tmp1$se)^2 -->
<!--   summary(tmp1$zscores) -->

<!--   tmp1$genpos <- get_Genpos(tmp1$ps, tmp1$chr) -->
<!--   tmp1$log10 <- -log10(tmp1$p_score) -->
<!--   head(tmp1) -->
<!--   summary(tmp1$se) -->
<!--   thresh <- -log10(0.05/nrow(tmp1)) -->


<!--   get_Genpos <- function(pos, chr){ -->
<!--     posmin <- tapply(X = pos, INDEX = chr, FUN = min) ## the smallest position for each chrm -->
<!--     posmax <- tapply(X = pos, INDEX = chr, FUN = max) ## the largest position for each chrm -->
<!--     posshift <- head(c(0,cumsum(posmax)),-1) -->
<!--     names(posshift) <- levels(chr) -->
<!--     genpos <- pos + posshift[chr] -->
<!--     return(genpos) -->
<!--   } -->

<!--   #cbbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#D55E00", "#CC79A7") -->
<!--   cbbPalette2 <- c( "#0868ac","#bae4bc", "#43a2ca", "#7bccc4", "#66c2a4") -->


<!--   library(lattice) -->
<!--   source("~/safedata/natvar/analyses/QQPlotbyMatthewFlickinger.R") -->
<!--   qqplot <- qqunif.plot(tmp1$p_score)  -->
<!--   qqplot -->

<!--   tmp1$zscores <- (tmp1$beta/tmp1$se)^2 -->
<!--   summary(tmp1$zscores) -->

<!--   tmp1$genpos <- get_Genpos(tmp1$ps, tmp1$chr) -->
<!--   tmp1$log10 <- -log10(tmp1$p_score) -->
<!--   head(tmp1) -->
<!--   summary(tmp1$se) -->
<!--   thresh <- -log10(0.05/nrow(tmp1)) -->

<!--   summary(tmp1$se) -->
<!--   head(tmp1) -->
<!--   tmp1_thinned <- tmp1[tmp1$se < .1,] -->
<!--   tmp1_thinned <- tmp1[tmp1$log10 > 2,] -->
<!--   dim(tmp1_thinned) -->

<!--   tmp1_thinned1 <- tmp1_thinned %>% filter(zscores > 20) -->
<!--    architechture_plot <-  ggplot(data=tmp1_thinned1) + -->
<!--         geom_point(aes(x=af, y=zscores, color=as.factor(chr))) + -->
<!--           scale_color_manual(values=cbbPalette2) + -->
<!--           ylab("Z scores") + xlab("allele freq.") + -->
<!--           geom_hline(yintercept = 20) + -->
<!--           geom_hline(yintercept = 46, linetype="dashed") -->
<!--           #theme(legend.position = "none")  -->
<!--   architechture_plot -->

<!--   tmp1_plot <-  ggplot(tmp1_thinned) +  -->
<!--     geom_point(aes(x=genpos, y=log10, color=as.factor(chr))) +  -->
<!--     scale_color_manual(values=cbbPalette2) + -->
<!--     ylab(expression(-log[10](P))) + -->
<!--     geom_hline(yintercept = 8, linetype="dashed") + -->
<!--     theme(axis.text.x = element_blank(), -->
<!--           axis.title.x = element_blank(), -->
<!--           legend.position = "none")  -->
<!--   tmp1_plot -->

<!--   covary_ftd13 <- plot_grid(tmp1_plot, qqplot, architechture_plot, nrow=2, ncol=2, rel_widths = c(2,1)) -->
<!--   saveRDS(covary_ftd13, file="./figs/tmpobjects/covaryFTD13_GWAS_qqplot.rda") -->
<!--   pdf(file="./figs/covaryFTD13_GWAS_qqplot.pdf") -->
<!--   png(file="./figs/covaryFTD13_GWAS_qqplot.png", units="in",res=72, width = 10, height = 8, bg="transparent") -->
<!--   covary_ftd13 -->
<!--   dev.off() -->

<!--   ### plot the alleles by the color of selection -->
<!--   surv <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSurvival_fruit_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score)  -->
<!--   seeds <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSeeds_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score)  -->
<!--     #fit1 <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rFitness_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score)  -->
<!--   dim(fit1) -->
<!--   fit1$genpos <- get_Genpos(fit1$ps, fit1$chr) -->
<!--   head(fit1) -->

<!--   head(surv) -->
<!--   head(seeds) -->

<!--   colnames(surv) <- paste0(colnames(surv), "_surv") -->
<!--   colnames(seeds) <- paste0(colnames(seeds), "_seeds") -->

<!--   surv_seeds <- merge(surv, by.x="rs_surv", seeds, by.y="rs_seeds") -->
<!--   head(surv_seeds) -->
<!--   head(tmp1_thinned) -->

<!--   tmp1_thinned_fit <- merge(tmp1_thinned, by.x="rs", surv_seeds, by.y="rs_surv") -->
<!--   head(tmp1_thinned_fit) -->
<!--   tmp1_plot_colSurv <-  ggplot(tmp1_thinned_fit) +  -->
<!--     geom_point(aes(x=genpos, y=log10, color=beta_seeds)) +  -->
<!--     scale_color_gradient2(high="gray", low="blue") + -->
<!--     ylab(expression(-log[10](P))) + -->
<!--     geom_hline(yintercept = 8, linetype="dashed") + -->
<!--     theme(axis.text.x = element_blank(), -->
<!--           axis.title.x = element_blank(), -->
<!--           legend.position = "none")  -->
<!--   tmp1_plot_colSurv -->


<!--   tmp1_thinned_fit %>% filter(log10>8) -->


<!--   tmp1_thinned_fit <- merge(tmp1_thinned, by.x="genpos", fit1, by.y="genpos") -->
<!--   head(tmp1_thinned_fit) -->
<!--   tmp1_thinned_fit$col_fit <- rep(1, nrow(tmp1_thinned_fit)) -->
<!--   tmp1_thinned_fit$col_fit[tmp1_thinned_fit$beta.y <= 0] <- 0 -->


<!--   tmp1_plot_wfitness <-  ggplot(tmp1_thinned_fit) +  -->
<!--     geom_point(aes(x=genpos, y=log10, color=as.factor(col_fit))) +  -->
<!--     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))+ -->
<!--     ylab(expression(-log[10](P))) + -->
<!--     geom_hline(yintercept = thresh, linetype="dashed") + -->
<!--     theme(axis.text.x = element_blank(), -->
<!--           axis.title.x = element_blank(), -->
<!--           legend.position = "none")  -->
<!--   tmp1_plot_wfitness -->

<!--   ftd13covar_wSurvival <- tmp1_plot_wfitness -->
<!--   ftd13covar_wSeeds <- tmp1_plot_wfitness -->
<!--   ftd13covar_wFitness <- tmp1_plot_wfitness -->

<!--   fitness_by_covaryFTD13 <- plot_grid(ftd13covar_wFitness, ftd13covar_wSurvival, ftd13covar_wSeeds, ncol=1, nrow=3, -->
<!--             labels=c("fitness", "survival", "seeds")) -->
<!--   saveRDS(fitness_by_covaryFTD13, file="./figs/tmpobjects/fitness_by_covaryFTD13.rda") -->
<!--   pdf(file="./figs/fitness_by_covaryFTD13.pdf") -->
<!--   png(file="./figs/fitness_by_covaryFTD13.png", units="in",res=72, width = 10, height = 8, bg="transparent") -->
<!--   fitness_by_covaryFTD13 -->
<!--   dev.off() -->


<!--   ##========================================================## -->
<!-- }else{ -->
<!--   covary_ftd13 <- readRDS(file="./figs/tmpobjects/covaryFTD13_GWAS_qqplot.rda") -->
<!--   covary_ftd13 -->
<!-- } -->


<!-- ```   -->


################################################################################
## IV.2 Mapping Selection on Alleles
################################################################################


```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=10, fig.cap=" Manhattan plots of multivariate GWA with flowering time and delta_C13 without genetic pcs, colored by the three fitness measures from Exposito-Alonso et al. 2018."}

#### Map Selection of Alleles on FT & dC13 mGWA
RERUN=F
if(RERUN){
  
  library(devtools)
  install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa
  ### ======================================================== ###
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  thresh <- -log10(0.05/nrow(tmp1))
  
  ### plot the alleles by the color of selection 
  
  ## SURVIVAL
  surv <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSurvival_fruit_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(surv)
  
  tmp1_surv <- merge(tmp1, surv, by="rs")
  head(tmp1_surv)
  tmp1_surv$col_fit <- rep(1, nrow(tmp1_surv))
  tmp1_surv$col_fit[tmp1_surv$beta <= 0] <- 0
  
  ## colored plots, how to change colors??
  ft_dc13_survival <- ggman(tmp1_surv, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) + ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_dc13_survival
  
  
  ## Lifetime Fitness
  lfitness <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rFitness_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(lfitness)
  
  tmp1_fitness <- merge(tmp1, lfitness, by="rs")
  head(tmp1_fitness)
  tmp1_fitness$col_fit <- rep(1, nrow(tmp1_fitness))
  tmp1_fitness$col_fit[tmp1_fitness$beta <= 0] <- 0
  
  thresh <- -log10(0.05/nrow(tmp1_fitness))
  ## colored plots, how to change colors??
  ft_dc13_lfitness <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) +  ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_dc13_lfitness
  
  pdf(file="./figs/ft_dc13_lfitness_manhatPlot.pdf", useDingbats = F)
  ft_dc13_lfitness
  dev.off()
  
  ## special no axis/lines plot
    ## special plot that is mostly gray and black, except high value points
  ## make chr 1, 3, and 5 ==  2
  ## make chr 2, 4 == 3
  tmp1_fitness$chr_col <- rep(2, nrow(tmp1_fitness))
  tmp1_fitness$chr_col[tmp1_fitness$chr.x==2|tmp1_fitness$chr.x==4] <- 3
  tmp1_fitness$chr_col[-log10(tmp1_fitness$p_score.x)>thresh] <-    tmp1_fitness$col_fit[-log10(tmp1_fitness$p_score.x)>thresh]
  
  tmp1_fitness[sort(tmp1_fitness$p_score.x, decreasing = F)[1:256],]
  -log10(0.00000652)
  
    ft_dc13_fitness_gray <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_hline(yintercept = 5.18, linetype="dashed") +
    geom_point(aes(color=as.factor(chr_col), size=as.factor(chr_col)), shape=19) + ylim(2, 8.5) + ylab("") + xlab("") +
    scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd")) + scale_size_manual(values=c("1"=2, "0" = 2, "2"=0.75, "3"=0.75))  +
       theme(legend.position = "none")
  ft_dc13_fitness_gray
  
  gene_pos_raw <- read.table(file="./data/gene_pos_raw.tsv", header=T)
  head(gene_pos_raw)
 df <- gene_pos_raw
  get_alleles_in_genes <- function(df, big_df){
    output_list <- c()
    for (i in 1:nrow(df)){
      snps <- big_df %>% filter(chr.x==df[i,"chr"]) %>% 
        filter(ps.x > df[i,"start"]) %>% 
        filter(ps.x < df[i,"end"])
      output_list <- c(output_list, snps$rs)
    }
    return(output_list)
  }
  
  alles_to_plot <- get_alleles_in_genes(gene_pos_raw, tmp1_fitness)
  
    fri_flc_plot <- ggmanHighlight(ft_dc13_fitness_gray, highlight = alles_to_plot)
  fri_flc_plot
  
  # tmp1_fitness$highlight <- 0
  # tmp1_fitness$highlight[tmp1_fitness$rs %in% alles_to_plot] <- 1
  # 
  
  # hghlight_plot <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_hline(yintercept = 5.18, linetype="dashed") +
  #   geom_point(aes(color=as.factor(highlight)),size=1, shape=19) + ylim(2, 8.5) + ylab("") + xlab("") +
  #   scale_color_manual(values=c("1"="#bd0026", "0"="#bdbdbd")) +
  #      theme(legend.position = "none")
  # 
  # hghlight_plot
  
  
  pdf(file="./figs/NoteableAlles_plotted_onGWA.rda", width = 7, height = 2.5)
  fri_flc_plot
  dev.off()
  
  
  png(file="./figs/ft_dc13_manhatPlot_fitsel.png", units="in",res=2400, width = 7, height = 3, bg="transparent")
  ft_dc13_fitness_gray
  dev.off()
  
  
  ## Seeds
  seeds <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSeeds_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(seeds)
  
  tmp1_seeds <- merge(tmp1, seeds, by="rs")
  head(tmp1_seeds)
  tmp1_seeds$col_fit <- rep(1, nrow(tmp1_seeds))
  tmp1_seeds$col_fit[tmp1_seeds$beta <= 0] <- 0
  
  ## colored plots, how to change colors??
  ft_dc13_seeds <- ggman(tmp1_seeds, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) +   ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_dc13_seeds
  
  
  bigplot <- plot_grid(ft_dc13_survival, ft_dc13_lfitness, ft_dc13_seeds, nrow=3, ncol=1, labels=c("survival", "lifetime fitness", "seeds"))
  pdf(file="./figs/FT_dC13_mGWA_colorByAllFitness.pdf")
  bigplot
  dev.off()
  
  saveRDS(bigplot, file="./figs/tmpobjects/FT_dC13_mGWA_colorByAllFitness.rda")
  
}else{
  ft_dC13_mGWA_coloredByAllFitness <- readRDS(file="./figs/tmpobjects/FT_dC13_mGWA_colorByAllFitness.rda")
  ft_dC13_mGWA_coloredByAllFitness
}
  
```


```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=10, fig.cap="Manhattan plot of multivariate GWA with flowering time and growth rate without genetic pcs, colored by the three fitness measures from Exposito-Alonso et al. 2018"}
#### Map Selection of Alleles on FT & GR mGWA
RERUN=F
if(RERUN){
  
  # library(devtools)
  # install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  
  ### ======================================================== ###
  ## multivariate gwa
  ### ======================================================== ###
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  thresh <- -log10(0.05/nrow(tmp1))
  
  ### plot the alleles by the color of selection 
  
  ## SURVIVAL
  surv <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSurvival_fruit_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(surv)
  
  tmp1_surv <- merge(tmp1, surv, by="rs")
  head(tmp1_surv)
  tmp1_surv$col_fit <- rep(1, nrow(tmp1_surv))
  tmp1_surv$col_fit[tmp1_surv$beta <= 0] <- 0
  
  ## colored plots, how to change colors??
  ft_GR_survival <- ggman(tmp1_surv, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) + ylim(2.5, 5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026")) 
  ft_GR_survival
  
  
  ## Lifetime Fitness
  lfitness <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rFitness_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(lfitness)
  
  tmp1_fitness <- merge(tmp1, lfitness, by="rs")
  head(tmp1_fitness)
  tmp1_fitness$col_fit <- rep(1, nrow(tmp1_fitness))
  tmp1_fitness$col_fit[tmp1_fitness$beta <= 0] <- 0
  
  ft_GR_lfitness <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) +  ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_GR_lfitness
  
  pdf(file="./figs/FT_GR_mGWA_colorByAllFitness.pdf")
  bigplot
  dev.off()
  
  ##special lfitness plot for ft and gr
  ## make chr 1, 3, and 5 ==  2
  ## make chr 2, 4 == 3
  tmp1_fitness$chr_col <- rep(2, nrow(tmp1_fitness))
  tmp1_fitness$chr_col[tmp1_fitness$chr.x==2|tmp1_fitness$chr.x==4] <- 3
  tmp1_fitness$chr_col[-log10(tmp1_fitness$p_score.x)>thresh] <- tmp1_fitness$col_fit[-log10(tmp1_fitness$p_score.x)>thresh]
  
  index <- seq(1, nrow(tmp1_fitness), by=10)
  half_tmp1_fitness <- tmp1_fitness[index,]
  ft_GR_fitres_gray <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  
    geom_point(aes(color=as.factor(chr_col)), size=.75, shape=19) + ylim(2, 8.5) +
    scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd"))
    #scale_color_manual(values=c("#636363","#bdbdbd")) 
  ft_GR_fitres_gray

  pdf(file="./figs/ft_GR_fitres_gray.pdf", width = 5, height=2)
  ft_GR_fitres_gray
  dev.off()
  
  #tmp1_fitres_chr1 <- tmp1_fitres %>% filter(chr.x==1)
    ft_GR_gray_all <- ggman(tmp1_fitness, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  
    geom_point(aes(color=as.factor(chr_col), size=as.factor(chr_col)), shape=19) + ylim(2, 8.5) + ylab("") + xlab("") +
    scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd")) +
      scale_size_manual(values=c("1"=2, "0" = 2, "2"=0.75, "3"=0.75))+
       theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank()) +
      theme_void() +
      theme(legend.position = "none")
  ft_GR_gray_all
  
  png(file="./figs/ft_GR_gray_all_highlight.png", units="in",res=2400, width = 7, height = 1.9, bg="transparent")
  ft_GR_gray_all
  dev.off()
  
  
  ## Seeds
  seeds <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSeeds_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score) 
  head(seeds)
  
  tmp1_seeds <- merge(tmp1, seeds, by="rs")
  head(tmp1_seeds)
  tmp1_seeds$col_fit <- rep(1, nrow(tmp1_seeds))
  tmp1_seeds$col_fit[tmp1_seeds$beta <= 0] <- 0
  
  ## colored plots, how to change colors??
  ft_GR_seeds <- ggman(tmp1_seeds, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) +   ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_GR_seeds
  
  
  bigplot <- plot_grid(ft_GR_survival, ft_GR_lfitness, ft_GR_seeds, nrow=3, ncol=1, labels=c("survival", "lifetime fitness", "seeds"))
  pdf(file="./figs/FT_GR_mGWA_colorByAllFitness.pdf")
  bigplot
  dev.off()
  
  saveRDS(bigplot, file="./figs/tmpobjects/FT_GR_mGWA_colorByAllFitness.rda")
  
}else{
  ft_GR_mGWA_coloredByAllFitness <- readRDS(file="./figs/tmpobjects/FT_GR_mGWA_colorByAllFitness.rda")
  ft_GR_mGWA_coloredByAllFitness
}
  
```

#### Control for growth rate selective effect from survival and lifetime fitnes

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=10, fig.cap="Figure SIII.21 mGWA of flowering time and delta_C13 colored by selection coefficients being either positively selected (green) or negatively selected (red). Selection coefficients are from top) a GWA with residuals from a model of lifetime fitness as a function of growth rate and bottom) a GWA with residuals from a model of survival as a function of growth rate"}
RERUN=F
if(RERUN){

  pheno_raw <- read.table(file="./data/atlas1001_phenotypes_matrix_MR.csv", 
                           header = T, sep=",")
  dat <- pheno_raw %>% select(id, rFitness_mlp, rSurvival_fruit_mlp, Growth_rate)
  head(dat)
  
  dat <- na.omit(dat)
  ## get residuals of fitness after accounting for growth rate
  fit_gr_lm <- lm(rFitness_mlp ~ Growth_rate, data=dat)
  fit_gr_lm.res = resid(fit_gr_lm)
  length(fit_gr_lm.res)
  fit_gr <- data.frame(id = dat$id, grres = fit_gr_lm.res)
  
  surv_gr_lm <- lm(rSurvival_fruit_mlp ~ Growth_rate, data=dat)
  surv_gr_lm.res = resid(surv_gr_lm)
  surv_gr <- data.frame(id = dat$id, sures = surv_gr_lm.res)
  
  fam <- read.table(file="../1001g/1001gbi.fam")
  fam <- fam[,1:5]
  head(fam)
  new_fam <- merge(fam, by.x="V1", fit_gr, by.y="id", all.x=T)
  new_fam <- merge(new_fam, by.x="V1", surv_gr, by.y="id", all.x=T)
  head(new_fam)
  1135-sum(is.na(new_fam[,7]))
  
  new_fam$grres[is.na(new_fam$grres)] <- -9
  new_fam$sures[is.na(new_fam$sures)] <- -9
  head(new_fam)
  write.table(new_fam, file="./FitnessResiduals/1001gbi.fam", quote = F, col.names = F, row.names = F)
  
  system(paste('ln -f ../1001g/1001gbi.bim ', paste0('./FitnessResiduals/','1001gbi.bim')))
  system(paste('ln -f ../1001g/1001gbi.bed ', paste0('./FitnessResiduals/', '1001gbi.bed')))
  system(paste('ln -f ../1001g/1001gbi.sXX.txt ', paste0('./FitnessResiduals/','1001gbi.sXX.txt')))

  ## change output file and -n to change phenotype - either fitness residuals or survival residuals
  setwd("~/safedata/natvar/")
  write.table(quote=F,row.names=F,col.names=F,
                   file="./FitnessResiduals/rungwa_sur.sh",
                   x=rbind(
                     "#!/bin/bash",
                     "#SBATCH --cpus-per-task=2",
                     "#SBATCH --mem-per-cpu=4G",
                     "#SBATCH --partition=DPB",
                     "#SBATCH --job-name=Survresiduals",
                     "#SBATCH --output=Survresiduals.slurm.log",
                     "./gemma -bfile 1001gbi -miss 0.1 -maf 0.05 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 2 -o Survresiduals")
                   )
  setwd("./FitnessResiduals/")
  system("sbatch rungwa_sur.sh")
  
  ### load results and color ft and dC13 mGWA by selection coefs.
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>% dplyr::select(chr,ps, rs, af, beta_1, beta_2, p_score)
  head(tmp1)
  
  ## or load results for ft and GR mGWA and color by lifetime fitness
  # tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Growth_rate/output/mGWAS_FT16_Growth_rate.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  # head(tmp1)
  
  thresh <- -log10(0.05/nrow(tmp1))
  
  ### plot the alleles by the color of selection 
  ## fitness residuals
  setwd("~/safedata/natvar")
  fit_res <- data.table::fread(file="./FitnessResiduals/output/Fitresiduals.assoc.txt") %>% dplyr::select(chr,ps, rs,af, beta, se, p_score)
  head(fit_res)
  
  tmp1_fitres <- merge(tmp1, fit_res, by="rs")
  head(tmp1_fitres)
  tmp1_fitres$col_fit <- rep(1, nrow(tmp1_fitres))
  tmp1_fitres$col_fit[tmp1_fitres$beta <= 0] <- 0
  
  thresh <- -log10(0.05/nrow(tmp1_fitres))
  limit <- quantile(tmp1_lifeFit$p_score.x, prob=0.0005)
  top_hits <- tmp1_fitres[tmp1_fitres$p_score.x < limit, ]
  
   mean(unlist(top_hits[c(71:97),"beta"]))
  mean(unlist(top_hits[c(71:97),"se"]))
  
  
  ## colored plots, how to change colors??
  ft_dc13_fitres <- ggman(tmp1_fitres, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit)), size=.5, shape=21) + ylim(4, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_dc13_fitres
  
  pdf(file="./figs/ft_dc13_fitGRResiduals_manhatPlot_color.pdf", width = 5, height=2)
  ft_dc13_fitres
  dev.off()
  
  ## special plot that is mostly gray and black, except high value points
  head(tmp1_fitres)
  gwas.sig <- tmp1_fitres[-log10(tmp1_fitres$p_score.x)>thresh,]
  
  ## make chr 1, 3, and 5 ==  2
  ## make chr 2, 4 == 3
  tmp1_fitres$col_fit <- rep(1, nrow(tmp1_fitres))
  tmp1_fitres$col_fit[tmp1_fitres$beta <= 0] <- 0
  tmp1_fitres$chr_col <- rep(2, nrow(tmp1_fitres))
  tmp1_fitres$chr_col[tmp1_fitres$chr.x==2|tmp1_fitres$chr.x==4] <- 3
  tmp1_fitres$chr_col[-log10(tmp1_fitres$p_score.x)>thresh] <- tmp1_fitres$col_fit[-log10(tmp1_fitres$p_score.x)>thresh]
  
  ## now new column for size
  
  index <- seq(1, nrow(tmp1_fitres), by=10)
  half_tmp1_fitres <- tmp1_fitres[index,]
  ft_dc13_fitres_gray <- ggman(half_tmp1_fitres, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  
    geom_point(aes(color=as.factor(chr_col)), size=.75, shape=19) + ylim(2, 8.5) +
    scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd"))
    #scale_color_manual(values=c("#636363","#bdbdbd")) 
  ft_dc13_fitres_gray

  pdf(file="./figs/ft_dc13_fitGRResiduals_manhatPlot_gray.pdf", width = 5, height=2)
  ft_dc13_fitres_gray
  dev.off()
  
  tmp1_fitres_chr1 <- tmp1_fitres %>% filter(chr.x==1)
    ft_dc13_fitres_gray_all <- ggman(tmp1_fitres_chr1, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  
    geom_point(aes(color=as.factor(chr_col), size=as.factor(chr_col)), shape=19) + ylim(2, 8.5) + ylab("") + xlab("") +
    scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd")) +
      scale_size_manual(values=c("1"=2, "0" = 2, "2"=0.75, "3"=0.75))+
       theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank()) +
      theme_void() +
      theme(legend.position = "none")

  ft_dc13_fitres_gray_all
  
   png(file="./figs/ft_GR_fitGRResiduals_manhatPlot_gray_chr1only.png", units="in",res=2400, width = 1.8, height = 1.9, bg="transparent")
  ft_GR_fitres_gray_all
  dev.off()
  
  
  png(file="./figs/ft_dc13_fitGRResiduals_manhatPlot_gray_chr1only.png", units="in",res=2400, width = 1.8, height = 1.9, bg="transparent")
  ft_dc13_fitres_gray_all
  dev.off()
  
  
  ### Survival residuals
  surv_res <- data.table::fread(file="./FitnessResiduals/output/Survresiduals.assoc.txt") %>% dplyr::select(chr,ps, rs,af, beta, se, p_score)
  head(surv_res)
  
  tmp1_surv_res <- merge(tmp1, surv_res, by="rs")
  head(tmp1_surv_res)
  tmp1_surv_res$col_fit <- rep(1, nrow(tmp1_surv_res))
  tmp1_surv_res$col_fit[tmp1_surv_res$beta <= 0] <- 0
  
  
    limit <- quantile(tmp1_surv_res$p_score.x, prob=0.0005)
  top_hits <- tmp1_surv_res[tmp1_surv_res$p_score.x < limit, ]
  
   mean(unlist(top_hits[c(70:96),"beta"]))
  mean(unlist(top_hits[c(70:96),"se"]))
  
  ## colored plots, how to change colors??
  ft_dc13_survres <- ggman(tmp1_surv_res, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +  geom_point(aes(color=as.factor(col_fit))) + ylim(2.5, 8.5) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026"))
  ft_dc13_survres
  
  residuals_fitness_manhatPlots <- plot_grid(ft_dc13_fitres, ft_dc13_survres, nrow=2,
                                             labels=c("fitness~growth rate residuals", "survival~growth rate residuals"))
  residuals_fitness_manhatPlots
  saveRDS(residuals_fitness_manhatPlots, file="./figs/tmpobjects/residuals_fitness_manhatPlots.rda")
  
  pdf(file="./figs/residuals_fitness_manhatPlots.pdf")
  residuals_fitness_manhatPlots
  dev.off()
  
  
}else{
    residuals_fitness_manhatPlots <- readRDS(file="./figs/tmpobjects/residuals_fitness_manhatPlots.rda")
    residuals_fitness_manhatPlots
}

```

#### Scatter plot

```{r, echo=F, eval=T, message=F, warning=F, fig.width=7, fig.height=4, fig.cap="Figure SIII.22 Top 0.05% SNPs from mGWA of flowering time and deltaC_13, plotted by their effect size estimated in both traits, colored by their selection coefficient estimated from linear models of left) lifetime fitness as a function of growth rate and right) survival as a function of growth rate."}
RERUN=F
if(RERUN){
  library(dplyr)
  library(data.table)
  tmp1<-data.table::fread(file="./multivarGWAS/mGWAS_FT16_Delta_13C/output/mGWAS_FT16_Delta_13C.assoc.txt") %>% dplyr::select(chr,ps, rs, af, beta_1, beta_2, p_score) 
  head(tmp1)
  
  ### plot the alleles by the color of selection
  ## Lifetime Fitness
  lfitness <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rFitness_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% dplyr::select(chr,ps, rs,af, beta, se, p_score) 
  head(lfitness)
  
  ## merge mGWA and lifetime fitness gwa
  tmp1_lifeFit <- merge(tmp1, lfitness, by="rs")
  head(tmp1_lifeFit)
  tmp1_lifeFit$col_fit <- rep(1, nrow(tmp1_lifeFit))
  tmp1_lifeFit$col_fit[tmp1_lifeFit$beta <= 0] <- 0
  
  dim(tmp1_lifeFit)
  saveRDS(tmp1_lifeFit, file="data/mGWA_FT&dC13_andLFitnessGWA_all.rda")
  
  ## filter out to the top .1% of p-value hits; resulting in 547 hits
  ## or fitler to top 0.05% for 274 hits
  thresh <- -log10(0.5/nrow(tmp1))
  top_hits <- tmp1_lifeFit[-log10(tmp1_lifeFit$p_score.x) > thresh, ]
  saveRDS(top_hits, file="data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
   top_hits<- readRDS(file="data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
  
  limit <- quantile(tmp1_lifeFit$p_score.x, prob=0.0005)
  -log10(limit)
  top_hits <- tmp1_lifeFit[tmp1_lifeFit$p_score.x < limit, ]
  dim(top_hits)
  #5.137613  limit for .05% of top SNPs
    saveRDS(top_hits, file="data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
   top_hits<- readRDS(file="data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
  top_hits<-d
  
  cor.test(top_hits$beta_1, top_hits$beta_2, method = "pearson")
  
  ft_d13_scatterPlot_lf <- ggplot(top_hits) + geom_point(aes(x=beta_1, y=beta_2, color=beta, size=beta)) +
    #scale_color_gradient2(low="#78c679", mid="white", high="#bd0026") +
    scale_color_gradient2(low="red3", mid="white", high="green3") + ## switch this color scale
    geom_hline(yintercept = 0, linetype = "dashed", color="black", linewidth=0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", color="black", linewidth=0.5) +
    xlab("Allele effect in flowering time") + ylab("Allele effect in WUE")  
  
  
  top_hits$rs %in% chrm1_peakhits
  length(unlist(top_hits[c(97:111, 113:123),"beta"]))
 mean(unlist(top_hits[c(97:111, 113:123),"beta"]))
  mean(unlist(top_hits[c(97:111, 113:123),"se"]))
  
  
  top_hits %>% filter(beta_2 > 0.3 & beta_1 < 0)
  top_hits %>% filter(beta_2 < -0.3 & beta_1 > -0.2)
  
  ## Fischer exact test
  
  ##        FT-   FT+
  ## WUE+    1    210
  ## WUE-    60    3
  
  probs <- c(0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.55, 0.6, 0.7, 0.8, 0.9)
  results <- data.frame(probs=probs, pvalue=NA, lower=NA, upper=NA, oddr=NA)
  i<-7
  for (i in 1:length(probs)){
    limit <- quantile(tmp1_lifeFit$p_score.x, prob=probs[i])
    top_hits <- tmp1_lifeFit[tmp1_lifeFit$p_score.x < limit, ]
    
    cont.tab <- matrix(NA,2,2)
    cont.tab[1,1] <- sum(top_hits$beta_1<0 & top_hits$beta_2>0)
    cont.tab[1,2] <- sum(top_hits$beta_1>0 & top_hits$beta_2>0)
    cont.tab[2,1] <- sum(top_hits$beta_1<0 & top_hits$beta_2<0)
    cont.tab[2,2] <- sum(top_hits$beta_1>0 & top_hits$beta_2<0)
    
    test <- fisher.test(cont.tab)
    results$pvalue[i] <- test$p.value
    results$lower[i] <- test$conf.int[1]
    results$upper[i] <- test$conf.int[2]
    results$oddr[i] <- test$estimate
  }
  
  
  limit <- quantile(tmp1_lifeFit$p_score.x, prob=0.0005)
  -log10(limit)
  top_hits <- tmp1_lifeFit[tmp1_lifeFit$p_score.x < limit, ]
  
  cont.tab <- matrix(NA,2,2)
  cont.tab[1,1] <- sum(top_hits$beta_1<0 & top_hits$beta_2>0)
  cont.tab[1,2] <- sum(top_hits$beta_1>0 & top_hits$beta_2>0)
  cont.tab[2,1] <- sum(top_hits$beta_1<0 & top_hits$beta_2<0)
  cont.tab[2,2] <- sum(top_hits$beta_1>0 & top_hits$beta_2<0)
  
  mosaicplot(cont.tab,
  main = "Mosaic plot",
  color = TRUE
)
  chisq.test(cont.tab)$expected
  fisher.test(cont.tab)
  
    # theme(legend.position = "none",
    #     panel.grid = element_blank(),
    #     axis.title = element_blank(),
    #     axis.text = element_blank(),
    #     axis.ticks = element_blank(),
    #     panel.background = element_blank()) +
    #   theme_void() +
    #   theme(legend.position = "none")
  ft_d13_scatterPlot_lf
  
  pdf(file="./figs/ft_d13_scatterPlot_lifetimefitness.pdf", width = 5, height=5)
  ft_d13_scatterPlot_lf
  dev.off()
  
  #   ##save png of the points only
  # png(file="./figs/ft_d13_scatterPlot_lf_pointsONLY.png", units="in", res=12000, width = 2.2, height = 2.6, bg="white")
  # ft_d13_scatterPlot_lf
  # dev.off()
  
  saveRDS(ft_d13_scatterPlot_lf, file="./figs/tmpobjects/ft_d13_scatterPlot_lf.rda")
  
  ## SURVIVAL
  surv <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSurvival_fruit_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% dplyr::select(chr,ps, rs,af, beta, se, p_score) 
  head(surv)
  
  tmp1_surv <- merge(tmp1, surv, by="rs")
  head(tmp1_surv)
  tmp1_surv$col_fit <- rep(1, nrow(tmp1_surv))
  tmp1_surv$col_fit[tmp1_surv$beta <= 0] <- 0
  
  limit <- quantile(tmp1_surv$p_score.x, prob=0.0005)
  top_hits <- tmp1_surv[tmp1_surv$p_score.x < limit, ]
  
  ft_d13_scatterPlot_surv <- ggplot(top_hits) + geom_point(aes(x=beta_1, y=beta_2, color=beta, size=beta)) +
    #scale_color_gradient2(low="#78c679", mid="white", high="#bd0026") +
    scale_color_gradient2(low="red3", mid="white", high="green3") +
    geom_hline(yintercept = 0, linetype = "dashed", color="black", linewidth=0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", color="black", linewidth=0.5) +
    xlab("Allele effect in flowering time") + ylab("Allele effect in WUE") 
  #   theme(legend.position = "none",
  #       panel.grid = element_blank(),
  #       axis.title = element_blank(),
  #       axis.text = element_blank(),
  #       axis.ticks = element_blank(),
  #       panel.background = element_blank()) +
  #     theme_void() +
  #     theme(legend.position = "none")
   ft_d13_scatterPlot_surv
  
   
     top_hits$rs %in% chrm1_peakhits
  length(unlist(top_hits[c(97:123),"beta"]))
 mean(unlist(top_hits[c(97:123),"beta"]))
  mean(unlist(top_hits[c(97:123),"se"]))
   
   ## FECUNDITY
  fec <- data.table::fread(file="./phenotypes/Exposito-Alonso_Nature_2019_PID_31462776/1001/rSeeds_mlp/output/Exposito-Alonso_Nature_2019_PID_31462776.lmm.assoc.txt") %>% dplyr::select(chr,ps, rs,af, beta, se, p_score) 
  head(fec)
  
  tmp1_fec <- merge(tmp1, fec, by="rs")
  head(tmp1_fec)
  tmp1_fec$col_fit <- rep(1, nrow(tmp1_fec))
  tmp1_fec$col_fit[tmp1_fec$beta <= 0] <- 0
  
  limit <- quantile(tmp1_fec$p_score.x, prob=0.0005)
  top_hits <- tmp1_fec[tmp1_fec$p_score.x < limit, ]
   
  length(unlist(top_hits[c(97:123),"beta"]))
 mean(unlist(top_hits[c(97:123),"beta"])) 
 
  mean(unlist(top_hits[c(97:123),"se"]))
  
  
  Big_scatterPlot <- plot_grid(ft_d13_scatterPlot_lf, ft_d13_scatterPlot_surv, labels = c("lifetime fitness", "survival"))
  Big_scatterPlot 
    
  
  pdf(file="./figs/ft_d13_scatterPlot_lf_and_survival.pdf", width = 12, height=5)
  Big_scatterPlot
  dev.off()
  

  saveRDS(Big_scatterPlot, file="./figs/tmpobjects/ft_d13_scatterPlot_lf_and_survival.rda")
  
  
}else{
  Big_scatterPlot <- readRDS(file="./figs/tmpobjects/ft_d13_scatterPlot_lf_and_survival.rda")
  Big_scatterPlot
}
```

### Allele frequency change 

```{r, echo=F, eval=T, message=F, warning=F, fig.width=7, fig.height=4}

library(dplyr)
library(ggplot2)
library(cowplot)
theme_set(theme_cowplot())
library(RColorBrewer)

d<-readRDS("~/safedata/natvar/data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
d<-readRDS("./data/mGWA_FT&dC13_andLFitnessGWA_top05hits.rda")
tophits<-d
head(d)

2*f*(1-f)*b.alt^2

sum(2*d$af.x*(1-d$af.x) * d$beta_1^2 *d$beta_2^2)
sum(2*d$af.x*(1-d$af.x) * (d$beta_2^2))


# ggplot(d) +
#   geom_point(aes(y= beta, x= beta_1))+
#   geom_abline(intercept =  0, slope = 0, lty='dashed', col='lightgrey')
#
# ggplot(d) +
#   geom_point(aes(y= beta, x= beta_2))+
#   geom_abline(intercept =  0, slope = 0, lty='dashed', col='lightgrey')
head(d)
pdf("~/safedata/natvar/figs/gwa-wue-ft-selection.pdf",width = 6,height = 5, useDingbats = F)
ggplot(d,aes(x= beta_1, y= beta_2, color=beta)) +
  geom_point(color="black")+
  geom_point()+
  #geom_smooth(x= beta_1, y= beta_2) +
  geom_abline(intercept =  0, slope = 0, lty='dashed', col='lightgrey')+
  geom_vline(xintercept = 0, lty='dashed', col='lightgrey')+
  # scale_color_gradientn(colors = brewer.pal(9,"RdGn"))
# scale_color_gradient(low = "red", high = "green")
  scale_color_gradientn(colors = c(rev(brewer.pal(9,"Reds")), brewer.pal(9,"Greens")))

dev.off()

# Change in frequency
pdf("~/safedata/natvar/figs/gwa-wue-ft-selection-2.pdf",width = 6,height = 5)

d$deltap<-d$af.y * (1-d$af.y) * d$beta

#ggplot(d,aes(y= deltap, x= (beta_2 * beta_1)*abs(beta_1)/abs(beta_1), color=beta, size=abs(beta))) 
af_change_by_combined_effect <- ggplot(d,aes(y= deltap, x= (beta_2 * beta_1), color=beta)) +
  geom_point(color="black")+
  geom_point()+
  geom_abline(intercept =  0, slope = 0, lty='dashed', col='lightgrey')+
  geom_vline(xintercept = 0, lty='dashed', col='lightgrey')+
  # geom_smooth(method='glm')+
  labs(y="change in frequency (p)", x='wue')+
  # scale_color_gradient(low = "red", high = "green")
  scale_color_gradientn(colors = c(rev(brewer.pal(9,"Reds")), brewer.pal(9,"Greens")))

dev.off()

pdf("~/safedata/natvar/figs/gwa-wue-ft-AFchange_fig3c.pdf",width = 6,height = 5, useDingbats = F)
af_change_by_combined_effect
dev.off()
#
# dsub<-dplyr::filter(d,chr.x==1, ps.x  > 19700000, ps.x<19800000)
# ggplot(dsub,aes(y= deltap, x= beta_2, color=beta, size=beta)) +
#   geom_point()+
#   geom_abline(intercept =  0, slope = 0, lty='dashed', col='lightgrey')+
#   geom_vline(xintercept = 0, lty='dashed', col='lightgrey')+
#   geom_smooth(method='glm')+
#   labs(y="change in frequency (p)", x='effect on flowering time')+
#   scale_color_gradient(low = "red", high = "green")
# # scale_color_gradientn(colors = brewer.pal(9,"RdGn"))
#

attach(d)
table((beta_2 * beta_1)*abs(beta_1)/beta_1 > 0, beta>0)
table((beta_2 * beta_1)*abs(beta_1)/abs(beta_1) > 0, beta>0) %>% fisher.test()

```

################################################################################
# V. Breaking the Link with CRSPR
################################################################################

## Impute delta_C13

```{r, echo=F, eval=T, message=F, warning=F,}
RERUN=F
if(RERUN){
  
  # Uli data
  flcKo_dC13 <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/deltaC_extended_group_rep3.csv", sep=",", header=T)
  head(flcKo_dC13)
  
  ##imputed elswhere
  
  
}else{
  
}

```


```{r, echo=F, eval=F, message=F, warning=F, fig.cap="", fig.width=8, fig.height=8, fig.cap="Selection once genetic correlation has been altered by CRISPR editing."}
RERUN=F
if(RERUN){

  setwd("~/safedata/natvar/")
  
  ## drought exp. data
  seedweights<-read.csv("./LauraAnalysis/field_prep_year1/crispr/growth chamber/seed_weights.csv")
  head(seedweights)

  pheno<-read.csv("./LauraAnalysis/field_prep_year1/crispr/growth chamber/wrangled_data_crispr.csv")
  head(pheno)

  pheno<-merge(pheno,seedweights, by="NEW_ID") #deletes one individual... look into later
  pheno$weight_w_paper<-as.numeric(pheno$weight_w_paper)

  pheno$rSeedWeight <- pheno$weight_w_paper - 0.3992667
  mean(pheno$rSeedWeight, na.rm=T)
  hist(pheno$rSeedWeight)
  head(pheno)
  dim(pheno)

  #pheno$founder<-fn(pheno$founder)
  pheno$founder <- as.character(pheno$founder)
  pheno$founder[pheno$founder==""] <- "c"
  head(pheno)

  ### Look at natural ecotypes first
  founder_pheno <- pheno %>% filter(founder =="f")
  dim(founder_pheno)
  founder_pheno <- founder_pheno %>% filter(drought =="optimal")
  dim(founder_pheno)

  small_fdrPheno <- founder_pheno[, c("real_id", "days_to_flowering", "days_to_wilting", "rSeedWeight")]
  #small_fdrPheno <- na.omit(small_fdrPheno)
 uniq_ids <- unique(small_fdrPheno$real_id)
 
 
 days_to_flowering <- c()
 days_to_wilting <- c()
 rSeedWeight <- c()
 reps <- c()
 id <- uniq_ids[1]
 for (id in uniq_ids){
   tmp <- small_fdrPheno[small_fdrPheno$real_id==id,]
   reps <- nrow(tmp)
   days_to_flowering <- c(days_to_flowering, mean(tmp$days_to_flowering, na.rm=T))
   days_to_wilting <- c(days_to_wilting, mean(tmp$days_to_wilting, na.rm=T))
   rSeedWeight <- c(rSeedWeight, mean(tmp$rSeedWeight, na.rm=T))
 }
 small_fdrPheno <- data.frame(id=uniq_ids, reps=reps,
                              days_to_flowering, days_to_wilting, rSeedWeight)

 
  ## Load Dittberner imputed data
  dittb_dC13 <- read.table(file="./data/atlas1001_phenotype_matrix_imputed_withID.csv",
                           header = T, sep=" ")
  dittb_dC13 <- dittb_dC13 %>% select(id, Delta_13C)

  sum(dittb_dC13$id %in% unique(founder_pheno$real_id))
  small_fdrPheno <- merge(small_fdrPheno, by.x="id", dittb_dC13, by.y="id")
  dim(small_fdrPheno)
  ## adjust all Delta_C13 values by col-0 value; except Col-0 not used in dittberner, so use a close-by central Germany accession
  small_fdrPheno$adj_Delta_13C <- small_fdrPheno$Delta_13C/abs(-37.21)
  head(small_fdrPheno)

  ### plots
 # small_fdrPheno<- small_fdrPheno[!is.na(small_fdrPheno$rSeedWeight), ]
 # small_fdrPheno_2 <- small_fdrPheno[, c("rSeedWeight", "Delta_13C")]
 # is.na(small_fdrPheno_2)
 #
  cor.test(small_fdrPheno$adj_Delta_13C, small_fdrPheno$rSeedWeight)
  cor.test(small_fdrPheno$adj_Delta_13C, small_fdrPheno$days_to_flowering)
  cor.test(small_fdrPheno$rSeedWeight, small_fdrPheno$days_to_flowering)
  
  ggplot(small_fdrPheno, aes(y=days_to_flowering, x=adj_Delta_13C, color = rSeedWeight)) +
    geom_point(aes(size = rSeedWeight)) +
      scale_color_distiller(name = "rSeedWeight", 
                        palette = "RdYlGn", 
                        direction = 1, 
                        limits = c(0.007, 0.066))
  
  hist(na.omit(rSeedWeight))
  
  range(dat$rSeedWeight)
  ggplot(dat, aes(y=days_to_flowering.x, x=adj_mean_dC13KO, color = rSeedWeight)) +
    geom_point(aes(size = rSeedWeight)) +
      scale_color_distiller(name = "rSeedWeight", 
                        palette = "RdYlGn", 
                        direction = 1, 
                        limits = c(0.003, 0.061))
  
  small_fdrPheno$group <- rep("wt", nrow(small_fdrPheno))
  dat$group <- rep("mutant", nrow(small_fdrPheno))
  
  df <- data.frame(id = c(small_fdrPheno$id, dat$id),
                   ft = c(small_fdrPheno$days_to_flowering, dat$days_to_flowering.x),
                   seed = c(small_fdrPheno$rSeedWeight, dat$rSeedWeight),
                   wue = c(small_fdrPheno$adj_Delta_13C, dat$adj_mean_dC13KO),
                   group = c(small_fdrPheno$group, dat$group))
  
  ggplot(df, aes(y=ft, x=wue, color=seed, size=seed)) +
    geom_point(aes(pch=group), alpha=0.7) +
    scale_color_distiller(name = "rSeedWeight", 
                        palette = "RdYlGn", 
                        direction = 1)+
    scale_shape_manual(values = c(18,16)) + xlim(-1.02, -0.90)+
    geom_smooth(data = subset(df, group == "wt"), aes(group = group), method = "lm", se = T, color = "gray70", linetype = "dashed", size = 0.5, alpha=0.1) +
  geom_smooth(data = subset(df, group == "mutant"), aes(group = group), method = "lm", se = T, color = "gray70", linetype = "dashed", size = 0.5, alpha=0.1)
  
  
  wilcox_test_result <- wilcox.test(small_fdrPheno$rSeedWeight, dat$rSeedWeight)
    wilcox_test_result <- wilcox.test(small_fdrPheno$days_to_flowering, dat$days_to_flowering.x)
    wilcox_test_result <- wilcox.test(small_fdrPheno$adj_Delta_13C, dat$adj_mean_dC13KO)
  df$group <- factor(df$group, levels = c("wt", "mutant"))
g1 <- ggplot(df, aes(x = group, y = wue, fill = group)) +
  geom_violin(trim = F, width = 0.5, adjust = 3) +  # Adjusting parameters for smoother violins
  geom_jitter(width = 0.1, alpha = 0.5) +
  scale_fill_manual(values = c("#66c2a5", "#bf812d")) +  # Custom fill colors for the groups
  theme_minimal()
g1
  g2 <- ggplot(df, aes(x = group, y = ft, fill = group)) +
  geom_violin(trim = F, width = 0.5, adjust = 3)+
  geom_jitter(width = 0.1, alpha = 0.5) +
  scale_fill_manual(values = c("#66c2a5", "#bf812d")) +  # Custom fill colors for the groups
  theme_minimal()
  
  plot_grid(g1,g2)
  
 seedWeight_dC13 <- ggplot(small_fdrPheno, aes(x=adj_Delta_13C, y=rSeedWeight)) + geom_point(col="#b30000", size=1) +
    geom_smooth(aes(y=rSeedWeight, x=adj_Delta_13C),method="lm", col="#b30000", linewidth=.2) +
    theme(legend.position="none") + xlab("") +ylab("seed weight (g)") +
    #ylim(0, 0.07) + xlim(-1,-0.85)  +
   geom_text(x=-.9, y=0.05, label="r2 = -0.41, p < 8.2e-3", size=4)
    #geom_text(x=-32, y=0.05, label="r2 = -0.41, p < 8.2e-3", size=4)
 pdf(file="figs/test_pointsplot.pdf")
  plot_grid(seedWeight_dC13)
  dev.off()

  ## empty plot
  seedWeight_dC13_pointsonly <- ggplot(small_fdrPheno, aes(x=adj_Delta_13C, y=rSeedWeight)) + geom_point(col="#b30000", size=.6) +
    geom_smooth(aes(y=rSeedWeight, x=adj_Delta_13C),method="lm", col="#b30000", linewidth=.2) +
    theme(legend.position="none") + xlab("") +ylab("seed weight (g)") +
    #ylim(0, 0.07) + xlim(-1,-0.85)  +
    theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank()) +
      theme_void() +
      theme(legend.position = "none")
  seedWeight_dC13_pointsonly

  png(file="figs/CRSPR_seedWeight_dC13_pointsonly.png", units="in", res=10000, width = 2, height = 2.1, bg="white")
  seedWeight_dC13_pointsonly
  dev.off()

  cor.test(small_fdrPheno$adj_Delta_13C, small_fdrPheno$days_to_flowering)
  ft_dC13 <- ggplot(small_fdrPheno, aes(x=adj_Delta_13C, y=days_to_flowering)) + geom_point(col="black", size=1) +
    geom_smooth(aes(y=days_to_flowering, x=adj_Delta_13C),method="lm", col="black", linewidth=.2) +
    theme(legend.position="none") + xlab("") +ylab("flowering time (days)") +
    ylim(0, 220) + #xlim(-1,-0.83) +
    geom_text(x=-0.89, y=50, label="r2 = 0.39, p < 1.1e-2", size=4)
    #geom_text(x=-32.5, y=50, label="r2 = 0.39, p < 1.1e-2", size=4)
  ft_dC13

  saveRDS(small_fdrPheno, file="./data/fdr_growthchamber_exp.rda")
  ### CRISPR lines FLC KO data
  #flc_droughtexp_data <- small_fdrPheno
  #saveRDS(flc_droughtexp_data, file="./data/flc_droughtexp_data.rda")
  flc_droughtexp_data <- readRDS(file="./data/flc_droughtexp_data.rda")

  head(flc_droughtexp_data)
  ## load imputed mutant dC13 data
  imp_dC13 <- readRDS(file="./data/imputed_mutant_dC13.rda")
  dim(imp_dC13)

  dat <- merge(flc_droughtexp_data, by.x="id", imp_dC13, by.y="id", all.x=T)
  head(dat)

  ## get col-0 mean mutant d_C13
  flcKo_dC13 <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/deltaC_extended_group_rep3.csv", sep=",", header=T)
  flcKo_dC13_col <- flcKo_dC13 %>% filter(line=="COL")
flcKo_dC13_col$deltaC_mean_mean
  dat$adj_mean_dC13KO <- dat$mean_dC13_ko/abs(mean(flcKo_dC13_col$deltaC_mean_mean))

  cor.test(dat$adj_mean_dC13KO, dat$rSeedWeight)
   CSRP_seedWeight_dC13 <- ggplot(dat, aes(x=adj_mean_dC13KO, y=rSeedWeight.x)) + geom_point(col="#41ab5d", size=1) +
    geom_smooth(aes(y=rSeedWeight.x, x=adj_mean_dC13KO),method="lm", col="#41ab5d", linewidth=.2) +
    theme(legend.position="none") + xlab("") +ylab("seed weight (g)") +
    ylim(0, 0.07) + xlim(-1,-0.85) +
    geom_text(x=-.9, y=0.02, label="r2 = 0.62, p < 4.0e-6", size=4)
  CSRP_seedWeight_dC13

  cor.test(dat$adj_mean_dC13KO, dat$days_to_flowering.x)
  CSRP_ft_dC13 <- ggplot(dat, aes(x=adj_mean_dC13KO, y=days_to_flowering.x)) + geom_point(col="black", size=1) +
    geom_smooth(aes(y=days_to_flowering.x, x=adj_mean_dC13KO),method="lm", col="black", linewidth=.2) +
    theme(legend.position="none") + xlab("") +ylab("flowering time (days)") +
    ylim(0, 220) + xlim(-1,-0.85) +
    geom_text(x=-0.9, y=150, label="r2 = 0.45, p < 1.7e-3", size=4)
  CSRP_ft_dC13


  big_CRSPR_plot <- plot_grid(seedWeight_dC13, ft_dC13, CSRP_seedWeight_dC13, CSRP_ft_dC13, nrow=2, ncol=2)
  big_CRSPR_plot

  pdf(file="./figs/Crispr_4plot.pdf")
  big_CRSPR_plot
  dev.off()

  png(file="./figs/Crispr_4plot.png")
  big_CRSPR_plot
  dev.off()

  saveRDS(big_CRSPR_plot, file="./figs/tmpobjects/big_CRSPR_plot.rda")

}else{
  big_CRSPR_plot <- readRDS(file="./figs/tmpobjects/big_CRSPR_plot.rda")
  big_CRSPR_plot
}

```

#### WT/CRSIPR Scatter Plots

```{r, echo=F, eval=F}

setwd("./safedata/natvar")

  ## Laura and CLara GC exp. data
  gc_exp_dat <-read.csv("./LauraAnalysis/field_prep_year1/crispr/growth chamber/wrangled_data_crispr.csv")
  head(gc_exp_dat)
  ## fitness from GC exp. 
  seedweights<-read.csv("./LauraAnalysis/field_prep_year1/crispr/growth chamber/seed_weights.csv")
  head(seedweights)

  ## merger exp. trait data with fitness
  df<-merge(gc_exp_dat,seedweights, by="NEW_ID") #deletes one individual... look into later
  

    df$weight_w_paper<-as.numeric(df$weight_w_paper)
    df$rSeedWeight <- df$weight_w_paper - 0.3992667
    df$founder <- as.character(df$founder)
    df$founder[df$founder==""] <- "c"
    
    wt <- df %>% filter(founder =="f")
    mutant <- df %>% filter(founder=="c")

    wt_opt <- wt %>% filter(drought=="optimal")
    #wt_late <- wt %>% filter(drought=="late") ## almost all NAs
    #wt_early <- wt %>% filter(drought=="early") ## almost all NAs
    mutant_opt <- mutant %>% filter(drought=="optimal")
    mutant_late <- mutant %>% filter(drought=="late")
    mutant_early <- mutant %>% filter(drought=="early")
    
    #wt$group_id <- as.factor(paste(wt$real_id, wt$drought, sep = "_"))

   wrangle_Expdata <- function(dat){   
      uniq_ids <- unique(dat$real_id)
      days_to_flowering <- c()
      dtf_sd <- c()
      days_to_wilting <- c()
      rSeedWeight <- c()
      sw_sd <- c()
      reps <- c()
      
      for (id in uniq_ids){
        tmp <- dat[dat$real_id==id,]
        tmp <- tmp[!is.na(tmp$rSeedWeight),]
        reps <- c(reps, nrow(tmp))
        
        if (nrow(tmp)==1){
          days_to_flowering <- c(days_to_flowering, tmp$days_to_flowering)
          dtf_sd <- c(dtf_sd, NA)
          days_to_wilting <- c(days_to_wilting, tmp$days_to_wilting)
          rSeedWeight <- c(rSeedWeight, tmp$rSeedWeight)
          sw_sd <- c(sw_sd, NA)
        }else{
          days_to_flowering <- c(days_to_flowering, mean(tmp$days_to_flowering, na.rm=T))
          dtf_sd <- c(dtf_sd, sd(tmp$days_to_flowering))
          days_to_wilting <- c(days_to_wilting, mean(tmp$days_to_wilting, na.rm=T))
          rSeedWeight <- c(rSeedWeight, mean(tmp$rSeedWeight, na.rm=T))
          sw_sd <- c(sw_sd, sd(tmp$rSeedWeight))
        }
       }
      output <- data.frame(id=uniq_ids, reps=reps,
                             ft= days_to_flowering,ft_sd=dtf_sd, fit=rSeedWeight, sw_sd=sw_sd)
    return(output)
    }
  
 wt <- wrangle_Expdata(wt_opt)  
 mutant_opt <-  wrangle_Expdata(mutant_opt) 
 mutant_late <-  wrangle_Expdata(mutant_late) 
 mutant_early <-  wrangle_Expdata(mutant_early) 

 imp_dC13 <- readRDS(file="./data/imputed_mutant_dC13.rda")
 head(imp_dC13)
 dim(imp_dC13)
 
 ## start with optimal bc the most survived
 mutant_opt$group<- "mut"
 wt$group <- "wt"
 mutant_late$group <- "mut_late"
 mutant_early$group <- "mut_early"
 
 wt_dc13 <- merge(wt, by.x="id", imp_dC13[,c(1,2)], by.y="id", all.x=T )
 mut_dc13 <- merge(mutant_opt, by.x="id", imp_dC13[,c(1,3)], by.y="id", all.x=T )
 mut_late_dc13 <- merge(mutant_late, by.x="id", imp_dC13[,c(1,3)], by.y="id", all.x=T )
 mut_early_dc13 <- merge(mutant_early, by.x="id", imp_dC13[,c(1,3)], by.y="id", all.x=T )
 
 colnames(wt_dc13)[8] <- "wue"
 colnames(mut_dc13)[8] <- "wue"
 colnames(mut_late_dc13)[8] <- "wue"
 colnames(mut_early_dc13)[8] <- "wue"
 
 wt_dc13$adj_dc13 <-  wt_dc13$wue/abs(-37.21) ## close relative of col-0, a german wt accession
 mut_dc13$adj_dc13 <- mut_dc13$wue/abs(-31.14) ## d_c13 of col-0 by Uli
 mut_late_dc13$adj_dc13 <- mut_late_dc13$wue/abs(-31.14) ## d_c13 of col-0 by Uli
 mut_early_dc13$adj_dc13 <- mut_early_dc13$wue/abs(-31.14) ## d_c13 of col-0 by Uli
 
 dat<- rbind(wt_dc13, mut_dc13)
 head(dat)
 
 all_muts<-rbind(wt_dc13, mut_dc13, mut_late_dc13, mut_early_dc13)  
 

 # ft_uli <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/lines_info_annot.csv", sep=",", header=T) %>% select(mut_line_ID, FT16_1001)
 # dat2 <- merge(dat, by.x="line", ft_uli, by.y="mut_line_ID")
 # 
 unique(dat$group)
 normalize.<-function(x) (x-min(x,na.rm=T)) / (max(x,na.rm=T)-min(x,na.rm=T))
 dat$norm_fit<- normalize.(dat$fit)
 dat$group
 
unique(all_muts$group)
all_muts$norm_fit<- normalize.(all_muts$fit)

library(randomForest)
#wt_dc13[,c(3,4,5,6,8,9)] <- na.roughfix(wt_dc13[,c(3,4,5,6,8,9)])
cor.test(wt_dc13$fit, wt_dc13$wue)
cor.test(mut_dc13$fit, mut_dc13$adj_dc13)


cor.test(mut_early_dc13$ft, mut_early_dc13$adj_dc13)
cor.test(mut_late_dc13$ft, mut_late_dc13$adj_dc13)


### look at all mutants (early, opt, and late drought) and wt
 ft_wue_allTreatments_Plot <- ggplot(all_muts, aes(y=ft, x=adj_dc13, color=group)) +
  geom_point(stroke=3, alpha=0.7, pch=16) + xlim(-1.03,-0.89) +
   scale_size_continuous(range(0.5, 6)) +
   #scale_fill_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D", "mut_late"="#7f2704" , "mut_early"= "#fdae6b")) +
   scale_color_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D", "mut_late"="#7f2704" , "mut_early"= "#fdae6b")) +
      theme(panel.grid.major = element_line(color = "gray90"),  # Light major gridlines
        panel.grid.minor = element_blank()) +
       geom_smooth(data = subset(all_muts, group == "wt"), aes(y=ft, x=adj_dc13,group = group),     method = "lm", se = T, color = "#4E9D79", linetype = "solid", size = 1, alpha=0.2,     inherit.aes = F) + 
       geom_smooth(data = subset(all_muts, group == "mut"), aes(y=ft, x=adj_dc13,group = group),    method = "lm", se = T, color = "#CE6E2D", linetype = "solid", size = 1,alpha=0.2,     inherit.aes = F) +
   geom_smooth(data = subset(all_muts, group == "mut_early"), aes(y=ft, x=adj_dc13,group = group),    method = "lm", se = T, color = "#fdae6b", linetype = "solid", size =1,alpha=0.2,     inherit.aes = F) +
   geom_smooth(data = subset(all_muts, group == "mut_late"), aes(y=ft, x=adj_dc13,group = group),    method = "lm", se = T, color = "#7f2704", linetype = "solid", size = 1,alpha=0.2,     inherit.aes = F) 
       # geom_errorbar(data = dat2, 
       #          aes(x = adj_dc13, ymin = ft - ft_sd, ymax = ft + ft_sd, color=group), 
       #          width = 0, 
       #          inherit.aes = FALSE) 
 ft_wue_allTreatments_Plot
 ## leet's try keeping the color as in other plot, but now looking at relationship with fitness
 pdf(file="./figs/ft_wue_allTreatments_Plot.pdf", height = 5.5, width = 7)
  ft_wue_allTreatments_Plot
  dev.off()
 
  
  summary(lm(wt_dc13$fit~ wt_dc13$adj_dc13))
cor.test(normalize.(mut_dc13$fit), mut_dc13$adj_dc13)

  summary(lm(wt_dc13$fit~ wt_dc13$adj_dc13))
  cor.test(wt_dc13$fit, wt_dc13$ft)
  
cor.test(normalize.(mut_dc13$fit), mut_dc13$ft)

# cor.test(normalize.(mut_early_dc13$fit), mut_early_dc13$adj_dc13)
# cor.test(normalize.(mut_late_dc13$fit), mut_late_dc13$adj_dc13)
  
all_mut_fitness_scatter_plot<- ggplot(all_muts, aes(y=norm_fit, x=adj_dc13, color=group)) +
     geom_point(pch=16, size=3.5, alpha=0.65) +  # size=2.5, stroke=1
    xlim(-1.03,-0.89) +
    theme(panel.grid.major = element_line(color = "gray90"),  # Light major gridlines
        panel.grid.minor = element_blank()) +
    scale_color_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D", "mut_late"="#7f2704" , "mut_early"= "#fdae6b")) +
    geom_smooth(data = subset(all_muts, group == "wt"), aes(y=norm_fit, x=adj_dc13,group = group),     method = "lm", se = T, color = "#4E9D79", linetype = "solid", size = 1, alpha=0.2,     inherit.aes = F) + 
       geom_smooth(data = subset(all_muts, group == "mut"), aes(y=norm_fit, x=adj_dc13,group = group),    method = "lm", se = T, color = "#CE6E2D", linetype = "solid", size = 1,alpha=0.2,     inherit.aes = F) +
   geom_smooth(data = subset(all_muts, group == "mut_early"), aes(y=norm_fit, x=adj_dc13,group = group),    method = "lm", se = T, color = "#fdae6b", linetype = "solid", size =1,alpha=0.2,     inherit.aes = F) +
   geom_smooth(data = subset(all_muts, group == "mut_late"), aes(y=norm_fit, x=adj_dc13,group = group),    method = "lm", se = T, color = "#7f2704", linetype = "solid", size = 1,alpha=0.2,     inherit.aes = F) 
    all_mut_fitness_scatter_plot
    
  pdf(file="./figs/all_mut_fitness_scatter_plot.pdf", height = 4.5, width = 6)
  all_mut_fitness_scatter_plot
  dev.off()
    
   # geom_errorbar(data = all_muts, 
   #              aes(x = adj_dc13, ymin = norm_fit - sw_sd, ymax = norm_fit + sw_sd, color=group), 
   #              width = 0, 
   #              inherit.aes = FALSE) 
   # 
  
  
 dat2 <- dat
 fitness_scatter_plot<- ggplot(dat2, aes(y=norm_fit, x=adj_dc13, fill=group, color=group)) +
     geom_point(pch=21, size=3, stroke=1) +  # size=2.5, stroke=1
    xlim(-1.03,-0.89) +
    theme(panel.grid.major = element_line(color = "gray90"),  # Light major gridlines
        panel.grid.minor = element_blank()) +
    scale_fill_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D")) +
    scale_color_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D")) +
    geom_smooth(data = subset(dat2, group == "wt"), aes(y=norm_fit, x=adj_dc13,group = group), method = "lm", se = T, color = "#4E9D79", linetype = "solid", size = 1, alpha=0.2, inherit.aes = F) + 
   geom_smooth(data = subset(dat2, group == "mut"), aes(y=norm_fit, x=adj_dc13,group = group), method = "lm", se = T, color = "#CE6E2D", linetype = "solid", size = 1,alpha=0.2, inherit.aes = F) +
   geom_errorbar(data = dat2, 
                aes(x = adj_dc13, ymin = norm_fit - sw_sd, ymax = norm_fit + sw_sd, color=group), 
                width = 0, 
                inherit.aes = FALSE) 
fitness_scatter_plot


   fitness_scatter_plot
  pdf(file="./figs/fitness_scatter_plot.pdf", height = 5.5, width = 7)
  fitness_scatter_plot
  dev.off()
   
   
  pdf(file="./figs/double_scatter_crisprexp.pdf", height = 8, width = 6)
  plot_grid(scat_plot, fitness_scatter_plot, ncol=1)
  dev.off()
  
   
  head(dat2)
   ft_scatter_plot<- ggplot(dat2, aes(y=ft, x=adj_dc13, fill=group, color=group)) +
   geom_point(pch=21, size=3, stroke=1) +
    xlim(-1.03,-0.89)+
    theme(panel.grid.major = element_line(color = "gray90"),  # Light major gridlines
        panel.grid.minor = element_blank()) +
    scale_fill_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D")) +
    scale_color_manual(values = c("wt"= "#4E9D79",  "mut"= "#CE6E2D")) +
    geom_smooth(data = subset(dat2, group == "wt"), aes(y=ft, x=adj_dc13,group = group), method = "lm", se = T, color = "#4E9D79", linetype = "solid", size = 1, alpha=0.2, inherit.aes = F) + 
   geom_smooth(data = subset(dat2, group == "mut"), aes(y=ft, x=adj_dc13,group = group), method = "lm", se = T, color = "#CE6E2D", linetype = "solid", size = 1,alpha=0.2, inherit.aes = F) 
   # geom_errorbar(data = dat2, 
   #              aes(x = adj_dc13, ymin = ft - ft_sd, ymax = ft + ft_sd, color=group), 
   #              width = 0,
   #              inherit.aes = FALSE) 
   
   ft_scatter_plot
   
   dat2 <- dat
     dat2$group <- factor(dat2$group, levels = c("wt", "mut"))
  fitness_hist <- ggplot(dat2, aes(y = norm_fit, x=group, fill=group, alpha=0.8)) +
   geom_violin(trim = F, width = 1, adjust = 3) +  # Adjusting parameters for smoother violins
   geom_jitter(width = 0.2, alpha = 0.6) +
   scale_fill_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) +
   scale_color_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) 
 fitness_hist
  
  pdf(file="./figs/fitness_hist_plot.pdf", height = 4.5, width = 3)
 fitness_hist
 dev.off()
 
   
   pdf(file="./figs/ftVwue_scatter_crisprexp.pdf", height = 5.5, width = 7)
   ft_scatter_plot
   dev.off()
  
   pdf(file="./figs/4scatterplot_flc_plots.pdf", height = 9, width = 10.5)
   plot_grid(ft_wue_fitnessAsPoints, fitness_scatter_plot, ft_scatter_plot, scat_plot)
   dev.off()
  
####============================================###
  
 # Uli data
 flcKo_dC13 <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/deltaC_extended_group_rep3.csv", sep=",", header=T)
 head(flcKo_dC13)

 mutant_uli <- flcKo_dC13 %>% filter(startsWith(.[[1]], "ID"))
 mutant_uli$group <- "mutant"
 mutant_uli$group[mutant_uli$dag_mean<=18] <- "mt18"
 wt_uli <-  flcKo_dC13 %>% filter(startsWith(.[[1]], "WT"))
 wt_uli$group <- "wt"
 wt_uli$group[wt_uli$dag_mean>102] <- "wt100"
 wt_uli$border <- wt_uli$color
 col_uli <- flcKo_dC13 %>% filter(line=="COL")
 col_uli$group <- "col"
 
 
 colors <- c("col" = "black","wt"= "#4E9D79", "wt100"= "#2DCECE", "mutant" = "#CE6E2D",  "mt18"= "#E5C71A")
 borders <- c("col" = "black", "wt"= "#4E9D79", "wt100"= "#4E9D79", "mutant" = "#CE6E2D", "mt18"=  "#CE6E2D" )
 
 dat <- rbind(mutant_uli, wt_uli, col_uli)
 
 # ft_uli <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/lines_info_annot.csv", sep=",", header=T) %>% select(mut_line_ID, FT16_1001)
 # dat2 <- merge(dat, by.x="line", ft_uli, by.y="mut_line_ID")
 # 
 unique(dat$group)

 ##scater plot
 scat_plot <-ggplot(dat, aes(x=deltaC_mean_mean, y=dag_mean, fill=group, color=group)) +
   geom_point(pch=21, size=3, stroke=1)+
  scale_fill_manual(values = colors) +
   scale_color_manual(values = borders) +
   xlim(-32, -29) + ylim(0,150) +
   theme(panel.grid.major = element_line(color = "gray90"),  # Light major gridlines
        panel.grid.minor = element_blank()) + # No minor gridlines 
 geom_smooth(data = subset(dat, group %in% c("wt", "col")), aes(x=deltaC_mean_mean, y=dag_mean),na.rm=T, method = "lm", se = T, color = "#4E9D79", linetype = "solid", size = 1, alpha=0.2, inherit.aes = F)+
   geom_smooth(data = subset(dat, group %in% c("mutant", "mt18")), aes(x=deltaC_mean_mean, y=dag_mean),na.rm=T, method = "lm", se = T, color = "#CE6E2D", linetype = "solid", size = 1, alpha=0.2, inherit.aes = F) +
   geom_errorbar(data = dat, 
                aes(x = deltaC_mean_mean, ymin = dag_mean - dag_std, ymax = dag_mean + dag_std, color=group), 
                width = 0, 
                inherit.aes = FALSE) +
    geom_errorbar(data = dat, 
                aes(y = dag_mean, xmin = deltaC_mean_mean - deltaC_mean_std, xmax = deltaC_mean_mean + deltaC_mean_std, color=group), 
                width = 0, 
                inherit.aes = FALSE)
 scat_plot
 
 pdf(file="./figs/Uli-exp-dc13_ft_scatter_plot.pdf", height = 5.5, width = 7)
 scat_plot
 dev.off()
 
 dat$label
 dat_subwt <- subset(dat, group %in% c("wt", "col", "wt100"))
 dat_subwt$id <- "W"
 dat_subwt$adj_dc13 <-  dat_subwt$deltaC_mean_mean/abs(-37.21)
 
 dat_submt <- subset(dat, group %in% c("mutant", "mt18"))
 dat_submt$id <- "M"
 dat_submt$adj_dc13 <-  dat_submt$deltaC_mean_mean/abs(-31.14)
 
 dat_subwt <- as.data.frame(dat_subwt)
 dat_submt <- as.data.frame(dat_submt)
 
  model_wt <- lm(dag_mean ~ deltaC_mean_mean, data = dat_subwt)
  model_mt <- lm(dag_mean ~ deltaC_mean_mean , data = dat_submt)

  summary(model_wt)
  summary(model_mt)
  
  combined_model<- lm(dag_mean ~ deltaC_mean_mean * id, data = rbind(dat_subwt, dat_submt))
  interaction_test <- summary(combined_model)

  anova_test <- anova(combined_model)
  anova_test

  get_slope <- function(data) {
  model <- lm(dag_mean ~ deltaC_mean_mean, data = data)
  return(coef(model)[2])  # Extract the coefficient of deltaC_mean_mean (slope)
}

# Perform bootstrap resampling for each group
set.seed(123)  # For reproducibility
n_bootstraps <- 100  # Number of bootstrap samples
slopes_group1 <- replicate(n_bootstraps, get_slope(dat_subwt[sample(x=seq(1, nrow(dat_subwt)), size =50, replace = TRUE),]))
slopes_group2 <- replicate(n_bootstraps, get_slope(dat_submt[sample(x=seq(1, nrow(dat_submt)), size =50, replace = TRUE),]))

# Calculate the difference in slopes
slope_diff <- slopes_group1 - slopes_group2

obs <- 10.44-4.5
hist(slope_diff) + abline(v=obs, col="red")
  
  # Calculate coefficients of variation
cv_wt <- sd(dat_subwt$dag_mean) / mean(dat_subwt$dag_mean)
cv_mt <- sd(dat_submt$dag_mean) / mean(dat_submt$dag_mean)

# Perform t-test
t_test_result <- t.test(cv_wt, cv_mt)
  

 cor.test(dat_subwt$dag_mean, dat_subwt$deltaC_mean_mean)
 cor.test(dat_subwt$dag_mean, dat_subwt$adj_dc13)
 
 
 cor.test(dat_submt$dag_mean, dat_submt$adj_dc13)

 dat3 <- dat %>% filter(line!="COL")
 
 dat3$group <- factor(dat3$group , levels = c("mut", "wt"))
wue_hist <- ggplot(dat3, aes(x = deltaC_mean_mean, y=label, fill=label, alpha=0.8)) +
   geom_violin(trim = F, width = 0.5, adjust = 3) +  # Adjusting parameters for smoother violins
   geom_jitter(width = 0.2, alpha = 0.6)+
   scale_fill_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) +
   scale_color_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) 
 wue_hist
 
  pdf(file="./figs/Uli-exp-dc13_hist_plot.pdf", height = 2, width = 4.5)
 wue_hist
 dev.off()
 
  dat3$label <- factor(dat3$label, levels = c("wt", "mut"))
ft_hist <- ggplot(dat3, aes(y = dag_mean, x=label, fill=label, alpha=0.8)) +
   geom_violin(trim = F, width = 1.5, adjust = 3) +  # Adjusting parameters for smoother violins
   geom_jitter(width = 0.2, alpha = 0.6) +
   scale_fill_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) +
   scale_color_manual(values = c("wt"= "#4E9D79", "mut" = "#CE6E2D")) 
 ft_hist
  
  pdf(file="./figs/Uli-exp-ft_hist_plot.pdf", height = 4.5, width = 3)
 ft_hist
 dev.off()
 
 
 ## load imputed mutant dC13 data
 imp_dC13 <- readRDS(file="./data/imputed_mutant_dC13.rda")
 dim(imp_dC13)

 
```

#### KO delta_C13 data

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=8, fig.cap="Delta_C13 correlations across mutants and wt from Uli, and raw and imputed Dittberner data."}
RERUN=F
if(RERUN){

  flc_droughtexp_data <- readRDS(file="./data/flc_droughtexp_data.rda")
  dim(flc_droughtexp_data)
  hist(flc_droughtexp_data$days_to_flowering)

  ## Load Uli's data
  #flcKo_dC13 <-  read.table(file="./data/deltaC_extended_rep3.csv", sep=",", header=T)
  flcKo_dC13 <-  read.table(file="./Data_code_Ruffley_Lutz_et_al/deltaC_extended_group_rep3.csv", sep=",", header=T)
  head(flcKo_dC13)
  dim(flcKo_dC13)
  
  extract_numbers <- function(text) {
  # Extract numbers using regular expression
  numbers <- regmatches(text, regexpr("ID(\\d{4})-", text))
  # Convert to numeric and return
  as.numeric(sub("ID", "", sub("-", "", numbers)))
}
  flcKo_dC13_mut$accession <- c(extract_numbers(flcKo_dC13_mut$line), 7706)


  flcKo_dC13_col <- flcKo_dC13 %>% filter(label=="COL")

  flcKo_dC13_mut <- flcKo_dC13 %>% filter(label=="mut")
  
  flcKo_dC13_mut$accession <- strsplit(flcKo_dC13_mut$line, split = "-")
  
  mean_dC13_ko <- c()
  k<-4807
  for (k in unique(flcKo_dC13_mut$accession)){
    tmp <- flcKo_dC13_mut[flcKo_dC13_mut$accession==k, ]
    #print(mean(tmp$deltaC_mean_mean, na.rm=T))
    mean_dC13_ko <- c(mean_dC13_ko, mean(tmp$deltaC_mean_mean, na.rm=T))
  }
  dC13_mut <- data.frame(id = unique(flcKo_dC13_mut$accession),
                         mean_dC13_ko)
  dC13_mut$id <- gsub("ID", "", dC13_mut$id)
  dC13_mut$id[29] <- 7701  ## i THINK this one is 7701, but need to double check

  dC13_mut


  flcKo_dC13_wt <-flcKo_dC13 %>% filter(type=="WT")
  mean_dC13_wt <- c()
  for (k in unique(flcKo_dC13_wt$accession)){
    tmp <- flcKo_dC13_wt[flcKo_dC13_wt$accession==k, ]
    mean_dC13_wt <- c(mean_dC13_wt, mean(tmp$deltaC_mean))
  }
  dC13_wt <- data.frame(id = unique(flcKo_dC13_wt$accession),
                         mean_dC13_wt)
  dC13_wt$id <- gsub("ID", "", dC13_wt$id)
  dC13_wt

  ## how do Uli's ko and wt correlate?
  uli_dC13 <- merge(dC13_mut, dC13_wt, by="id")
  Uli_dC13_corplot <- ggplot(uli_dC13) + geom_point(aes(x=mean_dC13_wt, y=mean_dC13_ko)) +
    xlab("dC13 WT") + ylab("dC13 mut") +
    geom_smooth(method="lm", aes(x=mean_dC13_wt, y=mean_dC13_ko), col="black")+
    geom_text(x=-30.4, y=-30, label="r2 = 0.77, p < 0.0002")
  Uli_dC13_corplot
  cor.test(uli_dC13$mean_dC13_ko, uli_dC13$mean_dC13_wt)

  ## how do Uli mutants correlate with raw dittberner?
  dittb_dC13_raw <- read.table(file="./data/atlas1001_phenotypes_matrix_MR.csv",
                           header = T, sep=",")
  dittb_dC13_raw <- dittb_dC13_raw %>% select(id, Delta_13C)
  head(dittb_dC13_raw)
  dittb_dC13_raw %>%  filter(id==6909)

  dittb_raw_Uli_mut <- merge(dittb_dC13_raw, dC13_mut, by="id")
  ditRaw_Uli_dC13_mut_corplot <- ggplot(dittb_raw_Uli_mut) +
    geom_point(aes(x=Delta_13C, y=mean_dC13_ko)) +
    xlab("dC13 Dittberner raw") + ylab("dC13 flc mut") +
    geom_smooth(method="lm", aes(x=Delta_13C, y=mean_dC13_ko), col="black")+
    geom_text(x=-35, y=-29.5, label="r2 = 0.89, p < 0.007")
  ditRaw_Uli_dC13_mut_corplot
  cor.test(dittb_raw_Uli_mut$Delta_13C, dittb_raw_Uli_mut$mean_dC13_ko)

  ## how do Uli wt correlate with raw dittberner?
  dittb_raw_Uli_wt <- merge(dittb_dC13_raw, dC13_wt, by="id")
  ditRaw_Uli_dC13_wt_corplot <- ggplot(dittb_raw_Uli_wt) +
    geom_point(aes(x=Delta_13C, y=mean_dC13_wt)) +
    xlab("dC13 Dittberner raw") + ylab("dC13 WT") +
    geom_smooth(method="lm", aes(x=Delta_13C, y=mean_dC13_wt), col="black")+
    geom_text(x=-34, y=-29, label="r2 = 0.69, p < 0.2")
  ditRaw_Uli_dC13_wt_corplot
  cor.test(dittb_raw_Uli_wt$Delta_13C, dittb_raw_Uli_wt$mean_dC13_wt)


  ## how do Uli mutants correlate with imputed data
  dittb_dC13_imp <- read.table(file="./data/atlas1001_phenotype_matrix_imputed_withID.csv",
                           header = T, sep=",")
  head(dittb_dC13_imp)
  dittb_dC13_imp <- dittb_dC13_imp %>% select(id, Delta_13C)
  dittb_imp_Uli_mut <- merge(dittb_dC13_imp, dC13_mut, by="id")

  dittb_imp_Uli_mut_corplot <- ggplot(dittb_imp_Uli_mut) +
    geom_point(aes(x=Delta_13C, y=mean_dC13_ko)) +
    xlab("dC13 Dittberner imputed") + ylab("dC13 flc mutant") +
    geom_smooth(method="lm", aes(x=Delta_13C, y=mean_dC13_ko), col="black")+
    geom_text(x=-35, y=-29.5, label="r2 = 0.71, p < 1.5e-5")
  dittb_imp_Uli_mut_corplot
  cor.test(dittb_imp_Uli_mut$Delta_13C, dittb_imp_Uli_mut$mean_dC13_ko)

  
  plot_grid(ditRaw_Uli_dC13_mut_corplot, dittb_imp_Uli_mut_corplot)
  
  dC13_corplots <- plot_grid(Uli_dC13_corplot, ditRaw_Uli_dC13_mut_corplot,
                             ditRaw_Uli_dC13_wt_corplot, dittb_imp_Uli_mut_corplot,
                             nrow=2, ncol=2)
  saveRDS(dC13_corplots, file="./figs/tmpobjects/mut_wt_dC13_corplots.rda")
dC13_corplots_1 <- readRDS(file="./figs/tmpobjects/mut_wt_dC13_corplots.rda")

  dim(dittb_dC13_raw)
  dittb_dC13_raw <- na.omit(dittb_dC13_raw)

  dittb_raw_Uli_mut <- merge(dittb_dC13_raw, dC13_mut, by="id", all.x=T, all.y=T)

  dittb_raw_Uli_mut <- merge(dittb_raw_Uli_mut, flc_droughtexp_data, by="id",all.x=T, all.y=T)
  head(dittb_raw_Uli_mut)
  dim(dittb_raw_Uli_mut)
  dittb_raw_Uli_mut<-dittb_raw_Uli_mut[,-7]

  ## Impute missing mutant dC13 data using raw dittberner data and mutant ft and days to wilting
  require(missForest)
  ## with seed weight
  dimp_withSeed <- missForest(dittb_raw_Uli_mut[,-1], variablewise=T)
  id <- dittb_raw_Uli_mut[,1]
  imp_dC13_withSeeds <- cbind(id, dimp_withSeed$ximp)

  ## without seed weight
  dimp_woSeed <- missForest(dittb_raw_Uli_mut[,-c(1,6)], variablewise=T)
  id <- dittb_raw_Uli_mut[,1]
  imp_dC13 <- cbind(id, dimp_woSeed$ximp)

  cor.test(imp_dC13$Delta_13C.x, imp_dC13$mean_dC13_ko)
  ## correlation is 0.93, p<2.2-16
  imputed_dC13_plot<- ggplot(imp_dC13) + geom_point(aes(x=Delta_13C.x, y=mean_dC13_ko)) +
    ylab("dC13 mut imputed") + xlab("dC13 dittberner") +
    geom_smooth(method="lm", aes(x=Delta_13C.x, y=mean_dC13_ko), col="black") +
    geom_text(x=-35.8, y=-29.5, label="r2 = 0.87, p < 2.2-16")

  saveRDS(imp_dC13_withSeeds, file="./data/imputed_mutant_dC13_withSeeds.rda")
  saveRDS(imp_dC13, file="./data/imputed_mutant_dC13.rda")

  dC13_corplots <- plot_grid(Uli_dC13_corplot, ditRaw_Uli_dC13_mut_corplot,
                             ditRaw_Uli_dC13_wt_corplot, dittb_imp_Uli_mut_corplot,
                             imputed_dC13_plot, nrow=3, ncol=2)
  saveRDS(dC13_corplots, file="./figs/tmpobjects/mut_wt_dC13_corplots.rda")


}else{
  dC13_corplots <- readRDS(file="./figs/tmpobjects/mut_wt_dC13_corplots.rda")
  dC13_corplots
  library(ggplot2)
library(scales)
  
}

```

#### GWA with knock-out dC13 data

```{r, echo=F, eval=T, message=F, warning=F, fig.width=8, fig.height=8, fig.cap="Delta_C13 correlations across mutants and wt from Uli, and raw and imputed Dittberner data."}
RERUN=F
if(RERUN){
  imp_dC13 <- readRDS(file="./data/imputed_mutant_dC13.rda")
  head(imp_dC13)
  dim(imp_dC13)

  ###imp_dC13_withSeeds <- readRDS( file="./data/imputed_mutant_dC13_withSeeds.rda")

  ## load lines used in drought exp. to filter to them only
  flc_droughtexp_data <- readRDS(file="./data/flc_droughtexp_data.rda")
  head(flc_droughtexp_data)

  imp_dC13 <- merge(flc_droughtexp_data, by.x="id", imp_dC13, by.y="id")
  imp_dC13 <- imp_dC13[,c(1,4, 7,8)]
  head(imp_dC13)

  fam <- read.table(file="./phenotypes/Meaux_Dittberner_MolEcol_2018_PID_30118161/1001/Delta_13C/1001gbi.fam")
  head(fam)
  new_fam <- merge(fam, by.x="V1", imp_dC13, by.y="id", all.x=T)
  head(new_fam)

  new_fam <- new_fam[,c(1,2,3,4,5,7,8,9)]
  new_fam$rSeedWeight[is.na(new_fam$rSeedWeight)] <- -9
  new_fam$mean_dC13_ko[is.na(new_fam$mean_dC13_ko)] <- -9
  new_fam$days_to_flowering.y[is.na(new_fam$days_to_flowering.y)] <- -9
  head(new_fam)
  write.table(new_fam, file="./FLC_koData/1001gbi.fam", quote = F, col.names = F, row.names = F)

  #system(paste('ln -f ../1001g/1001gbi.bim ', paste0('./FLC_koData/','1001gbi.bim')))
  #system(paste('ln -f ../1001g/1001gbi.bed ', paste0('./FLC_koData/', '1001gbi.bed')))
  #system(paste('ln -f ../1001g/1001gbi.sXX.txt ', paste0('./FLC_koData/','1001gbi.sXX.txt')))

  ## change output file and -n to change phenotype - either mutatnt dC13 or mutant seed weights
  setwd("~/safedata/natvar/")
  write.table(quote=F,row.names=F,col.names=F,
                   file="./FLC_koData/rungwa.sh",
                   x=rbind(
                     "#!/bin/bash",
                     "#SBATCH --cpus-per-task=2",
                     "#SBATCH --mem-per-cpu=4G",
                     "#SBATCH --partition=DPB",
                     "#SBATCH --job-name=rSeedWeight",
                     "#SBATCH --output=rSeedWeight.slurm.log",
                     "./gemma -bfile 1001gbi -miss 0.1 -maf 0.05 -r2 1 -k 1001gbi.sXX.txt -lmm 4 -n 1 -o rSeedWeight")
                   )
  setwd("./FLC_koData/")
  system("sbatch rungwa.sh")

  ### also run gwa using imputed Seeds weights to get estimates of selection on alleles, just change some of the code above a little

  ##########################################################
  ### GWAS results plotting
  ##########################################################
  library(devtools)
  install_github("drveera/ggman")
  library(ggman)
  setwd("~/safedata/natvar/")
  tmp1<-data.table::fread(file="./FLC_koData/output/mutdC13_ft.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score)
  #tmp1<-data.table::fread(file="./FLC_koData/output/mutdC13_ft.assoc.txt") %>% select(chr,ps, rs, af, beta_1, beta_2, p_score)
  head(tmp1)

  ### plot the alleles by the color of selection

  ## SEED WEIGHT
  seedWeight <- data.table::fread(file="./FLC_koData/output/rSeedWeight.assoc.txt") %>% select(chr,ps, rs,af, beta, se, p_score)
  head(seedWeight)

  tmp1_seedWeight <- merge(tmp1, seedWeight, by="rs")
  head(tmp1_seedWeight)
  tmp1_seedWeight$col_fit <- rep(1, nrow(tmp1_seedWeight))
  tmp1_seedWeight$col_fit[tmp1_seedWeight$beta <= 0] <- 0

  thresh <- -log10(0.05/nrow(tmp1_seedWeight))
  # nrow(tmp1_seedWeight)- 820884
  # #thresh <- -log10(0.05/11379)
  # thresh <- -log10(0.05/15510)

  dim(tmp1_seedWeight)
  summary(tmp1_seedWeight$se)
  #tmp1_seedWeight <- tmp1_seedWeight %>% filter(se < 0.003)
  tmp1_seedWeight$chr_col <- rep(2, nrow(tmp1_seedWeight))
  tmp1_seedWeight$chr_col[tmp1_seedWeight$chr.x==2|tmp1_seedWeight$chr.x==4] <- 3
  tmp1_seedWeight$chr_col[-log10(tmp1_seedWeight$p_score.x)>5.8] <-    tmp1_seedWeight$col_fit[-log10(tmp1_seedWeight$p_score.x)>5.8]

  head(tmp1_seedWeight)
 tmp2 <- tmp1_seedWeight[tmp1_seedWeight$col_fit==0,] %>% filter(chr.x==1)
  head(tmp2)
  tmp1_seedWeight$log <- -log10(tmp1_seedWeight$p_score.x)
  tmp1_seedWeight$log2 <- -log10(tmp1_seedWeight$p_score.y)
  tmp1_seedWeight %>% filter(tmp1_seedWeight$log> 6)
  
  
  ## colored plots, how to change colors??
  dC13mut_ft_SeedWeight_GWAplot <- ggman(tmp1_seedWeight, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +
    geom_point(aes(color=as.factor(chr_col))) + ylim(2, 8) +
     #scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026")) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd"))
  dC13mut_ft_SeedWeight_GWAplot

  pdf(file="./figs/dC13mut_ft_SeedWeight_GWAplot.pdf", width=9, height=3)
  dC13mut_ft_SeedWeight_GWAplot
  dev.off()


   ## points only plot
   dC13mut_ft_SeedWeight_GWAplot_onlypoints <- ggman(tmp1_seedWeight, snp = "rs", bp = "ps.x", chrom = "chr.x", pvalue = "p_score.x", relative.positions = T, sigLine = thresh, title="") +
    geom_point(aes(color=as.factor(chr_col), size=as.factor(chr_col))) + ylim(2, 8) +
     #scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026")) +
     scale_color_manual(values=c("1"="#78c679", "0" = "#bd0026", "2"="#636363", "3"="#bdbdbd")) +
        scale_size_manual(values=c("1"=2, "0" = 2, "2"=0.75, "3"=0.75))+
       theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank()) +
      theme_void() +
      theme(legend.position = "none")
   dC13mut_ft_SeedWeight_GWAplot_onlypoints

  png(file="./figs/dC13mut_ft_SeedWeight_GWAplot_onlypoints.png", units="in",res=2400, width = 9, height = 3, bg="transparent")
  dC13mut_ft_SeedWeight_GWAplot_onlypoints
  dev.off()


  #saveRDS(dC13mut_SeedWeight_GWAplot, file="./figs/tmpobjects/dC13mut_SeedWeight_GWAplot.rda")

  saveRDS(dC13mut_ft_SeedWeight_GWAplot, file="./figs/tmpobjects/dC13mut_ft_SeedWeight_GWAplot.rda")

  pdf(file="./figs/dC13mut_ft_SeedWeight_GWAplot.pdf", width=9, height=3)
  dC13mut_ft_SeedWeight_GWAplot
  dev.off()

  getwd()
  png(file="./figs/dC13mut_ft_SeedWeight_GWAplot_thinned.png", units="in",res=72, width = 12, height = 6, bg="transparent")
  dC13mut_ft_SeedWeight_GWAplot
  dev.off()

  pdf(file="./figs/dC13mut_ft_SeedWeight_GWAplot.rda.pdf")
  dC13mut_ft_SeedWeight_GWAplot
  dev.off()

  getwd()
  png(file="./figs/dC13mut_ft_SeedWeight_GWAplot.rda.png", units="in",res=72, width = 12, height = 6, bg="transparent")
  dC13mut_ft_SeedWeight_GWAplot
  dev.off()


}else{
 dC13mut_SeedWeight_GWAplot <- readRDS(file="./figs/tmpobjects/dC13mut_SeedWeight_GWAplot.rda")
  dC13mut_SeedWeight_GWAplot
}
```