generated from IslasGECI/seleccion_analista_2022
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 8cb3951
Showing
25 changed files
with
3,258 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
name: Tests | ||
on: push | ||
jobs: | ||
actions: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Copia repositorio | ||
uses: actions/checkout@v2 | ||
- name: Construye imagen | ||
run: docker build --tag islasgeci . | ||
- name: Make submissions | ||
run: docker run --rm --volume ${PWD}:/workdir islasgeci make submissions | ||
- name: Evaluate a directory | ||
run: docker run --rm --volume ${PWD}/pollos_petrel:/submissions islasgeci/common_task_framework:latest geci-ctf evaluate examples/pollos_petrel/complete_dataset.csv /submissions --directory | ||
- name: Verifica el formato | ||
run: docker run islasgeci make check | ||
- name: Corre pruebas y evalúa cobertura | ||
run: docker run islasgeci make coverage | ||
- name: Evalúa resistencia a mutaciones | ||
run: docker run islasgeci make mutants | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
Package: SeleccionAnalista2022 | ||
Title: Selección Analista 2022 | ||
Authors@R: c( | ||
person(given = "Ciencia de Datos", | ||
family = "GECI", | ||
role = c("aut", "cre", "cph"), | ||
email = "[email protected]" | ||
)) | ||
Config/testthat/edition: 3 | ||
Description: Selección de estudiante para realizar proyecto de Ciencia de Datos en GECI | ||
Encoding: UTF-8 | ||
Imports: | ||
data.table, | ||
MASS, | ||
R6, | ||
tidyverse | ||
LazyData: true | ||
License: GPL-3 | ||
Roxygen: list(markdown = TRUE) | ||
RoxygenNote: 7.2.3 | ||
Suggests: | ||
devtools, | ||
roxygen2, | ||
testthat | ||
Type: Package | ||
Version: 0.1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
FROM islasgeci/base:1.0.0 | ||
COPY . /workdir | ||
RUN apt update && apt install --yes \ | ||
gnuplot | ||
RUN pip install --upgrade pip && pip install \ | ||
black \ | ||
codecov \ | ||
flake8 \ | ||
mutmut \ | ||
mypy \ | ||
pylint \ | ||
pytest \ | ||
pytest-cov \ | ||
scikit-learn \ | ||
tensorflow | ||
RUN Rscript -e "install.packages(c('covr', 'devtools', 'DT', 'lintr', 'roxygen2', 'styler', 'testthat', 'vdiffr'), repos='http://cran.rstudio.com')" |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
submissions: \ | ||
pollos_petrel/example_python_submission.csv \ | ||
pollos_petrel/example_r_submission.csv | ||
|
||
pollos_petrel/example_python_submission.csv: setup_python src/example_submission.py | ||
@echo "Creating Python submission..." | ||
src/example_submission.py | ||
|
||
pollos_petrel/example_r_submission.csv: setup_r src/example_submission.R | ||
@echo "Creating R submission..." | ||
src/example_submission.R | ||
|
||
module = pollos_petrel | ||
|
||
define lint | ||
pylint \ | ||
--disable=bad-continuation \ | ||
--disable=missing-class-docstring \ | ||
--disable=missing-function-docstring \ | ||
--disable=missing-module-docstring \ | ||
${1} | ||
endef | ||
|
||
.PHONY: \ | ||
check \ | ||
clean \ | ||
coverage \ | ||
coverage_python \ | ||
coverage_r \ | ||
format \ | ||
init \ | ||
init_python \ | ||
init_r \ | ||
install_python \ | ||
install_r \ | ||
linter \ | ||
mutants \ | ||
mutants_python \ | ||
mutants_r \ | ||
setup \ | ||
setup_python \ | ||
setup_r \ | ||
submissions \ | ||
tests \ | ||
tests_python \ | ||
tests_r | ||
|
||
|
||
check: | ||
R -e "library(styler)" \ | ||
-e "resumen <- style_dir('R')" \ | ||
-e "resumen <- rbind(resumen, style_dir('src'))" \ | ||
-e "resumen <- rbind(resumen, style_dir('tests'))" \ | ||
-e "any(resumen[[2]])" \ | ||
| grep FALSE | ||
black --check --line-length 100 ${module} | ||
black --check --line-length 100 src | ||
black --check --line-length 100 tests | ||
flake8 --max-line-length 100 ${module} | ||
flake8 --max-line-length 100 src | ||
flake8 --max-line-length 100 tests | ||
mypy ${module} | ||
mypy src | ||
mypy tests | ||
|
||
clean: | ||
rm --force --recursive ${module}.egg-info | ||
rm --force --recursive ${module}/__pycache__ | ||
rm --force --recursive .*_cache | ||
rm --force --recursive SeleccionAnalista2022.Rcheck | ||
rm --force --recursive tests/__pycache__ | ||
rm --force --recursive tests/testthat/_snaps | ||
rm --force .mutmut-cache | ||
rm --force NAMESPACE | ||
rm --force SeleccionAnalista2022_*.tar.gz | ||
rm --force coverage.xml | ||
rm --force pollos_petrel/example_*_submission.csv | ||
|
||
coverage: coverage_python coverage_r | ||
|
||
coverage_python: setup_python | ||
pytest --cov=${module} --cov-report=term-missing --verbose | ||
|
||
coverage_r: setup_r | ||
Rscript tests/testthat/coverage.R | ||
|
||
format: | ||
black --line-length 100 ${module} | ||
black --line-length 100 src | ||
black --line-length 100 tests | ||
R -e "library(styler)" \ | ||
-e "style_dir('R')" \ | ||
-e "style_dir('src')" \ | ||
-e "style_dir('tests')" | ||
|
||
init: | ||
@echo "⛔ Please use 'make init_python' or 'make init_r' instead ⛔" | ||
|
||
init_python: setup_python tests_python | ||
|
||
init_r: setup_r tests_r | ||
|
||
install_python: | ||
pip install --editable . | ||
|
||
install_r: | ||
R -e "devtools::document()" && \ | ||
R CMD build . && \ | ||
R CMD check SeleccionAnalista2022_0.1.0.tar.gz && \ | ||
R CMD INSTALL SeleccionAnalista2022_0.1.0.tar.gz | ||
|
||
|
||
linter: | ||
$(call lint, ${module}) | ||
$(call lint, tests) | ||
|
||
mutants: mutants_python mutants_r | ||
|
||
mutants_python: setup_python tests_python | ||
mutmut run --paths-to-mutate ${module} | ||
mutmut run --paths-to-mutate src | ||
|
||
mutants_r: setup_r tests_r | ||
@echo "🙁🏹 No mutation testing on R 👾🎉👾" | ||
|
||
setup: setup_python setup_r | ||
|
||
setup_python: clean install_python | ||
|
||
setup_r: clean install_r | ||
|
||
tests: tests_python tests_r | ||
|
||
tests_python: | ||
pytest --verbose | ||
|
||
tests_r: | ||
Rscript -e "devtools::test(stop_on_failure = TRUE)" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
return_one <- function() { | ||
return(1) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
library(tidyverse) | ||
|
||
read_training_dataset <- function() { | ||
training_dataset_path <- "/workdir/data/raw/train.csv" | ||
training_dataset <- read_csv(training_dataset_path) | ||
return(training_dataset) | ||
} | ||
|
||
|
||
get_target_mean <- function(dataset) { | ||
mean_target <- mean(dataset$target) | ||
return(mean_target) | ||
} | ||
|
||
|
||
read_testing_dataset <- function() { | ||
testing_dataset_path <- "/workdir/data/raw/test.csv" | ||
testing_dataset <- read_csv(testing_dataset_path) | ||
return(testing_dataset) | ||
} | ||
|
||
|
||
drop_all_but_id <- function(dataset) { | ||
dataset_only_id <- dataset %>% select("id") | ||
return(dataset_only_id) | ||
} | ||
|
||
|
||
add_mean_as_target <- function() { | ||
training_dataset <- read_training_dataset() | ||
target_mean <- get_target_mean(training_dataset) | ||
testing_dataset <- read_testing_dataset() | ||
submission <- drop_all_but_id(testing_dataset) %>% | ||
mutate("target" = target_mean) | ||
return(submission) | ||
} | ||
|
||
|
||
#' @export | ||
write_submission <- function() { | ||
submission_path <- "pollos_petrel/example_r_submission.csv" | ||
submission <- add_mean_as_target() | ||
write_csv(submission, submission_path) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
<img src="https://www.islas.org.mx/img/logo.svg" align="right" width="256" /> | ||
|
||
# Examen de selección 2022 para Analista de Datos en GECI | ||
|
||
- [Ver convocatoria](https://www.facebook.com/IslasGECI/posts/3250808525199345) | ||
|
||
## Predicción de la edad de pollos de petrel negro a partir de su morfometría | ||
|
||
Debes estimar la edad (en días) de un conjunto de pollos de petrel negro a partir de su morfometría. | ||
Someterás tu respuesta como una tabla de dos columnas: la primera columna es el identificador del | ||
pollo y la segunda columna es la edad estimada. Esperamos ver avances graduales en ciclos cortos. | ||
Nos gustaría que el examen lo resuelvas con muchos _pull requests_ y que cada _pull request_ tenga | ||
un avance muy pequeño (menos de 100 líneas). Por lo que te proponemos: | ||
|
||
1. Crea un | ||
[_fork_](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#fork-an-example-repository) | ||
de este repositorio | ||
1. Cubre tu código con pruebas | ||
1. Haz pasar GitHub Actions | ||
1. Haz múltiples _pull requests_ pequeños (menos de 100 líneas cada uno) | ||
1. Usa GitHub (_issues_ y _pull requests_) como el medio de comunicación principal | ||
|
||
Por favor no esperes a terminar el examen para someter tu primer _pull request_. Recuerda que no | ||
podrás crear ningún _pull request_ hasta que tengas tu | ||
[_fork_](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#fork-an-example-repository). | ||
|
||
## Rúbrica | ||
|
||
El objetivo de este examen de selección es evaluar las habilidades para el trabajo colaborativo a | ||
distancia. Para eso usaremos los siguientes rubros: | ||
|
||
- **Capacidad para el trabajo colaborativo a distancia**: | ||
- [ ] Uso de Git: Los mensajes son informativos del porqué, las consignaciones son pequeñas y los | ||
nombres de las ramas dan información del objetivo de los cambios | ||
- [ ] Habilidades de comunicación mediante GitHub (_issues_ y _pull requests_): La comunicación es | ||
amable, la descripción es clara y da formato utilizando _Markdown_ | ||
- [ ] Solicitud de revisiones: Utilización de [las | ||
características](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/requesting-a-pull-request-review) | ||
de GitHub para indicar que terminó una corrección y que está solicitando una nueva revisión | ||
- [ ] Asimilación de retroalimentación: Las correcciones solicitadas en un _pull request_ ya no se | ||
repiten en los siguientes _pull requests_ | ||
|
||
- **Buenas prácticas en programación**: | ||
- [ ] Código limpio | ||
- [ ] Pruebas unitarias | ||
- [ ] Refactorización | ||
|
||
## Sugerencias | ||
|
||
- Estudia estas referencias: | ||
- [Guía de estilo de Ciencia de Datos en GECI](https://islas.dev/guia_de_estilo/) | ||
- [How to Make Your Code Reviewer Fall in Love with You](https://mtlynch.io/code-review-love/) | ||
- [The pull request author’s guide to getting through code review](https://google.github.io/eng-practices/review/developer/) | ||
- Crea _pull requests_ pequeños; un _pull request_ de 100 líneas es demasiado grande. | ||
- Se amable, explica el porqué de las cosas, respeta nuestro [código de | ||
conducta](https://www.contributor-covenant.org/es/version/2/0/code_of_conduct/), usa lenguaje simple y claro. | ||
- Comunícate mucho y hazlo mediante GitHub. | ||
|
||
|
||
## Instrucciones | ||
|
||
1. Ajusta un modelo con el archivo `train.csv` | ||
1. Evalúa el modelo ajustado en `test.csv` | ||
1. Guarda la respuesta de tu modelo en `<TU_NOMBRE>_submission.csv` | ||
|
||
> Reemplaza `<TU_NOMBRE>` con tu nombre. | ||
## Configuración | ||
|
||
Guarda tu respuesta `<TU_NOMBRE>_submission.csv` en la carpeta `pollos_petrel/`. En el Makefile de | ||
este repo, agrega al _phony_ **submissions** la ruta completa de tu respuesta: | ||
`pollos_petrel/<TU_NOMBRE>_submission.csv` | ||
|
||
El _phony_ **submissions** debería verse así: | ||
|
||
``` | ||
submissions: \ | ||
pollos_petrel/example_python_submission.csv \ | ||
pollos_petrel/example_r_submission.csv \ | ||
pollos_petrel/<OTRO_NOMBRE>_submission.csv \ | ||
pollos_petrel/<TU_NOMBRE>_submission.csv | ||
``` | ||
|
||
> Nota las diagonales invertidas `\` al final de cada línea, excepto en la última | ||
Agrega al Makefile como objetivo tu respuesta `pollos_petrel/<TU_NOMBRE>_submission.csv`. Esta tabla | ||
debe tener dos columnas: **id** y **target**. Ve el ejemplo: `pollos_petrel/example_submission.csv`. | ||
|
||
Ejemplo: | ||
|
||
id | target | ||
--------------|-------- | ||
2013-09-16-H9 | 0.83 | ||
2015-09-02-B5 | 0.94 | ||
2017-09-09-A9 | 0.50 | ||
|
||
## Reglas | ||
|
||
- El comando `make pollos_petrel/<TU_NOMBRE>_submission.csv` debe reproducir tu respuesta | ||
(`pollos_petrel/<TU_NOMBRE>_submission.csv`) a partir de los datos `test.csv`. Todo el código debe | ||
correr dentro del contenedor. No se vale consignar la respuesta o números mágicos. Sólo puedes | ||
consignar código. | ||
- Este es un examen individual. Sólo le puedes pedir ayuda a las y los miembros de Ciencia de Datos | ||
en GECI. Si no puedes resolver el examen entonces no puedes cumplir con las responsabilidades del | ||
puesto ofertado. No copies. | ||
|
||
## Descripción de las tablas | ||
En el directorio `pollos_petrel/` puedes encontrar tres archivos CSV. | ||
|
||
- Usa el archivo `train.csv` para ajustar tu modelo (entrenar tu algoritmo). | ||
- Usa el archivo `test.csv` para evaluar tu modelo ajustado. | ||
- Usa el archivo `example_submission.csv` como ejemplo de respuesta. | ||
|
||
## Descripción de los campos de las tablas | ||
- En todas las tablas, la primera columna se llama **id** y contiene un identificador único para | ||
cada registro. | ||
- En las tablas `train.csv` y `example_submission.csv`, la última columna se llama **target** y | ||
contiene la edad (en días) de los pollos. Esta columna representa la _respuesta_. | ||
- En las tablas `test.csv` y `train.csv`, el resto de las columnas ( **Masa**, **Longitud_tarso**, | ||
..., **Longitud_pluma_exterior_de_la_cola**) son las variables _predictivas_. | ||
|
||
## Resultados | ||
|
||
Te recomendamos que sometas al menos dos modelos. El mejor modelo es el que obtenga el menor error | ||
absoluto medio ([MAE](https://en.wikipedia.org/wiki/Mean_absolute_error)). Puedes ver los resultados | ||
de tu modelo en GitHub Actions en la sección _Evaluate a directory_. | ||
|
||
## Referencias | ||
|
||
- [Guía de estilo de Ciencia de Datos en GECI](https://islas.dev/guia_de_estilo/) | ||
- [How to Make Your Code Reviewer Fall in Love with You](https://mtlynch.io/code-review-love/) | ||
- [Revisiones en GitHub](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/requesting-a-pull-request-review) | ||
- [The pull request author’s guide to getting through code review](https://google.github.io/eng-practices/review/developer/) | ||
- [_Forkeado_ de un repositorio](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) | ||
|
||
--- | ||
|
Oops, something went wrong.