diff --git a/benchmarking.qmd b/benchmarking.qmd index 8c2cc629..3edd467f 100644 --- a/benchmarking.qmd +++ b/benchmarking.qmd @@ -776,13 +776,13 @@ There are several approaches that can be taken to improve data quality. These me - Data Cleaning: This involves handling missing values, correcting errors, and removing outliers. Clean data ensures that the model is not learning from noise or inaccuracies. -- Data Interpretability and Explainability: Common techniques include [[LIME]{.underline}](https://arxiv.org/abs/1602.04938) which provides insight into the decision boundaries of classifiers, and [[Shapley values]{.underline}](https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf) which estimate the importance of individual samples in contributing to a model's predictions. +- Data Interpretability and Explainability: Common techniques include LIME [@ribeiro2016should] which provides insight into the decision boundaries of classifiers, and Shapley values [@lundberg2017unified] which estimate the importance of individual samples in contributing to a model's predictions. - Feature Engineering: Transforming or creating new features can significantly improve model performance by providing more relevant information for learning. - Data Augmentation: Augmenting data by creating new samples through various transformations can help improve model robustness and generalization. -- Active Learning: This is a semi-supervised learning approach where the model actively queries a human oracle to label the most informative samples [[[Coleman et al, 2020]{.underline}](https://arxiv.org/abs/2007.00077)]. This ensures that the model is trained on the most relevant data. +- Active Learning: This is a semi-supervised learning approach where the model actively queries a human oracle to label the most informative samples [@coleman2022similarity]. This ensures that the model is trained on the most relevant data. - Dimensionality Reduction: Techniques like PCA can be used to reduce the number of features in a dataset, thereby reducing complexity and training time. diff --git a/references.bib b/references.bib index 249608f9..e422bf64 100644 --- a/references.bib +++ b/references.bib @@ -540,4 +540,36 @@ @article{xu2023demystifying author={Xu, Hu and Xie, Saining and Tan, Xiaoqing Ellen and Huang, Po-Yao and Howes, Russell and Sharma, Vasu and Li, Shang-Wen and Ghosh, Gargi and Zettlemoyer, Luke and Feichtenhofer, Christoph}, journal={arXiv preprint arXiv:2309.16671}, year={2023} +} +@inproceedings{coleman2022similarity, + title={Similarity search for efficient active learning and search of rare concepts}, + author={Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki}, + booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, + volume={36}, + number={6}, + pages={6402--6410}, + year={2022} +} +@inproceedings{ribeiro2016should, + title={" Why should i trust you?" Explaining the predictions of any classifier}, + author={Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos}, + booktitle={Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, + pages={1135--1144}, + year={2016} +} +@article{lundberg2017unified, + title={A unified approach to interpreting model predictions}, + author={Lundberg, Scott M and Lee, Su-In}, + journal={Advances in neural information processing systems}, + volume={30}, + year={2017} +} +@inproceedings{coleman2022similarity, + title={Similarity search for efficient active learning and search of rare concepts}, + author={Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki}, + booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, + volume={36}, + number={6}, + pages={6402--6410}, + year={2022} } \ No newline at end of file