diff --git a/images/ai_ops/cicd_pipelines.png b/images/ai_ops/cicd_pipelines.png new file mode 100644 index 00000000..b46e9692 Binary files /dev/null and b/images/ai_ops/cicd_pipelines.png differ diff --git a/images/ai_ops/clinaiops.png b/images/ai_ops/clinaiops.png new file mode 100644 index 00000000..2435d865 Binary files /dev/null and b/images/ai_ops/clinaiops.png differ diff --git a/images/ai_ops/clinaiops_loops.png b/images/ai_ops/clinaiops_loops.png new file mode 100644 index 00000000..81fa0268 Binary files /dev/null and b/images/ai_ops/clinaiops_loops.png differ diff --git a/images/ai_ops/data_cascades.png b/images/ai_ops/data_cascades.png new file mode 100644 index 00000000..ebe73a77 Binary files /dev/null and b/images/ai_ops/data_cascades.png differ diff --git a/images/ai_ops/edge_impulse_dashboard.png b/images/ai_ops/edge_impulse_dashboard.png new file mode 100644 index 00000000..2fe4765c Binary files /dev/null and b/images/ai_ops/edge_impulse_dashboard.png differ diff --git a/images/ai_ops/hidden_debt.png b/images/ai_ops/hidden_debt.png new file mode 100644 index 00000000..2cacb742 Binary files /dev/null and b/images/ai_ops/hidden_debt.png differ diff --git a/images/ai_ops/mlops_flow.png b/images/ai_ops/mlops_flow.png new file mode 100644 index 00000000..cb36c90d Binary files /dev/null and b/images/ai_ops/mlops_flow.png differ diff --git a/images/ai_ops/transfer_learning.png b/images/ai_ops/transfer_learning.png new file mode 100644 index 00000000..3f61b1cc Binary files /dev/null and b/images/ai_ops/transfer_learning.png differ diff --git a/ops.qmd b/ops.qmd index 3c2266d8..8ee71467 100644 --- a/ops.qmd +++ b/ops.qmd @@ -2,84 +2,812 @@ ![_DALL·E 3 Prompt: Create a detailed, wide rectangular illustration of an AI workflow. The image should showcase the process across six stages, with a flow from left to right: 1. Data collection, with diverse individuals of different genders and descents using a variety of devices like laptops, smartphones, and sensors to gather data. 2. Data processing, displaying a data center with active servers and databases with glowing lights. 3. Model training, represented by a computer screen with code, neural network diagrams, and progress indicators. 4. Model evaluation, featuring people examining data analytics on large monitors. 5. Deployment, where the AI is integrated into robotics, mobile apps, and industrial equipment. 6. Monitoring, showing professionals tracking AI performance metrics on dashboards to check for accuracy and concept drift over time. Each stage should be distinctly marked and the style should be clean, sleek, and modern with a dynamic and informative color scheme._](./images/cover_ml_ops.png) +This chapter explores the practices and architectures needed to effectively develop, deploy, and manage ML models across their entire lifecycle. We examine the various phases of the ML process including data collection, model training, evaluation, deployment, and monitoring. The importance of automation, collaboration, and continuous improvement is also discussed. We contrast different environments for ML model deployment, from cloud servers to embedded edge devices, and analyze their distinct constraints. Through concrete examples, we demonstrate how to tailor ML system design and operations for reliable and optimized model performance in any target environment. The goal is to provide readers with a comprehensive understanding of ML model management so they can successfully build and run ML applications that sustainably deliver value. + ::: {.callout-tip} ## Learning Objectives -* coming soon. +* Understand what is MLOps and why it is needed + +* Learn the architectural patterns for traditional MLOps + +* Contrast traditional vs. embedded MLOps across the ML lifecycle + +* Identify key constraints of embedded environments + +* Learn strategies to mitigate embedded ML challenges + +* Examine real-world case studies demonstrating embedded MLOps principles + +* Appreciate the need for holistic technical and human approaches ::: ## Introduction -Explanation: This subsection sets the groundwork for the discussions to follow, elucidating the fundamental concept of MLOps and its critical role in enhancing the efficiency, reliability, and scalability of embedded AI systems. It outlines the unique characteristics of implementing MLOps in an embedded context, emphasizing its significance in the streamlined deployment and management of machine learning models. +Machine Learning Operations (MLOps), is a systematic approach that combines machine learning (ML), data science, and software engineering to automate the end-to-end ML lifecycle. This includes everything from data preparation and model training to deployment and maintenance. MLOps ensures that ML models are developed, deployed, and maintained efficiently and effectively. + +Let's start by taking a general example (i.e., non-edge ML) case. Consider a ridesharing company that wants to deploy a machine-learning model to predict rider demand in real time. The data science team spends months developing a model, but when it's time to deploy, they realize it needs to be compatible with the engineering team's production environment. Deploying the model requires rebuilding it from scratch - costing weeks of additional work. This is where MLOps comes in. + +With MLOps, there are protocols and tools in place to ensure that the model developed by the data science team can be seamlessly deployed and integrated into the production environment. In essence, MLOps removes friction during the development, deployment, and maintenance of ML systems. It improves collaboration between teams through defined workflows and interfaces. MLOps also accelerates iteration speed by enabling continuous delivery for ML models. + +For the ridesharing company, implementing MLOps means their demand prediction model can be frequently retrained and deployed based on new incoming data. This keeps the model accurate despite changing rider behavior. MLOps also allows the company to experiment with new modeling techniques since models can be quickly tested and updated. + +Other MLOps benefits include enhanced model lineage tracking, reproducibility, and auditing. Cataloging ML workflows and standardizing artifacts - such as logging model versions, tracking data lineage, and packaging models and parameters - enables deeper insight into model provenance. Standardizing these artifacts facilitates tracing a model back to its origins, replicating the model development process, and examining how a model version has changed over time. This also facilitates regulation compliance, which is especially critical in regulated industries like healthcare and finance where being able to audit and explain models is important. + +Major organizations adopt MLOps to boost productivity, increase collaboration, and accelerate ML outcomes. It provides the frameworks, tools, and best practices to manage ML systems throughout their lifecycle effectively. This results in better-performing models, faster time-to-value, and sustained competitive advantage. As we explore MLOps further, consider how implementing these practices can help address embedded ML challenges today and in the future. + +## Historical Context + +MLOps has its roots in DevOps, which is a set of practices that combines software development (Dev) and IT operations (Ops) to shorten the development lifecycle and provide continuous delivery of high-quality software. The parallels between MLOps and DevOps are evident in their focus on automation, collaboration, and continuous improvement. In both cases, the goal is to break down silos between different teams (developers, operations, and, in the case of MLOps, data scientists and ML engineers) and to create a more streamlined and efficient process. It is useful to understand the history of this evolution to better understand MLOps in the context of traditional systems. + +### DevOps + +The term "DevOps" was first coined in 2009 by [Patrick Debois](https://www.jedi.be/), a consultant and Agile practitioner. Debois organized the first [DevOpsDays](https://www.devopsdays.org/) conference in Ghent, Belgium, in 2009, which brought together development and operations professionals to discuss ways to improve collaboration and automate processes. + +DevOps has its roots in the [Agile](https://agilemanifesto.org/) movement, which began in the early 2000s. Agile provided the foundation for a more collaborative approach to software development and emphasized small, iterative releases. However, Agile primarily focused on collaboration between development teams. As Agile methodologies became more popular, organizations realized the need to extend this collaboration to operations teams as well. + +The siloed nature of development and operations teams often led to inefficiencies, conflicts, and delays in software delivery. This need for better collaboration and integration between these teams led to the [DevOps](https://www.atlassian.com/devops) movement. In a sense, DevOps can be seen as an extension of the Agile principles to include operations teams. + +The key principles of DevOps include collaboration, automation, continuous integration and delivery, and feedback. DevOps focuses on automating the entire software delivery pipeline, from development to deployment. It aims to improve the collaboration between development and operations teams, utilizing tools like [Jenkins](https://www.jenkins.io/), [Docker](https://www.docker.com/), and [Kubernetes](https://kubernetes.io/) to streamline the development lifecycle. + +While Agile and DevOps share common principles around collaboration and feedback, DevOps specifically targets the integration of development and IT operations - expanding Agile beyond just development teams. It introduces practices and tools to automate software delivery and enhance the speed and quality of software releases. + +### MLOps + +[MLOps](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning), on the other hand, stands for MLOps, and it extends the principles of DevOps to the ML lifecycle. MLOps aims to automate and streamline the end-to-end ML lifecycle, from data preparation and model development to deployment and monitoring. The main focus of MLOps is to facilitate collaboration between data scientists, data engineers, and IT operations, and to automate the deployment, monitoring, and management of ML models. Some key factors led to the rise of MLOps. + +* **Data drift:** Data drift degrades model performance over time, motivating the need for rigorous monitoring and automated retraining procedures provided by MLOps. +* **Reproducibility:** The lack of reproducibility in machine learning experiments motivated the need for MLOps systems to track code, data, and environment variables to enable reproducible ML workflows. +* **Explainability:** The black box nature and lack of explainability of complex models motivated the need for MLOps capabilities to increase model transparency and explainability. +* **Monitoring:** The inability to reliably monitor model performance post-deployment highlighted the need for MLOps solutions with robust model performance instrumentation and alerting. +* **Friction:** The friction in manually retraining and deploying models motivated the need for MLOps systems that automate machine learning deployment pipelines. +* **Optimization:** The complexity of configuring infrastructure for machine learning motivated the need for MLOps platforms with optimized, ready-made ML infrastructure. + +While both DevOps and MLOps share the common goal of automating and streamlining processes, they differ in their focus and challenges. DevOps primarily deals with the challenges of software development and IT operations. In contrast, MLOps deals with the additional complexities of managing ML models, such as [data versioning](https://dvc.org/), [model versioning](https://dvc.org/), and [model monitoring](https://www.fiddler.ai/). MLOps also requires collaboration between various stakeholders, including data scientists, data engineers, and IT operations. + +While DevOps and MLOps share similarities in their goals and principles, they differ in their focus and challenges. DevOps focuses on improving the collaboration between development and operations teams and automating software delivery. In contrast, MLOps focuses on streamlining and automating the ML lifecycle and facilitating collaboration between data scientists, data engineers, and IT operations. + +Here is a table that summarizes them side by side. + +| Aspect | DevOps | MLOps | +|----------------------|----------------------------------|--------------------------------------| +| **Objective** | Streamlining software development and operations processes | Optimizing the lifecycle of machine learning models | +| **Methodology** | Continuous Integration and Continuous Delivery (CI/CD) for software development | Similar to CI/CD but focuses on machine learning workflows | +| **Primary Tools** | Version control (Git), CI/CD tools (Jenkins, Travis CI), Configuration management (Ansible, Puppet) | Data versioning tools, Model training and deployment tools, CI/CD pipelines tailored for ML | +| **Primary Concerns** | Code integration, Testing, Release management, Automation, Infrastructure as code | Data management, Model versioning, Experiment tracking, Model deployment, Scalability of ML workflows | +| **Typical Outcomes** | Faster and more reliable software releases, Improved collaboration between development and operations teams | Efficient management and deployment of machine learning models, Enhanced collaboration between data scientists and engineers | + +## Key Components of MLOps + +In this chapter, we will provide an overview of the core components of MLOps, an emerging set of practices that enables robust delivery and lifecycle management of ML models in production. While some MLOps elements like automation and monitoring were covered in previous chapters, we will integrate them into an integrated framework and expand on additional capabilities like governance. Additionally, we will describe and link to popular tools used within each component, such as [LabelStudio](https://labelstud.io/) for data labeling. By the end, we hope that you will understand the end-to-end MLOps methodology that takes models from ideation to sustainable value creation within organizations. + +### Data Management + +Robust data management and data engineering actively empower successful [MLOps](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) implementations. Teams properly ingest, store, and prepare raw data from sensors, databases, apps, and other systems for model training and deployment. + +Teams actively track changes to datasets over time using version control with [Git](https://git-scm.com/) and tools like [GitHub](https://github.com/) or [GitLab](https://about.gitlab.com/). Data scientists collaborate on curating datasets by merging changes from multiple contributors. Teams can review or roll back each iteration of a dataset if needed. + +Teams meticulously label and annotate data using labeling software like [LabelStudio](https://labelstud.io/), which enables distributed teams to work on tagging datasets together. As the target variables and labeling conventions evolve, teams maintain accessibility to earlier versions. + +Teams store the raw dataset and all derived assets on cloud storage services like [Amazon S3](https://aws.amazon.com/s3/) or [Google Cloud Storage](https://cloud.google.com/storage) which provide scalable, resilient storage with versioning capabilities. Teams can set granular access permissions. + +Robust data pipelines created by teams automate the extraction, joining, cleansing and transformation of raw data into analysis-ready datasets. [Prefect](https://www.prefect.io/), [Apache Airflow](https://airflow.apache.org/), [dbt](https://www.getdbt.com/) are workflow orchestrators that allow engineers to develop flexible, reusable data processing pipelines. + +For instance, a pipeline may ingest data from [PostgreSQL](https://www.postgresql.org/) databases, REST APIs, and CSVs stored on S3. It can filter, deduplicate, and aggregate the data, handle errors, and save the output to S3. The pipeline can also push the transformed data into a feature store like [Tecton](https://www.tecton.ai/) or [Feast](https://feast.dev/) for low-latency access. + +In an industrial predictive maintenance use case, sensor data is ingested from devices into S3. A Prefect pipeline processes the sensor data, joining it with maintenance records. The enriched dataset is stored in Feast so models can easily retrieve the latest data for training and predictions. + +### CI/CD Pipelines + +![Figure 14.1: This diagram illustrates a CI/CD pipeline specifically tailored for MLOps. The process starts with a dataset and feature repository, which feeds into a dataset ingestion stage. Post-ingestion, the data undergoes validation to ensure its quality before being transformed for training. Parallel to this, a retraining trigger can initiate the pipeline based on specified criteria. The data then passes through a model training/tuning phase within a data processing engine, followed by model evaluation and validation. Once validated, the model is registered and stored in a machine learning metadata and artifact repository. The final stage involves deploying the trained model back into the dataset and feature repository, thereby creating a cyclical process for continuous improvement and deployment of machine learning models.](images/ai_ops/cicd_pipelines.png) + +Continuous integration and continuous delivery (CI/CD) pipelines actively automate the progression of ML models from initial development into production deployment. Adapted for ML systems, CI/CD principles empower teams to rapidly and robustly deliver new models with minimized manual errors. + +CI/CD pipelines (see Figure 14.1) orchestrate key steps, including checking out new code changes, transforming data, training and registering new models, validation testing, containerization, deploying to environments like staging clusters, and promoting to production. Teams leverage popular CI/CD solutions like [Jenkins](https://www.jenkins.io/), [CircleCI](https://circleci.com/) and [GitHub Actions](https://github.com/features/actions) to execute these MLOps pipelines, while [Prefect](https://www.prefect.io/), [Metaflow](https://metaflow.org/) and [Kubeflow](https://www.kubeflow.org/) offer ML-focused options. + +For example, when a data scientist checks improvements to an image classification model into a [GitHub](https://github.com/) repository, this actively triggers a Jenkins CI/CD pipeline. The pipeline reruns data transformations and model training on the latest data, tracking experiments with [MLflow](https://mlflow.org/). After automated validation testing, teams deploy the model container to a [Kubernetes](https://kubernetes.io/) staging cluster for further QA. Once approved, Jenkins facilitates a phased rollout of the model to production with [canary deployments](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments) to catch any issues. If anomalies are detected, the pipeline enables teams to roll back to the previous model version gracefully. + +By connecting the disparate steps from development to deployment under continuous automation, CI/CD pipelines empower teams to iterate and deliver ML models rapidly. Integrating MLOps tools like MLflow enhances model packaging, versioning, and pipeline traceability. CI/CD is integral for progressing models beyond prototypes into sustainable business systems. + +### Model Training + +In the model training phase, data scientists actively experiment with different ML architectures and algorithms to create optimized models that effectively extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible. + +Modern ML frameworks like [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/) and [Keras](https://keras.io/) provide pre-built components that simplify designing neural networks and other model architectures. Data scientists leverage built-in modules for layers, activations, losses, etc. and high-level APIs like Keras to focus more on model architecture. + +MLOps enables teams to package model training code into reusable, tracked scripts and notebooks. As models are developed, capabilities like [hyperparameter tuning](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview), [neural architecture search](https://arxiv.org/abs/1808.05377) and [automatic feature selection](https://scikit-learn.org/stable/modules/feature_selection.html) rapidly iterate to find the best-performing configurations. + +Teams put training code under version control using Git and host it in repositories like GitHub to track changes over time. This allows seamless collaboration between data scientists. + +Notebooks like [Jupyter](https://jupyter.org/) make an excellent environment for interactive model development. The notebooks contain data ingestion, preprocessing, model declaration, training loop, evaluation, and export code in one reproducible document. + +Finally, teams orchestrate model training as part of a CI/CD pipeline for automation. For instance, a Jenkins pipeline can trigger a Python script to load new training data, retrain a TensorFlow classifier, evaluate model metrics, and automatically register the model if performance thresholds are met. + +An example workflow has a data scientist using a PyTorch notebook to develop a CNN model for image classification. The [fastai](https://www.fast.ai/) library provides high-level APIs to simplify training CNNs on image datasets. The notebook trains the model on sample data, evaluates accuracy metrics, and tunes hyperparameters like learning rate and layers to optimize performance. This reproducible notebook is version-controlled and integrated into a retraining pipeline. + +Automating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed for production of ML systems. + +### Model Evaluation + +Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing and [canary testing](https://martinfowler.com/bliki/CanaryRelease.html). + +Teams typically evaluate models against holdout [test datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) not used during training. The test data originates from the same distribution as production data. Teams calculate metrics like [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), [precision](https://en.wikipedia.org/wiki/Precision_and_recall), [recall](https://en.wikipedia.org/wiki/Precision_and_recall), and [F1 score](https://en.wikipedia.org/wiki/F1_score). + +Teams also track the same metrics over time against test data samples. If evaluation data comes from live production streams, this catches [data drifts](https://www.ibm.com/cloud/learn/data-drift) over time that degrade model performance. + +Human oversight for model release remains important. Data scientists review performance across key segments and slices. Error analysis helps identify model weaknesses to guide enhancement. Teams apply [fairness](https://developers.google.com/machine-learning/fairness-overview) and [bias detection](https://developers.google.com/machine-learning/fairness-overview) techniques. + +Canary testing releases a model to a small subset of users to evaluate real-world performance before wide deployment. Teams incrementally route traffic to the canary release while monitoring for issues. + +For example, a retailer evaluates a personalized product recommendation model against historical test data, reviewing accuracy and diversity metrics. Teams also calculate metrics on live customer data over time, detecting decreased accuracy over the last 2 weeks. Before full rollout, the new model is released to 5% of web traffic to ensure no degradation. + +Automating evaluation and canary releases reduces deployment risks. But human review remains critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production. + +### Model Deployment + +To reliably deploy ML models to production, teams need to properly package, test and track them. MLOps introduces frameworks and procedures to actively version, deploy, monitor and update models in sustainable ways. + +Teams containerize models using [Docker](https://www.docker.com/) which bundles code, libraries and dependencies into a standardized unit. Containers enable smooth portability across environments. + +Frameworks like [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and [BentoML](https://bentoml.org/) help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling and monitoring. + +Teams first deploy updated models to staging or QA environments for testing before full production rollout. Shadow or canary deployments route a sample of traffic to test model variants. Teams incrementally increase access to new models. + +Teams build robust rollback procedures in case issues emerge. Rollbacks revert to the last known good model version. Integration with CI/CD pipelines simplifies redeployment if needed. + +Teams carefully track model artifacts like scripts, weights, logs and metrics for each version with ML metadata tools like [MLflow](https://mlflow.org/). This maintains lineage and auditability. + +For example, a retailer containerizes a product recommendation model in TensorFlow Serving and deploys it to a [Kubernetes](https://kubernetes.io/) staging cluster. After monitoring and approving performance on sample traffic, Kubernetes shifts 10% of production traffic to the new model. If no issues are detected after a few days, the new model takes over 100% of traffic. But teams keep the previous version accessible for rollback if needed. + +Model deployment processes enable teams to make ML systems resilient in production by accounting for all transition states. + +### Infrastructure Management + +MLOps teams heavily leverage [infrastructure as code (IaC)](https://www.infoworld.com/article/3271126/what-is-iac-infrastructure-as-code-explained.html) tools and robust cloud architectures to actively manage the resources needed for development, training and deployment of ML systems. + +Teams use IaC tools like [Terraform](https://www.terraform.io/), [CloudFormation](https://aws.amazon.com/cloudformation/) and [Ansible](https://www.ansible.com/) to programmatically define, provision and update infrastructure in a version controlled manner. For MLOps, teams widely use Terraform to spin up resources on [AWS](https://aws.amazon.com/), [GCP](https://cloud.google.com/) and [Azure](https://azure.microsoft.com/). + +For model building and training, teams dynamically provision compute resources like GPU servers, container clusters, storage and databases through Terraform as needed by data scientists. Code encapsulates and preserves infrastructure definitions. + +Containers and orchestrators like Docker and Kubernetes provide means for teams to package models and reliably deploy them across different environments. Containers can be predictably spun up or down automatically based on demand. + +By leveraging cloud elasticity, teams scale resources up and down to meet spikes in workloads like hyperparameter tuning jobs or spikes in prediction requests. [Auto-scaling](https://aws.amazon.com/autoscaling/) enables optimized cost efficiency. + +Infrastructure spans on-prem, cloud and edge devices. A robust technology stack provides flexibility and resilience. Monitoring tools give teams observability into resource utilization. + +For example, a Terraform config may deploy a GCP Kubernetes cluster to host trained TensorFlow models exposed as prediction microservices. The cluster scales up pods to handle increased traffic. CI/CD integration seamlessly rolls out new model containers. + +Carefully managing infrastructure through IaC and monitoring enables teams to prevent bottlenecks in operationalizing ML systems at scale. + +### Monitoring + +MLOps teams actively maintain robust monitoring to sustain visibility into ML models deployed in production. Monitoring continuously provides insights into model and system performance so teams can rapidly detect and address issues to minimize disruption. + +Teams actively monitor key model aspects including analyzing samples of live predictions to track metrics like accuracy and [confusion matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html) over time. + +When monitoring performance, it is important for teams to profile incoming data to check for model drift - a steady decline in model accuracy over time after production deployment. Model drift can occur in one of two ways: [concept drift](https://en.wikipedia.org/wiki/Concept_drift) and data drift. Concept drift refers to a fundamental change observed in the relationship between the input data and the target outcomes. For instance, as the COVID-19 pandemic progressed e-commerce and retail sites had to correct their model recommendations, since purchase data was overwhelmingly skewed towards items like hand sanitizer. Data drift describes changes in the distribution of data over time. For example, image recognition algorithms used in self-driving cars will need to account for seasonality in observing their surroundings. Teams also track application performance metrics like latency and errors for model integrations. + +From an infrastructure perspective, teams monitor for capacity issues like high CPU, memory and disk utilization as well as system outages. Tools like [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/) and [Elastic](https://www.elastic.co/) enable teams to actively collect, analyze, query and visualize diverse monitoring metrics. Dashboards make dynamics highly visible. + +Teams configure alerting for key monitoring metrics like accuracy declines and system faults to enable proactively responding to events that threaten reliability. For example, drops in model accuracy trigger alerts for teams to investigate potential data drift and retrain models using updated, representative data samples. + +Comprehensive monitoring enables teams to maintain confidence in model and system health after deployment. It empowers teams to catch and resolve deviations through data-driven alerts and dashboards preemptively. Active monitoring is essential for maintaining highly available, trustworthy ML systems. + +### Governance + +MLOps teams actively establish proper governance practices as a critical component. Governance provides oversight into ML models to ensure they are trustworthy, ethical, and compliant. Without governance, significant risks exist of models behaving in dangerous or prohibited ways when deployed in applications and business processes. + +MLOps governance employs techniques to provide transparency into model predictions, performance, and behavior throughout the ML lifecycle. Explainability methods like [SHAP](https://github.com/slundberg/shap) and [LIME](https://github.com/marcotcr/lime) help auditors understand why models make certain predictions by highlighting influential input features behind decisions. [Bias detection](https://developers.google.com/machine-learning/fairness-overview) analyzes model performance across different demographic groups defined by attributes like age, gender and ethnicity to detect any systematic skews. Teams perform rigorous testing procedures on representative datasets to validate model performance before deployment. + +Once in production, teams monitor [concept drift](https://en.wikipedia.org/wiki/Concept_drift) to track if predictive relationships change over time in ways that degrade model accuracy. Teams analyze production logs to uncover patterns in the types of errors models generate. Documentation about data provenance, development procedures, and evaluation metrics provides additional visibility. + +Platforms like [Watson OpenScale](https://www.ibm.com/cloud/watson-openscale) incorporate governance capabilities like bias monitoring and explainability directly into model building, testing and production monitoring. The key focus areas of governance are transparency, fairness, and compliance. This minimizes risks of models behaving incorrectly or dangerously when integrated into business processes. Embedding governance practices into MLOps workflows enables teams to ensure trustworthy AI. + +### Communication & Collaboration + +MLOps actively breaks down silos and enables free flow of information and insights between teams through all ML lifecycle stages. Tools like [MLflow](https://mlflow.org/), [Weights & Biases](https://wandb.ai/), and data contexts provide traceability and visibility to improve collaboration. + +Teams use MLflow to systematize tracking of model experiments, versions, and artifacts. Experiments can be programmatically logged from data science notebooks and training jobs. The model registry provides a central hub for teams to store production-ready models before deployment, with metadata like descriptions, metrics, tags and lineage. Integrations with [Github](https://github.com/), [GitLab](https://about.gitlab.com/) facilitate code change triggers. + +Weights & Biases provides collaborative tools tailored to ML teams. Data scientists log experiments, visualize metrics like loss curves, and share experimentation insights with colleagues. Comparison dashboards highlight model differences. Teams discuss progress and next steps. + +Establishing shared data contexts - glossaries, [data dictionaries](https://en.wikipedia.org/wiki/Data_dictionary), schema references - ensures alignment on data meaning and usage across roles. Documentation aids understanding for those without direct data access. + +For example, a data scientist may use Weights & Biases to analyze an anomaly detection model experiment and share the evaluation results with other team members to discuss improvements. The final model can then be registered with MLflow before handing off for deployment. + +Enabling transparency, traceability and communication via MLOps empowers teams to remove bottlenecks and accelerate delivery of impactful ML systems. + + +## Hidden Technical Debt in ML Systems + +Technical debt is an increasingly pressing issue for ML systems (see Figure 14.2). This metaphor, originally proposed in the 1990s, likens the long-term costs of quick software development to financial debt. Just as some financial debt powers beneficial growth, carefully managed technical debt enables rapid iteration. However, left unchecked, accumulating technical debt can outweigh any gains. + +![Figure 14.2: The schematic represents various components that contribute to hidden technical debt in ML systems. It shows the interconnected nature of configuration, data collection, and feature extraction, which are foundational to the ML codebase. Data verification is highlighted as a critical step that precedes the utilization of machine resource management, analysis tools, and process management tools. These components, in turn, support the serving infrastructure required to deploy ML models. Finally, monitoring is depicted as an essential but often underemphasized component that operates alongside and provides feedback to the entire system, ensuring performance and reliability. [@sculley2015hidden]](images/ai_ops/hidden_debt.png) + +### Model Boundary Erosion +Unlike traditional software, ML lacks clear boundaries between components as seen in the diagram above. This erosion of abstraction creates entanglements that exacerbate technical debt in several ways: + +### Entanglement + +Tight coupling between ML model components makes isolating changes difficult. Modifying one part causes unpredictable ripple effects throughout the system. Changing anything changes everything (also known as CACE) is a phenomenon that applies to any tweak you make to your system. Potential mitigations include decomposing the problem when possible or closely monitoring for changes in behavior to contain their impact. + +### Correction Cascades + +![Figure 14.3: The flowchart depicts the concept of correction cascades in the ML workflow, from problem statement to model deployment. The arcs represent the potential iterative corrections needed at each stage of the workflow, with different colors corresponding to distinct issues such as interacting with physical world brittleness, inadequate application-domain expertise, conflicting reward systems, and poor cross-organizational documentation. The red arrows indicate the impact of cascades, which can lead to significant revisions in the model development process, while the dotted red line represents the drastic measure of abandoning the process to restart. This visual emphasizes the complex, interconnected nature of ML system development and the importance of addressing these issues early in the development cycle to mitigate their amplifying effects downstream. [@data_cascades]](images/ai_ops/data_cascades.png) + +Building models sequentially creates risky dependencies where later models rely on earlier ones. For example, taking an existing model and fine-tuning it for a new use case seems efficient. However, this bakes in assumptions from the original model that may eventually need correction. + +There are several factors that inform the decision to build models sequentially or not: + +* **Dataset size and rate of growth:** With small, static datasets, it often makes sense to fine-tune existing models. For large, growing datasets, training custom models from scratch allows more flexibility to account for new data. +* **Available computing resources:** Fine-tuning requires less resources than training large models from scratch. With limited resources, leveraging existing models may be the only feasible approach. + +While fine-tuning can be efficient, modifying foundational components later becomes extremely costly due to the cascading effects on subsequent models. Careful thought should be given to identifying points where introducing fresh model architectures, even with large resource requirements, can avoid correction cascades down the line (see Figure 14.3). There are still scenarios where sequential model building makes sense, so it entails weighing these tradeoffs around efficiency, flexibility, and technical debt. + +### Undeclared Consumers + +Once ML model predictions are made available, many downstream systems may silently consume them as inputs for further processing. However, the original model was not designed to accommodate this broad reuse. Due to the inherent opacity of ML systems, it becomes impossible to fully analyze the impact of the model's outputs as inputs elsewhere. Changes to the model can then have expensive and dangerous consequences by breaking undiscovered dependencies. + +Undeclared consumers can also enable hidden feedback loops if their outputs indirectly influence the original model's training data. Mitigations include restricting access to predictions, defining strict service contracts, and monitoring for signs of un-modelled influences. Architecting ML systems to encapsulate and isolate their effects limits the risks from unanticipated propagation. + +### Data Dependency Debt + +Data dependency debt refers to unstable and underutilized data dependencies which can have detrimental and hard to detect repercussions. While this is a key contributor to tech debt for traditional software, those systems can benefit from the use of widely available tools for static analysis by compilers and linkers to identify dependencies of these types. ML systems lack similar tooling. + +One mitigation for unstable data dependencies is to use versioning which ensures the stability of inputs but comes with the cost of managing multiple sets of data and the potential for staleness of the data. A mitigation for underutilized data dependencies is to conduct exhaustive leave-one-feature-out evaluation. + +### Analysis Debt from Feedback Loops + +Unlike traditional software, ML systems can change their own behavior over time, making it difficult to analyze pre-deployment. This debt manifests in feedback loops, both direct and hidden. + +Direct feedback loops occur when a model influences its own future inputs, such as by recommending products to users that in turn shape future training data. Hidden loops arise indirectly between models, such as two systems that interact via real-world environments. Gradual feedback loops are especially hard to detect. These loops lead to analysis debt – the inability to fully predict how a model will act after release. They undermine pre-deployment validation by enabling unmodeled self-influence. + +Careful monitoring and canary deployments help detect feedback. But fundamental challenges remain in understanding complex model interactions. Architectural choices that reduce entanglement and coupling mitigate analysis debt's compounding effect. + +### Pipeline Jungles + +ML workflows often lack standardized interfaces between components. This leads teams to incrementally "glue" together pipelines with custom code. What emerges are "pipeline jungles" – tangled preprocessing steps that are brittle and resist change. Avoiding modifications to these messy pipelines causes teams to experiment through alternate prototypes. Soon, multiple ways of doing everything proliferate. The lack of abstractions and interfaces then impedes sharing, reuse, and efficiency. + +Technical debt accumulates as one-off pipelines solidify into legacy constraints. Teams sink time into managing idiosyncratic code rather than maximizing model performance. Architectural principles like modularity and encapsulation are needed to establish clean interfaces. Shared abstractions enable interchangeable components, prevent lock-in, and promote best practice diffusion across teams. Breaking free of pipeline jungles ultimately requires enforcing standards that prevent accretion of abstraction debt. The benefits of interfaces and APIs that tame complexity outweigh the transitional costs. + +### Configuration Debt +ML systems involve extensive configuration of hyperparameters, architectures, and other tuning parameters. However, configuration is often an afterthought, lacking rigor and testing. Ad hoc configurations proliferate, amplified by the many knobs available for tuning complex ML models. + +This accumulation of technical debt has several consequences. Fragile and outdated configurations lead to hidden dependencies and bugs that cause production failures. Knowledge about optimal configurations is isolated rather than shared, leading to redundant work. Reproducing and comparing results becomes difficult when configuration lacks documentation. Legacy constraints accrete as teams fear changing poorly understood configurations. + +Addressing configuration debt requires establishing standards to document, test, validate, and centrally store configurations. Investing in more automated approaches such as hyperparameter optimization and architecture search reduces dependence on manual tuning. Better configuration hygiene makes iterative improvement more tractable by preventing complexity from compounding endlessly. The key is recognizing configuration as an integral part of the ML system lifecycle rather than an ad hoc afterthought. + +### The Changing World + +ML systems operate in dynamic real-world environments. Thresholds and decisions that are initially effective become outdated as the world evolves. But legacy constraints make it difficult to adapt systems to reflect changing populations, usage patterns, and other shifting contextual factors. + +This debt manifests in two main ways. First, preset thresholds and heuristics require constant re-evaluation and tuning as their optimal values drift. Second, validating systems through static unit and integration tests fails when inputs and behaviors are moving targets. + +Responding to a changing world in real-time with legacy ML systems is challenging. Technical debt accumulates as assumptions decay. The lack of modular architecture and ability to dynamically update components without side effects exacerbates these issues. + +Mitigating this requires building in configurability, monitoring, and modular updatability. Online learning where models continuously adapt, as well as robust feedback loops to training pipelines, help automatically tune to the world. But anticipating and architecting for change is essential to prevent erosion of real-world performance over time. + +### Navigating Technical Debt in Early Stages + +It is understandable that technical debt accumulates naturally in early stages of model development. When aiming to build MVP models quickly, teams often lack complete information on what components will reach scale or require modification. Some deferred work is expected. + +However, even scrappy initial systems should follow principles like "Flexible Foundations" to avoid painting themselves into corners: + +* Modular code and reusable libraries allow components to be swapped later +* Loose coupling between models, data stores, and business logic facilitates change +* Abstraction layers hide implementation details that may shift over time +* Containerized model serving keeps options open on deployment requirements + +Decisions that seem expedient in the moment can seriously limit future flexibility. For example, baking key business logic into model code rather than keeping it separate makes subsequent model changes extremely difficult. + +With thoughtful design, though, it is possible to build quickly at first while retaining degrees of freedom to improve. As the system matures, prudent break points emerge where introducing fresh architectures proactively avoids massive rework down the line. This balances urgent timelines with reducing future correction cascades. + +### Summary + +Although financial debt is a good metaphor to understand the tradeoffs, it differs from technical debt in its measurability. Technical debt lacks the ability to be fully tracked and quantified. This makes it hard for teams to navigate the tradeoffs between moving quickly and inherently introducing more debt versus taking the time to pay down that debt. + +The [Hidden Technical Debt of Machine Learning Systems](https://papers.nips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) paper spreads awareness of the nuances of ML system specific tech debt and encourages additional development in the broad area of maintainable ML. + +## Roles and Responsibilities + +Given the vastness of MLOps, successfully implementing ML systems requires diverse skills and close collaboration between people with different areas of expertise. While data scientists build the core ML models, it takes cross-functional teamwork to successfully deploy these models into production environments and enable them to deliver business value in a sustainable way. + +MLOps provides the framework and practices for coordinating the efforts of various roles involved in developing, deploying and running MLg systems. Bridging traditional silos between data, engineering and operations teams is key to MLOps success. Enabling seamless collaboration through the machine learning lifecycle accelerates benefit realization while ensuring long-term reliability and performance of ML models. + +We will look at some of the key roles involved in MLOps and their primary responsibilities. Understanding the breadth of skills needed to operationalize ML models provides guidance on assembling MLOps teams. It also clarifies how the workflows between different roles fit together under the overarching MLOps methodology. + +### Data Engineers + +Data engineers are responsible for building and maintaining the data infrastructure and pipelines that feed data to ML models. They ensure data is smoothly moved from source systems into the storage, processing, and feature engineering environments needed for ML model development and deployment. Their main responsibilities include: + +* Migrating raw data from on-prem databases, sensors, apps into cloud-based data lakes like Amazon S3 or Google Cloud Storage. This provides cost-efficient, scalable storage. +* Building data pipelines with workflow schedulers like Apache Airflow, Prefect, dbt. These extract data from sources, transform and validate data, and load it into destinations like data warehouses, feature stores or directly for model training. +* Transforming messy raw data into structured, analysis-ready datasets. This includes handling null or malformed values, deduplicating, joining disparate data sources, aggregating data and engineering new features. +* Maintaining data infrastructure components like cloud data warehouses ([Snowflake](https://www.snowflake.com/en/data-cloud/workloads/data-warehouse/), [Redshift](https://aws.amazon.com/redshift/), [BigQuery](https://cloud.google.com/bigquery?hl=en)), data lakes, and metadata management systems. Provisioning and optimizing data processing systems. +* Establishing data versioning, backup and archival processes for ML datasets and features. Enforcing data governance policies. + +For example, a manufacturing firm may use Apache Airflow pipelines to extract sensor data from PLCs on the factory floor into an Amazon S3 data lake. The data engineers would then process this raw data to filter, clean, and join it with product metadata. These pipeline outputs would then load into a Snowflake data warehouse from which features can be read for model training and prediction. + +The data engineering team builds and sustains the data foundation for reliable model development and operations. Their work enables data scientists and ML engineers to focus on building, training and deploying ML models at scale. + +### Data Scientists + +The job of the data scientists is to focus on the research, experimentation, development and continuous improvement of ML models. They leverage their expertise in statistics, modeling and algorithms to create high-performing models. Their main responsibilities include: + +* Working with business and data teams to identify opportunities where ML can add value. Framing the problem and defining success metrics. +* Performing exploratory data analysis to understand relationships in data and derive insights. Identifying relevant features for modeling. +* Researching and experimenting with different ML algorithms and model architectures based on the problem and data characteristics. Leveraging libraries like TensorFlow, PyTorch, Keras. +* Training and fine-tuning models by tuning hyperparameters, adjusting neural network architectures, feature engineering, etc. to maximize performance. +* Evaluating model performance through metrics like accuracy, AUC, F1 scores. Performing error analysis to identify areas for improvement. +* Developing new model versions by incorporating new data, testing different approaches, and optimizing model behavior. Maintaining documentation and lineage for models. + +For example, a data scientist may leverage TensorFlow and [TensorFlow Probability](https://www.tensorflow.org/probability) to develop a demand forecasting model for retail inventory planning. They would iterate on different sequence models like LSTMs and experiment with features derived from product, sales and seasonal data. The model would be evaluated based on error metrics versus actual demand before deployment. The data scientist monitors performance and retrains/enhances the model as new data comes in. + +Data scientists drive model creation, improvement and innovation through their expertise in ML techniques. They collaborate closely with other roles to ensure models create maximum business impact. + +### ML Engineers + +ML engineers enable models data scientists develop to be productized and deployed at scale. Their expertise makes models reliably serve predictions in applications and business processes. Their main responsibilities include: + +* Taking prototype models from data scientists and hardening them for production environments through coding best practices. +* Building APIs and microservices for model deployment using tools like [Flask](https://flask.palletsprojects.com/en/3.0.x/), [FastAPI](https://fastapi.tiangolo.com/). Containerizing models with Docker. +* Managing model versions and sinaging new models into production using CI/CD pipelines. Implementing canary releases, A/B tests, and rollback procedures. +* Optimizing model performance for high scalability, low latency and cost-efficiency. Leveraging compression, quantization, multi-model serving. +* Monitoring models once in production and ensuring continued reliability and accuracy. Retraining models periodically. + +For example, a ML engineer may take a TensorFlow fraud detection model developed by data scientists and containerize it using TensorFlow Serving for scalable deployment. The model would be integrated into the company's transaction processing pipeline via APIs. The ML engineer implements a model registry and CI/CD pipeline using MLFlow and Jenkins to reliably deploy model updates. The ML engineers would then monitor the running model for continued performance using tools like Prometheus and Grafana. If model accuracy drops, they initiate retraining and deployment of a new model version. + +The ML engineering team enables data science models to progress smoothly into sustainable and robust production systems. Their expertise in building modular, monitored systems delivers continuous business value. + +### DevOps Engineers + +DevOps engineers enable MLOps by building and managing the underlying infrastructure for developing, deploying, and monitoring ML models. They provide the cloud architecture and automation pipelines. Their main responsibilities include: + +* Provisioning and managing cloud infrastructure for ML workflows using IaC tools like Terraform, Docker, Kubernetes. +* Developing CI/CD pipelines for model retraining, validation, and deployment. Integrating ML tools into the pipeline like MLflow, Kubeflow. +* Monitoring model and infrastructure performance using tools like [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), [ELK stack](https://aws.amazon.com/what-is/elk-stack/). Building alerts and dashboards. +* Implementing governance practices around model development, testing, and promotion. Enabling reproducibility and traceability. +* Embedding ML models within applications. Exposing models via APIs and microservices for integration. +* Optimizing infrastructure performance and costs. Leveraging autoscaling, spot instances, and availability across regions. + +For example, a DevOps engineer provisions a Kubernetes cluster on AWS using Terraform to run ML training jobs and online deployment. They build a CI/CD pipeline in Jenkins which triggers model retraining if new data is available. After automated testing, the model is registered with MLflow and deployed in the Kubernetes cluster. The engineer then monitors cluster health, container resource usage, and API latency using Prometheus and Grafana. + +The DevOps team enables rapid experimentation and reliable deployments for ML through expertise in cloud, automation, and monitoring. Their work maximizes model impact while minimizing technical debt. + +### Project Managers + +Project managers play a vital role in MLOps by coordinating the activities between the different teams involved in delivering ML projects. They help drive alignment, accountability, and accelerated results. Their main responsibilities include: + +* Working with stakeholders to define project goals, success metrics, timelines and budgets. Outlining specifications and scope. +* Creating a project plan spanning activities like data acquisition, model development, infrastructure setup, deployment, and monitoring. +* Coordinating design, development and testing efforts between data engineers, data scientists, ML engineers and DevOps roles. +* Tracking progress and milestones. Identifying roadblocks and resolving through corrective actions. Managing risks and issues. +* Facilitating communication through status reports, meetings, workshops, documentation. Enabling seamless collaboration. +* Driving adherence to timelines and budget. Escalating anticipated overruns or shortfalls for mitigation. + +For example, a project manager would create a project plan for the development and ongoing enhancement of a customer churn prediction model. They coordinate between data engineers building data pipelines, data scientists experimenting with models, ML engineers productionalizing models, and DevOps setting up deployment infrastructure. The project manager tracks progress via milestones like dataset preparation, model prototyping, deployment, and monitoring. They surface any risks, delays or budget issues to enact preventive solutions. + +Skilled project managers enable MLOps teams to work synergistically to deliver maximum business value from ML investments rapidly. Their leadership and organization align with diverse teams. -- Overview of MLOps -- The importance of MLOps in the embedded domain -- Unique challenges and opportunities in embedded MLOps +## Embedded System Challenges -## Deployment Environments +We will briefly review the challenges with embedded systems so taht it sets the context for the specific challenges that emerge with embedded MLOps that we will discuss in the following section. -Explanation: This section focuses on different environments where embedded AI systems can be deployed. It will delve into aspects like edge devices, cloud platforms, and hybrid environments, offering insights into the unique characteristics and considerations of each. +### Limited Compute Resources -- Cloud-based deployment: Features and benefits -- Edge computing: Characteristics and applications -- Hybrid environments: Combining the best of edge and cloud computing -- Considerations for selecting an appropriate deployment environment +Embedded devices like microcontrollers and mobile phones have much more constrained compute power compared to data center machines or GPUs. A typical microcontroller may have only KB of RAM, MHz of CPU speed, and no GPU. For example, a microcontroller in a smartwatch may only have a 32-bit processor running at 50MHz with 256KB of RAM. This allows relatively simple ML models like small linear regressions or random forests, but more complex deep neural networks would be infeasible. Strategies to mitigate this include quantization, pruning, efficient model architectures, and offloading certain computations to the cloud when connectivity allows. -## Deployment Strategies +### Constrained Memory -Explanation: Here, readers will be introduced to various deployment strategies that facilitate a smooth transition from development to production. It discusses approaches such as blue-green deployments, canary releases, and rolling deployments, which can help in maintaining system stability and minimizing downtime during updates. +With limited memory, storing large ML models and datasets directly on embedded devices is often infeasible. For example, a deep neural network model can easily take hundreds of MB, which exceeds the storage capacity of many embedded systems. Consider this example. A wildlife camera that captures images to detect animals may have only a 2GB memory card. This is insufficient to store a deep learning model for image classification that is often hundreds of MB in size. Consequently, this requires optimization of memory usage through methods like weights compression, lower-precision numerics, and streaming inference pipelines. -- Overview of different deployment strategies -- Blue-green deployments: Definition and benefits -- Canary releases: Phased rollouts and monitoring -- Rolling deployments: Ensuring continuous service availability -- Strategy selection: Factors to consider +### Intermittent Connectivity -## Workflow Automation +Many embedded devices operate in remote environments without reliable internet connectivity. This means we cannot rely on constant cloud access for convenient retraining, monitoring, and deployment. Instead, we need smart scheduling and caching strategies to optimize for intermittent connections. For example, a model predicting crop yield on a remote farm may need to make predictions daily, but only have connectivity to the cloud once a week when the farmer drives into town. The model needs to operate independently in between connections. -Explanation: Automation is at the heart of MLOps, helping to streamline workflows and enhance efficiency. This subsection highlights the significance of workflow automation in embedded MLOps, discussing various strategies and techniques for automating tasks such as testing, deployment, and monitoring, fostering a faster and error-free development lifecycle. +### Power Limitations -- Automated testing: unit tests, integration tests -- Automated deployment: scripting, configuration management -- Continuous monitoring: setting up automated alerts and dashboards -- Benefits of workflow automation: speed, reliability, repeatability +Embedded devices like phones, wearables, and remote sensors are battery-powered. Continual inference and communication can quickly drain those batteries, limiting functionality. For example, a smart collar tagging endangered animals runs on a small battery. Continuously running a GPS tracking model would drain the battery within days. The collar has to carefully schedule when to activate the model. Thus, embedded ML has to carefully manage tasks to conserve power. Techniques include optimized hardware accelerators, prediction caching, and adaptive model execution. -## Model Versioning +### Fleet Management -Explanation: Model versioning is a pivotal aspect of MLOps, facilitating the tracking and management of different versions of machine learning models throughout their lifecycle. This subsection emphasizes the importance of model versioning in embedded systems, where memory and computational resources are limited, offering strategies for effective version management and rollback. +For mass-produced embedded devices, there can be millions of units deployed in the field to orchestrate updates for. Hypothetically, updating a fraud detection model on 100 million (future smart) credit cards requires securely pushing updates to each distributed device rather than a centralized data center. Such distributed scale makes fleet-wide management much harder than a centralized server cluster. It requires intelligent protocols for over-the-air updates, handling connectivity issues, and monitoring resource constraints across devices. -- Importance of versioning in machine learning pipelines -- Tools for model versioning: DVC, MLflow -- Strategies for version control: naming conventions, metadata tagging -- Rollback strategies: handling model regressions and rollbacks +### On-Device Data Collection +Collecting useful training data requires engineering both the sensors on device as well as the software pipelines. This is unlike servers where we can pull data from external sources. Challenges include handling sensor noise. Sensors on an industrial machine detect vibrations and temperature to predict maintenance needs. This requires tuning the sensors and sampling rates to capture useful data. -## Model Monitoring and Maintenance +### Device-Specific Personalization +A smart speaker learns an individual user's voice patterns and speech cadence to improve recognition accuracy, all while protecting privacy. Adapting ML models to specific devices and users is important but this poses privacy challenges. On-device learning allows personalization without transmitting as much private data. But balancing model improvement, privacy preservation, and constraints requires novel techniques. -Explanation: The process of monitoring and maintaining deployed models is crucial to ensure their long-term performance and reliability. This subsection underscores the significance of proactive monitoring and maintenance in embedded systems, discussing methodologies for monitoring model health, performance metrics, and implementing routine maintenance tasks to ensure optimal functionality. +### Safety Considerations -- The importance of monitoring deployed AI models -- Setting up monitoring systems: tools and techniques -- Tracking model performance: accuracy, latency, resource usage -- Maintenance strategies: periodic updates, fine-tuning -- Alerts and notifications: Setting up mechanisms for timely responses to issues -- Over the air updates -- Responding to anomalies: troubleshooting and resolution strategies +For extremely large embedded ML in systems like self-driving vehicles, there are serious safety risks if not engineered carefully. Self-driving cars must undergo extensive track testing in simulated rain, snow, and obstacle scenarios to ensure safe operation before deployment. This requires extensive validation, fail-safes, simulators, and standards compliance before deployment. -## Security and Compliance +### Diverse Hardware Targets -Explanation: Security and compliance are paramount in MLOps, safeguarding sensitive data and ensuring adherence to regulatory requirements. This subsection illuminates the critical role of implementing security measures and ensuring compliance in embedded MLOps, offering insights into best practices for data protection, access control, and regulatory adherence. +There are a diverse range of embedded processors including ARM, x86, specialized AI accelerators, FPGAs etc. Supporting this heterogeneity makes deployment challenging. We need strategies like standardized frameworks, extensive testing, and allowing model tuning for each platform. For example, an object detection model needs efficient implementations across embedded devices like a Raspberry Pi, Nvidia Jetson, and Google Edge TPU. -- Security considerations in embedded MLOps: data encryption, secure communications -- Compliance requirements: GDPR, HIPAA, and other regulations -- Strategies for ensuring compliance: documentation, audits, training -- Tools for security and compliance management: SIEM systems, compliance management platforms +### Testing Coverage + +Rigorously testing edge cases is difficult with constrained embedded resources for simulation. But exhaustive testing is critical in systems like self-driving cars. Exhaustively testing an autopilot model requires millions of simulated kilometers exposing it to extremely rare events like sensor failures. Therefore, strategies like synthetic data generation, distributed simulation, and chaos engineering help improve coverage. + +### Concept Drift Detection + +With limited monitoring data from each remote device, detecting changes in the input data over time is much harder. Drift can lead to degraded model performance. Lightweight methods are needed to identify when retraining is necessary. A model predicting power grid loads shows declining performance as usage patterns change over time. With only local device data, this trend is difficult to spot. + +## Traditional MLOps vs. Embedded MLOps + +In traditional MLOps, ML models are typically deployed in cloud-based or server environments, where resources like computing power and memory are abundant. These environments facilitate the smooth operation of complex models that require significant computational resources. For instance, a cloud-based image recognition model might be used by a social media platform to tag photos with relevant labels automatically. In this case, the model can leverage the extensive resources available in the cloud to process vast data efficiently. + +On the other hand, embedded MLOps involves deploying ML models on embedded systems, specialized computing systems designed to perform specific functions within larger systems. Embedded systems are typically characterized by their limited computational resources and power. For example, a ML model might be embedded in a smart thermostat to optimize heating and cooling based on the user's preferences and habits. In this case, the model must be optimized to run efficiently on the thermostat's limited hardware, without compromising its performance or accuracy. + +The key difference between traditional and embedded MLOps lies in the resource constraints of embedded systems. While traditional MLOps can leverage abundant cloud or server resources, embedded MLOps must contend with the hardware limitations on which the model is deployed. This requires careful optimization and fine-tuning of the model to ensure it can deliver accurate and valuable insights within the constraints of the embedded system. + +Furthermore, embedded MLOps must consider the unique challenges posed by integrating ML models with other components of the embedded system. For example, the model must be compatible with the system's software and hardware and must be able to interface seamlessly with other components, such as sensors or actuators. This requires a deep understanding of both ML and embedded systems, as well as close collaboration between data scientists, engineers, and other stakeholders. + +So, while traditional MLOps and embedded MLOps share the common goal of deploying and maintaining ML models in production environments, the unique challenges posed by embedded systems require a specialized approach. Embedded MLOps must carefully balance the need for model accuracy and performance with the constraints of the hardware on which the model is deployed. This requires a deep understanding of both ML and embedded systems, as well as close collaboration between various stakeholders to ensure the successful integration of ML models into embedded systems. + +This time we will group the subtopics under broader categories to streamline the structure of our thought process on MLOps. This structure will help you understand how different aspects of MLOps are interconnected and why each is important for the efficient operation of ML systems as we discuss the challenges in the context of embedded systems. + +* Model Lifecycle Management + * Data Management: Handling data ingestion, validation, and version control. + * Model Training: Techniques and practices for effective and scalable model training. + * Model Evaluation: Strategies for testing and validating model performance. + * Model Deployment: Approaches for deploying models into production environments. + +* Development and Operations Integration + * CI/CD Pipelines: Integrating ML models into continuous integration and continuous deployment pipelines. + * Infrastructure Management: Setting up and maintaining the infrastructure required for training and deploying models. + * Communication & Collaboration: Ensuring smooth communication and collaboration practices between data scientists, ML engineers, and operations teams. + +* Operational Excellence + * Monitoring: Techniques for monitoring model performance, data drift, and operational health. + * Governance: Implementing policies for model auditability, compliance, and ethical considerations. + +### Model Lifecycle Management + +![Figure 14.4: This diagram presents an overview of Model Lifecycle Management in an MLOps context, illustrating the flow from development to deployment and monitoring. The process begins with ML Development, where code and configurations are version-controlled. Data and model management are central to the process, involving datasets and feature repositories. Continuous training, model conversion, and model registry are key stages in the operationalization of training. Model deployment includes serving the model and managing serving logs. Alerting mechanisms are in place to flag issues, which feed into continuous monitoring to ensure model performance and reliability over time. This integrated approach ensures that models are not only developed but also maintained effectively throughout their lifecycle.](images/ai_ops/mlops_flow.png) + +#### Data Management + +In traditional centralized MLOps, data is aggregated into large datasets and data lakes, then processed on cloud or on-prem servers. However, embedded MLOps relies on decentralized data from local on-device sensors. Devices collect smaller batches of incremental data, often noisy and unstructured. With connectivity constraints, this data cannot always be instantly transmitted to the cloud and needs to be intelligently cached and processed at the edge. + +Embedded devices can only preprocess and clean data minimally before transmission due to limited on-device compute. Early filtering and processing occurs at edge gateways to reduce transmission loads. While leveraging cloud storage, more processing and storage happens at the edge to account for intermittent connectivity. Devices identify and transmit only the most critical subsets of data to the cloud. + +Labeling also faces challenges without centralized data access, requiring more automated techniques like federated learning where devices collaboratively label peers' data. With personal edge devices, data privacy and regulations are critical concerns. Data collection, transmission and storage must be secure and compliant. + +For instance, a smartwatch may collect step count, heart rate, GPS coordinates throughout the day. This data is cached locally and transmitted to an edge gateway when WiFi is available. The gateway processes and filters data before syncing relevant subsets with the cloud platform to retrain models. + +#### Model Training + +In traditional centralized MLOps, models are trained using abundant data via deep learning on high-powered cloud GPU servers. However, embedded MLOps faces severe constraints on model complexity, data availability and compute resources for training. + +The volume of aggregated data is much lower, often requiring techniques like federated learning across devices to create training sets. The specialized nature of edge data also limits public datasets for pre-training. With privacy concerns, data samples need to be tightly controlled and anonymized where possible. + +Furthermore, the models themselves need to use simplified architectures optimized for low-power edge hardware. There is no access to high-end GPUs for intensive deep learning given the compute limitations. Training leverages lower-powered edge servers and clusters with distributed approaches to spread load. + +![Figure 14.5: The diagram illustrates the concept of transfer learning in model training within an MLOps framework. It showcases a neural network where the initial layers (W_A1 to W_A4), responsible for general feature extraction, are frozen (indicated by the green dashed line), meaning their weights are not updated during training. This reuse of pre-trained layers accelerates learning by utilizing knowledge gained from previous tasks. The latter layers (W_A5 to W_A7), depicted beyond the blue dashed line, are fine-tuned for the specific task at hand, focusing on task-specific feature learning. This approach allows the model to adapt to the new task using fewer resources and potentially achieve higher performance on specialized tasks by reusing the general features learned from a broader dataset.](images/ai_ops/transfer_learning.png) + +To mitigate data scarcity and irregularity, strategies like transfer learning become essential (see Figure 14.5). Models can pre-train on large public datasets, then fine-tune the training on limited domain-specific edge data. Even incremental on-device learning to customize models helps overcome the decentralized nature of embedded data. The lack of broad labeled data also motivates semi-supervised techniques. + +For example, a smart home assistant may pre-train an audio recognition model on public YouTube clips which helps bootstrap with general knowledge. It then transfer learns on a small sample of home data to classify customized appliances and events, specializing the model. The model distills down into a lightweight neural network optimized for microphone-enabled devices across the home. + +So embedded MLOps faces acute challenges in constructing training datasets, designing efficient models, and distributing compute for model development compared to traditional settings. Careful adaptation such as transfer learning and distributed training is required to train models given the embedded constraints. + +#### Model Evaluation + +In traditional centralized MLOps, models are evaluated primarily on accuracy metrics using holdout test datasets. However, embedded MLOps requires more holistic evaluation accounting for system constraints beyond just accuracy. + +Models need to be tested early and often on real deployed edge hardware covering diverse configurations. In addition to accuracy, factors like latency, CPU usage, memory footprint and power consumption are critical evaluation criteria. Models are selected based on tradeoffs between these metrics to meet edge device constraints. + +Data drift must also be monitored - where models trained on cloud data degrade in accuracy over time on local edge data. Embedded data often has more variability than centralized training sets. Evaluating models across diverse operational edge data samples is key. But sometimes getting the data for monitoring the drift can be challenging if these devices are in the wild and communication is a barrier. + +Ongoing monitoring provides visibility into real-world performance post-deployment, revealing bottlenecks not caught during testing. For instance, a smart camera model update may be canary tested on 100 cameras first and rolled back if degraded accuracy is observed before expanding to all 5000 cameras. + +#### Model Deployment + +In traditional MLOps, new model versions are directly deployed onto servers via API endpoints. However, embedded devices require optimized delivery mechanisms to receive updated models. Over-the-air (OTA) updates provide a standardized approach to wirelessly distribute new software or firmware releases to embedded devices. Rather than direct API access, OTA packages allow remotely deploying models and dependencies as pre-built bundles. As an alternative, [federated learning](@sec-fl) allows model updates without direct access to raw training data. This decentralized approach has potential for continuous model improvement, but currently lacks robust MLOps platforms. + +For deeply embedded devices lacking connectivity, model delivery relies on physical interfaces like USB or UART serial connections. The model packaging still follows similar principles to OTA updates, but the deployment mechanism is tailored for the capabilities of the edge hardware. Moreover, specialized OTA protocols optimized for IoT networks are often used rather than standard WiFi or Bluetooth protocols. Key factors include efficiency, reliability, security, and telemetry like progress tracking. Solutions like [Mender.io](https://mender.io/) provide embedded-focused OTA services handling differential updates across device fleets. + +### Development and Operations Integration + +#### CI/CD Pipelines + +In traditional MLOps, robust CI/CD infrastructure like Jenkins and Kubernetes enables automating pipelines for large-scale model deployment. However, embedded MLOps lacks this centralized infrastructure and needs more tailored CI/CD workflows for edge devices. + +Building CI/CD pipelines has to account for a fragmented landscape of diverse hardware, firmware versions and connectivity constraints. There is no standard platform on which to orchestrate pipelines and tooling support is more limited. + +Testing needs to cover this wide spectrum of target embedded devices early, which is difficult without centralized access. Companies must invest significant effort into acquiring and managing test infrastructure across the heterogeneous embedded ecosystem. + +Over-the-air updates require setting up specialized servers to securely distribute model bundles to devices in the field. Rollout and rollback procedures must be carefully tailored for particular device families. + +With traditional CI/CD tools less applicable, embedded MLOps relies more on custom scripts and integration. Companies take varied approaches from open source frameworks to fully in-house solutions. Tight integration between developers, edge engineers and end customers establishes trusted release processes. + +Therefore, embedded MLOps can't leverage centralized cloud infrastructure for CI/CD. Companies cobble together custom pipelines, testing infrastructure and OTA delivery to deploy models across fragmented and disconnected edge systems. + +#### Infrastructure Management + +In traditional centralized MLOps, infrastructure entails provisioning cloud servers, GPUs and high-bandwidth networks for intensive workloads like model training and serving predictions at scale. However, embedded MLOps requires more heterogeneous infrastructure spanning edge devices, gateways, and cloud. + +Edge devices like sensors capture and preprocess data locally before intermittent transmission to avoid overloading networks. Gateways aggregate and process data from devices before sending select subsets to the cloud for training and analysis. The cloud provides centralized management and supplemental compute. + +This infrastructure needs tight integration, balancing processing and communication loads. Network bandwidth is limited, requiring careful data filtering and compression. Edge compute capabilities are modest compared to the cloud, imposing optimization constraints. + +Managing secure OTA updates across large device fleets presents challenges at the edge. Rollouts must be incremental and rollback-ready for quick mitigation. Updating edge infrastructure requires coordination given decentralized environments. + +For example, an industrial plant may perform basic signal processing on sensors before sending data to an on-prem gateway. The gateway handles data aggregation, infrastructure monitoring, and OTA updates. Only curated data is transmitted to the cloud for advanced analytics and model retraining. + +In summary, embedded MLOps requires holistic management of distributed infrastructure spanning constrained edge, gateways, and centralized cloud. Workloads are balanced across tiers while accounting for connectivity, compute and security challenges. + +#### Communication & Collaboration + +In traditional MLOps, collaboration tends to be centered around data scientists, ML engineers and DevOps teams. But embedded MLOps requires tighter cross-functional coordination between additional roles to address system constraints. + +Edge engineers optimize model architectures for target hardware environments. They provide feedback to data scientists during development so models fit device capabilities early on. Similarly, product teams define operational requirements informed by end-user contexts. + +With more stakeholders across the embedded ecosystem, communication channels must facilitate information sharing between centralized and remote teams. Issue tracking and project management ensures alignment. + +Collaborative tools optimize models for particular devices. Data scientists can log issues replicated from field devices so models specialize on niche data. Remote device access aids debugging and data collection. + +For example, data scientists may collaborate with field teams managing fleets of wind turbines to retrieve operational data samples. This data is used to specialize models detecting anomalies specific to that turbine class. Model updates are first tested in simulations then reviewed by engineers before field deployment. + +In essence, embedded MLOps mandates continuous coordination between data scientists, engineers, end customers and other stakeholders throughout the ML lifecycle. Only through close collaboration can models be tailored and optimized for targeted edge devices. + +### Operational Excellence + +#### Monitoring + +In traditional MLOps, monitoring focuses on tracking model accuracy, performance metrics and data drift centrally. But embedded MLOps must account for decentralized monitoring across diverse edge devices and environments. + +Edge devices require optimized data collection to transmit key monitoring metrics without overloading networks. Metrics help assess model performance, data patterns, resource usage and other behaviors on remote devices. + +With limited connectivity, more analysis occurs at the edge before aggregating insights centrally. Gateways play a key role in monitoring fleet health and coordinating software updates. Confirmed indicators are eventually propagated to the cloud. + +Broad device coverage is challenging but critical. Issues specific to certain device types may arise so monitoring needs to cover the full spectrum. Canary deployments help trial monitoring processes before scaling. + +Anomaly detection identifies incidents requiring rolling back models or retraining on new data. But interpreting alerts requires understanding unique device contexts based on input from engineers and customers. + +For example, an automaker may monitor autonomous vehicles for indicators of model degradation using caching, aggregation and real-time streams. Engineers assess when identified anomalies warrant OTA updates to improve models based on factors like location and vehicle age. + +Embedded MLOps monitoring provides observability into model and system performance across decentralized edge environments. Careful data collection, analysis and collaboration delivers meaningful insights to maintain reliability. + +#### Governance + +In traditional MLOps, governance focuses on model explainability, fairness and compliance for centralized systems. But embedded MLOps must also address device-level governance challenges around data privacy, security and safety. + +With sensors collecting personal and sensitive data, local data governance on devices is critical. Data access controls, anonymization, and encrypted caching help address privacy risks and compliance like HIPAA and GDPR. Updates must maintain security patches and settings. + +Safety governance considers the physical impacts of flawed device behavior. Failures could cause unsafe conditions in vehicles, factories and critical systems. Redundancy, fail-safes and warning systems help mitigate risks. + +Traditional governance like bias monitoring and model explainability remains imperative but is harder to implement for embedded AI. Peeking into black-box models on low-power devices poses challenges. + +For example, a medical device may scrub personal data on-device before transmission. Strict data governance protocols approve model updates. Model explainability is limited but the focus is detecting anomalous behavior. Backup systems prevent failures. + +In essence, embedded MLOps governance must span the dimensions of privacy, security, safety, transparency, and ethics. Specialized techniques and team collaboration are needed to help establish trust and accountability within decentralized environments. + +### Comparison + +Here is a comparison table highlighting similarities and differences between Traditional MLOps and Embedded MLOps based on all the things we have learned thus far: + +| Area | Traditional MLOps | Embedded MLOps | +|-|-|-| +| Data Management | Large datasets, data lakes, feature stores | On-device data capture, edge caching and processing | +| Model Development | Leverage deep learning, complex neural nets, GPU training | Constraints on model complexity, need for optimization | +| Deployment | Server clusters, cloud deployment, low latency at scale | OTA deployment to devices, intermittent connectivity | +| Monitoring | Dashboards, logs, alerts for cloud model performance | On-device monitoring of predictions, resource usage | +| Retraining | Retrain models on new data | Federated learning from devices, edge retraining | +| Infrastructure | Dynamic cloud infrastructure | Heterogeneous edge/cloud infrastructure | +| Collaboration | Shared experiment tracking and model registry | Collaboration for device-specific optimization | + +So while Embedded MLOps shares foundational MLOps principles, it faces unique constraints to tailor workflows and infrastructure specifically for resource-constrained edge devices. + +## Commercial Offerings + +While no replacement for understanding the principles, there are an increasing number of commercial offerings that help ease the burden of building ML pipelines and integrating tools together to build, test, deploy, and monitor ML models in production. + +### Traditional MLOps + +Google, Microsoft, and Amazon all offer their own version of managed ML services. These include services that manage model training and experimentation, model hosting and scaling, and monitoring. These offerings are available via an API and client SDKs, as well as through web UIs. While it is possible to build your own end-to-end MLOps solutions using pieces from each, the greatest ease of use benefits come by staying within a single provider ecosystem to take advantage of interservice integrations. + +I will provide a quick overview of the services offered that fit into each part of the MLOps life cycle described above, providing examples of offerings from different providers. The space is moving very quickly; new companies and products are entering the scene very rapidly, and these are not meant to serve as an endorsement of a particular company’s offering. + +#### Data Management + +Data storage and versioning are table stakes for any commercial offering and most take advantage of existing general purpose storage solutions such as S3. Others use more specialized options such as a git-based storage (Example: [Hugging Face’s Dataset Hub](https://huggingface.co/datasets) This is an area where providers make it easy to support their competitors' data storage options, as they don’t want this to be a barrier for adoptions of the rest of their MLOps services. For example, Vertex AI’s training pipeline seamlessly supports datasets stored in S3, Google Cloud Buckets, or Hugging Face’s Dataset Hub. + +#### Model Training + +Managed training services are where cloud providers really shine, as they provide on demand access to hardware that is out of reach for most smaller companies. They bill only for hardware during training time, and this puts GPU accelerated training within reach of even the smallest developer teams. The level of control that developers have over their training workflow can vary widely depending on their needs. Some providers have services that provide little more than access to the resources and rely on the developer to manage the training loop, logging, and model storage themselves. Other services are as simple as pointing to a base model and a labeled data set to kick off a fully managed fine tuning job (example: [Vertex AI Fine Tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models)). + +A word of warning: As of 2023, GPU hardware demand well exceeds the supply and as a result cloud providers are rationing access to their GPUs, and in some data center regions may be unavailable or require long term contracts. + +#### Model Evaluation + +Model evaluation tasks typically involve monitoring the accuracy, latency, and resource usage of models in both the testing and production phases. Unlike in embedded systems, ML models deployed to the cloud benefit from constant internet connectivity and virtually unlimited logging capacities. As a result it is often feasible to capture and log every request and response. This makes replaying or generating synthetic requests to enable comparison across different models and versions tractable. + +Some providers also offer services that automate the experiment tracking of modifying model hyperparameters. They track the runs, performance, and generated artifacts from these model training runs. Example: [WeightsAndBiases](https://wandb.ai/) + +#### Model Deployment + +Each provider typically has a service referred to as a “model registry” where training models are stored and accessed. Often these registries may also provide access to base models that are either open source or provided by larger technology companies (or in some cases like [LLAMA](https://ai.meta.com/llama/), both!). These model registries are a common place to compare all of the models and their versions together to allow easy decision making on which to pick for a given use case. Example: [Vertex AI’s model registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) + +From the model registry it is quick and simple to deploy a model to an inference endpoint, which handles the resource provisioning, model weight downloading, and hosting of a given model. These services typically give access to the model via a REST API where inference requests can be sent. Depending on the model type, the specific required resources can be configured, such as which type of GPU accelerator may be needed to hit the desired performance. Some providers may also offer serverless inference, or batch inference options that do not need a persistent endpoint for accessing the model. Example: [AWS SageMaker Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) + +### Embedded MLOps + +Despite the proliferation of new ML Ops tools in response to the increase in demand, the challenges described earlier have constrained the availability of such tools in embedded systems environments. More recently, new tools such as Edge Impulse [@janapa2023edge] have made the development process somewhat easier, as we’ll describe below. + +#### Edge Impulse + +[Edge Impulse](https://edgeimpulse.com/) is an end-to-end development platform for creating and deploying machine learning models onto edge devices such as microcontrollers and small processors. It aims to make embedded machine learning more accessible to software developers through its easy-to-use web interface and integrated tools for data collection, model development, optimization and deployment. It's key capabilities include: + +- Intuitive drag and drop workflow for building ML models without coding required +- Tools for acquiring, labeling, visualizing and preprocessing data from sensors +- Choice of model architectures including neural networks and unsupervised learning +- Model optimization techniques to balance performance metrics and hardware constraints +- Seamless deployment onto edge devices through compilation, SDKs and benchmarks +- Collaboration features for teams and integration with other platforms + +With Edge Impulse, developers with limited data science expertise can develop specialized ML models that run efficiently within small computing environments. It provides a comprehensive solution for creating embedded intelligence and taking machine learning to the edge. + +##### User Interface + +Edge Impulse was designed with seven key principles in mind: accessibility, end-to-end capabilities, a data-centric approach, iterativeness, extensibility, team orientation, and community support. The intuitive user interface (shown below) guides developers at all experience levels through uploading data, selecting a model architecture, training the model, and deploying it across relevant hardware platforms. It should be noted that, like any tool, Edge Impulse is intended to assist with, not replace, foundational considerations such as determining if ML is an appropriate solution or acquiring the requisite domain expertise for a given application. + +![Screenshot of Edge Impulse user interface for building workflows from input data to output features.](images/ai_ops/edge_impulse_dashboard.png) + +What makes Edge Impulse notable is its comprehensive yet intuitive end-to-end workflow. Developers start by uploading their data, either through file upload or command line interface (CLI) tools, after which they can examine raw samples and visualize the distribution of data in the training and test splits. Next, users can pick from a variety of preprocessing “blocks” to facilitate digital signal processing (DSP). While default parameter values are provided, users have the option to customize the parameters as needed, with considerations around memory and latency displayed. Users can easily choose their neural network architecture - without any code needed. + +Thanks to the platform’s visual editor, users can customize the components of the architecture and the specific parameters, all while ensuring that the model is still trainable. Users can also leverage unsupervised learning algorithms, such as K-means clustering and Gaussian mixture models (GMM). + +##### Optimizations + +To accommodate the resource constraints of TinyML applications, Edge Impulse provides a confusion matrix summarizing key performance metrics including per-class accuracy and F1 scores. The platform elucidates the tradeoffs between model performance, size, and latency using simulations in [Renode](https://renode.io/) and device-specific benchmarking. For streaming data use cases, a performance calibration tool leverages a genetic algorithm to find ideal post-processing configurations balancing false acceptance and false rejection rates. To optimize models, techniques like quantization, code optimization, and device-specific optimization are available. For deployment, models can be compiled in appropriate formats for target edge devices. Native firmware SDKs also enable direct data collection on devices. + +In addition to streamlining development, Edge Impulse scales the modeling process itself. A key capability is the [EON Tuner](https://docs.edgeimpulse.com/docs/edge-impulse-studio/eon-tuner), an automated machine learning (AutoML) tool that assists users in hyperparameter tuning based on system constraints. It runs a random search to quickly generate configurations for digital signal processing and training steps. The resulting models are displayed for the user to select based on relevant performance, memory, and latency metrics. For data, active learning facilitates training on a small labeled subset then manually or automatically labeling new samples based on proximity to existing classes. This expands data efficiency. + +##### Use Cases + +Beyond the accessibility of the platform itself, the Edge Impulse team has expanded the knowledge base of the embedded ML ecosystem. The platform lends itself to academic environments, having been used in online courses and on-site workshops globally. Numerous case studies featuring industry and research use cases have been published, most notably [Oura Ring](https://ouraring.com/), which uses ML to identify sleep patterns. The team has made repositories open source on GitHub, facilitating community growth. Users can also make projects public to share techniques and download libraries to share via Apache. Organization-level access enables collaboration on workflows. + +Overall, Edge Impulse is uniquely comprehensive and integrateable for developer workflows. Larger platforms like Google and Microsoft focus more on cloud versus embedded systems. TinyMLOps frameworks such as Neuton AI and Latent AI offer some functionality but lack Edge Impulse’s end-to-end capabilities. TensorFlow Lite Micro is the standard inference engine due to flexibility, open source status, and TensorFlow integration but uses more memory and storage than Edge Impulse’s EON Compiler. Other platforms are outdated, academic-focused, or less versatile. In summary, Edge Impulse aims to streamline and scale embedded ML through an accessible, automated platform. + +#### Limitations + +While Edge Impulse provides an accessible pipeline for embedded ML, there are still important limitations and risks to consider. A key challenge is data quality and availability - the models are only as good as the data used to train them. Users must have sufficient labeled samples that capture the breadth of expected operating conditions and failure modes. Labeled anomalies and outliers are critical yet time-consuming to collect and identify. Insufficient or biased data leads to poor model performance regardless of the tool's capabilities. + +There are also inherent challenges in deploying to low-powered devices. Optimized models may still be too resource intensive for ultra-low power MCUs. Striking the right balance of compression versus accuracy takes some experimentation. The tool simplifies but doesn't eliminate the need for foundational ML and signal processing expertise. Embedded environments also constrain debugging and interpretability compared to the cloud. + +While impressive results are achievable, users shouldn’t view Edge Impulse as a “Push Button ML” solution. Careful project scoping, data collection, model evaluation and testing is still essential. As with any development tool, reasonable expectations and diligence in application are advised. But for developers willing to invest the requisite data science and engineering effort, Edge Impulse can accelerate embedded ML prototyping and deployment. + +## Case Studies + +### Oura Ring + +The [Oura Ring](https://ouraring.com/) is a wearable that, when placed on the user’s finger, can measure activity, sleep, and recovery. Using sensors to track physiological metrics, the device uses embedded ML to predict the stages of sleep. To establish a baseline of legitimacy in the industry, Oura conducted a correlation experiment to evaluate the success of the device in predicting sleep stages against a baseline study, resulting in a solid 62% correlation compared to the baseline of 82-83%. Thus, the team set out to determine how they could improve their performance even further. + +The first challenge was to obtain better data, in terms of both quantity and quality. They could host a larger study to get a more comprehensive data set, but the data would be noisy and at such a large scale that it would be difficult to aggregate, scrub, and analyze. This is where Edge Impulse comes in. + +Oura was able to host a massive sleep study of 100 men and women between the ages of 15 and 73 across three continents (Asia, Europe, North America). In addition to wearing the Oura Ring, participants were responsible for undergoing the industry standard PSG testing, which provided a “label” for this data set. With 440 nights of sleep from 106 participants, the data set totaled 3,444 hours in length across Ring and PSG data. With Edge Impulse, Oura was able to easily upload and consolidate the data from different sources into a private S3 bucket. They were also able to set up a Data Pipeline to merge data samples into individual files, as well as preprocess the data without having to conduct manual scrubbing. + +Because of the time saved on data processing thanks to Edge Impulse, the Oura team was able to focus on the key drivers of their prediction. In fact, they ended up only extracting three types of sensor data: heart rate, motion, and body temperature. After partitioning the data using five-fold cross validation and classifying sleep stage, the team was able to achieve a correlation of 79% - just a few percentage points off the standard. They were able to readily deploy two types of models for sleep detection: one simplified using just the ring’s accelerometer and one more comprehensive leveraging Autonomic Nervous System (ANS)-mediated peripheral signals and circadian features. With Edge Impulse, they plan to conduct further analyses of different activity types and leverage the scalability of the platform to continue to experiment with different sources of data and subsets of features extracted. + +While most ML research focuses on the model-dominant steps such as training and finetuning, this case study underscores the importance of a holistic approach to ML Ops, where even the initial steps of data aggregation and preprocessing have a fundamental impact on successful outcomes. + +### ClinAIOps + +Let’s take a look at MLOps in the context of medical health monitoring to better understand how MLOps “matures” in the context of a real world deployment. Specifically, let’s consider continuous therapeutic monitoring (CTM) enabled by wearable devices and sensors , providing the opportunity for more frequent and personalized adjustments to treatments by capturing detailed physiological data from patients. + +Wearable ML enabled sensors enable continuous physiological and activity monitoring outside of clinics, opening up possibilities for timely, data-driven adjustments of therapies. For example, wearable insulin biosensors [@wearableinsulin] and wrist-worn ECG sensors for glucose monitoring [@glucosemonitor] can automate insulin dosing for diabetes, wrist-worn ECG and PPG sensors can adjust blood thinners based on atrial fibrillation patterns [@plasma; @afib], and accelerometers tracking gait can trigger preventative care for declining mobility in the elderly [@gaitathome]. The variety of signals that can now be captured passively and continuously allows therapy titration and optimization tailored to each patient’s changing needs. By closing the loop between physiological sensing and therapeutic response with TinyML and ondevice learning, wearables are poised to transform many areas of personalized medicine. + +ML holds great promise in analyzing CTM data to provide data-driven recommendations for therapy adjustments. But simply deploying AI models in silos, without integrating them properly into clinical workflows and decision making, can lead to poor adoption or suboptimal outcomes. In other words, thinking about MLOps alone is simply insufficient to make them useful in practice. What is needed are frameworks to seamlessly incorporate AI and CTM into real-world clinical practice as this study shows. + +This case study analyzes “ClinAIOps” as a model for embedded ML operations in complex clinical environments [@Chen2023]. We provide an overview of the framework and why it's needed, walk through an application example, and discuss key implementation challenges related to model monitoring, workflow integration, and stakeholder incentives. Analyzing real-world examples like ClinAIOps illuminates crucial principles and best practices needed for reliable and effective AI Ops across many domains. + +Traditional MLOps frameworks are insufficient for integrating continuous therapeutic monitoring (CTM) and AI in clinical settings for a few key reasons: + +* MLOps focuses on the ML model lifecycle - training, deployment, monitoring. But healthcare involves coordinating multiple human stakeholders - patients, clinicians - not just models. + +* MLOps aims to automate IT system monitoring and management. But optimizing patient health requires personalized care and human oversight, not just automation. + +* CTM and healthcare delivery are complex sociotechnical systems with many moving parts. MLOps doesn't provide a framework for coordinating human and AI decision-making. + +* There are ethical considerations regarding healthcare AI that require human judgment, oversight and accountability. MLOps frameworks lack processes for ethical oversight. + +* Patient health data is highly sensitive and regulated. MLOps alone doesn't ensure handling of protected health information to privacy and regulatory standards. + +* Clinical validation of AI-guided treatment plans is essential for provider adoption. MLOps doesn't incorporate domain-specific evaluation of model recommendations. + +* Optimizing healthcare metrics like patient outcomes requires aligning stakeholder incentives and workflows, which pure tech-focused MLOps overlooks. + +Thus, effectively integrating AI/ML and CTM in clinical practice requires more than just model and data pipelines, but coordinating complex human-AI collaborative decision making, which ClinAIOps aims to address via its multi-stakeholder feedback loops. + +#### Feedback Loops + +The ClinAIOps framework (see Figure 14.7) provides these mechanisms through three feedback loops. The loops are useful for coordinating the insights from continuous physiological monitoring, clinician expertise, and AI guidance via feedback loops, enabling data-driven precision medicine while maintaining human accountability. ClinAIOps provides a model for effective human-AI symbiosis in healthcare. + +These feedback loops which we will discuss below help maintain clinician responsibility and control over treatment plans, by reviewing AI suggestions before they impact patients. They help dynamically customize AI model behavior and outputs to each patient's changing health status. They help improve model accuracy and clinical utility over time by learning from clinician and patient responses. They facilitate shared decision-making and personalized care during patient-clinician interactions. They enable rapid optimization of therapies based on frequent patient data that clinicians cannot manually analyze. + +![Figure 14.7: This diagram depicts the ClinAIOps cycle, highlighting the collaborative workflow between patients, clinicians, and AI developers in a healthcare setting. The patient is at the center, providing health challenges and goals which inform the therapy regimen. The clinician oversees this regimen, giving inputs for adjustments based on continuous monitoring data and health reports from the patient. AI developers play a crucial role by creating systems that generate alerts for therapy updates, which are then vetted by the clinician. This cycle ensures that therapy regimens are dynamically adapted to the patient's changing health status, facilitated by AI-driven insights and clinician expertise, ultimately striving for personalized and responsive patient care.](images/ai_ops/clinaiops.png) + +##### Patient-AI Loop + +The patient-AI loop enables frequent therapy optimization driven by continuous physiological monitoring. Patients are prescribed wearables like smartwatches or skin patches to passively collect relevant health signals. For example, a diabetic patient could have a continuous glucose monitor, or a heart disease patient may wear an ECG patch. The patient's longitudinal health data streams are analyzed by an AI model in context of their electronic medical records - their diagnoses, lab tests, medications, and demographics. The AI model suggests adjustments to the treatment regimen tailored to that individual, like changing a medication dose or administration schedule. Minor adjustments within a pre-approved safe range can be made by the patient independently, while major changes are reviewed by the clinician first. This tight feedback between the patient's physiology and AI-guided therapy allows data-driven, timely optimizations like automated insulin dosing recommendations based on real-time glucose levels for diabetes patients. + +##### Clinician-AI Loop + +The clinician-AI loop allows clinical oversight over AI-generated recommendations to ensure safety and accountability. The AI model provides the clinician with treatment recommendations, along with easily reviewed summaries of the relevant patient data the suggestions are based on. For instance, an AI may suggest lowering a hypertension patient's blood pressure medication dose based on continuously low readings. The clinician can choose to accept, reject, or modify the AI's proposed prescription changes. This clinician feedback further trains and improves the model. Additionally, the clinician sets the bounds for the types and extents of treatment changes the AI can autonomously recommend to patients. By reviewing AI suggestions, the clinician maintains ultimate treatment authority based on their clinical judgment and accountability. This loop allows them to efficiently oversee patient cases with AI assistance. + +##### Patient-Clinician Loop + +Instead of routine data collection, the clinician can focus on interpreting high-level data patterns and collaborating with the patient to set health goals and priorities. The AI assistance will also free up clinician time, allowing them to focus more deeply on listening to patients' stories and concerns. For instance, the clinician may discuss diet and exercise changes with a diabetes patient to improve their glucose control based on their continuous monitoring data. Appointment frequency can also be dynamically adjusted based on patient progress rather than following a fixed calendar. Freed from basic data gathering, the clinician can provide coaching and care customized to each patient informed by their continuous health data. The patient-clinician relationship is made more productive and personalized. + +#### Hypertension Example + +Let’s consider an example. According to the Centers for Disease Control and Prevention, nearly half of adults have hypertension (48.1%, 119.9 million). Hypertension can be managed through ClinAIOps with the help of wearable sensors using the following approach: + +##### Data Collection + +The data collected would include continuous blood pressure monitoring using a wrist-worn device equipped with photoplethysmography (PPG) and electrocardiography (ECG) sensors to estimate blood pressure [@Zhang2017]. The wearable would also track the patient's physical activity via embedded accelerometers. The patient would log any antihypertensive medications they take, along with the time and dose. Additionally, the patient's demographic details and medical history from their electronic health record (EHR) would be incorporated. This multimodal real-world data provides valuable context for the AI model to analyze the patient's blood pressure patterns, activity levels, medication adherence, and responses to therapy. + +##### AI Model + +The on-device AI model would analyze the patient's continuous blood pressure trends, circadian patterns, physical activity levels, medication adherence behaviors, and other context. It would use ML to predict optimal antihypertensive medication doses and timing to control the individual's blood pressure. The model would send dosage change recommendations directly to the patient for minor adjustments, or to the reviewing clinician for approval for more significant modifications. By observing clinician feedback on its recommendations, as well as evaluating the resulting blood pressure outcomes in patients, the AI model could be continually retrained and improved to enhance performance. The goal is fully personalized blood pressure management optimized for each patient's needs and responses. + +##### Patient-AI Loop + +In the Patient-AI loop, the hypertensive patient would receive notifications on their wearable device or tethered smartphone app recommending adjustments to their antihypertensive medications. For minor dose changes within a pre-defined safe range, the patient could independently implement the AI model's suggested adjustment to their regimen. However, for more significant modifications, the patient would need to obtain clinician approval before changing their dosage. By providing personalized and timely medication recommendations, this automates an element of hypertension self-management for the patient. It can improve their adherence to the regimen as well as treatment outcomes. The patient is empowered to leverage AI insights to better control their blood pressure. + +##### Clinician-AI Loop + +In the Clinician-AI loop, the provider would receive summaries of the patient's continuous blood pressure trends and visualizations of their medication taking patterns and adherence. They review the AI model's suggested antihypertensive dosage changes and decide whether to approve, reject, or modify the recommendations before they reach the patient. The clinician also specifies the boundaries for how much the AI can independently recommend changing dosages without clinician oversight. If the patient's blood pressure is trending at dangerous levels, the system alerts the clinician so they can promptly intervene and adjust medications or request an emergency room visit. By keeping the clinician in charge of approving major treatment changes, this loop maintains accountability and safety while allowing the clinician to harness AI insights. + +##### Patient-Clinician Loop + +In the Patient-Clinician loop (see Figure 14.8), the in-person visits would focus less on collecting data or basic medication adjustments. Instead, the clinician could interpret high-level trends and patterns in the patient's continuous monitoring data and have focused discussions about diet, exercise, stress management, and other lifestyle changes to holistically improve their blood pressure control. The frequency of appointments could be dynamically optimized based on the patient's stability rather than following a fixed calendar. Since the clinician would not need to review all the granular data, they could concentrate on delivering personalized care and recommendations during visits. With continuous monitoring and AI-assisted optimization of medications between visits, the clinician-patient relationship focuses on overall wellness goals and becomes more impactful. This proactive and tailored data-driven approach can help avoid hypertension complications like stroke, heart failure, and other threats to patient health and wellbeing. + +![Figure 14.8: This flowchart represents the interactive ClinAIOps loop under the 'Patient-Clinician Loop' subheading. It begins with the patient engaging in self-monitoring by wearing a passive continuous blood-pressure monitor and logging antihypertensive medication usage. This data feeds into the Patient-AI loop, where AI processes the information and generates dosage titration recommendations. The patient then communicates the AI's findings and their personal experience to the clinician. In the Clinician-AI loop, the clinician sets and adjusts the AI's parameters for dose titration, considers the patient's feedback, and assesses for any adverse events. The clinician may also adjust patient-specific recommendations, such as diet and exercise modifications. Finally, the AI plays a critical role in identifying severe cases of hypertension or hypotension, prompting urgent medical follow-ups. This cyclic interaction ensures a dynamic, personalized, and responsive healthcare management system.](images/ai_ops/clinaiops_loops.png) + +#### MLOps vs. ClinAIOps + +The hypertension example illustrates well why traditional MLOps is insufficient for many real-world AI applications, and why frameworks like ClinAIOps are needed instead. + +With hypertension, simply developing and deploying an ML model for adjusting medications would fail without considering the broader clinical context. The patient, clinician, and health system each have concerns that shape adoption. And the AI model cannot optimize blood pressure outcomes alone - it requires integrating with workflows, behaviors, and incentives. + +* Some key gaps the example highlights in a pure MLOps approach: +* The model itself would lack the real-world patient data at scale to reliably recommend treatments. ClinAIOps enables this through collecting feedback from clinicians and patients via continuous monitoring. +* Clinicians would not trust model recommendations without transparency, explainability, and accountability. ClinAIOps keeps the clinician in the loop to build confidence. +* Patients need personalized coaching and motivation - not just AI notifications. The ClinAIOps patient-clinician loop facilitates this. +* Sensor reliability and data accuracy would be insufficient without clinical oversight. ClinAIOps validates recommendations. +* Liability for treatment outcomes is unclear with just an ML model. ClinAIOps maintains human accountability. +* Health systems would lack incentive to change workflows without demonstrating value. ClinAIOps aligns stakeholders. + +The hypertension case clearly shows the need to look beyond just training and deploying a performant ML model to considering the entire human-AI socio-technical system. This is the key gap ClinAIOps aims to address over traditional MLOps. Put another way, traditional MLOps is overly tech-focused on automating ML model development and deployment, while ClinAIOps incorporates clinical context and human-AI coordination through multi-stakeholder feedback loops. + +Here is a table comparing them. The point of this table is to highlight how when MLOps is put into practice, we need to think about more than just ML models. + +| | Traditional MLOps | ClinAIOps | +|-|-------------------|------------------| +| Focus | ML model development and deployment | Coordinating human and AI decision-making | +| Stakeholders | Data scientists, IT engineers | Patients, clinicians, AI developers | +| Feedback loops | Model retraining, monitoring | Patient-AI, clinician-AI, patient-clinician | +| Objective | Operationalize ML deployments | Optimize patient health outcomes | +| Processes | Automated pipelines and infrastructure | Integrates clinical workflows and oversight | +| Data considerations | Building training datasets | Privacy, ethics, protected health information | +| Model validation | Testing model performance metrics | Clinical evaluation of recommendations | +| Implementation | Focuses on technical integration | Aligns incentives of human stakeholders | + +#### Summary +In complex domains like healthcare, successfully deploying AI requires moving beyond a narrow focus on just training and deploying performant ML models. As illustrated through the hypertension example, real-world integration of AI necessitates coordinating diverse stakeholders, aligning incentives, validating recommendations, and maintaining accountability. Frameworks like ClinAIOps, which facilitate collaborative human-AI decision making through integrated feedback loops, are needed to address these multifaceted challenges. Rather than just automating tasks, AI must augment human capabilities and clinical workflows. This allows AI to deliver a positive impact on patient outcomes, population health, and healthcare efficiency. ## Conclusion -Explanation: As we wrap up this chapter, we consolidate the key takeaways regarding the implementation of MLOps in the embedded domain. This final section seeks to furnish readers with a holistic view of the principles and practices of embedded MLOps, encouraging a thoughtful approach to adopting MLOps strategies in their projects, with a glimpse into the potential future trends in this dynamic field. +Embedded ML is poised to transform many industries by enabling AI capabilities directly on edge devices like smartphones, sensors, and IoT hardware. However, developing and deploying TinyML models on resource-constrained embedded systems poses unique challenges compared to traditional cloud-based MLOps. + +This chapter provided an in-depth analysis of key differences between traditional and embedded MLOps across the model lifecycle, development workflows, infrastructure management, and operational practices. We discussed how factors like intermittent connectivity, decentralized data, and limited on-device compute necessitate innovative techniques like federated learning, on-device inference, and model optimization. Architectural patterns like cross-device learning and hierarchical edge-cloud infrastructure help mitigate constraints. -- Recap of key concepts and best practices in embedded MLOps -- Challenges and opportunities in implementing MLOps in embedded systems -- Future directions: emerging trends and technologies in embedded MLOps +Through concrete examples like Oura Ring and ClinAIOps, we demonstrated applied principles for embedded MLOps. The case studies highlighted critical considerations beyond just core ML engineering, like aligning stakeholder incentives, maintaining accountability, and coordinating human-AI decision making. This underscores the need for a holistic approach spanning both technical and human elements. +While embedded MLOps faces impediments, emerging tools like Edge Impulse and lessons from pioneers help accelerate TinyML innovation. A solid understanding of foundational MLOps principles tailored to embedded environments will empower more organizations to overcome constraints and deliver distributed AI capabilities. As frameworks and best practices mature, seamlessly integrating ML into edge devices and processes will transform industries through localized intelligence. \ No newline at end of file diff --git a/references.bib b/references.bib index 33302113..4b4cf6b6 100644 --- a/references.bib +++ b/references.bib @@ -1,85 +1,57 @@ +@article{10242251, + title = {Training Spiking Neural Networks Using Lessons From Deep Learning}, + author = {Eshraghian, Jason K. and Ward, Max and Neftci, Emre O. and Wang, Xinxin and Lenz, Gregor and Dwivedi, Girish and Bennamoun, Mohammed and Jeong, Doo Seok and Lu, Wei D.}, + year = 2023, + journal = {Proceedings of the IEEE}, + volume = 111, + number = 9, + pages = {1016--1054}, +} + @inproceedings{abadi2016deep, title = {Deep learning with differential privacy}, author = {Abadi, Martin and Chu, Andy and Goodfellow, Ian and McMahan, H Brendan and Mironov, Ilya and Talwar, Kunal and Zhang, Li}, year = 2016, booktitle = {Proceedings of the 2016 ACM SIGSAC conference on computer and communications security}, - pages = {308--318} -} - -@inproceedings{krishnan2023archgym, - title={ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design}, - author={Krishnan, Srivatsan and Yazdanbakhsh, Amir and Prakash, Shvetank and Jabbour, Jason and Uchendu, Ikechukwu and Ghosh, Susobhan and Boroujerdian, Behzad and Richins, Daniel and Tripathy, Devashree and Faust, Aleksandra and Janapa Reddi, Vijay}, - booktitle={Proceedings of the 50th Annual International Symposium on Computer Architecture}, - pages={1--16}, - year={2023} -} - -@misc{kuzmin2022fp8, - title={FP8 Quantization: The Power of the Exponent}, - author={Andrey Kuzmin and Mart Van Baalen and Yuwei Ren and Markus Nagel and Jorn Peters and Tijmen Blankevoort}, - year={2022}, - eprint={2208.09225}, - archivePrefix={arXiv}, - primaryClass={cs.LG} + pages = {308--318}, } @inproceedings{abadi2016tensorflow, - title = {$\{$TensorFlow$\}$: a system for $\{$Large-Scale$\}$ machine learning}, + title = {$\{$TensorFlow\$\}\$: a system for \$\{\$Large-Scale\$\}\$ machine learning}, author = {Abadi, Mart{\'\i}n and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others}, year = 2016, booktitle = {12th USENIX symposium on operating systems design and implementation (OSDI 16)}, - pages = {265--283} -} - - -@article{shastri2021photonics, - title={Photonics for artificial intelligence and neuromorphic computing}, - author={Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, - journal={Nature Photonics}, - volume={15}, - number={2}, - pages={102--114}, - year={2021}, - publisher={Nature Publishing Group UK London} -} - -@inproceedings{jouppi2017datacenter, - title={In-datacenter performance analysis of a tensor processing unit}, - author={Jouppi, Norman P and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and others}, - booktitle={Proceedings of the 44th annual international symposium on computer architecture}, - pages={1--12}, - year={2017} -} - -@inproceedings{ignatov2018ai, -title={Ai benchmark: Running deep neural networks on android smartphones}, - author={Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, - booktitle={Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, - pages={0--0}, - year={2018} + pages = {265--283}, } - @inproceedings{adolf2016fathom, title = {Fathom: Reference workloads for modern deep learning methods}, author = {Adolf, Robert and Rama, Saketh and Reagen, Brandon and Wei, Gu-Yeon and Brooks, David}, year = 2016, booktitle = {2016 IEEE International Symposium on Workload Characterization (IISWC)}, pages = {1--10}, - organization = {IEEE} + organization = {IEEE}, } +@article{afib, + title = {Mobile Photoplethysmographic Technology to Detect Atrial Fibrillation}, + author = {Yutao Guo and Hao Wang and Hui Zhang and Tong Liu and Zhaoguang Liang and Yunlong Xia and Li Yan and Yunli Xing and Haili Shi and Shuyan Li and Yanxia Liu and Fan Liu and Mei Feng and Yundai Chen and Gregory Y.H. Lip and null null}, + year = 2019, + journal = {Journal of the American College of Cardiology}, + volume = 74, + number = 19, + pages = {2365--2375}, +} @misc{al2016theano, title = {Theano: A Python framework for fast computation of mathematical expressions}, - author = {The Theano Development Team and Rami Al-Rfou and Guillaume Alain and Amjad Almahairi and Christof Angermueller and Dzmitry Bahdanau and Nicolas Ballas and Frédéric Bastien and Justin Bayer and Anatoly Belikov and Alexander Belopolsky and Yoshua Bengio and Arnaud Bergeron and James Bergstra and Valentin Bisson and Josh Bleecher Snyder and Nicolas Bouchard and Nicolas Boulanger-Lewandowski and Xavier Bouthillier and Alexandre de Brébisson and Olivier Breuleux and Pierre-Luc Carrier and Kyunghyun Cho and Jan Chorowski and Paul Christiano and Tim Cooijmans and Marc-Alexandre Côté and Myriam Côté and Aaron Courville and Yann N. Dauphin and Olivier Delalleau and Julien Demouth and Guillaume Desjardins and Sander Dieleman and Laurent Dinh and Mélanie Ducoffe and Vincent Dumoulin and Samira Ebrahimi Kahou and Dumitru Erhan and Ziye Fan and Orhan Firat and Mathieu Germain and Xavier Glorot and Ian Goodfellow and Matt Graham and Caglar Gulcehre and Philippe Hamel and Iban Harlouchet and Jean-Philippe Heng and Balázs Hidasi and Sina Honari and Arjun Jain and Sébastien Jean and Kai Jia and Mikhail Korobov and Vivek Kulkarni and Alex Lamb and Pascal Lamblin and Eric Larsen and César Laurent and Sean Lee and Simon Lefrancois and Simon Lemieux and Nicholas Léonard and Zhouhan Lin and Jesse A. Livezey and Cory Lorenz and Jeremiah Lowin and Qianli Ma and Pierre-Antoine Manzagol and Olivier Mastropietro and Robert T. McGibbon and Roland Memisevic and Bart van Merriënboer and Vincent Michalski and Mehdi Mirza and Alberto Orlandi and Christopher Pal and Razvan Pascanu and Mohammad Pezeshki and Colin Raffel and Daniel Renshaw and Matthew Rocklin and Adriana Romero and Markus Roth and Peter Sadowski and John Salvatier and François Savard and Jan Schlüter and John Schulman and Gabriel Schwartz and Iulian Vlad Serban and Dmitriy Serdyuk and Samira Shabanian and Étienne Simon and Sigurd Spieckermann and S. Ramana Subramanyam and Jakub Sygnowski and Jérémie Tanguay and Gijs van Tulder and Joseph Turian and Sebastian Urban and Pascal Vincent and Francesco Visin and Harm de Vries and David Warde-Farley and Dustin J. Webb and Matthew Willson and Kelvin Xu and Lijun Xue and Li Yao and Saizheng Zhang and Ying Zhang}, + author = {The Theano Development Team and Rami Al-Rfou and Guillaume Alain and Amjad Almahairi and Christof Angermueller and Dzmitry Bahdanau and Nicolas Ballas and Fr\'{e}d\'{e}ric Bastien and Justin Bayer and Anatoly Belikov and Alexander Belopolsky and Yoshua Bengio and Arnaud Bergeron and James Bergstra and Valentin Bisson and Josh Bleecher Snyder and Nicolas Bouchard and Nicolas Boulanger-Lewandowski and Xavier Bouthillier and Alexandre de Br\'{e}bisson and Olivier Breuleux and Pierre-Luc Carrier and Kyunghyun Cho and Jan Chorowski and Paul Christiano and Tim Cooijmans and Marc-Alexandre C\^{o}t\'{e} and Myriam C\^{o}t\'{e} and Aaron Courville and Yann N. Dauphin and Olivier Delalleau and Julien Demouth and Guillaume Desjardins and Sander Dieleman and Laurent Dinh and M\'{e}lanie Ducoffe and Vincent Dumoulin and Samira Ebrahimi Kahou and Dumitru Erhan and Ziye Fan and Orhan Firat and Mathieu Germain and Xavier Glorot and Ian Goodfellow and Matt Graham and Caglar Gulcehre and Philippe Hamel and Iban Harlouchet and Jean-Philippe Heng and Bal\'{a}zs Hidasi and Sina Honari and Arjun Jain and S\'{e}bastien Jean and Kai Jia and Mikhail Korobov and Vivek Kulkarni and Alex Lamb and Pascal Lamblin and Eric Larsen and C\'{e}sar Laurent and Sean Lee and Simon Lefrancois and Simon Lemieux and Nicholas L\'{e}onard and Zhouhan Lin and Jesse A. Livezey and Cory Lorenz and Jeremiah Lowin and Qianli Ma and Pierre-Antoine Manzagol and Olivier Mastropietro and Robert T. McGibbon and Roland Memisevic and Bart van Merri\"{e}nboer and Vincent Michalski and Mehdi Mirza and Alberto Orlandi and Christopher Pal and Razvan Pascanu and Mohammad Pezeshki and Colin Raffel and Daniel Renshaw and Matthew Rocklin and Adriana Romero and Markus Roth and Peter Sadowski and John Salvatier and Fran\c{c}ois Savard and Jan Schl\"{u}ter and John Schulman and Gabriel Schwartz and Iulian Vlad Serban and Dmitriy Serdyuk and Samira Shabanian and \'{E}tienne Simon and Sigurd Spieckermann and S. Ramana Subramanyam and Jakub Sygnowski and J\'{e}r\'{e}mie Tanguay and Gijs van Tulder and Joseph Turian and Sebastian Urban and Pascal Vincent and Francesco Visin and Harm de Vries and David Warde-Farley and Dustin J. Webb and Matthew Willson and Kelvin Xu and Lijun Xue and Li Yao and Saizheng Zhang and Ying Zhang}, year = 2016, eprint = {1605.02688}, archiveprefix = {arXiv}, - primaryclass = {cs.SC} + primaryclass = {cs.SC}, } - @article{Aledhari_Razzak_Parizi_Saeed_2020, title = {Federated learning: A survey on enabling technologies, Protocols, and applications}, author = {Aledhari, Mohammed and Razzak, Rehma and Parizi, Reza M. and Saeed, Fahad}, @@ -87,46 +59,39 @@ @article{Aledhari_Razzak_Parizi_Saeed_2020 journal = {IEEE Access}, volume = 8, pages = {140699–140725}, - doi = {10.1109/access.2020.3013541} } - @article{aljundi_gradient_nodate, title = {Gradient based sample selection for online continual learning}, author = {Aljundi, Rahaf and Lin, Min and Goujaud, Baptiste and Bengio, Yoshua}, language = {en}, - file = {Aljundi et al. - Gradient based sample selection for online continu.pdf:/Users/alex/Zotero/storage/GPHM4KY7/Aljundi et al. - Gradient based sample selection for online continu.pdf:application/pdf} } - @inproceedings{altayeb2022classifying, title = {Classifying mosquito wingbeat sound using TinyML}, author = {Altayeb, Moez and Zennaro, Marco and Rovai, Marcelo}, year = 2022, booktitle = {Proceedings of the 2022 ACM Conference on Information Technology for Social Good}, - pages = {132--137} + pages = {132--137}, } - @misc{amodei_ai_2018, title = {{AI} and {Compute}}, author = {Amodei, Dario and Hernandez, Danny}, year = 2018, month = may, journal = {OpenAI Blog}, - url = {https://openai.com/research/ai-and-compute} + url = {https://openai.com/research/ai-and-compute}, } - @inproceedings{antol2015vqa, title = {Vqa: Visual question answering}, author = {Antol, Stanislaw and Agrawal, Aishwarya and Lu, Jiasen and Mitchell, Margaret and Batra, Dhruv and Zitnick, C Lawrence and Parikh, Devi}, year = 2015, booktitle = {Proceedings of the IEEE international conference on computer vision}, - pages = {2425--2433} + pages = {2425--2433}, } - @article{app112211073, title = {Hardware/Software Co-Design for TinyML Voice-Recognition Application on Resource Frugal Edge Devices}, author = {Kwon, Jisu and Park, Daejin}, @@ -134,32 +99,47 @@ @article{app112211073 journal = {Applied Sciences}, volume = 11, number = 22, - doi = {10.3390/app112211073}, - issn = {2076-3417}, url = {https://www.mdpi.com/2076-3417/11/22/11073}, - article-number = 11073 + article-number = 11073, } - @article{Ardila_Branson_Davis_Henretty_Kohler_Meyer_Morais_Saunders_Tyers_Weber_2020, title = {Common Voice: A Massively-Multilingual Speech Corpus}, author = {Ardila, Rosana and Branson, Megan and Davis, Kelly and Henretty, Michael and Kohler, Michael and Meyer, Josh and Morais, Reuben and Saunders, Lindsay and Tyers, Francis M. and Weber, Gregor}, year = 2020, - month = {May}, + month = may, journal = {Proceedings of the 12th Conference on Language Resources and Evaluation}, - pages = {4218-4222} + pages = {4218--4222}, } - @misc{awq, title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, author = {Lin and Tang, Tang and Yang, Dang and Gan, Han}, year = 2023, - doi = {10.48550/arXiv.2306.00978}, url = {https://arxiv.org/abs/2306.00978}, - urldate = {2023-10-03} + urldate = {2023-10-03}, +} + +@misc{bailey_enabling_2018, + title = {Enabling {Cheaper} {Design}}, + author = {Bailey, Brian}, + year = 2018, + month = sep, + journal = {Semiconductor Engineering}, + url = {https://semiengineering.com/enabling-cheaper-design/}, + urldate = {2023-11-07}, + language = {en-US}, } +@article{bains2020business, + title = {The business of building brains}, + author = {Bains, Sunny}, + year = 2020, + journal = {Nat. Electron}, + volume = 3, + number = 7, + pages = {348--351}, +} @inproceedings{bamoumen2022tinyml, title = {How TinyML Can be Leveraged to Solve Environmental Problems: A Survey}, @@ -167,55 +147,48 @@ @inproceedings{bamoumen2022tinyml year = 2022, booktitle = {2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)}, pages = {338--343}, - organization = {IEEE} + organization = {IEEE}, } - @article{banbury2020benchmarking, title = {Benchmarking tinyml systems: Challenges and direction}, author = {Banbury, Colby R and Reddi, Vijay Janapa and Lam, Max and Fu, William and Fazel, Amin and Holleman, Jeremy and Huang, Xinyuan and Hurtado, Robert and Kanter, David and Lokhmotov, Anton and others}, year = 2020, - journal = {arXiv preprint arXiv:2003.04821} + journal = {arXiv preprint arXiv:2003.04821}, } - @article{bank2023autoencoders, title = {Autoencoders}, author = {Bank, Dor and Koenigstein, Noam and Giryes, Raja}, year = 2023, journal = {Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook}, publisher = {Springer}, - pages = {353--374} + pages = {353--374}, } - @book{barroso2019datacenter, title = {The datacenter as a computer: Designing warehouse-scale machines}, author = {Barroso, Luiz Andr{\'e} and H{\"o}lzle, Urs and Ranganathan, Parthasarathy}, year = 2019, - publisher = {Springer Nature} + publisher = {Springer Nature}, } - @article{Bender_Friedman_2018, title = {Data statements for natural language processing: Toward mitigating system bias and enabling better science}, author = {Bender, Emily M. and Friedman, Batya}, year = 2018, journal = {Transactions of the Association for Computational Linguistics}, volume = 6, - pages = {587-604}, - doi = {10.1162/tacl_a_00041} + pages = {587--604}, } - @article{beyer2020we, title = {Are we done with imagenet?}, author = {Beyer, Lucas and H{\'e}naff, Olivier J and Kolesnikov, Alexander and Zhai, Xiaohua and Oord, A{\"a}ron van den}, year = 2020, - journal = {arXiv preprint arXiv:2006.07159} + journal = {arXiv preprint arXiv:2006.07159}, } - @article{biggio2014pattern, title = {Pattern recognition systems under attack: Design issues and research challenges}, author = {Biggio, Battista and Fumera, Giorgio and Roli, Fabio}, @@ -224,9 +197,30 @@ @article{biggio2014pattern publisher = {World Scientific}, volume = 28, number = {07}, - pages = 1460002 + pages = 1460002, +} + +@article{biggs2021natively, + title = {A natively flexible 32-bit Arm microprocessor}, + author = {Biggs, John and Myers, James and Kufel, Jedrzej and Ozer, Emre and Craske, Simon and Sou, Antony and Ramsdale, Catherine and Williamson, Ken and Price, Richard and White, Scott}, + year = 2021, + journal = {Nature}, + publisher = {Nature Publishing Group UK London}, + volume = 595, + number = 7868, + pages = {532--536}, } +@article{binkert2011gem5, + title = {The gem5 simulator}, + author = {Binkert, Nathan and Beckmann, Bradford and Black, Gabriel and Reinhardt, Steven K and Saidi, Ali and Basu, Arkaprava and Hestness, Joel and Hower, Derek R and Krishna, Tushar and Sardashti, Somayeh and others}, + year = 2011, + journal = {ACM SIGARCH computer architecture news}, + publisher = {ACM New York, NY, USA}, + volume = 39, + number = 2, + pages = {1--7}, +} @misc{blalock_what_2020, title = {What is the {State} of {Neural} {Network} {Pruning}?}, @@ -234,15 +228,22 @@ @misc{blalock_what_2020 year = 2020, month = mar, publisher = {arXiv}, - doi = {10.48550/arXiv.2003.03033}, url = {http://arxiv.org/abs/2003.03033}, urldate = {2023-10-20}, note = {arXiv:2003.03033 [cs, stat]}, - abstract = {Neural network pruning---the task of reducing the size of a network by removing parameters---has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.}, - keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/MA4QGZ6E/Blalock et al. - 2020 - What is the State of Neural Network Pruning.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/8DFKG4GL/2003.html:text/html} } +@inproceedings{brown_language_2020, + title = {Language {Models} are {Few}-{Shot} {Learners}}, + author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario}, + year = 2020, + booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, + publisher = {Curran Associates, Inc.}, + volume = 33, + pages = {1877--1901}, + url = {https://proceedings.neurips.cc/paper\_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}, + urldate = {2023-11-07}, +} @article{brown2020language, title = {Language models are few-shot learners}, @@ -250,13 +251,22 @@ @article{brown2020language year = 2020, journal = {Advances in neural information processing systems}, volume = 33, - pages = {1877--1901} + pages = {1877--1901}, } +@article{burr2016recent, + title = {Recent progress in phase-change memory technology}, + author = {Burr, Geoffrey W and Brightsky, Matthew J and Sebastian, Abu and Cheng, Huai-Yu and Wu, Jau-Yi and Kim, Sangbum and Sosa, Norma E and Papandreou, Nikolaos and Lung, Hsiang-Lan and Pozidis, Haralampos and others}, + year = 2016, + journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems}, + publisher = {IEEE}, + volume = 6, + number = 2, + pages = {146--162}, +} @inproceedings{cai_online_2021, title = {Online {Continual} {Learning} with {Natural} {Distribution} {Shifts}: {An} {Empirical} {Study} with {Visual} {Data}}, - shorttitle = {Online {Continual} {Learning} with {Natural} {Distribution} {Shifts}}, author = {Cai, Zhipeng and Sener, Ozan and Koltun, Vladlen}, year = 2021, month = oct, @@ -264,41 +274,34 @@ @inproceedings{cai_online_2021 publisher = {IEEE}, address = {Montreal, QC, Canada}, pages = {8261--8270}, - doi = {10.1109/ICCV48922.2021.00817}, isbn = {978-1-66542-812-5}, url = {https://ieeexplore.ieee.org/document/9710740/}, urldate = {2023-10-26}, language = {en}, - file = {Cai et al. - 2021 - Online Continual Learning with Natural Distributio.pdf:/Users/alex/Zotero/storage/R7ZMIM4K/Cai et al. - 2021 - Online Continual Learning with Natural Distributio.pdf:application/pdf} } - @article{cai_tinytl_nodate, - title = {{TinyTL}: {Reduce} {Memory}, {Not} {Parameters} for {Efficient} {On}-{Device} {Learning}}, + title = {{TinyTL}: {Reduce} {Memory}, {Not} {Parameters} for {Efficient} {On}-{Device} {Learning}}, author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, language = {en}, - file = {Cai et al. - TinyTL Reduce Memory, Not Parameters for Efficient.pdf:/Users/alex/Zotero/storage/J9C8PTCX/Cai et al. - TinyTL Reduce Memory, Not Parameters for Efficient.pdf:application/pdf} } - @article{cai2018proxylessnas, title = {Proxylessnas: Direct neural architecture search on target task and hardware}, author = {Cai, Han and Zhu, Ligeng and Han, Song}, year = 2018, - journal = {arXiv preprint arXiv:1812.00332} + journal = {arXiv preprint arXiv:1812.00332}, } - @article{cai2020tinytl, title = {Tinytl: Reduce memory, not parameters for efficient on-device learning}, author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, year = 2020, journal = {Advances in Neural Information Processing Systems}, volume = 33, - pages = {11285--11297} + pages = {11285--11297}, } - @article{Chapelle_Scholkopf_Zien, title = {Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]}, author = {Chapelle, O. and Scholkopf, B. and Zien, Eds., A.}, @@ -307,26 +310,21 @@ @article{Chapelle_Scholkopf_Zien volume = 20, number = 3, pages = {542–542}, - doi = {10.1109/tnn.2009.2015974} } - @misc{chen__inpainting_2022, title = {Inpainting {Fluid} {Dynamics} with {Tensor} {Decomposition} ({NumPy})}, author = {Chen (陈新宇), Xinyu}, year = 2022, month = mar, journal = {Medium}, - url = {https://medium.com/@xinyu.chen/inpainting-fluid-dynamics-with-tensor-decomposition-numpy-d84065fead4d}, + url = {https://medium.com/\@xinyu.chen/inpainting-fluid-dynamics-with-tensor-decomposition-numpy-d84065fead4d}, urldate = {2023-10-20}, - abstract = {Some simple examples for showing how to use tensor decomposition to reconstruct fluid dynamics}, - language = {en} + language = {en}, } - @misc{chen_tvm_2018, title = {{TVM}: {An} {Automated} {End}-to-{End} {Optimizing} {Compiler} for {Deep} {Learning}}, - shorttitle = {{TVM}}, author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Cowan, Meghan and Shen, Haichen and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and Guestrin, Carlos and Krishnamurthy, Arvind}, year = 2018, month = oct, @@ -335,28 +333,32 @@ @misc{chen_tvm_2018 urldate = {2023-10-26}, note = {arXiv:1802.04799 [cs]}, language = {en}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Programming Languages}, - annote = {Comment: Significantly improved version, add automated optimization}, - file = {Chen et al. - 2018 - TVM An Automated End-to-End Optimizing Compiler f.pdf:/Users/alex/Zotero/storage/QR8MHJ38/Chen et al. - 2018 - TVM An Automated End-to-End Optimizing Compiler f.pdf:application/pdf} } - @article{chen2016training, title = {Training deep nets with sublinear memory cost}, author = {Chen, Tianqi and Xu, Bing and Zhang, Chiyuan and Guestrin, Carlos}, year = 2016, - journal = {arXiv preprint arXiv:1604.06174} + journal = {arXiv preprint arXiv:1604.06174}, } - @inproceedings{chen2018tvm, - title = {$\{$TVM$\}$: An automated $\{$End-to-End$\}$ optimizing compiler for deep learning}, + title = {$\{$TVM\$\}\$: An automated \$\{\$End-to-End\$\}\$ optimizing compiler for deep learning}, author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Shen, Haichen and Cowan, Meghan and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and others}, year = 2018, booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)}, - pages = {578--594} + pages = {578--594}, } +@article{Chen2023, + title = {A framework for integrating artificial intelligence for clinical care with continuous therapeutic monitoring}, + author = {Chen, Emma and Prakash, Shvetank and Janapa Reddi, Vijay and Kim, David and Rajpurkar, Pranav}, + year = 2023, + month = nov, + day = {06}, + journal = {Nature Biomedical Engineering}, + url = {https://doi.org/10.1038/s41551-023-01115-0}, +} @article{chen2023learning, title = {Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning}, @@ -366,29 +368,44 @@ @article{chen2023learning publisher = {Springer}, volume = 2023, number = 1, - pages = 33 + pages = 33, +} + +@article{cheng2017survey, + title = {A survey of model compression and acceleration for deep neural networks}, + author = {Cheng, Yu and Wang, Duo and Zhou, Pan and Zhang, Tao}, + year = 2017, + journal = {arXiv preprint arXiv:1710.09282}, } +@article{chi2016prime, + title = {Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory}, + author = {Chi, Ping and Li, Shuangchen and Xu, Cong and Zhang, Tao and Zhao, Jishen and Liu, Yongpan and Wang, Yu and Xie, Yuan}, + year = 2016, + journal = {ACM SIGARCH Computer Architecture News}, + publisher = {ACM New York, NY, USA}, + volume = 44, + number = 3, + pages = {27--39}, +} @misc{chollet2015, title = {keras}, - author = {François Chollet}, + author = {Fran\c{c}ois Chollet}, year = 2015, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/fchollet/keras}}, - commit = {5bcac37} + commit = {5bcac37}, } - @article{chollet2018keras, title = {Introduction to keras}, author = {Chollet, Fran{\c{c}}ois}, year = 2018, - journal = {March 9th} + journal = {March 9th}, } - @inproceedings{chu2021discovering, title = {Discovering multi-hardware mobile models via architecture search}, author = {Chu, Grace and Arikan, Okan and Bender, Gabriel and Wang, Weijun and Brighton, Achille and Kindermans, Pieter-Jan and Liu, Hanxiao and Akin, Berkin and Gupta, Suyog and Howard, Andrew}, @@ -397,9 +414,19 @@ @inproceedings{chu2021discovering pages = {3022--3031}, eprint = {2008.08178}, archiveprefix = {arXiv}, - primaryclass = {cs.CV} + primaryclass = {cs.CV}, } +@article{chua1971memristor, + title = {Memristor-the missing circuit element}, + author = {Chua, Leon}, + year = 1971, + journal = {IEEE Transactions on circuit theory}, + publisher = {IEEE}, + volume = 18, + number = 5, + pages = {507--519}, +} @article{coleman2017dawnbench, title = {Dawnbench: An end-to-end deep learning benchmark and competition}, @@ -408,10 +435,9 @@ @article{coleman2017dawnbench journal = {Training}, volume = 100, number = 101, - pages = 102 + pages = 102, } - @inproceedings{coleman2022similarity, title = {Similarity search for efficient active learning and search of rare concepts}, author = {Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki}, @@ -419,23 +445,40 @@ @inproceedings{coleman2022similarity booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = 36, number = 6, - pages = {6402--6410} + pages = {6402--6410}, } - @misc{cottier_trends_2023, title = {Trends in the {Dollar} {Training} {Cost} of {Machine} {Learning} {Systems}}, author = {Cottier, Ben}, year = 2023, month = jan, journal = {Epoch AI Report}, - url = {https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems} + url = {https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems}, +} + +@article{dally_evolution_2021, + title = {Evolution of the {Graphics} {Processing} {Unit} ({GPU})}, + author = {Dally, William J. and Keckler, Stephen W. and Kirk, David B.}, + year = 2021, + month = nov, + journal = {IEEE Micro}, + volume = 41, + number = 6, + pages = {42--51}, + url = {https://ieeexplore.ieee.org/document/9623445}, + urldate = {2023-11-07}, + note = {Conference Name: IEEE Micro}, } +@inproceedings{data_cascades, + title = {"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI}, + author = {Nithya Sambasivan and Shivani Kapania and Hannah Highfill and Diana Akrong and Praveen Kumar Paritosh and Lora Mois Aroyo}, + year = 2021, +} @misc{david_tensorflow_2021, title = {{TensorFlow} {Lite} {Micro}: {Embedded} {Machine} {Learning} on {TinyML} {Systems}}, - shorttitle = {{TensorFlow} {Lite} {Micro}}, author = {David, Robert and Duke, Jared and Jain, Advait and Reddi, Vijay Janapa and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Regev, Shlomi and Rhodes, Rocky and Wang, Tiezhen and Warden, Pete}, year = 2021, month = mar, @@ -444,46 +487,73 @@ @misc{david_tensorflow_2021 urldate = {2023-10-26}, note = {arXiv:2010.08678 [cs]}, language = {en}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning}, - file = {David et al. - 2021 - TensorFlow Lite Micro Embedded Machine Learning o.pdf:/Users/alex/Zotero/storage/YCFVNEVH/David et al. - 2021 - TensorFlow Lite Micro Embedded Machine Learning o.pdf:application/pdf} } - @article{david2021tensorflow, title = {Tensorflow lite micro: Embedded machine learning for tinyml systems}, author = {David, Robert and Duke, Jared and Jain, Advait and Janapa Reddi, Vijay and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Wang, Tiezhen and others}, year = 2021, journal = {Proceedings of Machine Learning and Systems}, volume = 3, - pages = {800--811} + pages = {800--811}, +} + +@article{davies2018loihi, + title = {Loihi: A neuromorphic manycore processor with on-chip learning}, + author = {Davies, Mike and Srinivasa, Narayan and Lin, Tsung-Han and Chinya, Gautham and Cao, Yongqiang and Choday, Sri Harsha and Dimou, Georgios and Joshi, Prasad and Imam, Nabil and Jain, Shweta and others}, + year = 2018, + journal = {Ieee Micro}, + publisher = {IEEE}, + volume = 38, + number = 1, + pages = {82--99}, +} + +@article{davies2021advancing, + title = {Advancing neuromorphic computing with loihi: A survey of results and outlook}, + author = {Davies, Mike and Wild, Andreas and Orchard, Garrick and Sandamirskaya, Yulia and Guerra, Gabriel A Fonseca and Joshi, Prasad and Plank, Philipp and Risbud, Sumedh R}, + year = 2021, + journal = {Proceedings of the IEEE}, + publisher = {IEEE}, + volume = 109, + number = 5, + pages = {911--934}, } +@misc{dean_jeff_numbers_nodate, + title = {Numbers {Everyone} {Should} {Know}}, + author = {Dean. Jeff}, + url = {https://brenocon.com/dean\_perf.html}, + urldate = {2023-11-07}, +} @article{dean2012large, title = {Large scale distributed deep networks}, author = {Dean, Jeffrey and Corrado, Greg and Monga, Rajat and Chen, Kai and Devin, Matthieu and Mao, Mark and Ranzato, Marc'aurelio and Senior, Andrew and Tucker, Paul and Yang, Ke and others}, year = 2012, journal = {Advances in neural information processing systems}, - volume = 25 + volume = 25, } - @misc{deci, title = {The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training}, - url = {https://deci.ai/quantization-and-quantization-aware-training/} + url = {https://deci.ai/quantization-and-quantization-aware-training/}, } - @misc{deepcompress, title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, author = {Han and Mao and Dally}, year = 2016, - doi = {10.48550/arXiv.1510.00149}, url = {https://arxiv.org/abs/1510.00149}, urldate = {2016-02-15}, - abstract = {Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9x to 13x; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3x to 4x layerwise speedup and 3x to 7x better energy efficiency.} } +@article{demler_ceva_2020, + title = {{CEVA} {SENSPRO} {FUSES} {AI} {AND} {VECTOR} {DSP}}, + author = {Demler, Mike}, + year = 2020, + language = {en}, +} @inproceedings{deng2009imagenet, title = {ImageNet: A large-scale hierarchical image database}, @@ -493,44 +563,38 @@ @inproceedings{deng2009imagenet booktitle = {2009 IEEE Conference on Computer Vision and Pattern Recognition(CVPR)}, volume = 00, pages = {248--255}, - doi = {10.1109/CVPR.2009.5206848}, url = {https://ieeexplore.ieee.org/abstract/document/5206848/}, added-at = {2018-09-20T15:22:39.000+0200}, biburl = {https://www.bibsonomy.org/bibtex/252793859f5bcbbd3f7f9e5d083160acf/analyst}, description = {ImageNet: A large-scale hierarchical image database}, interhash = {fbfae3e4fe1a81c477ba00efd0d4d977}, intrahash = {52793859f5bcbbd3f7f9e5d083160acf}, - keywords = {2009 computer-vision cvpr dataset ieee paper}, - timestamp = {2018-09-20T15:22:39.000+0200} + timestamp = {2018-09-20T15:22:39.000+0200}, } - @article{desai2016five, title = {Five Safes: designing data access for research}, author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard and others}, year = 2016, journal = {Economics Working Paper Series}, volume = 1601, - pages = 28 + pages = 28, } - @article{desai2020five, title = {Five Safes: designing data access for research; 2016}, author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard}, year = 2020, - journal = {URL https://www2. uwe. ac. uk/faculties/bbs/Documents/1601. pdf} + journal = {URL https://www2. uwe. ac. uk/faculties/bbs/Documents/1601. pdf}, } - @article{devlin2018bert, title = {Bert: Pre-training of deep bidirectional transformers for language understanding}, author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, year = 2018, - journal = {arXiv preprint arXiv:1810.04805} + journal = {arXiv preprint arXiv:1810.04805}, } - @article{dhar2021survey, title = {A survey of on-device machine learning: An algorithms and learning theory perspective}, author = {Dhar, Sauptik and Guo, Junyao and Liu, Jiayi and Tripathi, Samarth and Kurup, Unmesh and Shah, Mohak}, @@ -539,38 +603,50 @@ @article{dhar2021survey publisher = {ACM New York, NY, USA}, volume = 2, number = 3, - pages = {1--49} + pages = {1--49}, } - @misc{dong2022splitnets, title = {SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems}, author = {Xin Dong and Barbara De Salvo and Meng Li and Chiao Liu and Zhongnan Qu and H. T. Kung and Ziyun Li}, year = 2022, eprint = {2204.04705}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, +} + +@article{Dongarra2009-na, + title = {The evolution of high performance computing on system z}, + author = {Dongarra, Jack J}, + year = 2009, + journal = {IBM Journal of Research and Development}, + volume = 53, + pages = {3--4}, } +@article{duarte2022fastml, + title = {FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning}, + author = {Duarte, Javier and Tran, Nhan and Hawks, Ben and Herwig, Christian and Muhizi, Jules and Prakash, Shvetank and Reddi, Vijay Janapa}, + year = 2022, + journal = {arXiv preprint arXiv:2207.07958}, +} @article{duisterhof2019learning, title = {Learning to seek: Autonomous source seeking with deep reinforcement learning onboard a nano drone microcontroller}, author = {Duisterhof, Bardienus P and Krishnan, Srivatsan and Cruz, Jonathan J and Banbury, Colby R and Fu, William and Faust, Aleksandra and de Croon, Guido CHE and Reddi, Vijay Janapa}, year = 2019, - journal = {arXiv preprint arXiv:1909.11236} + journal = {arXiv preprint arXiv:1909.11236}, } - @inproceedings{duisterhof2021sniffy, title = {Sniffy bug: A fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments}, author = {Duisterhof, Bardienus P and Li, Shushuai and Burgu{\'e}s, Javier and Reddi, Vijay Janapa and de Croon, Guido CHE}, year = 2021, booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, pages = {9099--9106}, - organization = {IEEE} + organization = {IEEE}, } - @article{dwork2014algorithmic, title = {The algorithmic foundations of differential privacy}, author = {Dwork, Cynthia and Roth, Aaron and others}, @@ -579,9 +655,14 @@ @article{dwork2014algorithmic publisher = {Now Publishers, Inc.}, volume = 9, number = {3--4}, - pages = {211--407} + pages = {211--407}, } +@article{el-rayis_reconfigurable_nodate, + title = {Reconfigurable {Architectures} for the {Next} {Generation} of {Mobile} {Device} {Telecommunications} {Systems}}, + author = {El-Rayis, Ahmed Osman}, + language = {en}, +} @article{electronics12102287, title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, @@ -590,22 +671,18 @@ @article{electronics12102287 journal = {Electronics}, volume = 12, number = 10, - doi = {10.3390/electronics12102287}, - issn = {2079-9292}, url = {https://www.mdpi.com/2079-9292/12/10/2287}, - article-number = 2287 + article-number = 2287, } - @misc{energyproblem, title = {Computing's energy problem (and what we can do about it)}, author = {ISSCC}, year = 2014, url = {https://ieeexplore.ieee.org/document/6757323}, - urldate = {2014-03-06} + urldate = {2014-03-06}, } - @article{esteva2017dermatologist, title = {Dermatologist-level classification of skin cancer with deep neural networks}, author = {Esteva, Andre and Kuprel, Brett and Novoa, Roberto A and Ko, Justin and Swetter, Susan M and Blau, Helen M and Thrun, Sebastian}, @@ -614,36 +691,97 @@ @article{esteva2017dermatologist publisher = {Nature Publishing Group}, volume = 542, number = 7639, - pages = {115--118} + pages = {115--118}, } - @misc{fahim2021hls4ml, title = {hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices}, author = {Farah Fahim and Benjamin Hawks and Christian Herwig and James Hirschauer and Sergo Jindariani and Nhan Tran and Luca P. Carloni and Giuseppe Di Guglielmo and Philip Harris and Jeffrey Krupa and Dylan Rankin and Manuel Blanco Valentin and Josiah Hester and Yingyi Luo and John Mamish and Seda Orgrenci-Memik and Thea Aarrestad and Hamza Javed and Vladimir Loncar and Maurizio Pierini and Adrian Alan Pol and Sioni Summers and Javier Duarte and Scott Hauck and Shih-Chieh Hsu and Jennifer Ngadiuba and Mia Liu and Duc Hoang and Edward Kreinar and Zhenbin Wu}, year = 2021, eprint = {2103.05579}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, +} + +@article{farah2005neuroethics, + title = {Neuroethics: the practical and the philosophical}, + author = {Farah, Martha J}, + year = 2005, + journal = {Trends in cognitive sciences}, + publisher = {Elsevier}, + volume = 9, + number = 1, + pages = {34--40}, } +@inproceedings{fowers2018configurable, + title = {A configurable cloud-scale DNN processor for real-time AI}, + author = {Fowers, Jeremy and Ovtcharov, Kalin and Papamichael, Michael and Massengill, Todd and Liu, Ming and Lo, Daniel and Alkalay, Shlomi and Haselman, Michael and Adams, Logan and Ghandi, Mahdi and others}, + year = 2018, + booktitle = {2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)}, + pages = {1--14}, + organization = {IEEE}, +} @misc{frankle_lottery_2019, title = {The {Lottery} {Ticket} {Hypothesis}: {Finding} {Sparse}, {Trainable} {Neural} {Networks}}, - shorttitle = {The {Lottery} {Ticket} {Hypothesis}}, author = {Frankle, Jonathan and Carbin, Michael}, year = 2019, month = mar, publisher = {arXiv}, - doi = {10.48550/arXiv.1803.03635}, url = {http://arxiv.org/abs/1803.03635}, urldate = {2023-10-20}, note = {arXiv:1803.03635 [cs]}, - abstract = {Neural network pruning techniques can reduce the parameter counts of trained networks by over 90\%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20\% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/6STHYGW5/Frankle and Carbin - 2019 - The Lottery Ticket Hypothesis Finding Sparse, Tra.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/QGNSCTQB/1803.html:text/html} } +@article{furber2016large, + title = {Large-scale neuromorphic computing systems}, + author = {Furber, Steve}, + year = 2016, + journal = {Journal of neural engineering}, + publisher = {IOP Publishing}, + volume = 13, + number = 5, + pages = {051001}, +} + +@article{gaitathome, + title = {Monitoring gait at home with radio waves in Parkinson's disease: A marker of severity, progression, and medication response}, + author = {Yingcheng Liu and Guo Zhang and Christopher G. Tarolli and Rumen Hristov and Stella Jensen-Roberts and Emma M. Waddell and Taylor L. Myers and Meghan E. Pawlik and Julia M. Soto and Renee M. Wilson and Yuzhe Yang and Timothy Nordahl and Karlo J. Lizarraga and Jamie L. Adams and Ruth B. Schneider and Karl Kieburtz and Terry Ellis and E. Ray Dorsey and Dina Katabi}, + year = 2022, + journal = {Science Translational Medicine}, + volume = 14, + number = 663, + pages = {eadc9669}, + url = {https://www.science.org/doi/abs/10.1126/scitranslmed.adc9669}, + eprint = {https://www.science.org/doi/pdf/10.1126/scitranslmed.adc9669}, +} + +@article{gale2019state, + title = {The state of sparsity in deep neural networks}, + author = {Gale, Trevor and Elsen, Erich and Hooker, Sara}, + year = 2019, + journal = {arXiv preprint arXiv:1902.09574}, +} + +@inproceedings{gannot1994verilog, + title = {Verilog HDL based FPGA design}, + author = {Gannot, G. and Ligthart, M.}, + year = 1994, + booktitle = {International Verilog HDL Conference}, + pages = {86--92}, +} + +@article{gates2009flexible, + title = {Flexible electronics}, + author = {Gates, Byron D}, + year = 2009, + journal = {Science}, + publisher = {American Association for the Advancement of Science}, + volume = 323, + number = 5921, + pages = {1566--1567}, +} @article{gaviria2022dollar, title = {The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World}, @@ -651,21 +789,28 @@ @article{gaviria2022dollar year = 2022, journal = {Advances in Neural Information Processing Systems}, volume = 35, - pages = {12979--12990} + pages = {12979--12990}, } - @article{Gebru_Morgenstern_Vecchione_Vaughan_Wallach_III_Crawford_2021, title = {Datasheets for datasets}, - author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and III, Hal Daumé and Crawford, Kate}, + author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and III, Hal Daum\'{e} and Crawford, Kate}, year = 2021, journal = {Communications of the ACM}, volume = 64, number = 12, pages = {86–92}, - doi = {10.1145/3458723} } +@article{glucosemonitor, + title = {Non-invasive Monitoring of Three Glucose Ranges Based On ECG By Using DBSCAN-CNN}, + author = {Li, Jingzhen and Tobore, Igbe and Liu, Yuhang and Kandwal, Abhishek and Wang, Lei and Nie, Zedong}, + year = 2021, + journal = {IEEE Journal of Biomedical and Health Informatics}, + volume = 25, + number = 9, + pages = {3340--3350}, +} @article{goodfellow2020generative, title = {Generative adversarial networks}, @@ -675,104 +820,136 @@ @article{goodfellow2020generative publisher = {ACM New York, NY, USA}, volume = 63, number = 11, - pages = {139--144} + pages = {139--144}, } +@article{goodyear2017social, + title = {Social media, apps and wearable technologies: navigating ethical dilemmas and procedures}, + author = {Goodyear, Victoria A}, + year = 2017, + journal = {Qualitative research in sport, exercise and health}, + publisher = {Taylor \& Francis}, + volume = 9, + number = 3, + pages = {285--302}, +} @misc{Google, - title = {Information quality & content moderation}, + title = {Information quality \& content moderation}, author = {Google}, - url = {https://blog.google/documents/83/} + url = {https://blog.google/documents/83/}, } - @misc{gordon_morphnet_2018, title = {{MorphNet}: {Fast} \& {Simple} {Resource}-{Constrained} {Structure} {Learning} of {Deep} {Networks}}, - shorttitle = {{MorphNet}}, author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, year = 2018, month = apr, publisher = {arXiv}, - doi = {10.48550/arXiv.1711.06798}, url = {http://arxiv.org/abs/1711.06798}, urldate = {2023-10-20}, note = {arXiv:1711.06798 [cs, stat]}, - abstract = {We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.}, - keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/GV7N4CZC/Gordon et al. - 2018 - MorphNet Fast & Simple Resource-Constrained Struc.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/K6FUV82F/1711.html:text/html} } - @inproceedings{gordon2018morphnet, title = {Morphnet: Fast \& simple resource-constrained structure learning of deep networks}, author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, year = 2018, booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages = {1586--1595} + pages = {1586--1595}, } - @article{gruslys2016memory, title = {Memory-efficient backpropagation through time}, author = {Gruslys, Audrunas and Munos, R{\'e}mi and Danihelka, Ivo and Lanctot, Marc and Graves, Alex}, year = 2016, journal = {Advances in neural information processing systems}, - volume = 29 + volume = 29, +} + +@article{gwennap_certus-nx_nodate, + title = {Certus-{NX} {Innovates} {General}-{Purpose} {FPGAs}}, + author = {Gwennap, Linley}, + language = {en}, } +@article{haensch2018next, + title = {The next generation of deep learning hardware: Analog computing}, + author = {Haensch, Wilfried and Gokmen, Tayfun and Puri, Ruchir}, + year = 2018, + journal = {Proceedings of the IEEE}, + publisher = {IEEE}, + volume = 107, + number = 1, + pages = {108--122}, +} @article{han2015deep, title = {Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding}, author = {Han, Song and Mao, Huizi and Dally, William J}, year = 2015, - journal = {arXiv preprint arXiv:1510.00149} + journal = {arXiv preprint arXiv:1510.00149}, } - @misc{han2016deep, title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, author = {Song Han and Huizi Mao and William J. Dally}, year = 2016, eprint = {1510.00149}, archiveprefix = {arXiv}, - primaryclass = {cs.CV} + primaryclass = {cs.CV}, } +@article{hazan2021neuromorphic, + title = {Neuromorphic analog implementation of neural engineering framework-inspired spiking neuron for high-dimensional representation}, + author = {Hazan, Avi and Ezra Tsur, Elishai}, + year = 2021, + journal = {Frontiers in Neuroscience}, + publisher = {Frontiers Media SA}, + volume = 15, + pages = 627221, +} @misc{he_structured_2023, title = {Structured {Pruning} for {Deep} {Convolutional} {Neural} {Networks}: {A} survey}, - shorttitle = {Structured {Pruning} for {Deep} {Convolutional} {Neural} {Networks}}, author = {He, Yang and Xiao, Lingao}, year = 2023, month = mar, publisher = {arXiv}, - doi = {10.48550/arXiv.2303.00566}, url = {http://arxiv.org/abs/2303.00566}, urldate = {2023-10-20}, note = {arXiv:2303.00566 [cs]}, - abstract = {The remarkable performance of deep Convolutional neural networks (CNNs) is generally attributed to their deeper and wider architectures, which can come with significant computational costs. Pruning neural networks has thus gained interest since it effectively lowers storage and computational costs. In contrast to weight pruning, which results in unstructured models, structured pruning provides the benefit of realistic acceleration by producing models that are friendly to hardware implementation. The special requirements of structured pruning have led to the discovery of numerous new challenges and the development of innovative solutions. This article surveys the recent progress towards structured pruning of deep CNNs. We summarize and compare the state-of-the-art structured pruning techniques with respect to filter ranking methods, regularization methods, dynamic execution, neural architecture search, the lottery ticket hypothesis, and the applications of pruning. While discussing structured pruning algorithms, we briefly introduce the unstructured pruning counterpart to emphasize their differences. Furthermore, we provide insights into potential research opportunities in the field of structured pruning. A curated list of neural network pruning papers can be found at https://github.com/he-y/Awesome-Pruning}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/K5RGQQA9/He and Xiao - 2023 - Structured Pruning for Deep Convolutional Neural N.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/U7PVPU4C/2303.html:text/html} } - @inproceedings{he2016deep, title = {Deep residual learning for image recognition}, author = {He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, year = 2016, booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages = {770--778} + pages = {770--778}, } - @inproceedings{hendrycks2021natural, title = {Natural adversarial examples}, author = {Hendrycks, Dan and Zhao, Kevin and Basart, Steven and Steinhardt, Jacob and Song, Dawn}, year = 2021, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, - pages = {15262--15271} + pages = {15262--15271}, } +@article{Hennessy2019-je, + title = {A new golden age for computer architecture}, + author = {Hennessy, John L and Patterson, David A}, + year = 2019, + month = jan, + journal = {Commun. ACM}, + publisher = {Association for Computing Machinery (ACM)}, + volume = 62, + number = 2, + pages = {48--60}, + copyright = {http://www.acm.org/publications/policies/copyright\_policy\#Background}, + language = {en}, +} @misc{hinton_distilling_2015, title = {Distilling the {Knowledge} in a {Neural} {Network}}, @@ -780,62 +957,47 @@ @misc{hinton_distilling_2015 year = 2015, month = mar, publisher = {arXiv}, - doi = {10.48550/arXiv.1503.02531}, url = {http://arxiv.org/abs/1503.02531}, urldate = {2023-10-20}, note = {arXiv:1503.02531 [cs, stat]}, - abstract = {A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.}, - keywords = {Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/VREDW45A/Hinton et al. - 2015 - Distilling the Knowledge in a Neural Network.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/8MNJG4RP/1503.html:text/html} } - @misc{hinton2015distilling, title = {Distilling the Knowledge in a Neural Network}, author = {Geoffrey Hinton and Oriol Vinyals and Jeff Dean}, year = 2015, eprint = {1503.02531}, archiveprefix = {arXiv}, - primaryclass = {stat.ML} + primaryclass = {stat.ML}, } - @article{Holland_Hosny_Newman_Joseph_Chmielinski_2020, title = {The Dataset Nutrition label}, author = {Holland, Sarah and Hosny, Ahmed and Newman, Sarah and Joseph, Joshua and Chmielinski, Kasia}, year = 2020, journal = {Data Protection and Privacy}, - doi = {10.5040/9781509932771.ch-001} } - @inproceedings{hong2023publishing, title = {Publishing Efficient On-device Models Increases Adversarial Vulnerability}, author = {Hong, Sanghyun and Carlini, Nicholas and Kurakin, Alexey}, year = 2023, booktitle = {2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)}, pages = {271--290}, - organization = {IEEE} + organization = {IEEE}, } - @misc{howard_mobilenets_2017, title = {{MobileNets}: {Efficient} {Convolutional} {Neural} {Networks} for {Mobile} {Vision} {Applications}}, - shorttitle = {{MobileNets}}, author = {Howard, Andrew G. and Zhu, Menglong and Chen, Bo and Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias and Andreetto, Marco and Adam, Hartwig}, year = 2017, month = apr, publisher = {arXiv}, - doi = {10.48550/arXiv.1704.04861}, url = {http://arxiv.org/abs/1704.04861}, urldate = {2023-10-20}, note = {arXiv:1704.04861 [cs]}, - abstract = {We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/IJ9P9ID9/Howard et al. - 2017 - MobileNets Efficient Convolutional Neural Network.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/D9TS95GJ/1704.html:text/html} } - @misc{howard2017mobilenets, title = {MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications}, author = {Andrew G. Howard and Menglong Zhu and Bo Chen and Dmitry Kalenichenko and Weijun Wang and Tobias Weyand and Marco Andreetto and Hartwig Adam}, @@ -843,77 +1005,126 @@ @misc{howard2017mobilenets journal = {arXiv preprint arXiv:1704.04861}, eprint = {1704.04861}, archiveprefix = {arXiv}, - primaryclass = {cs.CV} + primaryclass = {cs.CV}, } +@article{huang2010pseudo, + title = {Pseudo-CMOS: A design style for low-cost and robust flexible electronics}, + author = {Huang, Tsung-Ching and Fukuda, Kenjiro and Lo, Chun-Ming and Yeh, Yung-Hui and Sekitani, Tsuyoshi and Someya, Takao and Cheng, Kwang-Ting}, + year = 2010, + journal = {IEEE Transactions on Electron Devices}, + publisher = {IEEE}, + volume = 58, + number = 1, + pages = {141--150}, +} @misc{iandola_squeezenet_2016, title = {{SqueezeNet}: {AlexNet}-level accuracy with 50x fewer parameters and {\textless}0.{5MB} model size}, - shorttitle = {{SqueezeNet}}, author = {Iandola, Forrest N. and Han, Song and Moskewicz, Matthew W. and Ashraf, Khalid and Dally, William J. and Keutzer, Kurt}, year = 2016, month = nov, publisher = {arXiv}, - doi = {10.48550/arXiv.1602.07360}, url = {http://arxiv.org/abs/1602.07360}, urldate = {2023-10-20}, note = {arXiv:1602.07360 [cs]}, - abstract = {Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/X3ZX9UTZ/Iandola et al. - 2016 - SqueezeNet AlexNet-level accuracy with 50x fewer .pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/DHI96QVT/1602.html:text/html} } - @article{iandola2016squeezenet, title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size}, author = {Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt}, year = 2016, - journal = {arXiv preprint arXiv:1602.07360} + journal = {arXiv preprint arXiv:1602.07360}, } +@article{Ignatov2018-kh, + title = {{AI} Benchmark: Running deep neural networks on Android smartphones}, + author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, + year = 2018, + publisher = {arXiv}, +} @inproceedings{ignatov2018ai, title = {Ai benchmark: Running deep neural networks on android smartphones}, author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, year = 2018, booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, - pages = {0--0} + pages = {0--0}, } - @inproceedings{ijcai2021p592, title = {Hardware-Aware Neural Architecture Search: Survey and Taxonomy}, author = {Benmeziane, Hadjer and El Maghraoui, Kaoutar and Ouarnoughi, Hamza and Niar, Smail and Wistuba, Martin and Wang, Naigang}, year = 2021, - month = 8, + month = aug, booktitle = {Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, pages = {4322--4329}, - doi = {10.24963/ijcai.2021/592}, url = {https://doi.org/10.24963/ijcai.2021/592}, note = {Survey Track}, - editor = {Zhi-Hua Zhou} + editor = {Zhi-Hua Zhou}, } +@inproceedings{imani2016resistive, + title = {Resistive configurable associative memory for approximate computing}, + author = {Imani, Mohsen and Rahimi, Abbas and Rosing, Tajana S}, + year = 2016, + booktitle = {2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, + pages = {1327--1332}, + organization = {IEEE}, +} @misc{intquantfordeepinf, title = {Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation)}, author = {Wu and Judd, Zhang and Isaev, Micikevicius}, year = 2020, - doi = {10.48550/arXiv.2004.09602}, url = {https://arxiv.org/abs/2004.09602}, - urldate = {2020-04-20} + urldate = {2020-04-20}, +} + +@inproceedings{jacob2018quantization, + title = {Quantization and training of neural networks for efficient integer-arithmetic-only inference}, + author = {Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry}, + year = 2018, + booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages = {2704--2713}, +} + +@article{janapa2023edge, + title = {Edge Impulse: An MLOps Platform for Tiny Machine Learning}, + author = {Janapa Reddi, Vijay and Elium, Alexander and Hymel, Shawn and Tischler, David and Situnayake, Daniel and Ward, Carl and Moreau, Louis and Plunkett, Jenny and Kelcey, Matthew and Baaijens, Mathijs and others}, + year = 2023, + journal = {Proceedings of Machine Learning and Systems}, + volume = 5, } +@misc{jia_dissecting_2018, + title = {Dissecting the {NVIDIA} {Volta} {GPU} {Architecture} via {Microbenchmarking}}, + author = {Jia, Zhe and Maggioni, Marco and Staiger, Benjamin and Scarpazza, Daniele P.}, + year = 2018, + month = apr, + publisher = {arXiv}, + url = {http://arxiv.org/abs/1804.06826}, + urldate = {2023-11-07}, + note = {arXiv:1804.06826 [cs]}, +} @inproceedings{jia2014caffe, title = {Caffe: Convolutional architecture for fast feature embedding}, author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, year = 2014, booktitle = {Proceedings of the 22nd ACM international conference on Multimedia}, - pages = {675--678} + pages = {675--678}, } +@article{jia2019beyond, + title = {Beyond Data and Model Parallelism for Deep Neural Networks.}, + author = {Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, + year = 2019, + journal = {Proceedings of Machine Learning and Systems}, + volume = 1, + pages = {1--13}, +} @article{jia2023life, title = {Life-threatening ventricular arrhythmia detection challenge in implantable cardioverter--defibrillators}, @@ -923,32 +1134,27 @@ @article{jia2023life publisher = {Nature Publishing Group UK London}, volume = 5, number = 5, - pages = {554--555} + pages = {554--555}, } - @misc{jiang2019accuracy, title = {Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search}, author = {Weiwen Jiang and Xinyi Zhang and Edwin H. -M. Sha and Lei Yang and Qingfeng Zhuge and Yiyu Shi and Jingtong Hu}, year = 2019, eprint = {1901.11211}, archiveprefix = {arXiv}, - primaryclass = {cs.DC} + primaryclass = {cs.DC}, } - @article{Johnson-Roberson_Barto_Mehta_Sridhar_Rosaen_Vasudevan_2017, title = {Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?}, author = {Johnson-Roberson, Matthew and Barto, Charles and Mehta, Rounak and Sridhar, Sharath Nittur and Rosaen, Karl and Vasudevan, Ram}, year = 2017, journal = {2017 IEEE International Conference on Robotics and Automation (ICRA)}, - doi = {10.1109/icra.2017.7989092} } - @article{jordan_machine_2015, title = {Machine learning: {Trends}, perspectives, and prospects}, - shorttitle = {Machine learning}, author = {Jordan, M. I. and Mitchell, T. M.}, year = 2015, month = jul, @@ -956,33 +1162,42 @@ @article{jordan_machine_2015 volume = 349, number = 6245, pages = {255--260}, - doi = {10.1126/science.aaa8415}, - issn = {0036-8075, 1095-9203}, url = {https://www.science.org/doi/10.1126/science.aaa8415}, urldate = {2023-10-25}, language = {en}, - file = {Jordan and Mitchell - 2015 - Machine learning Trends, perspectives, and prospe.pdf:/Users/alex/Zotero/storage/RGU3CQ4Q/Jordan and Mitchell - 2015 - Machine learning Trends, perspectives, and prospe.pdf:application/pdf} } - @inproceedings{jouppi2017datacenter, title = {In-datacenter performance analysis of a tensor processing unit}, author = {Jouppi, Norman P and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and others}, year = 2017, booktitle = {Proceedings of the 44th annual international symposium on computer architecture}, - pages = {1--12} + pages = {1--12}, } - -@article{kairouz2015secure, +@inproceedings{Jouppi2023TPUv4, + title = {TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings}, + author = {Jouppi, Norm and Kurian, George and Li, Sheng and Ma, Peter and Nagarajan, Rahul and Nai, Lifeng and Patil, Nishant and Subramanian, Suvinay and Swing, Andy and Towles, Brian and Young, Clifford and Zhou, Xiang and Zhou, Zongwei and Patterson, David A}, + year = 2023, + booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, + location = {Orlando, FL, USA}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + series = {ISCA '23}, + isbn = 9798400700958, + url = {https://doi.org/10.1145/3579371.3589350}, + articleno = 82, + numpages = 14, +} + +@article{kairouz2015secure, title = {Secure multi-party differential privacy}, author = {Kairouz, Peter and Oh, Sewoong and Viswanath, Pramod}, year = 2015, journal = {Advances in neural information processing systems}, - volume = 28 + volume = 28, } - @article{karargyris2023federated, title = {Federated benchmarking of medical artificial intelligence with MedPerf}, author = {Karargyris, Alexandros and Umeton, Renato and Sheller, Micah J and Aristizabal, Alejandro and George, Johnu and Wuest, Anna and Pati, Sarthak and Kassem, Hasan and Zenk, Maximilian and Baid, Ujjwal and others}, @@ -991,28 +1206,25 @@ @article{karargyris2023federated publisher = {Nature Publishing Group UK London}, volume = 5, number = 7, - pages = {799--810} + pages = {799--810}, } - @article{kiela2021dynabench, title = {Dynabench: Rethinking benchmarking in NLP}, author = {Kiela, Douwe and Bartolo, Max and Nie, Yixin and Kaushik, Divyansh and Geiger, Atticus and Wu, Zhengxuan and Vidgen, Bertie and Prasad, Grusha and Singh, Amanpreet and Ringshia, Pratik and others}, year = 2021, - journal = {arXiv preprint arXiv:2104.14337} + journal = {arXiv preprint arXiv:2104.14337}, } - @inproceedings{koh2021wilds, title = {Wilds: A benchmark of in-the-wild distribution shifts}, author = {Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and Xie, Sang Michael and Zhang, Marvin and Balsubramani, Akshay and Hu, Weihua and Yasunaga, Michihiro and Phillips, Richard Lanas and Gao, Irena and others}, year = 2021, booktitle = {International Conference on Machine Learning}, pages = {5637--5664}, - organization = {PMLR} + organization = {PMLR}, } - @article{kolda_tensor_2009, title = {Tensor {Decompositions} and {Applications}}, author = {Kolda, Tamara G. and Bader, Brett W.}, @@ -1022,16 +1234,11 @@ @article{kolda_tensor_2009 volume = 51, number = 3, pages = {455--500}, - doi = {10.1137/07070111X}, - issn = {0036-1445, 1095-7200}, url = {http://epubs.siam.org/doi/10.1137/07070111X}, urldate = {2023-10-20}, - abstract = {This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ≥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors.}, language = {en}, - file = {Kolda and Bader - 2009 - Tensor Decompositions and Applications.pdf:/Users/jeffreyma/Zotero/storage/Q7ZG2267/Kolda and Bader - 2009 - Tensor Decompositions and Applications.pdf:application/pdf} } - @article{koshti2011cumulative, title = {Cumulative sum control chart}, author = {Koshti, VV}, @@ -1039,28 +1246,25 @@ @article{koshti2011cumulative journal = {International journal of physics and mathematical sciences}, volume = 1, number = 1, - pages = {28--32} + pages = {28--32}, } - @misc{krishna2023raman, title = {RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge}, - author = {Adithya Krishna and Srikanth Rohit Nudurupati and Chandana D G and Pritesh Dwivedi and André van Schaik and Mahesh Mehendale and Chetan Singh Thakur}, + author = {Adithya Krishna and Srikanth Rohit Nudurupati and Chandana D G and Pritesh Dwivedi and Andr\'{e} van Schaik and Mahesh Mehendale and Chetan Singh Thakur}, year = 2023, eprint = {2306.06493}, archiveprefix = {arXiv}, - primaryclass = {cs.NE} + primaryclass = {cs.NE}, } - @article{krishnamoorthi2018quantizing, title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, author = {Krishnamoorthi, Raghuraman}, year = 2018, - journal = {arXiv preprint arXiv:1806.08342} + journal = {arXiv preprint arXiv:1806.08342}, } - @article{Krishnan_Rajpurkar_Topol_2022, title = {Self-supervised learning in medicine and Healthcare}, author = {Krishnan, Rayan and Rajpurkar, Pranav and Topol, Eric J.}, @@ -1069,19 +1273,24 @@ @article{Krishnan_Rajpurkar_Topol_2022 volume = 6, number = 12, pages = {1346–1352}, - doi = {10.1038/s41551-022-00914-1} } +@inproceedings{krishnan2023archgym, + title = {ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design}, + author = {Krishnan, Srivatsan and Yazdanbakhsh, Amir and Prakash, Shvetank and Jabbour, Jason and Uchendu, Ikechukwu and Ghosh, Susobhan and Boroujerdian, Behzad and Richins, Daniel and Tripathy, Devashree and Faust, Aleksandra and Janapa Reddi, Vijay}, + year = 2023, + booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, + pages = {1--16}, +} @article{krizhevsky2012imagenet, title = {Imagenet classification with deep convolutional neural networks}, author = {Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E}, year = 2012, journal = {Advances in neural information processing systems}, - volume = 25 + volume = 25, } - @inproceedings{kung1979systolic, title = {Systolic arrays (for VLSI)}, author = {Kung, Hsiang Tsung and Leiserson, Charles E}, @@ -1089,20 +1298,18 @@ @inproceedings{kung1979systolic booktitle = {Sparse Matrix Proceedings 1978}, volume = 1, pages = {256--282}, - organization = {Society for industrial and applied mathematics Philadelphia, PA, USA} + organization = {Society for industrial and applied mathematics Philadelphia, PA, USA}, } - @misc{kung2018packing, title = {Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization}, author = {H. T. Kung and Bradley McDanel and Sai Qian Zhang}, year = 2018, eprint = {1811.04770}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, } - @incollection{kurkova_survey_2018, title = {A {Survey} on {Deep} {Transfer} {Learning}}, author = {Tan, Chuanqi and Sun, Fuchun and Kong, Tao and Zhang, Wenchang and Yang, Chao and Liu, Chunfang}, @@ -1112,30 +1319,25 @@ @incollection{kurkova_survey_2018 address = {Cham}, volume = 11141, pages = {270--279}, - doi = {10.1007/978-3-030-01424-7_27}, isbn = {978-3-030-01423-0 978-3-030-01424-7}, - url = {http://link.springer.com/10.1007/978-3-030-01424-7_27}, + url = {http://link.springer.com/10.1007/978-3-030-01424-7\_27}, urldate = {2023-10-26}, note = {Series Title: Lecture Notes in Computer Science}, language = {en}, - editor = {Kůrková, Věra and Manolopoulos, Yannis and Hammer, Barbara and Iliadis, Lazaros and Maglogiannis, Ilias}, - file = {Tan et al. - 2018 - A Survey on Deep Transfer Learning.pdf:/Users/alex/Zotero/storage/5NZ36SGB/Tan et al. - 2018 - A Survey on Deep Transfer Learning.pdf:application/pdf} + editor = {K\r{u}rkov\'{a}, V\v{e}ra and Manolopoulos, Yannis and Hammer, Barbara and Iliadis, Lazaros and Maglogiannis, Ilias}, } - @misc{kuzmin2022fp8, title = {FP8 Quantization: The Power of the Exponent}, author = {Andrey Kuzmin and Mart Van Baalen and Yuwei Ren and Markus Nagel and Jorn Peters and Tijmen Blankevoort}, year = 2022, eprint = {2208.09225}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, } - @misc{kwon_tinytrain_2023, title = {{TinyTrain}: {Deep} {Neural} {Network} {Training} at the {Extreme} {Edge}}, - shorttitle = {{TinyTrain}}, author = {Kwon, Young D. and Li, Rui and Venieris, Stylianos I. and Chauhan, Jagmohan and Lane, Nicholas D. and Mascolo, Cecilia}, year = 2023, month = jul, @@ -1144,43 +1346,45 @@ @misc{kwon_tinytrain_2023 urldate = {2023-10-26}, note = {arXiv:2307.09988 [cs]}, language = {en}, - keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning}, - file = {Kwon et al. - 2023 - TinyTrain Deep Neural Network Training at the Ext.pdf:/Users/alex/Zotero/storage/L2ST472U/Kwon et al. - 2023 - TinyTrain Deep Neural Network Training at the Ext.pdf:application/pdf} } +@article{kwon2022flexible, + title = {Flexible sensors and machine learning for heart monitoring}, + author = {Kwon, Sun Hwa and Dong, Lin}, + year = 2022, + journal = {Nano Energy}, + publisher = {Elsevier}, + pages = 107632, +} @article{kwon2023tinytrain, title = {TinyTrain: Deep Neural Network Training at the Extreme Edge}, author = {Kwon, Young D and Li, Rui and Venieris, Stylianos I and Chauhan, Jagmohan and Lane, Nicholas D and Mascolo, Cecilia}, year = 2023, - journal = {arXiv preprint arXiv:2307.09988} + journal = {arXiv preprint arXiv:2307.09988}, } - @misc{Labelbox, journal = {Labelbox}, - url = {https://labelbox.com/} + url = {https://labelbox.com/}, } - @article{lai2018cmsis, title = {Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus}, author = {Lai, Liangzhen and Suda, Naveen and Chandra, Vikas}, year = 2018, - journal = {arXiv preprint arXiv:1801.06601} + journal = {arXiv preprint arXiv:1801.06601}, } - @misc{lai2018cmsisnn, title = {CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs}, author = {Liangzhen Lai and Naveen Suda and Vikas Chandra}, year = 2018, eprint = {1801.06601}, archiveprefix = {arXiv}, - primaryclass = {cs.NE} + primaryclass = {cs.NE}, } - @inproceedings{lecun_optimal_1989, title = {Optimal {Brain} {Damage}}, author = {LeCun, Yann and Denker, John and Solla, Sara}, @@ -1190,46 +1394,39 @@ @inproceedings{lecun_optimal_1989 volume = 2, url = {https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html}, urldate = {2023-10-20}, - abstract = {We have used information-theoretic ideas to derive a class of prac(cid:173) tical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, sev(cid:173) eral improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative informa(cid:173) tion to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.}, - file = {Full Text PDF:/Users/jeffreyma/Zotero/storage/BYHQQSST/LeCun et al. - 1989 - Optimal Brain Damage.pdf:application/pdf} } - @article{lecun1989optimal, title = {Optimal brain damage}, author = {LeCun, Yann and Denker, John and Solla, Sara}, year = 1989, journal = {Advances in neural information processing systems}, - volume = 2 + volume = 2, } - @article{li2014communication, title = {Communication efficient distributed machine learning with the parameter server}, author = {Li, Mu and Andersen, David G and Smola, Alexander J and Yu, Kai}, year = 2014, journal = {Advances in Neural Information Processing Systems}, - volume = 27 + volume = 27, } - @article{li2016lightrnn, title = {LightRNN: Memory and computation-efficient recurrent neural networks}, author = {Li, Xiang and Qin, Tao and Yang, Jian and Liu, Tie-Yan}, year = 2016, journal = {Advances in Neural Information Processing Systems}, - volume = 29 + volume = 29, } - @article{li2017deep, title = {Deep reinforcement learning: An overview}, author = {Li, Yuxi}, year = 2017, - journal = {arXiv preprint arXiv:1701.07274} + journal = {arXiv preprint arXiv:1701.07274}, } - @article{li2017learning, title = {Learning without forgetting}, author = {Li, Zhizhong and Hoiem, Derek}, @@ -1238,10 +1435,9 @@ @article{li2017learning publisher = {IEEE}, volume = 40, number = 12, - pages = {2935--2947} + pages = {2935--2947}, } - @article{li2019edge, title = {Edge AI: On-demand accelerating deep neural network inference via edge computing}, author = {Li, En and Zeng, Liekang and Zhou, Zhi and Chen, Xu}, @@ -1250,26 +1446,28 @@ @article{li2019edge publisher = {IEEE}, volume = 19, number = 1, - pages = {447--457} + pages = {447--457}, } +@inproceedings{Li2020Additive, + title = {Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks}, + author = {Yuhang Li and Xin Dong and Wei Wang}, + year = 2020, + booktitle = {International Conference on Learning Representations}, + url = {https://openreview.net/forum?id=BkgXT24tDS}, +} @misc{liao_can_2023, title = {Can {Unstructured} {Pruning} {Reduce} the {Depth} in {Deep} {Neural} {Networks}?}, - author = {Liao, Zhu and Quétu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, + author = {Liao, Zhu and Qu\'{e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, year = 2023, month = aug, publisher = {arXiv}, - doi = {10.48550/arXiv.2308.06619}, url = {http://arxiv.org/abs/2308.06619}, urldate = {2023-10-20}, note = {arXiv:2308.06619 [cs]}, - abstract = {Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance. The key focus of EGP is to prioritize pruning connections in layers with low entropy, ultimately leading to their complete removal. Through extensive experiments conducted on popular models like ResNet-18 and Swin-T, our findings demonstrate that EGP effectively compresses deep neural networks while maintaining competitive performance levels. Our results not only shed light on the underlying mechanism behind the advantages of unstructured pruning, but also pave the way for further investigations into the intricate relationship between entropy, pruning techniques, and deep learning performance. The EGP algorithm and its insights hold great promise for advancing the field of network compression and optimization. The source code for EGP is released open-source.}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/V6P3XB5H/Liao et al. - 2023 - Can Unstructured Pruning Reduce the Depth in Deep .pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/WSQ4ZUH4/2308.html:text/html} } - @misc{lin_-device_2022, title = {On-{Device} {Training} {Under} {256KB} {Memory}}, author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, @@ -1280,12 +1478,8 @@ @misc{lin_-device_2022 urldate = {2023-10-26}, note = {arXiv:2206.15472 [cs]}, language = {en}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - annote = {Comment: NeurIPS 2022}, - file = {Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:/Users/alex/Zotero/storage/GMF6SWGT/Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:application/pdf} } - @misc{lin_-device_2022-1, title = {On-{Device} {Training} {Under} {256KB} {Memory}}, author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, @@ -1296,41 +1490,29 @@ @misc{lin_-device_2022-1 urldate = {2023-10-25}, note = {arXiv:2206.15472 [cs]}, language = {en}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - annote = {Comment: NeurIPS 2022}, - file = {Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:/Users/alex/Zotero/storage/DNIY32R2/Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:application/pdf} } - @misc{lin_mcunet_2020, title = {{MCUNet}: {Tiny} {Deep} {Learning} on {IoT} {Devices}}, - shorttitle = {{MCUNet}}, author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Cohn, John and Gan, Chuang and Han, Song}, year = 2020, month = nov, publisher = {arXiv}, - doi = {10.48550/arXiv.2007.10319}, url = {http://arxiv.org/abs/2007.10319}, urldate = {2023-10-20}, note = {arXiv:2007.10319 [cs]}, - abstract = {Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.device, latency, energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro and CMSIS-NN. MCUNet is the first to achieves {\textgreater}70\% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x less Flash compared to quantized MobileNetV2 and ResNet-18. On visual\&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived. Code and models can be found here: https://tinyml.mit.edu.}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/IX2JN4P9/Lin et al. - 2020 - MCUNet Tiny Deep Learning on IoT Devices.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/BAKHZ46Y/2007.html:text/html}, language = {en}, - annote = {Comment: NeurIPS 2020 (spotlight)} } - @inproceedings{lin2014microsoft, title = {Microsoft coco: Common objects in context}, author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence}, year = 2014, booktitle = {Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13}, pages = {740--755}, - organization = {Springer} + organization = {Springer}, } - @article{lin2020mcunet, title = {Mcunet: Tiny deep learning on iot devices}, author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song and others}, @@ -1340,19 +1522,56 @@ @article{lin2020mcunet pages = {11711--11722}, eprint = {2007.10319}, archiveprefix = {arXiv}, - primaryclass = {cs.CV} + primaryclass = {cs.CV}, } - @article{lin2022device, title = {On-device training under 256kb memory}, author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, year = 2022, journal = {Advances in Neural Information Processing Systems}, volume = 35, - pages = {22941--22954} + pages = {22941--22954}, +} + +@inproceedings{lin2022ondevice, + title = {On-Device Training Under 256KB Memory}, + author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, + year = 2022, + booktitle = {ArXiv}, +} + +@article{lin2023awq, + title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, + author = {Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song}, + year = 2023, + journal = {arXiv}, +} + +@article{lindholm_nvidia_2008, + title = {{NVIDIA} {Tesla}: {A} {Unified} {Graphics} and {Computing} {Architecture}}, + author = {Lindholm, Erik and Nickolls, John and Oberman, Stuart and Montrym, John}, + year = 2008, + month = mar, + journal = {IEEE Micro}, + volume = 28, + number = 2, + pages = {39--55}, + url = {https://ieeexplore.ieee.org/document/4523358}, + urldate = {2023-11-07}, + note = {Conference Name: IEEE Micro}, } +@article{loh20083d, + title = {3D-stacked memory architectures for multi-core processors}, + author = {Loh, Gabriel H}, + year = 2008, + journal = {ACM SIGARCH computer architecture news}, + publisher = {ACM New York, NY, USA}, + volume = 36, + number = 3, + pages = {453--464}, +} @misc{lu_notes_2016, title = {Notes on {Low}-rank {Matrix} {Factorization}}, @@ -1360,24 +1579,48 @@ @misc{lu_notes_2016 year = 2016, month = may, publisher = {arXiv}, - doi = {10.48550/arXiv.1507.00333}, url = {http://arxiv.org/abs/1507.00333}, urldate = {2023-10-20}, note = {arXiv:1507.00333 [cs]}, - abstract = {Low-rank matrix factorization (MF) is an important technique in data science. The key idea of MF is that there exists latent structures in the data, by uncovering which we could obtain a compressed representation of the data. By factorizing an original matrix to low-rank matrices, MF provides a unified method for dimension reduction, clustering, and matrix completion. In this article we review several important variants of MF, including: Basic MF, Non-negative MF, Orthogonal non-negative MF. As can be told from their names, non-negative MF and orthogonal non-negative MF are variants of basic MF with non-negativity and/or orthogonality constraints. Such constraints are useful in specific senarios. In the first part of this article, we introduce, for each of these models, the application scenarios, the distinctive properties, and the optimizing method. By properly adapting MF, we can go beyond the problem of clustering and matrix completion. In the second part of this article, we will extend MF to sparse matrix compeletion, enhance matrix compeletion using various regularization methods, and make use of MF for (semi-)supervised learning by introducing latent space reinforcement and transformation. We will see that MF is not only a useful model but also as a flexible framework that is applicable for various prediction problems.}, - keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Mathematics - Numerical Analysis}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/4QED5ZU9/Lu and Yang - 2016 - Notes on Low-rank Matrix Factorization.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/XIBZBDJQ/1507.html:text/html} } +@inproceedings{luebke2008cuda, + title = {CUDA: Scalable parallel programming for high-performance scientific computing}, + author = {Luebke, David}, + year = 2008, + booktitle = {2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro}, + pages = {836--838}, +} @article{lundberg2017unified, title = {A unified approach to interpreting model predictions}, author = {Lundberg, Scott M and Lee, Su-In}, year = 2017, journal = {Advances in neural information processing systems}, - volume = 30 + volume = 30, +} + +@article{maass1997networks, + title = {Networks of spiking neurons: the third generation of neural network models}, + author = {Maass, Wolfgang}, + year = 1997, + journal = {Neural networks}, + publisher = {Elsevier}, + volume = 10, + number = 9, + pages = {1659--1671}, } +@article{markovic2020, + title = {Physics for neuromorphic computing}, + author = {Markovi{\'c}, Danijela and Mizrahi, Alice and Querlioz, Damien and Grollier, Julie}, + year = 2020, + journal = {Nature Reviews Physics}, + publisher = {Nature Publishing Group UK London}, + volume = 2, + number = 9, + pages = {499--510}, +} @article{mattson2020mlperf, title = {Mlperf training benchmark}, @@ -1385,29 +1628,58 @@ @article{mattson2020mlperf year = 2020, journal = {Proceedings of Machine Learning and Systems}, volume = 2, - pages = {336--349} + pages = {336--349}, } - @inproceedings{mcmahan2017communication, title = {Communication-efficient learning of deep networks from decentralized data}, author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, year = 2017, booktitle = {Artificial intelligence and statistics}, pages = {1273--1282}, - organization = {PMLR} + organization = {PMLR}, } - @inproceedings{mcmahan2023communicationefficient, title = {Communication-efficient learning of deep networks from decentralized data}, author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, year = 2017, booktitle = {Artificial intelligence and statistics}, pages = {1273--1282}, - organization = {PMLR} + organization = {PMLR}, +} + +@article{miller2000optical, + title = {Optical interconnects to silicon}, + author = {Miller, David AB}, + year = 2000, + journal = {IEEE Journal of Selected Topics in Quantum Electronics}, + publisher = {IEEE}, + volume = 6, + number = 6, + pages = {1312--1317}, +} + +@article{mittal2021survey, + title = {A survey of SRAM-based in-memory computing techniques and applications}, + author = {Mittal, Sparsh and Verma, Gaurav and Kaushik, Brajesh and Khanday, Farooq A}, + year = 2021, + journal = {Journal of Systems Architecture}, + publisher = {Elsevier}, + volume = 119, + pages = 102276, } +@article{modha2023neural, + title = {Neural inference at the frontier of energy, space, and time}, + author = {Modha, Dharmendra S and Akopyan, Filipp and Andreopoulos, Alexander and Appuswamy, Rathinakumar and Arthur, John V and Cassidy, Andrew S and Datta, Pallab and DeBole, Michael V and Esser, Steven K and Otero, Carlos Ortega and others}, + year = 2023, + journal = {Science}, + publisher = {American Association for the Advancement of Science}, + volume = 382, + number = 6668, + pages = {329--335}, +} @article{moshawrab2023reviewing, title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, @@ -1417,219 +1689,348 @@ @article{moshawrab2023reviewing publisher = {MDPI}, volume = 12, number = 10, - pages = 2287 + pages = 2287, +} + +@inproceedings{munshi2009opencl, + title = {The OpenCL specification}, + author = {Munshi, Aaftab}, + year = 2009, + booktitle = {2009 IEEE Hot Chips 21 Symposium (HCS)}, + pages = {1--314}, } +@article{musk2019integrated, + title = {An integrated brain-machine interface platform with thousands of channels}, + author = {Musk, Elon and others}, + year = 2019, + journal = {Journal of medical Internet research}, + publisher = {JMIR Publications Inc., Toronto, Canada}, + volume = 21, + number = 10, + pages = {e16194}, +} @inproceedings{nguyen2023re, title = {Re-thinking Model Inversion Attacks Against Deep Neural Networks}, author = {Nguyen, Ngoc-Bao and Chandrasegaran, Keshigeyan and Abdollahzadeh, Milad and Cheung, Ngai-Man}, year = 2023, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, - pages = {16384--16393} + pages = {16384--16393}, } +@misc{noauthor_amd_nodate, + title = {{AMD} {Radeon} {RX} 7000 {Series} {Desktop} {Graphics} {Cards}}, + url = {https://www.amd.com/en/graphics/radeon-rx-graphics}, + urldate = {2023-11-07}, +} @misc{noauthor_deep_nodate, title = {Deep {Learning} {Model} {Compression} (ii) {\textbar} by {Ivy} {Gu} {\textbar} {Medium}}, author = {Ivy Gu}, - year = {2023}, + year = 2023, url = {https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453}, - urldate = {2023-10-20} + urldate = {2023-10-20}, +} + +@misc{noauthor_evolution_2023, + title = {The {Evolution} of {Audio} {DSPs}}, + year = 2023, + month = oct, + journal = {audioXpress}, + url = {https://audioxpress.com/article/the-evolution-of-audio-dsps}, + urldate = {2023-11-07}, + language = {en}, +} + +@misc{noauthor_fpga_nodate, + title = {{FPGA} {Architecture} {Overview}}, + url = {https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/fpga-architecture-overview.html}, + urldate = {2023-11-07}, +} + +@misc{noauthor_google_2023, + title = {Google {Tensor} {G3}: {The} new chip that gives your {Pixel} an {AI} upgrade}, + year = 2023, + month = oct, + journal = {Google}, + url = {https://blog.google/products/pixel/google-tensor-g3-pixel-8/}, + urldate = {2023-11-07}, + language = {en-us}, +} + +@misc{noauthor_hexagon_nodate, + title = {Hexagon {DSP} {SDK} {Processor}}, + journal = {Qualcomm Developer Network}, + url = {https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor}, + urldate = {2023-11-07}, + language = {en}, +} + +@misc{noauthor_integrated_2023, + title = {Integrated circuit}, + year = 2023, + month = nov, + journal = {Wikipedia}, + url = {https://en.wikipedia.org/w/index.php?title=Integrated\_circuit\&oldid=1183537457}, + urldate = {2023-11-07}, + copyright = {Creative Commons Attribution-ShareAlike License}, + note = {Page Version ID: 1183537457}, + language = {en}, } +@misc{noauthor_intel_nodate, + title = {Intel\textregistered{} {Arc}\texttrademark{} {Graphics} {Overview}}, + journal = {Intel}, + url = {https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html}, + urldate = {2023-11-07}, + language = {en}, +} @misc{noauthor_introduction_nodate, title = {An {Introduction} to {Separable} {Convolutions} - {Analytics} {Vidhya}}, author = {Hegde, Sumant}, - year = {2023}, + year = 2023, url = {https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/}, - urldate = {2023-10-20} + urldate = {2023-10-20}, } - @misc{noauthor_knowledge_nodate, title = {Knowledge {Distillation} - {Neural} {Network} {Distiller}}, author = {IntelLabs}, - year = {2023}, - url = {https://intellabs.github.io/distiller/knowledge_distillation.html}, - urldate = {2023-10-20} + year = 2023, + url = {https://intellabs.github.io/distiller/knowledge\_distillation.html}, + urldate = {2023-10-20}, +} + +@misc{noauthor_project_nodate, + title = {Project {Catapult} - {Microsoft} {Research}}, + url = {https://www.microsoft.com/en-us/research/project/project-catapult/}, + urldate = {2023-11-07}, +} + +@misc{noauthor_what_nodate, + title = {What is an {FPGA}? {Field} {Programmable} {Gate} {Array}}, + journal = {AMD}, + url = {https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html}, + urldate = {2023-11-07}, + language = {en}, } +@misc{noauthor_who_nodate, + title = {Who {Invented} the {Microprocessor}? - {CHM}}, + url = {https://computerhistory.org/blog/who-invented-the-microprocessor/}, + urldate = {2023-11-07}, +} + +@inproceedings{Norman2017TPUv1, + title = {In-Datacenter Performance Analysis of a Tensor Processing Unit}, + author = {Jouppi, Norman P. and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and Boyle, Rick and Cantin, Pierre-luc and Chao, Clifford and Clark, Chris and Coriell, Jeremy and Daley, Mike and Dau, Matt and Dean, Jeffrey and Gelb, Ben and Ghaemmaghami, Tara Vazir and Gottipati, Rajendra and Gulland, William and Hagmann, Robert and Ho, C. Richard and Hogberg, Doug and Hu, John and Hundt, Robert and Hurt, Dan and Ibarz, Julian and Jaffey, Aaron and Jaworski, Alek and Kaplan, Alexander and Khaitan, Harshit and Killebrew, Daniel and Koch, Andy and Kumar, Naveen and Lacy, Steve and Laudon, James and Law, James and Le, Diemthu and Leary, Chris and Liu, Zhuyuan and Lucke, Kyle and Lundin, Alan and MacKean, Gordon and Maggiore, Adriana and Mahony, Maire and Miller, Kieran and Nagarajan, Rahul and Narayanaswami, Ravi and Ni, Ray and Nix, Kathy and Norrie, Thomas and Omernick, Mark and Penukonda, Narayana and Phelps, Andy and Ross, Jonathan and Ross, Matt and Salek, Amir and Samadiani, Emad and Severn, Chris and Sizikov, Gregory and Snelham, Matthew and Souter, Jed and Steinberg, Dan and Swing, Andy and Tan, Mercedes and Thorson, Gregory and Tian, Bo and Toma, Horia and Tuttle, Erick and Vasudevan, Vijay and Walter, Richard and Wang, Walter and Wilcox, Eric and Yoon, Doe Hyun}, + year = 2017, + booktitle = {Proceedings of the 44th Annual International Symposium on Computer Architecture}, + location = {Toronto, ON, Canada}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + series = {ISCA '17}, + pages = {1--12}, + isbn = 9781450348928, + url = {https://doi.org/10.1145/3079856.3080246}, + numpages = 12, +} + +@article{Norrie2021TPUv2_3, + title = {The Design Process for Google's Training Chips: TPUv2 and TPUv3}, + author = {Norrie, Thomas and Patil, Nishant and Yoon, Doe Hyun and Kurian, George and Li, Sheng and Laudon, James and Young, Cliff and Jouppi, Norman and Patterson, David}, + year = 2021, + journal = {IEEE Micro}, + volume = 41, + number = 2, + pages = {56--63}, +} @article{Northcutt_Athalye_Mueller_2021, title = {Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks}, author = {Northcutt, Curtis G and Athalye, Anish and Mueller, Jonas}, - year = {2021}, - month = {Mar}, + year = 2021, + month = mar, journal = {arXiv}, - doi = {  https://doi.org/10.48550/arXiv.2103.14749 arXiv-issued DOI via DataCite} } - @inproceedings{ooko2021tinyml, title = {TinyML in Africa: Opportunities and challenges}, author = {Ooko, Samson Otieno and Ogore, Marvin Muyonga and Nsenga, Jimmy and Zennaro, Marco}, - year = {2021}, + year = 2021, booktitle = {2021 IEEE Globecom Workshops (GC Wkshps)}, pages = {1--6}, - organization = {IEEE} + organization = {IEEE}, } - @misc{ou_low_2023, title = {Low {Rank} {Optimization} for {Efficient} {Deep} {Learning}: {Making} {A} {Balance} between {Compact} {Architecture} and {Fast} {Training}}, - shorttitle = {Low {Rank} {Optimization} for {Efficient} {Deep} {Learning}}, author = {Ou, Xinwei and Chen, Zhangxin and Zhu, Ce and Liu, Yipeng}, - year = {2023}, - month = {Mar}, + year = 2023, + month = mar, publisher = {arXiv}, url = {http://arxiv.org/abs/2303.13635}, urldate = {2023-10-20}, note = {arXiv:2303.13635 [cs]}, - abstract = {Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.}, - keywords = {Computer Science - Machine Learning}, - file = {arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/SPSZ2HR9/2303.html:text/html;Full Text PDF:/Users/jeffreyma/Zotero/storage/6TUEBTEX/Ou et al. - 2023 - Low Rank Optimization for Efficient Deep Learning.pdf:application/pdf} } - @article{pan_survey_2010, title = {A {Survey} on {Transfer} {Learning}}, author = {Pan, Sinno Jialin and Yang, Qiang}, - year = {2010}, - month = {Oct}, + year = 2010, + month = oct, journal = {IEEE Transactions on Knowledge and Data Engineering}, - volume = {22}, - number = {10}, + volume = 22, + number = 10, pages = {1345--1359}, - doi = {10.1109/TKDE.2009.191}, - issn = {1041-4347}, url = {http://ieeexplore.ieee.org/document/5288526/}, urldate = {2023-10-25}, language = {en}, - file = {Pan and Yang - 2010 - A Survey on Transfer Learning.pdf:/Users/alex/Zotero/storage/T3H8E5K8/Pan and Yang - 2010 - A Survey on Transfer Learning.pdf:application/pdf} } - @article{pan2009survey, title = {A survey on transfer learning}, author = {Pan, Sinno Jialin and Yang, Qiang}, - year = {2009}, + year = 2009, journal = {IEEE Transactions on knowledge and data engineering}, publisher = {IEEE}, - volume = {22}, - number = {10}, - pages = {1345--1359} + volume = 22, + number = 10, + pages = {1345--1359}, } - @article{parisi_continual_2019, title = {Continual lifelong learning with neural networks: {A} review}, - shorttitle = {Continual lifelong learning with neural networks}, author = {Parisi, German I. and Kemker, Ronald and Part, Jose L. and Kanan, Christopher and Wermter, Stefan}, - year = {2019}, - month = {May}, + year = 2019, + month = may, journal = {Neural Networks}, - volume = {113}, + volume = 113, pages = {54--71}, - doi = {10.1016/j.neunet.2019.01.012}, - issn = {08936080}, url = {https://linkinghub.elsevier.com/retrieve/pii/S0893608019300231}, urldate = {2023-10-26}, language = {en}, - file = {Parisi et al. - 2019 - Continual lifelong learning with neural networks .pdf:/Users/alex/Zotero/storage/TCGHD5TW/Parisi et al. - 2019 - Continual lifelong learning with neural networks .pdf:application/pdf} } - @article{paszke2019pytorch, title = {Pytorch: An imperative style, high-performance deep learning library}, author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others}, - year = {2019}, + year = 2019, journal = {Advances in neural information processing systems}, - volume = {32} + volume = 32, } +@book{patterson2016computer, + title = {Computer organization and design ARM edition: the hardware software interface}, + author = {Patterson, David A and Hennessy, John L}, + year = 2016, + publisher = {Morgan kaufmann}, +} @misc{Perrigo_2023, - title = {OpenAI used Kenyan workers on less than $2 per hour: Exclusive}, + title = {OpenAI used Kenyan workers on less than \$2 per hour: Exclusive}, author = {Perrigo, Billy}, - year = {2023}, - month = {Jan}, + year = 2023, + month = jan, journal = {Time}, publisher = {Time}, - url = {https://time.com/6247678/openai-chatgpt-kenya-workers/} + url = {https://time.com/6247678/openai-chatgpt-kenya-workers/}, } +@article{plasma, + title = {Noninvasive assessment of dofetilide plasma concentration using a deep learning (neural network) analysis of the surface electrocardiogram: A proof of concept study}, + author = {Attia, Zachi and Sugrue, Alan and Asirvatham, Samuel and Ackerman, Michael and Kapa, Suraj and Friedman, Paul and Noseworthy, Peter}, + year = 2018, + month = {08}, + journal = {PLOS ONE}, + volume = 13, + pages = {e0201059}, +} @inproceedings{Prakash_2023, title = {{CFU} Playground: Full-Stack Open-Source Framework for Tiny Machine Learning ({TinyML}) Acceleration on {FPGAs}}, author = {Shvetank Prakash and Tim Callahan and Joseph Bushagour and Colby Banbury and Alan V. Green and Pete Warden and Tim Ansell and Vijay Janapa Reddi}, - year = {2023}, - month = {apr}, + year = 2023, + month = apr, booktitle = {2023 {IEEE} International Symposium on Performance Analysis of Systems and Software ({ISPASS})}, publisher = {{IEEE}}, - doi = {10.1109/ispass57527.2023.00024}, - url = {https://doi.org/10.1109%2Fispass57527.2023.00024} + url = {https://doi.org/10.1109\%2Fispass57527.2023.00024}, } - @inproceedings{prakash_cfu_2023, title = {{CFU} {Playground}: {Full}-{Stack} {Open}-{Source} {Framework} for {Tiny} {Machine} {Learning} ({tinyML}) {Acceleration} on {FPGAs}}, - shorttitle = {{CFU} {Playground}}, author = {Prakash, Shvetank and Callahan, Tim and Bushagour, Joseph and Banbury, Colby and Green, Alan V. and Warden, Pete and Ansell, Tim and Reddi, Vijay Janapa}, - year = {2023}, - month = {Apr}, + year = 2023, + month = apr, booktitle = {2023 {IEEE} {International} {Symposium} on {Performance} {Analysis} of {Systems} and {Software} ({ISPASS})}, pages = {157--167}, - doi = {10.1109/ISPASS57527.2023.00024}, url = {http://arxiv.org/abs/2201.01863}, urldate = {2023-10-25}, note = {arXiv:2201.01863 [cs]}, language = {en}, - keywords = {Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture}, - file = {Prakash et al. - 2023 - CFU Playground Full-Stack Open-Source Framework f.pdf:/Users/alex/Zotero/storage/BZNRIDTL/Prakash et al. - 2023 - CFU Playground Full-Stack Open-Source Framework f.pdf:application/pdf} } - @article{preparednesspublic, title = {Public Health Law}, - author = {Preparedness, Emergency} + author = {Preparedness, Emergency}, } - @article{Pushkarna_Zaldivar_Kjartansson_2022, title = {Data cards: Purposeful and transparent dataset documentation for responsible ai}, author = {Pushkarna, Mahima and Zaldivar, Andrew and Kjartansson, Oddur}, - year = {2022}, + year = 2022, journal = {2022 ACM Conference on Fairness, Accountability, and Transparency}, - doi = {10.1145/3531146.3533231} } +@article{putnam_reconfigurable_2014, + title = {A reconfigurable fabric for accelerating large-scale datacenter services}, + author = {Putnam, Andrew and Caulfield, Adrian M. and Chung, Eric S. and Chiou, Derek and Constantinides, Kypros and Demme, John and Esmaeilzadeh, Hadi and Fowers, Jeremy and Gopal, Gopi Prashanth and Gray, Jan and Haselman, Michael and Hauck, Scott and Heil, Stephen and Hormati, Amir and Kim, Joo-Young and Lanka, Sitaram and Larus, James and Peterson, Eric and Pope, Simon and Smith, Aaron and Thong, Jason and Xiao, Phillip Yi and Burger, Doug}, + year = 2014, + month = oct, + journal = {ACM SIGARCH Computer Architecture News}, + volume = 42, + number = 3, + pages = {13--24}, + url = {https://dl.acm.org/doi/10.1145/2678373.2665678}, + urldate = {2023-11-07}, + language = {en}, +} @article{qi_efficient_2021, title = {An efficient pruning scheme of deep neural networks for {Internet} of {Things} applications}, author = {Qi, Chen and Shen, Shibo and Li, Rongpeng and Zhifeng, Zhao and Liu, Qing and Liang, Jing and Zhang, Honggang}, - year = {2021}, - month = {Jun}, + year = 2021, + month = jun, journal = {EURASIP Journal on Advances in Signal Processing}, volume = 2021, - doi = {10.1186/s13634-021-00744-4}, - abstract = {Nowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2\% and 94.1\%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.}, - file = {Full Text PDF:/Users/jeffreyma/Zotero/storage/AGWCC5VS/Qi et al. - 2021 - An efficient pruning scheme of deep neural network.pdf:application/pdf} } - @misc{quantdeep, title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, author = {Krishnamoorthi}, year = 2018, month = jun, publisher = {arXiv}, - doi = {10.48550/arXiv.1806.08342}, url = {https://arxiv.org/abs/1806.08342}, - urldate = {2018-06-21} + urldate = {2018-06-21}, } +@inproceedings{raina_large-scale_2009, + title = {Large-scale deep unsupervised learning using graphics processors}, + author = {Raina, Rajat and Madhavan, Anand and Ng, Andrew Y.}, + year = 2009, + month = jun, + booktitle = {Proceedings of the 26th {Annual} {International} {Conference} on {Machine} {Learning}}, + publisher = {ACM}, + address = {Montreal Quebec Canada}, + pages = {873--880}, + isbn = {978-1-60558-516-1}, + url = {https://dl.acm.org/doi/10.1145/1553374.1553486}, + urldate = {2023-11-07}, + language = {en}, +} @article{ramcharan2017deep, title = {Deep learning for image-based cassava disease detection}, @@ -1638,54 +2039,70 @@ @article{ramcharan2017deep journal = {Frontiers in plant science}, publisher = {Frontiers Media SA}, volume = 8, - pages = 1852 + pages = 1852, } +@article{Ranganathan2011-dc, + title = {From microprocessors to nanostores: Rethinking data-centric systems}, + author = {Ranganathan, Parthasarathy}, + year = 2011, + month = jan, + journal = {Computer (Long Beach Calif.)}, + publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, + volume = 44, + number = 1, + pages = {39--48}, +} @misc{Rao_2021, author = {Rao, Ravi}, year = 2021, - month = {Dec}, + month = dec, journal = {www.wevolver.com}, - url = {https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies} + url = {https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies}, } - @article{Ratner_Hancock_Dunnmon_Goldman_Ré_2018, title = {Snorkel metal: Weak supervision for multi-task learning.}, - author = {Ratner, Alex and Hancock, Braden and Dunnmon, Jared and Goldman, Roger and Ré, Christopher}, + author = {Ratner, Alex and Hancock, Braden and Dunnmon, Jared and Goldman, Roger and R\'{e}, Christopher}, year = 2018, journal = {Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning}, - doi = {10.1145/3209889.3209898} } - @inproceedings{reddi2020mlperf, title = {Mlperf inference benchmark}, author = {Reddi, Vijay Janapa and Cheng, Christine and Kanter, David and Mattson, Peter and Schmuelling, Guenther and Wu, Carole-Jean and Anderson, Brian and Breughe, Maximilien and Charlebois, Mark and Chou, William and others}, year = 2020, booktitle = {2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)}, pages = {446--459}, - organization = {IEEE} + organization = {IEEE}, } - @inproceedings{ribeiro2016should, title = {" Why should i trust you?" Explaining the predictions of any classifier}, author = {Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos}, year = 2016, booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, - pages = {1135--1144} + pages = {1135--1144}, } - @book{rosenblatt1957perceptron, title = {The perceptron, a perceiving and recognizing automaton Project Para}, author = {Rosenblatt, Frank}, year = 1957, - publisher = {Cornell Aeronautical Laboratory} + publisher = {Cornell Aeronautical Laboratory}, } +@article{roskies2002neuroethics, + title = {Neuroethics for the new millenium}, + author = {Roskies, Adina}, + year = 2002, + journal = {Neuron}, + publisher = {Elsevier}, + volume = 35, + number = 1, + pages = {21--23}, +} @inproceedings{rouhani2017tinydl, title = {TinyDL: Just-in-time deep learning solution for constrained embedded systems}, @@ -1693,10 +2110,8 @@ @inproceedings{rouhani2017tinydl year = 2017, month = {05}, pages = {1--4}, - doi = {10.1109/ISCAS.2017.8050343} } - @article{rumelhart1986learning, title = {Learning representations by back-propagating errors}, author = {Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J}, @@ -1705,23 +2120,37 @@ @article{rumelhart1986learning publisher = {Nature Publishing Group UK London}, volume = 323, number = 6088, - pages = {533--536} + pages = {533--536}, } - @article{ruvolo_ella_nodate, title = {{ELLA}: {An} {Efficient} {Lifelong} {Learning} {Algorithm}}, author = {Ruvolo, Paul and Eaton, Eric}, language = {en}, - file = {Ruvolo and Eaton - ELLA An Efficient Lifelong Learning Algorithm.pdf:/Users/alex/Zotero/storage/QA5G29GL/Ruvolo and Eaton - ELLA An Efficient Lifelong Learning Algorithm.pdf:application/pdf} } +@article{samajdar2018scale, + title = {Scale-sim: Systolic cnn accelerator simulator}, + author = {Samajdar, Ananda and Zhu, Yuhao and Whatmough, Paul and Mattina, Matthew and Krishna, Tushar}, + year = 2018, + journal = {arXiv preprint arXiv:1811.02883}, +} @misc{ScaleAI, journal = {ScaleAI}, - url = {https://scale.com/data-engine} + url = {https://scale.com/data-engine}, } +@article{schuman2022, + title = {Opportunities for neuromorphic computing algorithms and applications}, + author = {Schuman, Catherine D and Kulkarni, Shruti R and Parsa, Maryam and Mitchell, J Parker and Date, Prasanna and Kay, Bill}, + year = 2022, + journal = {Nature Computational Science}, + publisher = {Nature Publishing Group US New York}, + volume = 2, + number = 1, + pages = {10--19}, +} @inproceedings{schwarzschild2021just, title = {Just how toxic is data poisoning? a unified benchmark for backdoor and data poisoning attacks}, @@ -1729,9 +2158,16 @@ @inproceedings{schwarzschild2021just year = 2021, booktitle = {International Conference on Machine Learning}, pages = {9389--9398}, - organization = {PMLR} + organization = {PMLR}, } +@article{sculley2015hidden, + title = {Hidden technical debt in machine learning systems}, + author = {Sculley, David and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan}, + year = 2015, + journal = {Advances in neural information processing systems}, + volume = 28, +} @misc{see_compression_2016, title = {Compression of {Neural} {Machine} {Translation} {Models} via {Pruning}}, @@ -1739,25 +2175,35 @@ @misc{see_compression_2016 year = 2016, month = jun, publisher = {arXiv}, - doi = {10.48550/arXiv.1606.09274}, url = {http://arxiv.org/abs/1606.09274}, urldate = {2023-10-20}, note = {arXiv:1606.09274 [cs]}, - abstract = {Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40\% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80\%-pruned model.}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Neural and Evolutionary Computing}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/2CJ4TSNR/See et al. - 2016 - Compression of Neural Machine Translation Models v.pdf:application/pdf} } +@misc{segal1999opengl, + title = {The OpenGL graphics system: A specification (version 1.1)}, + author = {Segal, Mark and Akeley, Kurt}, + year = 1999, +} + +@article{segura2018ethical, + title = {Ethical implications of user perceptions of wearable devices}, + author = {Segura Anaya, LH and Alsadoon, Abeer and Costadopoulos, Nectar and Prasad, PWC}, + year = 2018, + journal = {Science and engineering ethics}, + publisher = {Springer}, + volume = 24, + pages = {1--28}, +} @inproceedings{seide2016cntk, title = {CNTK: Microsoft's open-source deep-learning toolkit}, author = {Seide, Frank and Agarwal, Amit}, year = 2016, booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, - pages = {2135--2135} + pages = {2135--2135}, } - @misc{sevilla_compute_2022, title = {Compute {Trends} {Across} {Three} {Eras} of {Machine} {Learning}}, author = {Sevilla, Jaime and Heim, Lennart and Ho, Anson and Besiroglu, Tamay and Hobbhahn, Marius and Villalobos, Pablo}, @@ -1768,11 +2214,8 @@ @misc{sevilla_compute_2022 urldate = {2023-10-25}, note = {arXiv:2202.05924 [cs]}, language = {en}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society}, - file = {Sevilla et al. - 2022 - Compute Trends Across Three Eras of Machine Learni.pdf:/Users/alex/Zotero/storage/24N9RZ72/Sevilla et al. - 2022 - Compute Trends Across Three Eras of Machine Learni.pdf:application/pdf} } - @article{seyedzadeh2018machine, title = {Machine learning for estimation of building energy consumption and performance: a review}, author = {Seyedzadeh, Saleh and Rahimian, Farzad Pour and Glesk, Ivan and Roper, Marc}, @@ -1780,10 +2223,9 @@ @article{seyedzadeh2018machine journal = {Visualization in Engineering}, publisher = {Springer}, volume = 6, - pages = {1--20} + pages = {1--20}, } - @article{shamir1979share, title = {How to share a secret}, author = {Shamir, Adi}, @@ -1792,9 +2234,19 @@ @article{shamir1979share publisher = {ACm New York, NY, USA}, volume = 22, number = 11, - pages = {612--613} + pages = {612--613}, } +@article{shastri2021photonics, + title = {Photonics for artificial intelligence and neuromorphic computing}, + author = {Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, + year = 2021, + journal = {Nature Photonics}, + publisher = {Nature Publishing Group UK London}, + volume = 15, + number = 2, + pages = {102--114}, +} @article{Sheng_Zhang_2019, title = {Machine learning with crowdsourcing: A brief summary of the past research and Future Directions}, @@ -1804,122 +2256,151 @@ @article{Sheng_Zhang_2019 volume = 33, number = {01}, pages = {9837–9843}, - doi = {10.1609/aaai.v33i01.33019837} } - @misc{Sheth_2022, title = {Eletect - TinyML and IOT based Smart Wildlife Tracker}, author = {Sheth, Dhruv}, year = 2022, - month = {Mar}, + month = mar, journal = {Hackster.io}, - url = {https://www.hackster.io/dhruvsheth_/eletect-tinyml-and-iot-based-smart-wildlife-tracker-c03e5a} + url = {https://www.hackster.io/dhruvsheth\_/eletect-tinyml-and-iot-based-smart-wildlife-tracker-c03e5a}, } - @inproceedings{shi2022data, title = {Data selection for efficient model update in federated learning}, author = {Shi, Hongrui and Radu, Valentin}, year = 2022, booktitle = {Proceedings of the 2nd European Workshop on Machine Learning and Systems}, - pages = {72--78} + pages = {72--78}, } - @article{smestad2023systematic, title = {A Systematic Literature Review on Client Selection in Federated Learning}, author = {Smestad, Carl and Li, Jingyue}, year = 2023, - journal = {arXiv preprint arXiv:2306.04862} + journal = {arXiv preprint arXiv:2306.04862}, } - @misc{smoothquant, title = {SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models}, author = {Xiao and Lin, Seznec and Wu, Demouth and Han}, year = 2023, - doi = {10.48550/arXiv.2211.10438}, url = {https://arxiv.org/abs/2211.10438}, urldate = {2023-06-05}, - abstract = {Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the matrix multiplications in LLMs, including OPT, BLOOM, GLM, MT-NLG, and LLaMA family. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. SmoothQuant enables serving 530B LLM within a single node. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs.} } +@inproceedings{suda2016throughput, + title = {Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks}, + author = {Suda, Naveen and Chandra, Vikas and Dasika, Ganesh and Mohanty, Abinash and Ma, Yufei and Vrudhula, Sarma and Seo, Jae-sun and Cao, Yu}, + year = 2016, + booktitle = {Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays}, + pages = {16--25}, +} @misc{surveyofquant, title = {A Survey of Quantization Methods for Efficient Neural Network Inference)}, author = {Gholami and Kim, Dong and Yao, Mahoney and Keutzer}, year = 2021, - doi = {10.48550/arXiv.2103.13630}, url = {https://arxiv.org/abs/2103.13630}, urldate = {2021-06-21}, - abstract = {As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.} } +@article{Sze2017-ak, + title = {Efficient processing of deep neural networks: A tutorial and survey}, + author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel}, + year = 2017, + month = mar, + copyright = {http://arxiv.org/licenses/nonexclusive-distrib/1.0/}, + archiveprefix = {arXiv}, + primaryclass = {cs.CV}, + eprint = {1703.09039}, +} + +@article{sze2017efficient, + title = {Efficient processing of deep neural networks: A tutorial and survey}, + author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel S}, + year = 2017, + journal = {Proceedings of the IEEE}, + publisher = {Ieee}, + volume = 105, + number = 12, + pages = {2295--2329}, +} @misc{tan_efficientnet_2020, title = {{EfficientNet}: {Rethinking} {Model} {Scaling} for {Convolutional} {Neural} {Networks}}, - shorttitle = {{EfficientNet}}, author = {Tan, Mingxing and Le, Quoc V.}, year = 2020, month = sep, publisher = {arXiv}, - doi = {10.48550/arXiv.1905.11946}, url = {http://arxiv.org/abs/1905.11946}, urldate = {2023-10-20}, note = {arXiv:1905.11946 [cs, stat]}, - abstract = {Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3\% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7\%), Flowers (98.8\%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.}, - keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning}, - file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/KISBF35I/Tan and Le - 2020 - EfficientNet Rethinking Model Scaling for Convolu.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/TUD4PH4M/1905.html:text/html} } - @inproceedings{tan2019mnasnet, title = {Mnasnet: Platform-aware neural architecture search for mobile}, author = {Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V}, year = 2019, booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {2820--2828} + pages = {2820--2828}, } - @misc{tan2020efficientnet, title = {EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks}, author = {Mingxing Tan and Quoc V. Le}, year = 2020, eprint = {1905.11946}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, } +@article{tang2022soft, + title = {Soft bioelectronics for cardiac interfaces}, + author = {Tang, Xin and He, Yichun and Liu, Jia}, + year = 2022, + journal = {Biophysics Reviews}, + publisher = {AIP Publishing}, + volume = 3, + number = 1, +} + +@article{tang2023flexible, + title = {Flexible brain--computer interfaces}, + author = {Tang, Xin and Shen, Hao and Zhao, Siyuan and Li, Na and Liu, Jia}, + year = 2023, + journal = {Nature Electronics}, + publisher = {Nature Publishing Group UK London}, + volume = 6, + number = 2, + pages = {109--118}, +} @misc{Team_2023, title = {Data-centric AI for the Enterprise}, author = {Team, Snorkel}, year = 2023, - month = {Aug}, + month = aug, journal = {Snorkel AI}, - url = {https://snorkel.ai/} + url = {https://snorkel.ai/}, } - @misc{Thefutur92:online, - title = {The future is being built on Arm: Market diversification continues to drive strong royalty and licensing growth as ecosystem reaches quarter of a trillion chips milestone – Arm®}, + title = {The future is being built on Arm: Market diversification continues to drive strong royalty and licensing growth as ecosystem reaches quarter of a trillion chips milestone – Arm\textregistered{}}, author = {ARM.com}, note = {(Accessed on 09/16/2023)}, - howpublished = {\url{https://www.arm.com/company/news/2023/02/arm-announces-q3-fy22-results}} + howpublished = {\url{https://www.arm.com/company/news/2023/02/arm-announces-q3-fy22-results}}, } - @misc{threefloat, title = {Three Floating Point Formats}, author = {Google}, year = 2023, - url = {https://storage.googleapis.com/gweb-cloudblog-publish/images/Three_floating-point_formats.max-624x261.png}, - urldate = {2023-10-20} + url = {https://storage.googleapis.com/gweb-cloudblog-publish/images/Three\_floating-point\_formats.max-624x261.png}, + urldate = {2023-10-20}, } - @article{tirtalistyani2022indonesia, title = {Indonesia rice irrigation system: Time for innovation}, author = {Tirtalistyani, Rose and Murtiningrum, Murtiningrum and Kanwar, Rameshwar S}, @@ -1928,20 +2409,18 @@ @article{tirtalistyani2022indonesia publisher = {MDPI}, volume = 14, number = 19, - pages = 12477 + pages = 12477, } - @inproceedings{tokui2015chainer, title = {Chainer: a next-generation open source framework for deep learning}, author = {Tokui, Seiya and Oono, Kenta and Hido, Shohei and Clayton, Justin}, year = 2015, booktitle = {Proceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS)}, volume = 5, - pages = {1--6} + pages = {1--6}, } - @article{van_de_ven_three_2022, title = {Three types of incremental learning}, author = {Van De Ven, Gido M. and Tuytelaars, Tinne and Tolias, Andreas S.}, @@ -1951,40 +2430,44 @@ @article{van_de_ven_three_2022 volume = 4, number = 12, pages = {1185--1197}, - doi = {10.1038/s42256-022-00568-3}, - issn = {2522-5839}, url = {https://www.nature.com/articles/s42256-022-00568-3}, urldate = {2023-10-26}, language = {en}, - file = {Van De Ven et al. - 2022 - Three types of incremental learning.pdf:/Users/alex/Zotero/storage/5ZAHXMQN/Van De Ven et al. - 2022 - Three types of incremental learning.pdf:application/pdf} } - @article{vaswani2017attention, title = {Attention is all you need}, author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, year = 2017, journal = {Advances in neural information processing systems}, - volume = 30 + volume = 30, } - @misc{Vectorbo78:online, title = {Vector-borne diseases}, note = {(Accessed on 10/17/2023)}, - howpublished = {\url{https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases}} + howpublished = {\url{https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases}}, } - @misc{Verma_2022, title = {Elephant AI}, - author = {Verma, Team Dual_Boot: Swapnil}, + author = {Verma, Team Dual\_Boot: Swapnil}, year = 2022, - month = {Mar}, + month = mar, journal = {Hackster.io}, - url = {https://www.hackster.io/dual_boot/elephant-ai-ba71e9} + url = {https://www.hackster.io/dual\_boot/elephant-ai-ba71e9}, } +@article{verma2019memory, + title = {In-memory computing: Advances and prospects}, + author = {Verma, Naveen and Jia, Hongyang and Valavi, Hossein and Tang, Yinqi and Ozatay, Murat and Chen, Lung-Yen and Zhang, Bonan and Deaville, Peter}, + year = 2019, + journal = {IEEE Solid-State Circuits Magazine}, + publisher = {IEEE}, + volume = 11, + number = 3, + pages = {43--55}, +} @misc{villalobos_machine_2022, title = {Machine {Learning} {Model} {Sizes} and the {Parameter} {Gap}}, @@ -1996,27 +2479,22 @@ @misc{villalobos_machine_2022 urldate = {2023-10-25}, note = {arXiv:2207.02852 [cs]}, language = {en}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Computation and Language}, - file = {Villalobos et al. - 2022 - Machine Learning Model Sizes and the Parameter Gap.pdf:/Users/alex/Zotero/storage/WW69A82B/Villalobos et al. - 2022 - Machine Learning Model Sizes and the Parameter Gap.pdf:application/pdf} } - @misc{villalobos_trends_2022, title = {Trends in {Training} {Dataset} {Sizes}}, author = {Villalobos, Pablo and Ho, Anson}, year = 2022, month = sep, journal = {Epoch AI}, - url = {https://epochai.org/blog/trends-in-training-dataset-sizes} + url = {https://epochai.org/blog/trends-in-training-dataset-sizes}, } - @misc{VinBrain, journal = {VinBrain}, - url = {https://vinbrain.net/aiscaler} + url = {https://vinbrain.net/aiscaler}, } - @article{vinuesa2020role, title = {The role of artificial intelligence in achieving the Sustainable Development Goals}, author = {Vinuesa, Ricardo and Azizpour, Hossein and Leite, Iolanda and Balaam, Madeline and Dignum, Virginia and Domisch, Sami and Fell{\"a}nder, Anna and Langhans, Simone Daniela and Tegmark, Max and Fuso Nerini, Francesco}, @@ -2025,25 +2503,60 @@ @article{vinuesa2020role publisher = {Nature Publishing Group}, volume = 11, number = 1, - pages = {1--10} + pages = {1--10}, +} + +@article{Vivet2021, + title = {IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management}, + author = {Vivet, Pascal and Guthmuller, Eric and Thonnart, Yvain and Pillonnet, Gael and Fuguet, C\'{e}sar and Miro-Panades, Ivan and Moritz, Guillaume and Durupt, Jean and Bernard, Christian and Varreau, Didier and Pontes, Julian and Thuries, S\'{e}bastien and Coriat, David and Harrand, Michel and Dutoit, Denis and Lattard, Didier and Arnaud, Lucile and Charbonnier, Jean and Coudrain, Perceval and Garnier, Arnaud and Berger, Fr\'{e}d\'{e}ric and Gueugnot, Alain and Greiner, Alain and Meunier, Quentin L. and Farcy, Alexis and Arriordaz, Alexandre and Ch\'{e}ramy, S\'{e}verine and Clermidy, Fabien}, + year = 2021, + journal = {IEEE Journal of Solid-State Circuits}, + volume = 56, + number = 1, + pages = {79--97}, } +@inproceedings{wang2020apq, + title = {APQ: Joint Search for Network Architecture, Pruning and Quantization Policy}, + author = {Wang, Tianzhe and Wang, Kuan and Cai, Han and Lin, Ji and Liu, Zhijian and Wang, Hanrui and Lin, Yujun and Han, Song}, + year = 2020, + booktitle = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + pages = {2075--2084}, +} @article{warden2018speech, title = {Speech commands: A dataset for limited-vocabulary speech recognition}, author = {Warden, Pete}, year = 2018, - journal = {arXiv preprint arXiv:1804.03209} + journal = {arXiv preprint arXiv:1804.03209}, } - @book{warden2019tinyml, title = {Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers}, author = {Warden, Pete and Situnayake, Daniel}, year = 2019, - publisher = {O'Reilly Media} + publisher = {O'Reilly Media}, } +@article{wearableinsulin, + title = {Wearable Insulin Biosensors for Diabetes Management: Advances and Challenges}, + author = {Psoma, Sotiria D. and Kanthou, Chryso}, + year = 2023, + journal = {Biosensors}, + volume = 13, + number = 7, + url = {https://www.mdpi.com/2079-6374/13/7/719}, + article-number = 719, + pubmedid = 37504117, +} + +@book{weik_survey_1955, + title = {A {Survey} of {Domestic} {Electronic} {Digital} {Computing} {Systems}}, + author = {Weik, Martin H.}, + year = 1955, + publisher = {Ballistic Research Laboratories}, + language = {en}, +} @article{weiss_survey_2016, title = {A survey of transfer learning}, @@ -2054,67 +2567,92 @@ @article{weiss_survey_2016 volume = 3, number = 1, pages = 9, - doi = {10.1186/s40537-016-0043-6}, - issn = {2196-1115}, url = {http://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0043-6}, urldate = {2023-10-25}, language = {en}, - file = {Weiss et al. - 2016 - A survey of transfer learning.pdf:/Users/alex/Zotero/storage/3FN2Y6EA/Weiss et al. - 2016 - A survey of transfer learning.pdf:application/pdf} } +@article{wong2012metal, + title = {Metal--oxide RRAM}, + author = {Wong, H-S Philip and Lee, Heng-Yuan and Yu, Shimeng and Chen, Yu-Sheng and Wu, Yi and Chen, Pang-Shiu and Lee, Byoungil and Chen, Frederick T and Tsai, Ming-Jinn}, + year = 2012, + journal = {Proceedings of the IEEE}, + publisher = {IEEE}, + volume = 100, + number = 6, + pages = {1951--1970}, +} @inproceedings{wu2019fbnet, title = {Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search}, author = {Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt}, year = 2019, booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {10734--10742} + pages = {10734--10742}, } - @article{wu2022sustainable, title = {Sustainable ai: Environmental implications, challenges and opportunities}, author = {Wu, Carole-Jean and Raghavendra, Ramya and Gupta, Udit and Acun, Bilge and Ardalani, Newsha and Maeng, Kiwan and Chang, Gloria and Aga, Fiona and Huang, Jinshi and Bai, Charles and others}, year = 2022, journal = {Proceedings of Machine Learning and Systems}, volume = 4, - pages = {795--813} + pages = {795--813}, } - @inproceedings{xie2020adversarial, title = {Adversarial examples improve image recognition}, author = {Xie, Cihang and Tan, Mingxing and Gong, Boqing and Wang, Jiang and Yuille, Alan L and Le, Quoc V}, year = 2020, booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {819--828} + pages = {819--828}, +} + +@article{xiong_mri-based_2021, + title = {{MRI}-based brain tumor segmentation using {FPGA}-accelerated neural network}, + author = {Xiong, Siyu and Wu, Guoqing and Fan, Xitian and Feng, Xuan and Huang, Zhongcheng and Cao, Wei and Zhou, Xuegong and Ding, Shijin and Yu, Jinhua and Wang, Lingli and Shi, Zhifeng}, + year = 2021, + month = sep, + journal = {BMC Bioinformatics}, + volume = 22, + number = 1, + pages = 421, + url = {https://doi.org/10.1186/s12859-021-04347-6}, + urldate = {2023-11-07}, } +@article{xiu2019time, + title = {Time Moore: Exploiting Moore's Law from the perspective of time}, + author = {Xiu, Liming}, + year = 2019, + journal = {IEEE Solid-State Circuits Magazine}, + publisher = {IEEE}, + volume = 11, + number = 1, + pages = {39--55}, +} @article{xu2018alternating, title = {Alternating multi-bit quantization for recurrent neural networks}, author = {Xu, Chen and Yao, Jianqiang and Lin, Zhouchen and Ou, Wenwu and Cao, Yuanbin and Wang, Zhirong and Zha, Hongbin}, year = 2018, - journal = {arXiv preprint arXiv:1802.00150} + journal = {arXiv preprint arXiv:1802.00150}, } - @article{xu2023demystifying, title = {Demystifying CLIP Data}, author = {Xu, Hu and Xie, Saining and Tan, Xiaoqing Ellen and Huang, Po-Yao and Howes, Russell and Sharma, Vasu and Li, Shang-Wen and Ghosh, Gargi and Zettlemoyer, Luke and Feichtenhofer, Christoph}, year = 2023, - journal = {arXiv preprint arXiv:2309.16671} + journal = {arXiv preprint arXiv:2309.16671}, } - @article{xu2023federated, title = {Federated Learning of Gboard Language Models with Differential Privacy}, author = {Xu, Zheng and Zhang, Yanxiang and Andrew, Galen and Choquette-Choo, Christopher A and Kairouz, Peter and McMahan, H Brendan and Rosenstock, Jesse and Zhang, Yuanbo}, year = 2023, - journal = {arXiv preprint arXiv:2305.18465} + journal = {arXiv preprint arXiv:2305.18465}, } - @article{yamashita2023coffee, title = {Coffee disease classification at the edge using deep learning}, author = {Yamashita, Jo{\~a}o Vitor Yukio Bordin and Leite, Jo{\~a}o Paulo RR}, @@ -2122,53 +2660,89 @@ @article{yamashita2023coffee journal = {Smart Agricultural Technology}, publisher = {Elsevier}, volume = 4, - pages = 100183 + pages = 100183, } - @misc{yang2020coexploration, title = {Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks}, author = {Lei Yang and Zheyu Yan and Meng Li and Hyoukjun Kwon and Liangzhen Lai and Tushar Krishna and Vikas Chandra and Weiwen Jiang and Yiyu Shi}, year = 2020, eprint = {2002.04116}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, } - @inproceedings{yang2023online, title = {Online Model Compression for Federated Learning with Large Models}, author = {Yang, Tien-Ju and Xiao, Yonghui and Motta, Giovanni and Beaufays, Fran{\c{c}}oise and Mathews, Rajiv and Chen, Mingqing}, year = 2023, booktitle = {ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {1--5}, - organization = {IEEE} + organization = {IEEE}, } +@misc{yik2023neurobench, + title = {NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking}, + author = {Jason Yik and Soikat Hasan Ahmed and Zergham Ahmed and Brian Anderson and Andreas G. Andreou and Chiara Bartolozzi and Arindam Basu and Douwe den Blanken and Petrut Bogdan and Sander Bohte and Younes Bouhadjar and Sonia Buckley and Gert Cauwenberghs and Federico Corradi and Guido de Croon and Andreea Danielescu and Anurag Daram and Mike Davies and Yigit Demirag and Jason Eshraghian and Jeremy Forest and Steve Furber and Michael Furlong and Aditya Gilra and Giacomo Indiveri and Siddharth Joshi and Vedant Karia and Lyes Khacef and James C. Knight and Laura Kriener and Rajkumar Kubendran and Dhireesha Kudithipudi and Gregor Lenz and Rajit Manohar and Christian Mayr and Konstantinos Michmizos and Dylan Muir and Emre Neftci and Thomas Nowotny and Fabrizio Ottati and Ayca Ozcelikkale and Noah Pacik-Nelson and Priyadarshini Panda and Sun Pao-Sheng and Melika Payvand and Christian Pehle and Mihai A. Petrovici and Christoph Posch and Alpha Renner and Yulia Sandamirskaya and Clemens JS Schaefer and Andr\'{e} van Schaik and Johannes Schemmel and Catherine Schuman and Jae-sun Seo and Sadique Sheik and Sumit Bam Shrestha and Manolis Sifalakis and Amos Sironi and Kenneth Stewart and Terrence C. Stewart and Philipp Stratmann and Guangzhi Tang and Jonathan Timcheck and Marian Verhelst and Craig M. Vineyard and Bernhard Vogginger and Amirreza Yousefzadeh and Biyan Zhou and Fatima Tuz Zohora and Charlotte Frenkel and Vijay Janapa Reddi}, + year = 2023, + eprint = {2304.04640}, + archiveprefix = {arXiv}, + primaryclass = {cs.AI}, +} + +@article{young2018recent, + title = {Recent trends in deep learning based natural language processing}, + author = {Young, Tom and Hazarika, Devamanyu and Poria, Soujanya and Cambria, Erik}, + year = 2018, + journal = {ieee Computational intelligenCe magazine}, + publisher = {IEEE}, + volume = 13, + number = 3, + pages = {55--75}, +} @inproceedings{zennaro2022tinyml, title = {TinyML: applied AI for development}, author = {Zennaro, Marco and Plancher, Brian and Reddi, V Janapa}, year = 2022, booktitle = {The UN 7th Multi-stakeholder Forum on Science, Technology and Innovation for the Sustainable Development Goals}, - pages = {2022--05} + pages = {2022--05}, } - @article{zennarobridging, title = {Bridging the Digital Divide: the Promising Impact of TinyML for Developing Countries}, - author = {Zennaro, Marco and Plancher, Brian and Reddi, Vijay Janapa} + author = {Zennaro, Marco and Plancher, Brian and Reddi, Vijay Janapa}, } - @inproceedings{Zhang_2020_CVPR_Workshops, title = {Fast Hardware-Aware Neural Architecture Search}, author = {Zhang, Li Lyna and Yang, Yuqing and Jiang, Yuhang and Zhu, Wenwu and Liu, Yunxin}, year = 2020, - month = {June}, - booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops} + month = jun, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, +} + +@inproceedings{zhang2015fpga, + title = {FPGA-based Accelerator Design for Deep Convolutional Neural Networks Proceedings of the 2015 ACM}, + author = {Zhang, Chen and Li, Peng and Sun, Guangyu and Guan, Yijin and Xiao, Bingjun and Cong, Jason Optimizing}, + year = 2015, + booktitle = {SIGDA International Symposium on Field-Programmable Gate Arrays-FPGA}, + volume = 15, + pages = {161--170}, } +@article{Zhang2017, + title = {Highly wearable cuff-less blood pressure and heart rate monitoring with single-arm electrocardiogram and photoplethysmogram signals}, + author = {Zhang, Qingxue and Zhou, Dian and Zeng, Xuan}, + year = 2017, + month = feb, + day = {06}, + journal = {BioMedical Engineering OnLine}, + volume = 16, + number = 1, + pages = 23, + url = {https://doi.org/10.1186/s12938-017-0317-z}, +} @misc{zhang2019autoshrink, title = {AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture}, @@ -2176,21 +2750,18 @@ @misc{zhang2019autoshrink year = 2019, eprint = {1911.09251}, archiveprefix = {arXiv}, - primaryclass = {cs.LG} + primaryclass = {cs.LG}, } - @article{zhao2018federated, title = {Federated learning with non-iid data}, author = {Zhao, Yue and Li, Meng and Lai, Liangzhen and Suda, Naveen and Civin, Damon and Chandra, Vikas}, year = 2018, - journal = {arXiv preprint arXiv:1806.00582} + journal = {arXiv preprint arXiv:1806.00582}, } - @misc{zhou_deep_2023, title = {Deep {Class}-{Incremental} {Learning}: {A} {Survey}}, - shorttitle = {Deep {Class}-{Incremental} {Learning}}, author = {Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei}, year = 2023, month = feb, @@ -2199,1247 +2770,58 @@ @misc{zhou_deep_2023 urldate = {2023-10-26}, note = {arXiv:2302.03648 [cs]}, language = {en}, - keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning}, - annote = {Comment: Code is available at https://github.com/zhoudw-zdw/CIL\_Survey/}, - file = {Zhou et al. - 2023 - Deep Class-Incremental Learning A Survey.pdf:/Users/alex/Zotero/storage/859VZG7W/Zhou et al. - 2023 - Deep Class-Incremental Learning A Survey.pdf:application/pdf} -} - - -@misc{noauthor_who_nodate, - title = {Who {Invented} the {Microprocessor}? - {CHM}}, - url = {https://computerhistory.org/blog/who-invented-the-microprocessor/}, - urldate = {2023-11-07}, -} - - -@book{weik_survey_1955, - title = {A {Survey} of {Domestic} {Electronic} {Digital} {Computing} {Systems}}, - language = {en}, - publisher = {Ballistic Research Laboratories}, - author = {Weik, Martin H.}, - year = {1955}, -} - - -@inproceedings{brown_language_2020, - title = {Language {Models} are {Few}-{Shot} {Learners}}, - volume = {33}, - url = {https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}, - abstract = {We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.}, - urldate = {2023-11-07}, - booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, - publisher = {Curran Associates, Inc.}, - author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario}, - year = {2020}, - pages = {1877--1901}, -} - - -@misc{jia_dissecting_2018, - title = {Dissecting the {NVIDIA} {Volta} {GPU} {Architecture} via {Microbenchmarking}}, - url = {http://arxiv.org/abs/1804.06826}, - abstract = {Every year, novel NVIDIA GPU designs are introduced. This rapid architectural and technological progression, coupled with a reluctance by manufacturers to disclose low-level details, makes it difficult for even the most proficient GPU software designers to remain up-to-date with the technological advances at a microarchitectural level. To address this dearth of public, microarchitectural-level information on the novel NVIDIA GPUs, independent researchers have resorted to microbenchmarks-based dissection and discovery. This has led to a prolific line of publications that shed light on instruction encoding, and memory hierarchy's geometry and features at each level. Namely, research that describes the performance and behavior of the Kepler, Maxwell and Pascal architectures. In this technical report, we continue this line of research by presenting the microarchitectural details of the NVIDIA Volta architecture, discovered through microbenchmarks and instruction set disassembly. Additionally, we compare quantitatively our Volta findings against its predecessors, Kepler, Maxwell and Pascal.}, - urldate = {2023-11-07}, - publisher = {arXiv}, - author = {Jia, Zhe and Maggioni, Marco and Staiger, Benjamin and Scarpazza, Daniele P.}, - month = apr, - year = {2018}, - note = {arXiv:1804.06826 [cs]}, - keywords = {Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance}, -} - - -@article{jia2019beyond, - title={Beyond Data and Model Parallelism for Deep Neural Networks.}, - author={Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, - journal={Proceedings of Machine Learning and Systems}, - volume={1}, - pages={1--13}, - year={2019} -} - - -@inproceedings{raina_large-scale_2009, - address = {Montreal Quebec Canada}, - title = {Large-scale deep unsupervised learning using graphics processors}, - isbn = {978-1-60558-516-1}, - url = {https://dl.acm.org/doi/10.1145/1553374.1553486}, - doi = {10.1145/1553374.1553486}, - language = {en}, - urldate = {2023-11-07}, - booktitle = {Proceedings of the 26th {Annual} {International} {Conference} on {Machine} {Learning}}, - publisher = {ACM}, - author = {Raina, Rajat and Madhavan, Anand and Ng, Andrew Y.}, - month = jun, - year = {2009}, - pages = {873--880}, -} - - -@misc{noauthor_amd_nodate, - title = {{AMD} {Radeon} {RX} 7000 {Series} {Desktop} {Graphics} {Cards}}, - url = {https://www.amd.com/en/graphics/radeon-rx-graphics}, - urldate = {2023-11-07}, -} - - -@misc{noauthor_intel_nodate, - title = {Intel® {Arc}™ {Graphics} {Overview}}, - url = {https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html}, - abstract = {Find out how Intel® Arc Graphics unlock lifelike gaming and seamless content creation.}, - language = {en}, - urldate = {2023-11-07}, - journal = {Intel}, -} - - -@article{lindholm_nvidia_2008, - title = {{NVIDIA} {Tesla}: {A} {Unified} {Graphics} and {Computing} {Architecture}}, - volume = {28}, - issn = {1937-4143}, - shorttitle = {{NVIDIA} {Tesla}}, - url = {https://ieeexplore.ieee.org/document/4523358}, - doi = {10.1109/MM.2008.31}, - abstract = {To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable in C or via graphics APIs.}, - number = {2}, - urldate = {2023-11-07}, - journal = {IEEE Micro}, - author = {Lindholm, Erik and Nickolls, John and Oberman, Stuart and Montrym, John}, - month = mar, - year = {2008}, - note = {Conference Name: IEEE Micro}, - pages = {39--55}, -} - - -@article{dally_evolution_2021, - title = {Evolution of the {Graphics} {Processing} {Unit} ({GPU})}, - volume = {41}, - issn = {1937-4143}, - url = {https://ieeexplore.ieee.org/document/9623445}, - doi = {10.1109/MM.2021.3113475}, - abstract = {Graphics processing units (GPUs) power today’s fastest supercomputers, are the dominant platform for deep learning, and provide the intelligence for devices ranging from self-driving cars to robots and smart cameras. They also generate compelling photorealistic images at real-time frame rates. GPUs have evolved by adding features to support new use cases. NVIDIA’s GeForce 256, the first GPU, was a dedicated processor for real-time graphics, an application that demands large amounts of floating-point arithmetic for vertex and fragment shading computations and high memory bandwidth. As real-time graphics advanced, GPUs became programmable. The combination of programmability and floating-point performance made GPUs attractive for running scientific applications. Scientists found ways to use early programmable GPUs by casting their calculations as vertex and fragment shaders. GPUs evolved to meet the needs of scientific users by adding hardware for simpler programming, double-precision floating-point arithmetic, and resilience.}, - number = {6}, - urldate = {2023-11-07}, - journal = {IEEE Micro}, - author = {Dally, William J. and Keckler, Stephen W. and Kirk, David B.}, - month = nov, - year = {2021}, - note = {Conference Name: IEEE Micro}, - pages = {42--51}, -} - - -@article{demler_ceva_2020, - title = {{CEVA} {SENSPRO} {FUSES} {AI} {AND} {VECTOR} {DSP}}, - language = {en}, - author = {Demler, Mike}, - year = {2020}, -} - - -@misc{noauthor_google_2023, - title = {Google {Tensor} {G3}: {The} new chip that gives your {Pixel} an {AI} upgrade}, - shorttitle = {Google {Tensor} {G3}}, - url = {https://blog.google/products/pixel/google-tensor-g3-pixel-8/}, - abstract = {Tensor G3 on Pixel 8 and Pixel 8 Pro is more helpful, more efficient and more powerful.}, - language = {en-us}, - urldate = {2023-11-07}, - journal = {Google}, - month = oct, - year = {2023}, -} - - -@misc{noauthor_hexagon_nodate, - title = {Hexagon {DSP} {SDK} {Processor}}, - url = {https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor}, - abstract = {The Hexagon DSP processor has both CPU and DSP functionality to support deeply embedded processing needs of the mobile platform for both multimedia and modem functions.}, - language = {en}, - urldate = {2023-11-07}, - journal = {Qualcomm Developer Network}, -} - - -@misc{noauthor_evolution_2023, - title = {The {Evolution} of {Audio} {DSPs}}, - url = {https://audioxpress.com/article/the-evolution-of-audio-dsps}, - abstract = {To complement the extensive perspective of another Market Update feature article on DSP Products and Applications, published in the November 2020 edition, audioXpress was honored to have the valuable contribution from one of the main suppliers in the field. In this article, Youval Nachum, CEVA’s Senior Product Marketing Manager, writes about \"The Evolution of Audio DSPs,\" discussing how DSP technology has evolved, its impact on the user experience, and what the future of DSP has in store for us.}, - language = {en}, - urldate = {2023-11-07}, - journal = {audioXpress}, - month = oct, - year = {2023}, -} - - -@article{xiong_mri-based_2021, - title = {{MRI}-based brain tumor segmentation using {FPGA}-accelerated neural network}, - volume = {22}, - issn = {1471-2105}, - url = {https://doi.org/10.1186/s12859-021-04347-6}, - doi = {10.1186/s12859-021-04347-6}, - abstract = {Brain tumor segmentation is a challenging problem in medical image processing and analysis. It is a very time-consuming and error-prone task. In order to reduce the burden on physicians and improve the segmentation accuracy, the computer-aided detection (CAD) systems need to be developed. Due to the powerful feature learning ability of the deep learning technology, many deep learning-based methods have been applied to the brain tumor segmentation CAD systems and achieved satisfactory accuracy. However, deep learning neural networks have high computational complexity, and the brain tumor segmentation process consumes significant time. Therefore, in order to achieve the high segmentation accuracy of brain tumors and obtain the segmentation results efficiently, it is very demanding to speed up the segmentation process of brain tumors.}, - number = {1}, - urldate = {2023-11-07}, - journal = {BMC Bioinformatics}, - author = {Xiong, Siyu and Wu, Guoqing and Fan, Xitian and Feng, Xuan and Huang, Zhongcheng and Cao, Wei and Zhou, Xuegong and Ding, Shijin and Yu, Jinhua and Wang, Lingli and Shi, Zhifeng}, - month = sep, - year = {2021}, - keywords = {Brain tumor segmatation, FPGA acceleration, Neural network}, - pages = {421}, -} - - -@article{gwennap_certus-nx_nodate, - title = {Certus-{NX} {Innovates} {General}-{Purpose} {FPGAs}}, - language = {en}, - author = {Gwennap, Linley}, -} - - -@misc{noauthor_fpga_nodate, - title = {{FPGA} {Architecture} {Overview}}, - url = {https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/fpga-architecture-overview.html}, - urldate = {2023-11-07}, -} - - -@misc{noauthor_what_nodate, - title = {What is an {FPGA}? {Field} {Programmable} {Gate} {Array}}, - shorttitle = {What is an {FPGA}?}, - url = {https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html}, - abstract = {What is an FPGA - Field Programmable Gate Arrays are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be reprogrammed to desired application or functionality requirements after manufacturing.}, - language = {en}, - urldate = {2023-11-07}, - journal = {AMD}, -} - - -@article{putnam_reconfigurable_2014, - title = {A reconfigurable fabric for accelerating large-scale datacenter services}, - volume = {42}, - issn = {0163-5964}, - url = {https://dl.acm.org/doi/10.1145/2678373.2665678}, - doi = {10.1145/2678373.2665678}, - abstract = {Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables - In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95\% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29\%}, - language = {en}, - number = {3}, - urldate = {2023-11-07}, - journal = {ACM SIGARCH Computer Architecture News}, - author = {Putnam, Andrew and Caulfield, Adrian M. and Chung, Eric S. and Chiou, Derek and Constantinides, Kypros and Demme, John and Esmaeilzadeh, Hadi and Fowers, Jeremy and Gopal, Gopi Prashanth and Gray, Jan and Haselman, Michael and Hauck, Scott and Heil, Stephen and Hormati, Amir and Kim, Joo-Young and Lanka, Sitaram and Larus, James and Peterson, Eric and Pope, Simon and Smith, Aaron and Thong, Jason and Xiao, Phillip Yi and Burger, Doug}, - month = oct, - year = {2014}, - pages = {13--24}, -} - - -@misc{noauthor_project_nodate, - title = {Project {Catapult} - {Microsoft} {Research}}, - url = {https://www.microsoft.com/en-us/research/project/project-catapult/}, - urldate = {2023-11-07}, -} - - -@misc{dean_jeff_numbers_nodate, - title = {Numbers {Everyone} {Should} {Know}}, - url = {https://brenocon.com/dean_perf.html}, - urldate = {2023-11-07}, - author = {Dean. Jeff}, -} - - -@misc{bailey_enabling_2018, - title = {Enabling {Cheaper} {Design}}, - url = {https://semiengineering.com/enabling-cheaper-design/}, - abstract = {Enabling Cheaper Design, At what point does cheaper design enable a significant growth in custom semiconductor content? Not everyone is onboard with the idea.}, - language = {en-US}, - urldate = {2023-11-07}, - journal = {Semiconductor Engineering}, - author = {Bailey, Brian}, - month = sep, - year = {2018}, -} - - -@misc{noauthor_integrated_2023, - title = {Integrated circuit}, - copyright = {Creative Commons Attribution-ShareAlike License}, - url = {https://en.wikipedia.org/w/index.php?title=Integrated_circuit&oldid=1183537457}, - abstract = {An integrated circuit (also known as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece of semiconductor material, usually silicon. Large numbers of miniaturized transistors and other electronic components are integrated together on the chip. This results in circuits that are orders of magnitude smaller, faster, and less expensive than those constructed of discrete components, allowing a large transistor count. -The IC's mass production capability, reliability, and building-block approach to integrated circuit design have ensured the rapid adoption of standardized ICs in place of designs using discrete transistors. ICs are now used in virtually all electronic equipment and have revolutionized the world of electronics. Computers, mobile phones and other home appliances are now essential parts of the structure of modern societies, made possible by the small size and low cost of ICs such as modern computer processors and microcontrollers. -Very-large-scale integration was made practical by technological advancements in semiconductor device fabrication. Since their origins in the 1960s, the size, speed, and capacity of chips have progressed enormously, driven by technical advances that fit more and more transistors on chips of the same size – a modern chip may have many billions of transistors in an area the size of a human fingernail. These advances, roughly following Moore's law, make the computer chips of today possess millions of times the capacity and thousands of times the speed of the computer chips of the early 1970s. -ICs have three main advantages over discrete circuits: size, cost and performance. The size and cost is low because the chips, with all their components, are printed as a unit by photolithography rather than being constructed one transistor at a time. Furthermore, packaged ICs use much less material than discrete circuits. Performance is high because the IC's components switch quickly and consume comparatively little power because of their small size and proximity. The main disadvantage of ICs is the high initial cost of designing them and the enormous capital cost of factory construction. This high initial cost means ICs are only commercially viable when high production volumes are anticipated.}, - language = {en}, - urldate = {2023-11-07}, - journal = {Wikipedia}, - month = nov, - year = {2023}, - note = {Page Version ID: 1183537457}, -} - - -@article{el-rayis_reconfigurable_nodate, - title = {Reconfigurable {Architectures} for the {Next} {Generation} of {Mobile} {Device} {Telecommunications} {Systems}}, - language = {en}, - author = {El-Rayis, Ahmed Osman}, -} - - -@misc{noauthor_intel_nodate, - title = {Intel® {Stratix}® 10 {NX} {FPGA} {Overview} - {High} {Performance} {Stratix}® {FPGA}}, - url = {https://www.intel.com/content/www/us/en/products/details/fpga/stratix/10/nx.html}, - abstract = {View Intel® Stratix® 10 NX FPGAs and find product specifications, features, applications and more.}, - language = {en}, - urldate = {2023-11-07}, - journal = {Intel}, -} - - -@book{patterson2016computer, - title={Computer organization and design ARM edition: the hardware software interface}, - author={Patterson, David A and Hennessy, John L}, - year={2016}, - publisher={Morgan kaufmann} -} - - -@article{xiu2019time, - title={Time Moore: Exploiting Moore's Law from the perspective of time}, - author={Xiu, Liming}, - journal={IEEE Solid-State Circuits Magazine}, - volume={11}, - number={1}, - pages={39--55}, - year={2019}, - publisher={IEEE} -} - - -@article{brown2020language, - title={Language models are few-shot learners}, - author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others}, - journal={Advances in neural information processing systems}, - volume={33}, - pages={1877--1901}, - year={2020} -} - - -@article{cheng2017survey, - title={A survey of model compression and acceleration for deep neural networks}, - author={Cheng, Yu and Wang, Duo and Zhou, Pan and Zhang, Tao}, - journal={arXiv preprint arXiv:1710.09282}, - year={2017} -} - - -@article{sze2017efficient, - title={Efficient processing of deep neural networks: A tutorial and survey}, - author={Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel S}, - journal={Proceedings of the IEEE}, - volume={105}, - number={12}, - pages={2295--2329}, - year={2017}, - publisher={Ieee} -} - - -@article{young2018recent, - title={Recent trends in deep learning based natural language processing}, - author={Young, Tom and Hazarika, Devamanyu and Poria, Soujanya and Cambria, Erik}, - journal={ieee Computational intelligenCe magazine}, - volume={13}, - number={3}, - pages={55--75}, - year={2018}, - publisher={IEEE} -} - - -@inproceedings{jacob2018quantization, - title={Quantization and training of neural networks for efficient integer-arithmetic-only inference}, - author={Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry}, - booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages={2704--2713}, - year={2018} -} - - -@article{gale2019state, - title={The state of sparsity in deep neural networks}, - author={Gale, Trevor and Elsen, Erich and Hooker, Sara}, - journal={arXiv preprint arXiv:1902.09574}, - year={2019} -} - - -@inproceedings{zhang2015fpga, - title={FPGA-based Accelerator Design for Deep Convolutional Neural Networks Proceedings of the 2015 ACM}, - author={Zhang, Chen and Li, Peng and Sun, Guangyu and Guan, Yijin and Xiao, Bingjun and Cong, Jason Optimizing}, - booktitle={SIGDA International Symposium on Field-Programmable Gate Arrays-FPGA}, - volume={15}, - pages={161--170}, - year={2015} -} - - -@inproceedings{suda2016throughput, - title={Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks}, - author={Suda, Naveen and Chandra, Vikas and Dasika, Ganesh and Mohanty, Abinash and Ma, Yufei and Vrudhula, Sarma and Seo, Jae-sun and Cao, Yu}, - booktitle={Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays}, - pages={16--25}, - year={2016} -} - - -@inproceedings{fowers2018configurable, - title={A configurable cloud-scale DNN processor for real-time AI}, - author={Fowers, Jeremy and Ovtcharov, Kalin and Papamichael, Michael and Massengill, Todd and Liu, Ming and Lo, Daniel and Alkalay, Shlomi and Haselman, Michael and Adams, Logan and Ghandi, Mahdi and others}, - booktitle={2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)}, - pages={1--14}, - year={2018}, - organization={IEEE} -} - - -@article{jia2019beyond, - title={Beyond Data and Model Parallelism for Deep Neural Networks.}, - author={Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, - journal={Proceedings of Machine Learning and Systems}, - volume={1}, - pages={1--13}, - year={2019} -} - - -@inproceedings{zhu2018benchmarking, - title={Benchmarking and analyzing deep neural network training}, - author={Zhu, Hongyu and Akrout, Mohamed and Zheng, Bojian and Pelegris, Andrew and Jayarajan, Anand and Phanishayee, Amar and Schroeder, Bianca and Pekhimenko, Gennady}, - booktitle={2018 IEEE International Symposium on Workload Characterization (IISWC)}, - pages={88--100}, - year={2018}, - organization={IEEE} -} - - -@article{samajdar2018scale, - title={Scale-sim: Systolic cnn accelerator simulator}, - author={Samajdar, Ananda and Zhu, Yuhao and Whatmough, Paul and Mattina, Matthew and Krishna, Tushar}, - journal={arXiv preprint arXiv:1811.02883}, - year={2018} -} - - -@INPROCEEDINGS{munshi2009opencl, - author={Munshi, Aaftab}, - booktitle={2009 IEEE Hot Chips 21 Symposium (HCS)}, - title={The OpenCL specification}, - year={2009}, - volume={}, - number={}, - pages={1-314}, - doi={10.1109/HOTCHIPS.2009.7478342} -} - - -@INPROCEEDINGS{luebke2008cuda, - author={Luebke, David}, - booktitle={2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro}, - title={CUDA: Scalable parallel programming for high-performance scientific computing}, - year={2008}, - volume={}, - number={}, - pages={836-838}, - doi={10.1109/ISBI.2008.4541126} -} - - -@misc{segal1999opengl, - title={The OpenGL graphics system: A specification (version 1.1)}, - author={Segal, Mark and Akeley, Kurt}, - year={1999} -} - - -@INPROCEEDINGS{gannot1994verilog, - author={Gannot, G. and Ligthart, M.}, - booktitle={International Verilog HDL Conference}, - title={Verilog HDL based FPGA design}, - year={1994}, - volume={}, - number={}, - pages={86-92}, - doi={10.1109/IVC.1994.323743} -} - - -@article{binkert2011gem5, - title={The gem5 simulator}, - author={Binkert, Nathan and Beckmann, Bradford and Black, Gabriel and Reinhardt, Steven K and Saidi, Ali and Basu, Arkaprava and Hestness, Joel and Hower, Derek R and Krishna, Tushar and Sardashti, Somayeh and others}, - journal={ACM SIGARCH computer architecture news}, - volume={39}, - number={2}, - pages={1--7}, - year={2011}, - publisher={ACM New York, NY, USA} -} - - -@ARTICLE{Vivet2021, author={Vivet, Pascal and Guthmuller, Eric and Thonnart, Yvain and Pillonnet, Gael and Fuguet, César and Miro-Panades, Ivan and Moritz, Guillaume and Durupt, Jean and Bernard, Christian and Varreau, Didier and Pontes, Julian and Thuries, Sébastien and Coriat, David and Harrand, Michel and Dutoit, Denis and Lattard, Didier and Arnaud, Lucile and Charbonnier, Jean and Coudrain, Perceval and Garnier, Arnaud and Berger, Frédéric and Gueugnot, Alain and Greiner, Alain and Meunier, Quentin L. and Farcy, Alexis and Arriordaz, Alexandre and Chéramy, Séverine and Clermidy, Fabien}, -journal={IEEE Journal of Solid-State Circuits}, -title={IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management}, -year={2021}, -volume={56}, -number={1}, -pages={79-97}, -doi={10.1109/JSSC.2020.3036341}} - - -@article{schuman2022, - title={Opportunities for neuromorphic computing algorithms and applications}, - author={Schuman, Catherine D and Kulkarni, Shruti R and Parsa, Maryam and Mitchell, J Parker and Date, Prasanna and Kay, Bill}, - journal={Nature Computational Science}, - volume={2}, - number={1}, - pages={10--19}, - year={2022}, - publisher={Nature Publishing Group US New York} -} - - -@article{markovic2020, - title={Physics for neuromorphic computing}, - author={Markovi{\'c}, Danijela and Mizrahi, Alice and Querlioz, Damien and Grollier, Julie}, - journal={Nature Reviews Physics}, - volume={2}, - number={9}, - pages={499--510}, - year={2020}, - publisher={Nature Publishing Group UK London} -} - - -@article{furber2016large, - title={Large-scale neuromorphic computing systems}, - author={Furber, Steve}, - journal={Journal of neural engineering}, - volume={13}, - number={5}, - pages={051001}, - year={2016}, - publisher={IOP Publishing} -} - - -@article{davies2018loihi, - title={Loihi: A neuromorphic manycore processor with on-chip learning}, - author={Davies, Mike and Srinivasa, Narayan and Lin, Tsung-Han and Chinya, Gautham and Cao, Yongqiang and Choday, Sri Harsha and Dimou, Georgios and Joshi, Prasad and Imam, Nabil and Jain, Shweta and others}, - journal={Ieee Micro}, - volume={38}, - number={1}, - pages={82--99}, - year={2018}, - publisher={IEEE} -} - - -@article{davies2021advancing, - title={Advancing neuromorphic computing with loihi: A survey of results and outlook}, - author={Davies, Mike and Wild, Andreas and Orchard, Garrick and Sandamirskaya, Yulia and Guerra, Gabriel A Fonseca and Joshi, Prasad and Plank, Philipp and Risbud, Sumedh R}, - journal={Proceedings of the IEEE}, - volume={109}, - number={5}, - pages={911--934}, - year={2021}, - publisher={IEEE} -} - - -@article{modha2023neural, - title={Neural inference at the frontier of energy, space, and time}, - author={Modha, Dharmendra S and Akopyan, Filipp and Andreopoulos, Alexander and Appuswamy, Rathinakumar and Arthur, John V and Cassidy, Andrew S and Datta, Pallab and DeBole, Michael V and Esser, Steven K and Otero, Carlos Ortega and others}, - journal={Science}, - volume={382}, - number={6668}, - pages={329--335}, - year={2023}, - publisher={American Association for the Advancement of Science} -} - - -@article{maass1997networks, - title={Networks of spiking neurons: the third generation of neural network models}, - author={Maass, Wolfgang}, - journal={Neural networks}, - volume={10}, - number={9}, - pages={1659--1671}, - year={1997}, - publisher={Elsevier} } - -@article{10242251, -author={Eshraghian, Jason K. and Ward, Max and Neftci, Emre O. and Wang, Xinxin and Lenz, Gregor and Dwivedi, Girish and Bennamoun, Mohammed and Jeong, Doo Seok and Lu, Wei D.}, -journal={Proceedings of the IEEE}, -title={Training Spiking Neural Networks Using Lessons From Deep Learning}, -year={2023}, -volume={111}, -number={9}, -pages={1016-1054}, -doi={10.1109/JPROC.2023.3308088} -} - - -@article{chua1971memristor, - title={Memristor-the missing circuit element}, - author={Chua, Leon}, - journal={IEEE Transactions on circuit theory}, - volume={18}, - number={5}, - pages={507--519}, - year={1971}, - publisher={IEEE} -} - - -@article{shastri2021photonics, - title={Photonics for artificial intelligence and neuromorphic computing}, - author={Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, - journal={Nature Photonics}, - volume={15}, - number={2}, - pages={102--114}, - year={2021}, - publisher={Nature Publishing Group UK London} -} - - -@article{haensch2018next, -title={The next generation of deep learning hardware: Analog computing}, -author={Haensch, Wilfried and Gokmen, Tayfun and Puri, Ruchir}, -journal={Proceedings of the IEEE}, -volume={107}, -number={1}, -pages={108--122}, -year={2018}, -publisher={IEEE} -} - - -@article{hazan2021neuromorphic, - title={Neuromorphic analog implementation of neural engineering framework-inspired spiking neuron for high-dimensional representation}, - author={Hazan, Avi and Ezra Tsur, Elishai}, - journal={Frontiers in Neuroscience}, - volume={15}, - pages={627221}, - year={2021}, - publisher={Frontiers Media SA} -} - - -@article{gates2009flexible, - title={Flexible electronics}, - author={Gates, Byron D}, - journal={Science}, - volume={323}, - number={5921}, - pages={1566--1567}, - year={2009}, - publisher={American Association for the Advancement of Science} -} - - -@article{musk2019integrated, - title={An integrated brain-machine interface platform with thousands of channels}, - author={Musk, Elon and others}, - journal={Journal of medical Internet research}, - volume={21}, - number={10}, - pages={e16194}, - year={2019}, - publisher={JMIR Publications Inc., Toronto, Canada} -} - - -@article{tang2023flexible, - title={Flexible brain--computer interfaces}, - author={Tang, Xin and Shen, Hao and Zhao, Siyuan and Li, Na and Liu, Jia}, - journal={Nature Electronics}, - volume={6}, - number={2}, - pages={109--118}, - year={2023}, - publisher={Nature Publishing Group UK London} -} - - -@article{tang2022soft, - title={Soft bioelectronics for cardiac interfaces}, - author={Tang, Xin and He, Yichun and Liu, Jia}, - journal={Biophysics Reviews}, - volume={3}, - number={1}, - year={2022}, - publisher={AIP Publishing} -} - - -@article{kwon2022flexible, - title={Flexible sensors and machine learning for heart monitoring}, - author={Kwon, Sun Hwa and Dong, Lin}, - journal={Nano Energy}, - pages={107632}, - year={2022}, - publisher={Elsevier} -} - - -@article{huang2010pseudo, - title={Pseudo-CMOS: A design style for low-cost and robust flexible electronics}, - author={Huang, Tsung-Ching and Fukuda, Kenjiro and Lo, Chun-Ming and Yeh, Yung-Hui and Sekitani, Tsuyoshi and Someya, Takao and Cheng, Kwang-Ting}, - journal={IEEE Transactions on Electron Devices}, - volume={58}, - number={1}, - pages={141--150}, - year={2010}, - publisher={IEEE} -} - - -@article{biggs2021natively, - title={A natively flexible 32-bit Arm microprocessor}, - author={Biggs, John and Myers, James and Kufel, Jedrzej and Ozer, Emre and Craske, Simon and Sou, Antony and Ramsdale, Catherine and Williamson, Ken and Price, Richard and White, Scott}, - journal={Nature}, - volume={595}, - number={7868}, - pages={532--536}, - year={2021}, - publisher={Nature Publishing Group UK London} -} - - -@article{farah2005neuroethics, - title={Neuroethics: the practical and the philosophical}, - author={Farah, Martha J}, - journal={Trends in cognitive sciences}, - volume={9}, - number={1}, - pages={34--40}, - year={2005}, - publisher={Elsevier} -} - - -@article{segura2018ethical, - title={Ethical implications of user perceptions of wearable devices}, - author={Segura Anaya, LH and Alsadoon, Abeer and Costadopoulos, Nectar and Prasad, PWC}, - journal={Science and engineering ethics}, - volume={24}, - pages={1--28}, - year={2018}, - publisher={Springer} -} - - -@article{goodyear2017social, - title={Social media, apps and wearable technologies: navigating ethical dilemmas and procedures}, - author={Goodyear, Victoria A}, - journal={Qualitative research in sport, exercise and health}, - volume={9}, - number={3}, - pages={285--302}, - year={2017}, - publisher={Taylor \& Francis} -} - - -@article{roskies2002neuroethics, - title={Neuroethics for the new millenium}, - author={Roskies, Adina}, - journal={Neuron}, - volume={35}, - number={1}, - pages={21--23}, - year={2002}, - publisher={Elsevier} -} - - -@article{duarte2022fastml, - title={FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning}, - author={Duarte, Javier and Tran, Nhan and Hawks, Ben and Herwig, Christian and Muhizi, Jules and Prakash, Shvetank and Reddi, Vijay Janapa}, - journal={arXiv preprint arXiv:2207.07958}, - year={2022} -} - - -@article{verma2019memory, - title={In-memory computing: Advances and prospects}, - author={Verma, Naveen and Jia, Hongyang and Valavi, Hossein and Tang, Yinqi and Ozatay, Murat and Chen, Lung-Yen and Zhang, Bonan and Deaville, Peter}, - journal={IEEE Solid-State Circuits Magazine}, - volume={11}, - number={3}, - pages={43--55}, - year={2019}, - publisher={IEEE} -} - - -@article{chi2016prime, - title={Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory}, - author={Chi, Ping and Li, Shuangchen and Xu, Cong and Zhang, Tao and Zhao, Jishen and Liu, Yongpan and Wang, Yu and Xie, Yuan}, - journal={ACM SIGARCH Computer Architecture News}, - volume={44}, - number={3}, - pages={27--39}, - year={2016}, - publisher={ACM New York, NY, USA} -} - - -@article{burr2016recent, - title={Recent progress in phase-change memory technology}, - author={Burr, Geoffrey W and Brightsky, Matthew J and Sebastian, Abu and Cheng, Huai-Yu and Wu, Jau-Yi and Kim, Sangbum and Sosa, Norma E and Papandreou, Nikolaos and Lung, Hsiang-Lan and Pozidis, Haralampos and others}, - journal={IEEE Journal on Emerging and Selected Topics in Circuits and Systems}, - volume={6}, - number={2}, - pages={146--162}, - year={2016}, - publisher={IEEE} -} - - -@article{loh20083d, - title={3D-stacked memory architectures for multi-core processors}, - author={Loh, Gabriel H}, - journal={ACM SIGARCH computer architecture news}, - volume={36}, - number={3}, - pages={453--464}, - year={2008}, - publisher={ACM New York, NY, USA} -} - - -@article{mittal2021survey, - title={A survey of SRAM-based in-memory computing techniques and applications}, - author={Mittal, Sparsh and Verma, Gaurav and Kaushik, Brajesh and Khanday, Farooq A}, - journal={Journal of Systems Architecture}, - volume={119}, - pages={102276}, - year={2021}, - publisher={Elsevier} -} - - -@article{wong2012metal, - title={Metal--oxide RRAM}, - author={Wong, H-S Philip and Lee, Heng-Yuan and Yu, Shimeng and Chen, Yu-Sheng and Wu, Yi and Chen, Pang-Shiu and Lee, Byoungil and Chen, Frederick T and Tsai, Ming-Jinn}, - journal={Proceedings of the IEEE}, - volume={100}, - number={6}, - pages={1951--1970}, - year={2012}, - publisher={IEEE} -} - - -@inproceedings{imani2016resistive, - title={Resistive configurable associative memory for approximate computing}, - author={Imani, Mohsen and Rahimi, Abbas and Rosing, Tajana S}, - booktitle={2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, - pages={1327--1332}, - year={2016}, - organization={IEEE} -} - - -@article{miller2000optical, - title={Optical interconnects to silicon}, - author={Miller, David AB}, - journal={IEEE Journal of Selected Topics in Quantum Electronics}, - volume={6}, - number={6}, - pages={1312--1317}, - year={2000}, - publisher={IEEE} -} - - -@article{zhou2022photonic, -title={Photonic matrix multiplication lights up photonic accelerator and beyond}, -author={Zhou, Hailong and Dong, Jianji and Cheng, Junwei and Dong, Wenchan and Huang, Chaoran and Shen, Yichen and Zhang, Qiming and Gu, Min and Qian, Chao and Chen, Hongsheng and others}, -journal={Light: Science \& Applications}, -volume={11}, -number={1}, -pages={30}, -year={2022}, -publisher={Nature Publishing Group UK London} -} - - -@article{bains2020business, - title={The business of building brains}, - author={Bains, Sunny}, - journal={Nat. Electron}, - volume={3}, - number={7}, - pages={348--351}, - year={2020} -} - - -@ARTICLE{Hennessy2019-je, - title = "A new golden age for computer architecture", - author = "Hennessy, John L and Patterson, David A", - abstract = "Innovations like domain-specific hardware, enhanced security, - open instruction sets, and agile chip development will lead the - way.", - journal = "Commun. ACM", - publisher = "Association for Computing Machinery (ACM)", - volume = 62, - number = 2, - pages = "48--60", - month = jan, - year = 2019, - copyright = "http://www.acm.org/publications/policies/copyright\_policy\#Background", - language = "en" -} - - -@ARTICLE{Dongarra2009-na, - title = "The evolution of high performance computing on system z", - author = "Dongarra, Jack J", - journal = "IBM Journal of Research and Development", - volume = 53, - pages = "3--4", - year = 2009 -} - - -@ARTICLE{Ranganathan2011-dc, - title = "From microprocessors to nanostores: Rethinking data-centric - systems", - author = "Ranganathan, Parthasarathy", - journal = "Computer (Long Beach Calif.)", - publisher = "Institute of Electrical and Electronics Engineers (IEEE)", - volume = 44, - number = 1, - pages = "39--48", - month = jan, - year = 2011 -} - - -@ARTICLE{Ignatov2018-kh, - title = "{AI} Benchmark: Running deep neural networks on Android - smartphones", - author = "Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke - and Wu, Max and Hartley, Tim and Van Gool, Luc", - abstract = "Over the last years, the computational power of mobile devices - such as smartphones and tablets has grown dramatically, reaching - the level of desktop computers available not long ago. While - standard smartphone apps are no longer a problem for them, there - is still a group of tasks that can easily challenge even - high-end devices, namely running artificial intelligence - algorithms. In this paper, we present a study of the current - state of deep learning in the Android ecosystem and describe - available frameworks, programming models and the limitations of - running AI on smartphones. We give an overview of the hardware - acceleration resources available on four main mobile chipset - platforms: Qualcomm, HiSilicon, MediaTek and Samsung. - Additionally, we present the real-world performance results of - different mobile SoCs collected with AI Benchmark that are - covering all main existing hardware configurations.", - publisher = "arXiv", - year = 2018 -} - - -@ARTICLE{Sze2017-ak, - title = "Efficient processing of deep neural networks: A tutorial and - survey", - author = "Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, - Joel", - abstract = "Deep neural networks (DNNs) are currently widely used for - many artificial intelligence (AI) applications including - computer vision, speech recognition, and robotics. While - DNNs deliver state-of-the-art accuracy on many AI tasks, it - comes at the cost of high computational complexity. - Accordingly, techniques that enable efficient processing of - DNNs to improve energy efficiency and throughput without - sacrificing application accuracy or increasing hardware cost - are critical to the wide deployment of DNNs in AI systems. - This article aims to provide a comprehensive tutorial and - survey about the recent advances towards the goal of - enabling efficient processing of DNNs. Specifically, it will - provide an overview of DNNs, discuss various hardware - platforms and architectures that support DNNs, and highlight - key trends in reducing the computation cost of DNNs either - solely via hardware design changes or via joint hardware - design and DNN algorithm changes. It will also summarize - various development resources that enable researchers and - practitioners to quickly get started in this field, and - highlight important benchmarking metrics and design - considerations that should be used for evaluating the - rapidly growing number of DNN hardware designs, optionally - including algorithmic co-designs, being proposed in academia - and industry. The reader will take away the following - concepts from this article: understand the key design - considerations for DNNs; be able to evaluate different DNN - hardware implementations with benchmarks and comparison - metrics; understand the trade-offs between various hardware - architectures and platforms; be able to evaluate the utility - of various DNN design techniques for efficient processing; - and understand recent implementation trends and - opportunities.", - month = mar, - year = 2017, - copyright = "http://arxiv.org/licenses/nonexclusive-distrib/1.0/", - archivePrefix = "arXiv", - primaryClass = "cs.CV", - eprint = "1703.09039" -} - - -@inproceedings{lin2022ondevice, - title = {On-Device Training Under 256KB Memory}, - author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, - booktitle = {ArXiv}, - year = {2022} -} - - -@article{lin2023awq, - title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, - author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song}, - journal={arXiv}, - year={2023} -} - - -@inproceedings{wang2020apq, - author={Wang, Tianzhe and Wang, Kuan and Cai, Han and Lin, Ji and Liu, Zhijian and Wang, Hanrui and Lin, Yujun and Han, Song}, - booktitle={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, - title={APQ: Joint Search for Network Architecture, Pruning and Quantization Policy}, - year={2020}, - volume={}, - number={}, - pages={2075-2084}, - doi={10.1109/CVPR42600.2020.00215} -} - - -@inproceedings{Li2020Additive, -title={Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks}, -author={Yuhang Li and Xin Dong and Wei Wang}, -booktitle={International Conference on Learning Representations}, -year={2020}, -url={https://openreview.net/forum?id=BkgXT24tDS} -} - - -@article{janapa2023edge, - title={Edge Impulse: An MLOps Platform for Tiny Machine Learning}, - author={Janapa Reddi, Vijay and Elium, Alexander and Hymel, Shawn and Tischler, David and Situnayake, Daniel and Ward, Carl and Moreau, Louis and Plunkett, Jenny and Kelcey, Matthew and Baaijens, Mathijs and others}, - journal={Proceedings of Machine Learning and Systems}, - volume={5}, - year={2023} -} - - -@article{zhuang2020comprehensive, - title={A comprehensive survey on transfer learning}, - author={Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, - journal={Proceedings of the IEEE}, - volume={109}, - number={1}, - pages={43--76}, - year={2020}, - publisher={IEEE} -} - - -@article{zhuang_comprehensive_2021, - title = {A {Comprehensive} {Survey} on {Transfer} {Learning}}, - volume = {109}, - issn = {0018-9219, 1558-2256}, - url = {https://ieeexplore.ieee.org/document/9134370/}, - doi = {10.1109/JPROC.2020.3004555}, - language = {en}, - number = {1}, - urldate = {2023-10-25}, - journal = {Proceedings of the IEEE}, - author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, - month = jan, - year = {2021}, - pages = {43--76}, - file = {Zhuang et al. - 2021 - A Comprehensive Survey on Transfer Learning.pdf:/Users/alex/Zotero/storage/CHJB2WE4/Zhuang et al. - 2021 - A Comprehensive Survey on Transfer Learning.pdf:application/pdf}, -} - - -@inproceedings{Norman2017TPUv1, -author = {Jouppi, Norman P. and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and Boyle, Rick and Cantin, Pierre-luc and Chao, Clifford and Clark, Chris and Coriell, Jeremy and Daley, Mike and Dau, Matt and Dean, Jeffrey and Gelb, Ben and Ghaemmaghami, Tara Vazir and Gottipati, Rajendra and Gulland, William and Hagmann, Robert and Ho, C. Richard and Hogberg, Doug and Hu, John and Hundt, Robert and Hurt, Dan and Ibarz, Julian and Jaffey, Aaron and Jaworski, Alek and Kaplan, Alexander and Khaitan, Harshit and Killebrew, Daniel and Koch, Andy and Kumar, Naveen and Lacy, Steve and Laudon, James and Law, James and Le, Diemthu and Leary, Chris and Liu, Zhuyuan and Lucke, Kyle and Lundin, Alan and MacKean, Gordon and Maggiore, Adriana and Mahony, Maire and Miller, Kieran and Nagarajan, Rahul and Narayanaswami, Ravi and Ni, Ray and Nix, Kathy and Norrie, Thomas and Omernick, Mark and Penukonda, Narayana and Phelps, Andy and Ross, Jonathan and Ross, Matt and Salek, Amir and Samadiani, Emad and Severn, Chris and Sizikov, Gregory and Snelham, Matthew and Souter, Jed and Steinberg, Dan and Swing, Andy and Tan, Mercedes and Thorson, Gregory and Tian, Bo and Toma, Horia and Tuttle, Erick and Vasudevan, Vijay and Walter, Richard and Wang, Walter and Wilcox, Eric and Yoon, Doe Hyun}, -title = {In-Datacenter Performance Analysis of a Tensor Processing Unit}, -year = {2017}, -isbn = {9781450348928}, -publisher = {Association for Computing Machinery}, -address = {New York, NY, USA}, -url = {https://doi.org/10.1145/3079856.3080246}, -doi = {10.1145/3079856.3080246}, -abstract = {Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95\% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X -- 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X -- 80X higher. Moreover, using the CPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.}, -booktitle = {Proceedings of the 44th Annual International Symposium on Computer Architecture}, -pages = {1-12}, -numpages = {12}, -keywords = {accelerator, neural network, MLP, TPU, CNN, deep learning, domain-specific architecture, GPU, TensorFlow, DNN, RNN, LSTM}, -location = {Toronto, ON, Canada}, -series = {ISCA '17} -} - - -@ARTICLE{Norrie2021TPUv2_3, - author={Norrie, Thomas and Patil, Nishant and Yoon, Doe Hyun and Kurian, George and Li, Sheng and Laudon, James and Young, Cliff and Jouppi, Norman and Patterson, David}, - journal={IEEE Micro}, - title={The Design Process for Google's Training Chips: TPUv2 and TPUv3}, - year={2021}, - volume={41}, - number={2}, - pages={56-63}, - doi={10.1109/MM.2021.3058217} -} - - -@inproceedings{Jouppi2023TPUv4, -author = {Jouppi, Norm and Kurian, George and Li, Sheng and Ma, Peter and Nagarajan, Rahul and Nai, Lifeng and Patil, Nishant and Subramanian, Suvinay and Swing, Andy and Towles, Brian and Young, Clifford and Zhou, Xiang and Zhou, Zongwei and Patterson, David A}, -title = {TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings}, -year = {2023}, -isbn = {9798400700958}, -publisher = {Association for Computing Machinery}, -address = {New York, NY, USA}, -url = {https://doi.org/10.1145/3579371.3589350}, -doi = {10.1145/3579371.3589350}, -abstract = {In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5\% of system cost and <3\% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x--7x yet use only 5\% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus nearly 10x faster overall, which along with OCS flexibility and availability allows a large language model to train at an average of ~60\% of peak FLOPS/second. For similar sized systems, it is ~4.3x--4.5x faster than the Graphcore IPU Bow and is 1.2x--1.7x faster and uses 1.3x--1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~2--6x less energy and produce ~20x less CO2e than contemporary DSAs in typical on-premise data centers.}, -booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, -articleno = {82}, -numpages = {14}, -keywords = {warehouse scale computer, embeddings, supercomputer, domain specific architecture, reconfigurable, TPU, large language model, power usage effectiveness, CO2 equivalent emissions, energy, optical interconnect, IPU, machine learning, GPU, carbon emissions}, -location = {Orlando, FL, USA}, -series = {ISCA '23} -} - - @misc{zhou2021analognets, title = {AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator}, - author = {Chuteng Zhou and Fernando Garcia Redondo and Julian Büchel and Irem Boybat and Xavier Timoneda Comas and S. R. Nandakumar and Shidhartha Das and Abu Sebastian and Manuel Le Gallo and Paul N. Whatmough}, + author = {Chuteng Zhou and Fernando Garcia Redondo and Julian B\"{u}chel and Irem Boybat and Xavier Timoneda Comas and S. R. Nandakumar and Shidhartha Das and Abu Sebastian and Manuel Le Gallo and Paul N. Whatmough}, year = 2021, eprint = {2111.06503}, archiveprefix = {arXiv}, - primaryclass = {cs.AR} + primaryclass = {cs.AR}, } - -@article{wearableinsulin, - author = {Psoma, Sotiria D. and Kanthou, Chryso}, - title = {Wearable Insulin Biosensors for Diabetes Management: Advances and Challenges}, - journal = {Biosensors}, - volume = {13}, - year = {2023}, - number = {7}, - article-number = {719}, - url = {https://www.mdpi.com/2079-6374/13/7/719}, - pubmedid = {37504117}, - issn = {2079-6374}, - doi = {10.3390/bios13070719} -} - - -@article{glucosemonitor, - author={Li, Jingzhen and Tobore, Igbe and Liu, Yuhang and Kandwal, Abhishek and Wang, Lei and Nie, Zedong}, - journal={IEEE Journal of Biomedical and Health Informatics}, - title={Non-invasive Monitoring of Three Glucose Ranges Based On ECG By Using DBSCAN-CNN}, - year={2021}, - volume={25}, - number={9}, - pages={3340-3350}, - doi={10.1109/JBHI.2021.3072628} -} - - -@article{plasma, - author = {Attia, Zachi and Sugrue, Alan and Asirvatham, Samuel and Ackerman, Michael and Kapa, Suraj and Friedman, Paul and Noseworthy, Peter}, - year = {2018}, - month = {08}, - pages = {e0201059}, - title = {Noninvasive assessment of dofetilide plasma concentration using a deep learning (neural network) analysis of the surface electrocardiogram: A proof of concept study}, - volume = {13}, - journal = {PLOS ONE}, - doi = {10.1371/journal.pone.0201059} -} - - -@article{afib, - author = {Yutao Guo and Hao Wang and Hui Zhang and Tong Liu and Zhaoguang Liang and Yunlong Xia and Li Yan and Yunli Xing and Haili Shi and Shuyan Li and Yanxia Liu and Fan Liu and Mei Feng and Yundai Chen and Gregory Y.H. Lip and null null }, - title = {Mobile Photoplethysmographic Technology to Detect Atrial Fibrillation}, - journal = {Journal of the American College of Cardiology}, - volume = {74}, - number = {19}, - pages = {2365-2375}, - year = {2019}, - doi = {10.1016/j.jacc.2019.08.019} -} - - -@article{gaitathome, - author = {Yingcheng Liu and Guo Zhang and Christopher G. Tarolli and Rumen Hristov and Stella Jensen-Roberts and Emma M. Waddell and Taylor L. Myers and Meghan E. Pawlik and Julia M. Soto and Renee M. Wilson and Yuzhe Yang and Timothy Nordahl and Karlo J. Lizarraga and Jamie L. Adams and Ruth B. Schneider and Karl Kieburtz and Terry Ellis and E. Ray Dorsey and Dina Katabi }, - title = {Monitoring gait at home with radio waves in Parkinson's disease: A marker of severity, progression, and medication response}, - journal = {Science Translational Medicine}, - volume = {14}, - number = {663}, - pages = {eadc9669}, - year = {2022}, - doi = {10.1126/scitranslmed.adc9669}, - URL = {https://www.science.org/doi/abs/10.1126/scitranslmed.adc9669}, - eprint = {https://www.science.org/doi/pdf/10.1126/scitranslmed.adc9669} +@article{zhou2022photonic, + title = {Photonic matrix multiplication lights up photonic accelerator and beyond}, + author = {Zhou, Hailong and Dong, Jianji and Cheng, Junwei and Dong, Wenchan and Huang, Chaoran and Shen, Yichen and Zhang, Qiming and Gu, Min and Qian, Chao and Chen, Hongsheng and others}, + year = 2022, + journal = {Light: Science \& Applications}, + publisher = {Nature Publishing Group UK London}, + volume = 11, + number = 1, + pages = 30, } - -@article{Chen2023, - author={Chen, Emma and Prakash, Shvetank and Janapa Reddi, Vijay and Kim, David and Rajpurkar, Pranav}, - title={A framework for integrating artificial intelligence for clinical care with continuous therapeutic monitoring}, - journal={Nature Biomedical Engineering}, - year={2023}, - month={Nov}, - day={06}, - issn={2157-846X}, - doi={10.1038/s41551-023-01115-0}, - url={https://doi.org/10.1038/s41551-023-01115-0} +@inproceedings{zhu2018benchmarking, + title = {Benchmarking and analyzing deep neural network training}, + author = {Zhu, Hongyu and Akrout, Mohamed and Zheng, Bojian and Pelegris, Andrew and Jayarajan, Anand and Phanishayee, Amar and Schroeder, Bianca and Pekhimenko, Gennady}, + year = 2018, + booktitle = {2018 IEEE International Symposium on Workload Characterization (IISWC)}, + pages = {88--100}, + organization = {IEEE}, } - -@article{Zhang2017, - author={Zhang, Qingxue and Zhou, Dian and Zeng, Xuan}, - title={Highly wearable cuff-less blood pressure and heart rate monitoring with single-arm electrocardiogram and photoplethysmogram signals}, - journal={BioMedical Engineering OnLine}, - year={2017}, - month={Feb}, - day={06}, - volume={16}, - number={1}, - pages={23}, - issn={1475-925X}, - doi={10.1186/s12938-017-0317-z}, - url={https://doi.org/10.1186/s12938-017-0317-z} +@article{zhuang_comprehensive_2021, + title = {A {Comprehensive} {Survey} on {Transfer} {Learning}}, + author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, + year = 2021, + month = jan, + journal = {Proceedings of the IEEE}, + volume = 109, + number = 1, + pages = {43--76}, + url = {https://ieeexplore.ieee.org/document/9134370/}, + urldate = {2023-10-25}, + language = {en}, } -@misc{yik2023neurobench, - title={NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking}, - author={Jason Yik and Soikat Hasan Ahmed and Zergham Ahmed and Brian Anderson and Andreas G. Andreou and Chiara Bartolozzi and Arindam Basu and Douwe den Blanken and Petrut Bogdan and Sander Bohte and Younes Bouhadjar and Sonia Buckley and Gert Cauwenberghs and Federico Corradi and Guido de Croon and Andreea Danielescu and Anurag Daram and Mike Davies and Yigit Demirag and Jason Eshraghian and Jeremy Forest and Steve Furber and Michael Furlong and Aditya Gilra and Giacomo Indiveri and Siddharth Joshi and Vedant Karia and Lyes Khacef and James C. Knight and Laura Kriener and Rajkumar Kubendran and Dhireesha Kudithipudi and Gregor Lenz and Rajit Manohar and Christian Mayr and Konstantinos Michmizos and Dylan Muir and Emre Neftci and Thomas Nowotny and Fabrizio Ottati and Ayca Ozcelikkale and Noah Pacik-Nelson and Priyadarshini Panda and Sun Pao-Sheng and Melika Payvand and Christian Pehle and Mihai A. Petrovici and Christoph Posch and Alpha Renner and Yulia Sandamirskaya and Clemens JS Schaefer and André van Schaik and Johannes Schemmel and Catherine Schuman and Jae-sun Seo and Sadique Sheik and Sumit Bam Shrestha and Manolis Sifalakis and Amos Sironi and Kenneth Stewart and Terrence C. Stewart and Philipp Stratmann and Guangzhi Tang and Jonathan Timcheck and Marian Verhelst and Craig M. Vineyard and Bernhard Vogginger and Amirreza Yousefzadeh and Biyan Zhou and Fatima Tuz Zohora and Charlotte Frenkel and Vijay Janapa Reddi}, - year={2023}, - eprint={2304.04640}, - archivePrefix={arXiv}, - primaryClass={cs.AI} +@article{zhuang2020comprehensive, + title = {A comprehensive survey on transfer learning}, + author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, + year = 2020, + journal = {Proceedings of the IEEE}, + publisher = {IEEE}, + volume = 109, + number = 1, + pages = {43--76}, }