Skip to content

Latest commit

 

History

History
983 lines (659 loc) · 45.9 KB

DLArch.md

File metadata and controls

983 lines (659 loc) · 45.9 KB

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019). https://github.com/benedekrozemberczki/CapsGNN

Neural Architecture Search with Deep Neural Network and Monte Carlo Tree Search https://github.com/linnanwang/AlphaX-NASBench101

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search https://github.com/lixincn2015/Partial-Order-Pruning

DeepSwarm神经网络架构搜索框架 https://github.com/Pattio/DeepSwarm

【MegEngine预训练模型库】’MegEngine Model Hub - 基于旷视研究院领先的深度学习算法,提供满足多业务场景的预训练模型' https://github.com/MegEngine/Hub

ptgnn:微软开源的PyTorch图神经网络库 https://github.com/microsoft/ptgnn

A Gluon implementation of Mnasnet https://github.com/chinakook/Mnasnet.MXNet

浅谈GNN:能力与局限 https://www.aminer.cn/research_report/5ea26452ab6e30e67b2c865f?download=false

A Gluon implement of MobileNetV3 https://github.com/AmigoCDT/MXNet-MobileNetV3

Simple Self Attention Layer (SimpleSelfAttention) https://github.com/sdoria/SimpleSelfAttention

【Graph U-Nets的PyTorch实现】 https://github.com/HongyangGao/Graph-U-Nets

MobileNetV3, PyTorch, pretrained model, on ImageNet 1K, detailed training processing. https://github.com/PengBoXiangShang/MobileNetV3_PyTorch https://github.com/xiaolai-sqlai/mobilenetv3

Caffe Implementation of MobileNets V3 https://github.com/jixing0415/caffe-mobilenet-v3

【图解GNN】《An Illustrated Guide to Graph Neural Networks》 https://medium.com/dair-ai/an-illustrated-guide-to-graph-neural-networks-d5564a551783

Mobile Computer Vision @ Facebook - Mobile vision models and code https://github.com/facebookresearch/mobile-vision

nn_builder - Removes the need for boilerplate code when building neural networks https://github.com/p-christ/nn_builder

Implementation on EfficientNet model. Keras. https://arxiv.org/abs/1905.11946 https://github.com/qubvel/efficientnet https://github.com/lukemelas/EfficientNet-PyTorch https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

Github上的图神经网络必读论文和最新进展列表 https://weibo.com/ttarticle/p/show?id=2309634376853627462065&u=1923399337&m=4376981174538555&cu=1923399337&ru=2210647444&rm=4376853626016899

This repository contains the PyTorch implementation of the paper Multi-Scale Dense Networks for Resource Efficient Image Classification https://github.com/kalviny/MSDNet-PyTorch

Graph U-Nets https://arxiv.org/abs/1905.05178

This is a pytorch re-implementation of Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition https://github.com/songdejia/DFL-CNN

深度神经网络可扩展异步神经结构/超参搜索 https://github.com/deephyper/deephyper

A GPipe implementation in PyTorch https://github.com/KakaoBrain/torchgpipe

Code for our CVPR 2019 paper: Selective Kernel Networks; See zhihu:https://zhuanlan.zhihu.com/p/59690223 https://github.com/implus/SKNet

MorphNet:更快更小的神经网络探索 https://ai.googleblog.com/2019/04/morphnet-towards-faster-and-smaller.html

通过多尺寸卷积核卷积神经网络的深度学习方法进行关系抽取/分类的PyTorch实现 https://github.com/lemonhu/RE-CNN-pytorch

implement the CVPR 2019 paper "Selective Kernel Networks" by PyTorch https://github.com/pppLang/SKNet

实现时尚图片分类的四种CNN模型 https://towardsdatascience.com/the-4-convolutional-neural-network-models-that-can-classify-your-fashion-images-9fe7f3e5399d

AutoML与轻量模型大列表 https://github.com/guan-yuan/awesome-AutoML-and-Lightweight-Models

用遗传算法搜索最优神经网络结构 https://github.com/Tsdevendra1/NEAT-Algorithm

(Keras)神经网络多标签分类 https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fmulti-label-image-classification-with-neural-network-keras-ddc1ab1afede

(PyTorch)从头开始实现 Transformers https://github.com/pbloem/former

机器学习自动化(AutoML)文献/工具/项目资源大列表 https://github.com/windmaple/awesome-AutoML

ShuffleNet 系列模型实现与预训练模型 https://github.com/megvii-model/ShuffleNet-Series

HungaBunga:遍历超参“暴力”优化Scikit-Learn模型 https://github.com/ypeleg/HungaBunga

A PyTorch implementation of WS-DAN (Weakly Supervised Data Augmentation Network) for FGVC (Fine-Grained Visual Classification) https://github.com/GuYuc/WS-DAN.PyTorch

Pretrained Image & Video ConvNets for PyTorch: NASNet, ResNeXt (2D + 3D), ResNet (2D + 3D), InceptionV4, InceptionResnetV2, Xception, DPN, NonLocalNets, R(2+1)D nets, MultiView CNNs, Temporal Relation Networks, etc https://github.com/alexandonian/pretorched-x

NNI:微软发布的开源神经架构搜索/超参调优自动机器学习(AutoML)工具包,通过多种调优算法搜索最佳神经网络结构和(或)超参,支持单机、本地多机、云等不同的运行环境】’NNI (Neural Network Intelligence) - An open source AutoML toolkit for neural architecture search and hyper-parameter tuning' by Microsoft https://github.com/Microsoft/nni/releases

Implementation of the paper: "Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition" (ICCV 2019) https://github.com/HCPLab-SYSU/SSGRL

面向计算机视觉的卷积网络及其实现大列表 https://github.com/osmr/imgclsmob

Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights. https://github.com/ansleliu/EfficientNet.PyTorch

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture" https://github.com/Res2Net/Res2Net-PretrainedModels

和逻辑推理相关的神经网络模型 ,和认知的双系统很相关《Deep Reasoning Networks: Thinking Fast and Slow》 https://www.aminer.cn/pub/5d04e900da56295d08dd2ba8/deep-reasoning-networks-thinking-fast-and-slow

集成关系到神经网络《An Explicitly Relational Neural Network Architecture》 https://www.aminer.cn/pub/5cf48a3eda56291d582a0d77/an-explicitly-relational-neural-network-architecture

Efficient Transformers for research, PyTorch and Tensorflow using Locality Sensitive Hashing https://github.com/cerebroai/reformers

Code release for "Adversarial Robustness vs Model Compression, or Both?" https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM

Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification https://github.com/hguosc/visual_attention_consistency

Geom-GCN: Geometric Graph Convolutional Networks https://github.com/graphdml-uiuc-jlu/geom-gcn

Code for ICLR 2020 paper 'AtomNAS: Fine-Grained End-to-End Neural Architecture Search' https://github.com/meijieru/AtomNAS

Source code accompanying our CVPR 2019 paper: "NetTailor: Tuning the architecture, not just the weights." https://github.com/pedro-morgado/nettailor

Official pyTorch implementation of "Dynamic-Net: Tuning the Objective Without Re-training for Synthesis Tasks" experiments https://github.com/AlonShoshan10/dynamic_net

R2D2: Reliable and Repeatable Detector and Descriptor https://github.com/naver/r2d2

Multi-level Wavelet-CNN for Image Restoration https://github.com/lpj0/MWCNN

Improved Wave-U-Net implemented in Pytorch https://github.com/f90/Wave-U-Net-Pytorch

This is a tensorflow implementation of high-resolution representations for ImageNet classification. https://github.com/yuanyuanli85/tf-hrnet

Codebase for Image Classification Research, written in PyTorch. https://github.com/facebookresearch/pycls

Official code for using / reproducing CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge". https://github.com/laura-rieger/deep-explanation-penalization

partial residual networks https://github.com/WongKinYiu/PartialResidualNetworks

SPOS(Single Path One-Shot Neural Architecture Search with Uniform Sampling) rebuilt in Pytorch with single GPU. https://github.com/ShunLu91/Single-Path-One-Shot-NAS

This repository contains FCOS(ICCV'19) with VoVNet (CVPRW'19) efficient backbone networks. This code based on pytorch imeplementation of FCOS https://github.com/vov-net/VoVNet-FCOS

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning https://github.com/alexanderrichard/NeuralNetwork-Viterbi

Graph Neural Networks for Multi-Label Classification https://github.com/QData/LaMP

CORnet: Modeling the Neural Mechanisms of Core Object Recognition https://github.com/dicarlolab/CORnet

The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture" https://github.com/Res2Net/Res2Net-PretrainedModels

PyTorch code for our BMVC 2019 paper "Image Classification with Hierarchical Multigraph Networks" https://github.com/bknyaz/bmvc_2019

Efficient Graph Generation with Graph Recurrent Attention Networks, Deep Generative Model of Graphs, Graph Neural Networks, NeurIPS 2019 https://github.com/lrjconan/GRAN

Bridging the gap Between Stability and Scalability in Neural Architecture Search https://github.com/xiaomi-automl/SCARLET-NAS

A Pytorch implementation for the paper Local Relational Networks for Image Recognition https://github.com/gan3sh500/local-relational-nets

Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP-IJCNLP 19) https://github.com/INK-USC/KagNet

PyTorch implementation of "Searching for A Robust Neural Architecture in Four GPU Hours", CVPR 2019 https://github.com/D-X-Y/GDAS

'《深入浅出图神经网络:GNN原理解析》配套代码' https://github.com/FighterLYL/GraphNeuralNetwork

'用Keras实现的图卷积神经网络 Semi-supervised classification by Graph Convolutional Network with Keras' https://github.com/zhouchunpong/GCN_Keras

【超参自动优化框架Optuna发布v1.0版】“Optuna: An open source hyperparameter optimization framework to automate hyperparameter search” https://github.com/optuna/optuna 新特性:用最先进优化算法实现高效超参优化;支持各种机器学习库,包括PyTorch、TensorFlow、Keras、FastAI、Scikit-Learn、LightGBM和XGBoost;支持跨多台计算机并行执行,减少优化时间;搜索空间可用 Python 控制语句描述;集成多种可视化,方便对优化结果进行各种分析

《FrequentNet : A New Deep Learning Baseline for Image Classification》 https://arxiv.org/abs/2001.01034

【女孩图片多标签分类】 https://github.com/KichangKim/DeepDanbooru

【Keras超参调试器】’Keras Tuner - Hyperparameter tuning for humans' https://github.com/keras-team/keras-tuner

【BANANAS:新的神经网络架构搜索方法(NAS)】 https://github.com/naszilla/bananas

'(Generic) EfficientNets for PyTorch - Pretrained EfficientNet, MixNet, MobileNetV3, MNASNet A1 and B1, FBNet, Single-Path NAS' https://github.com/rwightman/gen-efficientnet-pytorch

CBNet:一种新的目标检测复合骨干网络架构 https://medium.com/swlh/cbnet-a-novel-composite-backbone-network-architecture-for-object-detection-review-88b79a838ef1

21秒看尽ImageNet屠榜模型,60+模型架构同台献艺 https://mp.weixin.qq.com/s/8isaB1F_66Ykx3eojYMmRw

'ICCV 2019 Tutorial: Everything You Need to Know to Reproduce SOTA Deep Learning Models https://github.com/zhreshold/ICCV19-GluonCV

Interpretable Convolutional Neural Networks https://github.com/zqs1022/interpretableCNN

Keras implementation of the graph attention networks (GAT) by Veličković et al. (2017; https://arxiv.org/abs/1710.10903) https://github.com/danielegrattarola/keras-gat

Advbox: a toolbox to generate adversarial examples that fool neural networks https://github.com/advboxes/AdvBox

This repository contains a MXNet implementation of the paper Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. https://github.com/terrychenism/OctaveConv

NSGA-Net, a Neural Architecture Search Algorithm https://github.com/ianwhale/nsga-net

Official Implementation of "DeepCaps: Going Deeper with Capsule Networks" paper (CVPR 2019). https://github.com/brjathu/deepcaps

Channel Pruning via Automatic Structure Search https://github.com/lmbxmu/ABCPruner

ANRL: Attributed Network Representation Learning via Deep Neural Networks https://github.com/cszhangzhen/ANRL

Official repo for `ELF: Embedded Localisation of Features in pre-trained CNN' (ICCV19) https://github.com/abenbihi/elf

MobileNetV3 https://github.com/kuan-wang/pytorch-mobilenet-v3 https://github.com/Randl/MobileNetV3-pytorch https://github.com/leaderj1001/MobileNetV3-Pytorch

Making Convolutional Networks Shift-Invariant Again https://github.com/adobe/antialiased-cnns

Attention in Graph Neural Networks using PyTorch https://arxiv.org/abs/1905.02850 https://github.com/bknyaz/graph_attention_pool

A PyTorch implementation of MixNet architecture: MixNet: Mixed Depthwise Convolutional Kernels. https://github.com/romulus0914/MixNet-Pytorch

Source code of the paper Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction https://github.com/tangxianfeng/STDN

Implementation for: Graph-Based Global Reasoning Networks (CVPR19) https://github.com/facebookresearch/GloRe

TensorFlow Implementation of TCN (Temporal Convolutional Networks) https://github.com/Songweiping/TCN-TF

PyTorch implementation of binary neural networks https://github.com/wonnado/binary-nets

Implementation for the paper "DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks", which has been accepted by KDD'2019. https://github.com/motefly/DeepGBM

图深度学习技术路线图(文献列表) https://github.com/guillaumejaume/graph-neural-networks-roadmap

GNN相关工作有如井喷,为了使论文列表更加条理,同学们按主题进行了分类 https://github.com/thunlp/GNNPapers

"On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data". https://github.com/lunanbit/UUlearning

Pytorch code for Unsupervised Embedding Learning via Invariant and Spreading Instance Feature in CVPR 2019. https://github.com/mangye16/Unsupervised_Embedding_Learning

EfficientNets snapshot https://github.com/mingxingtan/efficientnet

Octave Convolution Implementation in PyTorch https://github.com/braincreators/octconv

Pytorch implementation of Graph U-Nets (ICML19) https://arxiv.org/abs/1905.05178 https://github.com/HongyangGao/gunet

知识蒸馏相关文献列表 https://github.com/lhyfst/knowledge-distillation-papers

Graph Neural Networks: A Review of Methods and Applications https://arxiv.org/abs/1812.08434 https://github.com/thunlp/GNNPapers

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). https://github.com/benedekrozemberczki/ClusterGCN

高分辨率网络(HRNet):视觉识别通用神经网络架构 https://www.microsoft.com/en-us/research/blog/high-resolution-network-a-universal-neural-architecture-for-visual-recognition/

【nlpgnn:图神经网络自然语言处理工具箱 https://github.com/kyzhouhzau/NLPGNN

【DeepSNAP:图深度学习辅助库,用于连接图处理库和深度学习框架】 https://github.com/snap-stanford/deepsnap

'GNN_Review - GNN综述阅读报告,涵盖多篇GNN方面的论文' https://github.com/LYuhang/GNN_Review

OpenNE-Pytorch:开源网络嵌入工具包 https://mp.weixin.qq.com/s/61FTg6WZOmyPgRNVjLYfxg https://github.com/thunlp/OpenNE/tree/pytorch

AAAI 2020. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting https://github.com/Davidham3/STSGCN

"Improved Residual Networks for Image and Video Recognition" https://github.com/iduta/iresnet

Associating Multi-Scale Receptive Fields for Fine-grained Recognition https://github.com/FouriYe/CNL-ICIP2020

Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper. https://github.com/google-research/big_transfer

Reference implementation for Blueprint Separable Convolutions (CVPR 2020) https://github.com/zeiss-microscopy/BSConv

Neural Architecture Transfer (Arxiv'20), PyTorch Implementation https://github.com/human-analysis/neural-architecture-transfer

The code for Robust Line Segments Matching via Graph Convolution Networks https://github.com/mameng1/GraphLineMatching

Code for paper "Learning Semantically Enhanced Feature for Fine-grained Image Classification" https://github.com/cswluo/SEF

Dynamic Group Convolution for Accelerating Convolutional Neural Networks (ECCV 2020) https://github.com/zhuogege1943/dgc

SCAN: Learning to Classify Images without Labels (ECCV 2020) https://github.com/wvangansbeke/Unsupervised-Classification

[ECCV 2020] NAS-DIP: Learning Deep Image Prior with Neural Architecture Search https://github.com/YunChunChen/NAS-DIP-pytorch

code for paper "Graph Structure of Neural Networks" https://github.com/facebookresearch/graph2nn

GNN-algorithms - 图神经网络相关算法详述及实现 https://github.com/wangyouze/GNN-algorithms

面向超深度神经网络训练的层内归一化技术 https://theaisummer.com/normalization/

Performer:具有线性扩展注意力机制的Transformer架构 https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html

GraphGym:用于设计和评价图神经网络(GNN)的实验平台 https://github.com/snap-stanford/GraphGym

The EfficientDet Architecture in PyTorch https://amaarora.github.io/2021/01/13/efficientdet-pytorch.html

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks https://github.com/szq0214/MEAL-V2

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances https://github.com/CompVis/invariances

Asymmetric Loss For Multi-Label Classification https://github.com/Alibaba-MIIL/ASL

Pytorch code for view-GCN [CVPR2020]. https://github.com/weixmath/view-GCN

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

https://github.com/lucidrains/lambda-networks

Multi-level Wavelet Convolutional Neural Networks https://github.com/lpj-github-io/MWCNNv2

Transferable Recognition-Aware Image Processing https://github.com/liuzhuang13/Transferable_RA

This is a partial implementation of Generative Teaching Networks https://github.com/GoodAI/GTN

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy. https://github.com/blackfeather-wang/GFNet-Pytorch

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances (NeurIPS 2020) https://github.com/alinlab/CSI

Codebase for Learning Invariances in Neural Networks https://github.com/g-benton/learning-invariances

This repo contains the code for the paper Rethinking Bottleneck Structure for Efficient Mobile Network Design (ECCV 2020)

https://github.com/yitu-opensource/MobileNeXt

MONeT framework for reducing memory consumption of DNN training https://github.com/utsaslab/MONeT

Primal-Dual Mesh Convolutional Neural Networks https://github.com/MIT-SPARK/PD-MeshNet

Original PyTorch implementation of "Gradient Boosting Neural Networks: GrowNet"

https://github.com/sbadirli/GrowNet

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems https://github.com/amazon-research/FeatGraph

TensorFlow implementation of "ResNeSt: Split-Attention Networks" https://github.com/YeongHyeon/ResNeSt-TF2

This repository provides the code for training with Correctness Ranking Loss presented in the paper "Confidence-Aware Learning for Deep Neural Networks" accepted to ICML2020. https://github.com/daintlab/confidence-aware-learning

[Paper] SplitNet: Divide and Co-training. Also, an image classification toolbox includes ResNet, Wide-ResNet, ResNeXt, ResNeSt, ResNeXSt, SENet, Shake-Shake, DenseNet, PyramidNet, and EfficientNet.

https://github.com/mzhaoshuai/SplitNet-Divide-and-Co-training

This is the code for the ICML'20 paper "Rethinking Bias-Variance Trade-off for Generalization of Neural Networks". https://github.com/yaodongyu/Rethink-BiasVariance-Tradeoff

Official code release for paper "Improving Confidence Estimates for Unfamiliar Examples" https://github.com/lizhitwo/ConfidenceEstimates

This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient Image Transformers).

https://github.com/facebookresearch/deit

Learning from Failure: Training Debiased Classifier from Biased Classifier (NeurIPS 2020) https://github.com/alinlab/LfF

Codes and datasets for AAAI-2021 paper "Learning to Pre-train Graph Neural Networks" https://github.com/rootlu/L2P-GNN

Simple Numpy implementation of the FAVOR+ attention mechanism introduce in Rethinking Attention with Performers

https://github.com/teddykoker/performer

PyTorch and Torch implementation for our accepted CVPR 2020 paper (Oral): Controllable Orthogonalization in Training DNNs https://github.com/huangleiBuaa/ONI

This repo contains the code for the paper Rethinking Bottleneck Structure for Efficient Mobile Network Design

https://github.com/zhoudaquan/rethinking_bottleneck_design

PyTorch implementation of Lambda Network and pretrained Lambda-ResNet https://github.com/d-li14/lambda.pytorch

AAAI'21: Data Augmentation for Graph Neural Networks https://github.com/zhao-tong/GAug

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" https://github.com/VITA-Group/TENAS

Implementation of Hang et al. 2020 "Hyperspectral Image Classification with Attention Aided CNNs" for tree species prediction https://github.com/weecology/DeepTreeAttention

Semantic Segmentation PyTorch code for our paper: Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition https://github.com/iduta/pyconvsegnet

Code for the paper "Contrastive Clustering" https://github.com/Yunfan-Li/Contrastive-Clustering

This is an official implementation of our CVPR 2020 paper "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms". https://github.com/BA-Transform/BAT-Image-Classification

Implementation of Spectral Leakage and Rethinking the Kernel Size in CNNs in Pytorch https://github.com/EvgenyKashin/non-leaking-conv

FcaNet: Frequency Channel Attention Networks https://github.com/cfzd/FcaNet

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks https://github.com/wofmanaf/SA-Net

NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch https://github.com/vballoli/nfnets-pytorch

This repository provides a minimal implementation of adaptive gradient clipping (AGC) (as proposed in High-Performance Large-Scale Image Recognition Without Normalization1) in TensorFlow 2. https://github.com/sayakpaul/Adaptive-Gradient-Clipping

图机器学习入门指南 https://gordicaleksa.medium.com/how-to-get-started-with-graph-machine-learning-afa53f6f963a

NitroML:面向机器学习和AutoML的模块化、可移植和可扩展的模型质量基准框架 https://github.com/google/nitroml

Model Search:大规模模型架构搜索AutoML算法框架 https://github.com/google/model_search

Pre-trained NFNets with 99% of the accuracy of the official paper "High-Performance Large-Scale Image Recognition Without Normalization". https://github.com/benjs/nfnets_pytorch

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

可解释性 https://github.com/wbw520/scouter

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training https://arxiv.org/abs/2102.08098 https://github.com/zhuchen03/gradinit

图神经网络理论基础 https://www.bilibili.com/video/bv1iy4y1h7pT https://www.youtube.com/watch?v=uF53xsT7mjc https://petar-v.com/talks/GNN-Wednesday.pdf

GNN_note - 图神经网络整理(相关论文和笔记)

https://github.com/joeat1/GNN_note

A2S2K-ResNet: Attention-Based Adaptive Spectral-Spatial Kernel ResNet for Hyperspectral Image Classification https://github.com/suvojit-0x55aa/A2S2K-ResNet

Relative Neural Architecture Search via Slow-Fast Learning https://github.com/EMI-Group/RelativeNAS

图神经网络 a gallery for benchmarking Graph Neural Networks (GNNs) with TensorFlow 2.x and PyTorch backend https://github.com/EdisonLeeeee/GraphGallery

神经网络架构搜索(NAS)相关论文列表 https://github.com/jackguagua/awesome-nas-papers

Transformer in Transformer https://www.arxiv-vanity.com/papers/2103.00112/

Model Complexity of Deep Learning: A Survey https://www.arxiv-vanity.com/papers/2103.05127

TransFG: A Transformer Architecture for Fine-grained Recognition https://www.arxiv-vanity.com/papers/2103.07976

《HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark》(ICLR 2021) github.com/RICE-EIC/HW-NAS-Bench

EfficientNetV2: Smaller Models and Faster Training https://www.arxiv-vanity.com/papers/2104.00298

每个部署框架都有不同的特性,高效算法设计必须考虑目标平台的特性,才能取得最好的性能。微软亚洲研究员们通过神经网络行为分析,发现了7个改进神经网络设计的策略

https://weibo.com/ttarticle/p/show?id=2309404623786484826193

EfficientNetV2: Smaller Models and Faster Training github.com/d-li14/efficientnetv2.pytorch

Truly shift-invariant convolutional neural networks github.com/achaman2/truly_shift_invariant_cnns

《GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network》(NeurIPS 2020)

github.com/PruneTruong/GOCor

《Lite-HRNet: A Lightweight High-Resolution Network》(CVPR 2021) github.com/HRNet/Lite-HRNet

EfficientNetV2:Google官方发布的EfficientNetV2及预训练模型 github.com/google/automl/tree/master/efficientnetv2

图神经网络(GNN)实战教程 https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial7/GNN_overview.html

Graph Neural Networks: A Review of Methods and Applications https://www.sciencedirect.com/science/article/pii/S2666651021000012

《MLP-Mixer: An all-MLP Architecture for Vision》(2021)

github.com/lucidrains/mlp-mixer-pytorch github.com/rishikksh20/MLP-Mixer-pytorch github.com/isaaccorley/mlp-mixer-pytorch github.com/lucidrains/mlp-gpt-jax github.com/sayakpaul/MLP-Mixer-CIFAR10

《Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet》 github.com/lukemelas/do-you-even-need-attention

《ResMLP: Feedforward networks for image classification with data-efficient training》(2021) github.com/lucidrains/res-mlp-pytorch

《Self-Supervised Learning with Swin Transformers》(2021) github.com/SwinTransformer/Transformer-SSL

《Grammatically Recognizing Images with Tree Convolution》(KDD 2020) github.com/wanggrun/TreeConv

《Involution: Inverting the Inherence of Convolution for Visual Recognition》(CVPR 2021) github.com/rish-16/involution_pytorch

《Are Deep Neural Architectures Losing Information? Invertibility Is Indispensable》(2020) github.com/Lillian1082/IRAE_pytorch

《FNet: Mixing Tokens with Fourier Transforms》(2021) github.com/rishikksh20/FNet-pytorch

《You Only Learn One Representation: Unified Network for Multiple Tasks》(2021) github.com/WongKinYiu/yolor

《Pay Attention to MLPs》(2021) github.com/lucidrains/g-mlp-pytorch

《Leveraging Sparse Linear Layers for Debuggable Deep Networks》(2021) github.com/MadryLab/DebuggableDeepNetworks

《Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules》(2020) github.com/sarthmit/BRIMs

《Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks》(ICLR 2021) github.com/bboylyg/NAD

《Personalized Transformer for Explainable Recommendation》(ACL 2021) github.com/lileipisces/PETER

《Structure-Based Function Prediction using Graph Convolutional Networks》(2019) github.com/flatironinstitute/DeepFRI

《Editable Neural Networks》(ICLR 2020) github.com/xtinkt/editable

《Orthogonalizing Convolutional Layers with the Cayley Transform》(ICLR 2021) github.com/locuslab/orthogonal-convolutions

《Unifying Graph Convolutional Neural Networks and Label Propagation》(2020) github.com/hwwang55/GCN-LPA

《Convolutional Normalization: Improving Robustness and Training for Deep Neural Networks》(2021) github.com/shengliu66/ConvNorm

《Involution: Inverting the Inherence of Convolution for Visual Recognition》(CVPR 2021) github.com/ChristophReich1996/Involution

《O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks》(2019) github.com/hjimce/O2U-Net

《Pay Attention to MLPs》(2021) github.com/jaketae/g-mlp

《The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models》(CVPR 2021) github.com/VITA-Group/CV_LTH_Pre-training

图神经网络概览(Colab) github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial7/GNN_overview.ipynb github.com/phlippe/uvadlc_notebooks/tree/master/docs/tutorial_notebooks

《EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network》 #新transform架构

github.com/murufeng/EPSANet

STFT_Transformer - Code for STFT Transformer used in BirdCLEF 2021 competition(Kaggle). github.com/jfpuget/STFT_Transformer

《GraphiT: Encoding Graph Structure in Transformers》(2021) github.com/inria-thoth/GraphiT

《Convolutional Dynamic Alignment Networks for Interpretable Classifications》(CVPR 2021)

github.com/moboehle/CoDA-Nets

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition github.com/Andrew-Qibin/VisionPermutator

Rethinking Token-Mixing MLP for MLP-based Vision Backbone https://www.arxiv-vanity.com/papers/2106.14882

Global Filter Networks for Image Classification github.com/raoyongming/GFNet

图注意力网络(GAT)的PyTorch实现与解读 https://nn.labml.ai/graphs/gat/index.html

《Rethinking Differentiable Search for Mixed-Precision Neural Networks》(CVPR 2020) github.com/zhaoweicai/EdMIPS

Query2Label: A Simple Transformer Way to Multi-Label Classification github.com/SlongLiu/query2labels

深入浅出图深度学习 ericmjl.github.io/essays-on-data-science/machine-learning/graph-nets/

MicroNet: Improving Image Recognition with Extremely Low FLOPs github.com/liyunsheng13/micronet

Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer https://arxiv.org/abs/2108.09193

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study github.com/VITA-Group/Deep_GCN_Benchmarking

图神经网络交互式简介 #TODO

https://distill.pub/2021/gnn-intro/

理解图上的卷积 https://distill.pub/2021/understanding-gnns/

Sparse-MLP: A Fully-MLP Architecture with Conditional Computation https://arxiv.org/abs/2109.02008

Block Pruning For Faster Transformers https://arxiv.org/abs/2109.04838

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? https://arxiv.org/abs/2109.05422

为何 Transformer 在计算机视觉中如此受欢迎 #Transformer

https://weibo.com/ttarticle/p/show?id=2309404684991857820078

A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition https://arxiv.org/abs/2110.01240

Token Pooling in Vision Transformers https://arxiv.org/abs/2110.03860

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

github.com/ChongjianGE/CARE

《Object-Region Video Transformers》 github.com/roeiherz/ORViT

Jittor-MLP - Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, ConvMLP, ConvMixer in Jittor github.com/liuruiyang98/Jittor-MLP

《HRFormer: High-Resolution Transformer for Dense Prediction》 github.com/HRNet/HRFormer

configaformers:高度可配置的Transformer库,简化模型架构搜索和实验 github.com/antofuller/configaformers

Swin Transformer V2: Scaling Up Capacity and Resolution github.com/microsoft/Swin-Transformer

MetaFormer is Actually What You Need for Vision github.com/sail-sg/poolformer

从头理解Transformer #COM 讲得非常清楚

https://e2eml.school/transformers.html

轻量网络概览 awesome_lightweight_networks - MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc. github.com/murufeng/awesome_lightweight_networks

给数学新手的图神经网络教程 https://rish16.notion.site/Graph-Neural-Networks-for-Novice-Math-Fanatics-c51b922a595b4efd8647788475461d57

Neural Net Editor 神经网络编辑器,基于 imnodes 的流程图编辑工具 github.com/scarsty/node-editor

awesome-vit:Vision Transformer 论文精选列表和总结 github.com/open-mmlab/awesome-vit

神经网络架构设计/可视化工具集

github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Network

NN SVG:深度学习网络架构图可视化生成 https://alexlenail.me/NN-SVG/index.html

基于图神经网络的系统交互推理 medium.com/stanford-cs224w/how-to-analyze-interacting-systems-using-graph-neural-networks-940da9f9c013

神经架构搜索(NAS):基本原理和主要方法 https://theaisummer.com/neural-architecture-search/

VFormer:PyTorch模块化Vision Transformer库 github.com/SforAiDl/vformer

Transformer Recipe:Transformer学习资料与参考实现汇编 github.com/dair-ai/Transformers-Recipe/blob/main/README.md

GNNs Recipe:图神经网络相关资源集锦 github.com/dair-ai/GNNs-Recipe

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations https://arxiv.org/abs/2202.07800

Learning to Merge Tokens in Vision Transformers https://arxiv.org/abs/2202.12015

attentions:PyTorch实现的几种注意力 github.com/sooftware/attentions

从微分几何和代数拓扑角度看图神经网络 towardsdatascience.com/graph-neural-networks-through-the-lens-of-differential-geometry-and-algebraic-topology-3a7c3c22d5f

EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers

https://arxiv.org/abs/2203.03952

图神经网络背后的数学 rish-16.github.io/posts/gnn-math/

Vision Transformer Cookbook with Tensorflow:Vision Transformer的Tensorflow实现方案手册 github.com/taki0112/vit-tensorflow

Understanding The Robustness in Vision Transformers https://arxiv.org/abs/2204.12451

gtrick: 图神经网络技巧集 github.com/sangyx/gtrick

Inception Transformer https://arxiv.org/abs/2205.12956

MinT:从头实现的最小化Transformer库 github.com/dpressel/mint

EfficientFormer: Vision Transformers at MobileNet Speed https://arxiv.org/abs/2206.01191

Can CNNs Be More Robust Than Transformers? https://arxiv.org/abs/2206.03452

Peripheral Vision Transformer https://arxiv.org/abs/2206.06801

【各种注意力机制的PyTorch实现】’External-Attention-pytorch - Pytorch implementation of various Attention Mechanism' by xmu-xiaoma66 GitHub: https:// github.com/xmu-xiaoma666/External-Attention-pytorch

[CV]《M&M Mix: A Multimodal Multiview Transformer Ensemble》X Xiong, A Arnab, A Nagrani, C Schmid [Google Research] (2022) https://arxiv.org/abs/2206.09852

[CV]《k-means Mask Transformer》Q Yu, H Wang, S Qiao, M Collins, Y Zhu, H Adam, A Yuille, L Chen [Johns Hopkins University & Google Research] (2022) https://arxiv.org/abs/2207.04044

【面向视觉的Transformer】《Transformers in Vision》by Niccolò Zanichelli https:// iaml-it.github.io/posts/2021-04-28-transformers-in-vision/

TinyViT: Fast Pretraining Distillation for Small Vision Transformers https://arxiv.org/abs/2207.10666

【BMList:大模型大列表】’BMList - A List of Big Models' by OpenBMB GitHub: github.com/OpenBMB/BMList

【视觉Transformer及其下游任务相关工作资源大列表】’Awesome Vision Transformer Collection - Variants of Vision Transformer and its downstream tasks' by Runwei Guan GitHub: github.com/GuanRunwei/Awesome-Vision-Transformer-Collection

[CV]《MViTv2: Improved Multiscale Vision Transformers for Classification and Detection》Y Li, C Wu, H Fan, K Mangalam, B Xiong, J Malik, C Feichtenhofer [Facebook AI Research & UC Berkeley] (2022) https://arxiv.org/abs/2112.01526

【基于Transformer的3D视觉相关论文列表】’3D Vision with Transformers - A list of 3D computer vision papers with Transformers' by lahoud GitHub: github.com/lahoud/3d-vision-transformers

清华朱军组博士深度解读大红大紫的diffusion model

https://www.zhihu.com/question/536012286?utm_id=0

【生成式扩散模型综述】’A-Survey-on-Generative-Diffusion-Model' by Hanqun CAO GitHub: github.com/chq1155/A-Survey-on-Generative-Diffusion-Model

【图神经网络(GNN)系统相关文献资源列表】’Awesome Graph Neural Network Systems - A list of awesome GNN systems.' by Cheng Wan GitHub: github.com/chwan1016/awesome-gnn-systems

[CV]《Hydra Attention: Efficient Attention with Many Heads》D Bolya, C Fu, X Dai, P Zhang, J Hoffman [Georgia Tech & Meta AI] (2022) https://arxiv.org/abs/2209.07484

【Tensorflow/Keras移植版Stable Diffusion模型】’Stable Diffusion in Tensorflow / Keras - Tensorflow port of the Stable Diffusion Model' by Divam Gupta GitHub: github.com/divamgupta/stable-diffusion-tensorflow

【Stable Diffusion的Google Colab集】’stable-diffusion-colab-tools - Stable diffusion very easy and useful tools on Google Colaboratory(Google Colab)' by karaage GitHub: github.com/karaage0703/stable-diffusion-colab-tools

【Transformers-Tutorials:HuggingFace Transformers库实例教程集】’Transformers-Tutorials - This repository contains demos I made with the Transformers library by HuggingFace.' by NielsRogge GitHub: https:// github.com/NielsRogge/Transformers-Tutorials

【TrAVis: 浏览器里的Transformer注意力可视化工具】’TrAVis: Transformer Attention Visualiser - TrAVis: Visualise BERT attention in-browser' by Ayaka GitHub: github.com/ayaka14732/TrAVis

【Transformer Engine:Transformer引擎,在 NVIDIA GPU上加速Transformer模型的库,包括在 Hopper GPU上使用FP8,以在训练和推理中提供更好的性能和更低的内存使用】’Transformer Engine - A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.' by NVIDIA GitHub: github.com/NVIDIA/TransformerEngine

[CV]《A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective》C Chen, Y Wu, Q Dai, H Zhou, M Xu, S Yang, X Han, Y Yu [The University of Hong Kong & The Chinese University of Hong Kong & ShanghaiTech University] (2022) https://arxiv.org/abs/2209.13232

[CV]《Improving Sample Quality of Diffusion Model Using Self-Attention Guidance》S Hong, G Lee, W Jang, S Kim [Korea University] (2022) https://arxiv.org/abs/2210.00939

【深入浅出Transformers】《Transformers》 by Lucas Beyer [Google] https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/edit#slide=id.g31364026ad_3_2

'VisionTransformer - ViT trained on COYO-Labeled-300M dataset' by kakaobrain GitHub: github.com/kakaobrain/coyo-vit

[CV]《RGB no more: Minimally-decoded JPEG Vision Transformers》J Park, J Johnson [University of Michigan] (2022)

https://arxiv.org/abs/2211.16421

【基于FlashAttention优化的Transformer实现,GPT2/GPT3训练速度比Huggingface版实现快3-5倍】’Optimized Transformer implementation' by HazyResearch GitHub: github.com/HazyResearch/flash-attention/tree/main/training

【Networks-Beyond-Attention:面向视觉的现代卷积网络架构资源列表】’Networks-Beyond-Attention - A compilation of network architectures for vision and others without usage of self-attention mechanism' by FocalNet GitHub: github.com/FocalNet/Networks-Beyond-Attention

各种各样神奇的自注意力机制(Self-attention)变形 - 知乎 https://zhuanlan.zhihu.com/p/527688857

2023 年扩散模型还有什么可做的方向? - 知乎 https://www.zhihu.com/question/568791838

一文解释 Diffusion Model (一) DDPM 理论推导 - 知乎 https://zhuanlan.zhihu.com/p/565901160

GPT能力从何而来? 语言模型涌现能力溯源 https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1

Rethinking Vision Transformers for MobileNet Size and Speed https://arxiv.org/abs/2212.08059

Transformer机制可解释性初探 https://www.neelnanda.io/mechanistic-interpretability/getting-started

[CV]《BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons》Y Xu, X Chen, Y Wang [Huawei Noah’s Ark Lab] (2022) https://arxiv.org/abs/2212.14158

【图神经网络(GNNs)学习指南】’Graph Neural Networks (GNNs) Study Guide - A study guide to learn about Graph Neural Networks (GNNs)' DAIR.AI GitHub: github.com/dair-ai/GNNs-Recipe

[CV]《Rethinking Mobile Block for Efficient Neural Models》J Zhang, X Li, J Li, L Liu, Z Xue, B Zhang, Z Jiang, T Huang, Y Wang, C Wang [Tencent & Peking University & Wuhan University] (2023) https://arxiv.org/abs/2301.01146

[CV]《Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling》K Tian, Y Jiang, Q Diao, C Lin, L Wang, Z Yuan [Peking University & Bytedance Inc & University of Oxford] (2023) https://arxiv.org/abs/2301.03580

Transformer 进化史(时间线) https://amatriain.net/blog/transformer-models-an-introduction-and-catalog-2d1e9039f376/?continueFlag=6817c7861421f8b7a171c6db348c259e

[LG]《Transformer models: an introduction and catalog》X Amatriain (2023) https://arxiv.org/abs/2302.07730

FastViT是一种高效的混合视觉Transformer架构,通过结构重参数化和进一步的架构改进,在多个计算平台上实现了出色的性能,特别是在较高分辨率下能够显著提高运行时效率。 https://arxiv.org/abs/2303.14189

提出一种新的CNN架构InceptionNeXt,通过将大卷积核分解为小卷积核和恒等映射,实现了高性能和高效率的平衡。 https://arxiv.org/abs/2303.16900 [CV]《InceptionNeXt: When Inception Meets ConvNeXt》W Yu, P Zhou, S Yan, X Wang [National University of Singapore & Sea AI Lab] (2023)

提出名为Slide Attention的局部注意力机制,可以高效集成到各种Vision Transformer模型和硬件设备,并在综合基准测试中取得了稳定的改进性能。 https://arxiv.org/abs/2304.04237
[CV]《Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention》X Pan, T Ye, Z Xia, S Song, G Huang [Tsinghua University] (2023)

提出一种名为EfficientViT的高速视觉Transformer模型,采用了新的构建块和级联组注意力模块,以实现内存效率和通道通信的提高,同时在速度和准确性之间取得了良好的平衡。 https://arxiv.org/abs/2305.07027 [CV]《EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention》X Liu, H Peng, N Zheng, Y Yang, H Hu, Y Yuan [Microsoft Research & The Chinese University of Hong Kong] (2023)

【C Transformers:用GGML库实现的C/C++ Transformer模型的Python bindings,支持多种模型,如GPT-2,GPT-J等】'C Transformers - Python bindings for the Transformer models implemented in C/C++ using GGML library.' Ravindra Marella GitHub: github.com/marella/ctransformers

提出Hiera,一种极简的分层视觉Transformer,通过MAE预训练去除不必要的组件,提高了准确性和速度,使其成为图像和视频识别任务中的最先进模型。 https://arxiv.org/abs/2306.00989 [CV]《Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles》C Ryali, Y Hu, D Bolya, C Wei, H Fan, P Huang, V Aggarwal, A Chowdhury, O Poursaeed, J Hoffman, J Malik, Y Li, C Feichtenhofer [Meta AI] (2023)

对Transformer模型在深度学习任务中应用的全面综述,揭示了Transformer在五个主要应用领域(NLP、计算机视觉、多模态、音频和语音处理、信号处理)的潜力和未来可能性。 https://arxiv.org/abs/2306.07303 [LG]《A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks》S Islam, H Elmekki, A Elsebai, J Bentahar, N Drawel, G Rjoub, W Pedrycz [Concordia University] (2023)

提出一种名为Conformer的结构,将卷积层和Transformer结合起来,用于训练大型语言模型,从而有效地提升了模型的性能。 https://arxiv.org/abs/2307.00461 [CL]《Conformer LLMs -- Convolution Augmented Large Language Models》P Verma [Stanford University] (2023)

NaViT是一种新的视觉Transformer,通过在训练过程中使用序列打包来处理任意分辨率和宽高比的输入,从而在训练效率、模型适应性和推理灵活性方面超越了传统的Vision Transformer。 https://arxiv.org/abs/2307.06304

【一种名为Monarch Mixer BERT(M2-BERT)的新架构,通过用Monarch矩阵替代Transformer的主要组成部分,实现了全新的子线性架构,在质量相匹配的情况下,降低了参数数量和FLOPs,有用于处理更长序列的潜力】《Monarch Mixer: Revisiting BERT, Without Attention or MLPs · Hazy Research》 https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert

提出一种将RetNet和Transformer相结合的方法(RMT),通过引入显式衰减机制和在图像两个坐标轴上分解建模过程,RMT在计算机视觉任务中取得了显著的性能提升,突出了RetNet在视觉领域的潜力和优势。 https://arxiv.org/abs/2309.11523 [CV]《RMT: Retentive Networks Meet Vision Transformers》Q Fan, H Huang, M Chen, H Liu, R He [CASIA] (2023)

一个transformer如何工作的可视化页面 网址:ig.ft.com/generative-ai/ ​​​

提出一种近似注意力机制HyperAttention,通过细粒度的参数衡量问题的难度,实现了线性时间采样算法,可以处理大型语言模型中的长上下文,并通过使用局部敏感哈希技术提高了速度。 https://arxiv.org/abs/2310.05869 [LG]《HyperAttention: Long-context Attention in Near-Linear Time》I Han, R Jarayam, A Karbasi, V Mirrokni, D P. Woodruff, A Zandieh [Google Research & Yale University] (2023)

提出一种嵌套Transformer架构MatFormer,通过联合优化不同粒度的模型,实现弹性推断,从而满足不同的部署需求。 https://arxiv.org/abs/2310.07707 [LG]《MatFormer: Nested Transformer for Elastic Inference》Devvrit, S Kudugunta, A Kusupati, T Dettmers, K Chen, I Dhillon, Y Tsvetkov, H Hajishirzi, S Kakade, A Farhadi, P Jain [Google Research & University of Texas at Austin & University of Washington] (2023)