A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

This repository contains a list of papers, codes, datasets, leaderboards in SLU field. If you found any error, please don't hesitate to open an issue or pull request.

If you find this repository helpful for your work, please kindly cite the following paper. The Bibtex are listed below:

@misc{qin2021survey,
      title={A Survey on Spoken Language Understanding: Recent Advances and New Frontiers}, 
      author={Libo Qin and Tianbao Xie and Wanxiang Che and Ting Liu},
      year={2021},
      eprint={2103.03095},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contributor

Contributed by Libo Qin, Tianbao Xie, Yudi Zhang, Lehan Wang, Wanxiang Che.

Thanks for supports from our adviser Wanxiang Che!

Introduction

Spoken language understanding (SLU) is a critical component in task-oriented dialogue systems. It usually consists of intent and slot filling task to extract semantic constituents from the natrual language utterances.

For the purpose of alleviating pressure in article/dataset collation, we worked on sorting out the relevant data sets, papers, codes and lists of SLU in this project.

At present, the project has been completely open source, including:

SLU domain dataset sorting table: we sorted out the dataset used in SLU field. You can index in it and get the message of general scale, basic structure, content, characteristics, source and acquisition method of the dataset you want to know.
Articles and infos in different directions in the field of SLU: we classified and arranged the papers according to the current mainstream frontiers. Each line of the list contains not only the title of the paper, but also the year of publication, the source of publication, the paper link and code link for quick indexing, as well as the dataset used.
Leaderboard list on the mainstream datasets of SLU: we sorted out the leaderboard on the mainstream datasets, and distinguished them according to pre-trained or not. In addition to the paper/model/method name and related scores, each line also has links to year, paper and code if it has.

The taxonomy and frontiers of our survey can be summarized into this picture below.

Quick path

Resources
Dataset
Frontiers
LeaderBoard
- ATIS
  - Non-pretrained model
  - + Pretrained model
- SNIPS
  - Non-pretrained model
  - + Pretrained model

Resources

survey paper links

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers arxiv [pdf]
Spoken language understanding: Systems for extracting semantic information from speech book [pdf]
Recent Neural Methods on Slot Filling and Intent Classification COLING 2020 [pdf]
A survey of joint intent detection and slot-filling models in natural language understanding arxiv 2021 [pdf]

recent open-sourced code

Single Model

Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network (SNIPS) ACL 2020 [pdf] [code]
Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding (ATIS/Stanford Dialogue Dataset) COLING 2018 [pdf] [code]

Joint Model

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection(ATIS/SNIPS) ICASSP 2021 [pdf] [code]
SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling (ATIS/SNIPS) EMNLP 2020 [pdf] [code]
Joint Slot Filling and Intent Detection via Capsule Neural Networks (ATIS/SNIPS) ACL 2019 [pdf] [code]
BERT for Joint Intent Classification and Slot Filling (ATIS/SNIPS/Stanford Dialogue Dataset) arXiv 2019 [pdf] [code]
A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling (ATIS/Stanford Dialogue Dataset/SNIPS) ACL 2019 [pdf] [code]
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding (ATIS/SNIPS/CAIS) EMNLP 2019 [pdf] [code]
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction (ATIS/Stanford Dialogue Dataset,SNIPS) NAACL 2018 [pdf] [code]
Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks (ATIS) SIGDIAL 2016 [pdf] [code]

Complex SLU Model

How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues (DSTC4) NAACL 2018 [pdf] [code]
Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning (DSTC4) IJCNLP 2017 [pdf] [code]
Dynamic time-aware attention to speaker roles and contexts for spoken language understanding (DSTC4) IEEE 2017 [pdf] [code]
Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding (CAIS/ECDT-NLU) arXiv 2020 [pdf] [code]
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding (ATIS/SNIPS/CAIS) EMNLP 2019 [pdf] [code]
Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling (SNIPS) ACL 2020 [pdf] [code]
CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP (SC2/4/MLDoc/Multi WOZ/Facebook Multilingual SLU Dataset) IJCAI 2020 [pdf] [code]
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment (Multilingual spoken language understanding (SLU) dataset) EMNLP 2020 [pdf] [code]
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems (Facebook Multilingual SLU Dataset/(DST)MultiWOZ) AAAI 2020 [pdf] [code]
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark (MTOP/Multilingual ATIS) arXiv 2020 [pdf] [code]
Neural Architectures for Multilingual Semantic Parsing (GEO/ATIS) ACL 2017 [pdf] [code]
Few-shot Learning for Multi-label Intent Detection (TourSG/StandfordLU) AAAI 2021 [pdf] [code]
Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network (SNIPS and further construct) ACL 2020 [pdf] [code]

Dataset

Name	Intro	Links	Multi/Single Turn(M/S)	Detail	Size & Stats	Label
ATIS	1. The ATIS (Airline Travel Information Systems) dataset (Tur et al., 2010) is widely used in SLU research 2. For natural language understanding	Download: 1.https://github.com/yizhen20133868/StackPropagation-SLU/tree/master/data/atis 2.https://github.com/yvchen/JointSLU/tree/master/data Paper: https://www.aclweb.org/anthology/H90-1021.pdf	S	Airline Travel Information However, this data set has been shown to have a serious skew problem on intent	Train: 4478 Test: 893 120 slot and 21 intent	Intent Slots
SNIPS	1. Collected by Snips for model evaluation. 2. For natural language understanding 3. Homepage: https://medium.com/snips-ai/benchmarking-natural-language-understanding-systems-google-facebook-microsoft-and-snips-2b8ddcf9fb19	Download: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06-custom-intent-engines Paper: https://arxiv.org/pdf/1805.10190.pdf	S	7 task: Weather,play music, search, add to list, book, moive	Train:13,084 Test:700 7 intent 72 slot labels	Intent Slots
Facebook Multilingual SLU Dataset	1 Contains English, Spanish, and Thai across the weather, reminder, and alarm domains 2 For cross-lingual SLU	Download: https://fb.me/multilingual_task_oriented_data Paper: https://www.aclweb.org/anthology/N19-1380.pdf	S	Utterances are manually translated and annotated	Train: English 30,521; Spanish 3,617; Thai 2,156 Dev: English 4,181; Spanish 1,983; Thai 1,235 Test: English 8,621; Spanish 3,043; Thai 1,692 11 slot and 12 intent	Intent Slots
MIT Restraunt Corpus	MIT corpus contains train set and test set in BIO format for NLU	Download: https://groups.csail.mit.edu/sls/downloads/restaurant/	S	It is a single-domain dataset, which is associated with restaurant reservations. MR contains ‘open-vocabulary’ slots, such as restaurant names	Train:7760 Test:1521	Slots
MIT Movie Corpus	The MIT Movie Corpus is a semantically tagged training and test corpus in BIO format. The eng corpus are simple queries, and the trivia10k13 corpus are more complex queries.	Download: https://groups.csail.mit.edu/sls/downloads/movie/	S	The MIT movie corpus consists of two single-domain datasets: the movie eng (ME) and movie trivia (MT) datasets. While both datasets contain queries about film information, the trivia queries are more complex and specific	eng Corpus: Train:9775 Test:2443 Trivia Corpus: Train:7816 Test:1953	Slots
Multilingual ATIS	ATIS was manually translated into Hindi and Turkish	Download: It has been put into LDC, and you can download it if you are own a membership or pay for it Paper: http://shyamupa.com/papers/UFTHH18.pdf	S	3 languages	On the top of ATIS dataset, 893 and 715 utterances from the ATIS test split were translated and annotated for Hindi and Turkish evaluation respectively also translated and annotated 600(each language separately) utterances from the ATIS train split to use as supervision In total 37,084 training examples and 7,859 test examples	Intent Slots
Multilingual ATIS++	Extends Multilingual ATIS corpus to nine languages across four language families	Download: contact [email protected]. Paper: https://arxiv.org/abs/2004.14353	S	10 languages	check the paper to find the full table of description (to many info ,have no enough space here)	Intent Slots
Almawave-SLU	1. A dataset for Italian SLU 2. Was generated through a semi-automatic procedure from SNIPS	Download: contact [first name initial].[last name]@almawave.it for the dataset (any author in this paper) Paper: https://arxiv.org/pdf/1907.07526.pdf	S	6 domains: Music, Restaurants, TV, Movies, Books, Weather	Train: 7,142 Validation: 700 Test: 700 7 intents and 39 slots	Intent Slots
Chatbot Corpus	1. Chatbot Corpus is based on questions gathered by a Telegram chatbot which answers questions about public transport connections, consisting of 206 questions 2. For intent classification test	Download: https://github.com/sebischair/NLU-Evaluation-Corpora Paper: https://www.aclweb.org/anthology/W17-5522.pdf	S	2 Intents: Departure Time, Find Connection 5 entity types: StationStart, StationDest, Criterion, Vehicle, Line	Train: 100 Test: 106	Intent Entity
StackExchange Corpus	1. StackExchange Corpus is based on data from two StackExchange platforms: ask ubuntu and Web Applications 2. Gathers 290 questions and answers in total, 100 from Web Applications and 190 from ask ubuntu 3. For intent classification test	Download: https://github.com/sebischair/NLU-Evaluation-Corpora Paper: https://www.aclweb.org/anthology/W17-5522.pdf	S	Ask ubuntu Intents: “Make Update”, “Setup Printer”, “Shutdown Computer”, and “Software Recommendation” Web Applications Intents: “Change Password”, “Delete Account”, “Download Video”, “Export Data”, “Filter Spam”, “Find Alternative”, and “Sync Accounts”	Total: 290 Ask ubuntu: 190 Web Application: 100	Intent Entity
MixSNIPS/MixATIS	multi-intent dataset based on SNIPS and ATIS	Download: https://github.com/LooperXX/AGIF/tree/master/data Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.163.pdf	S	using conjunctions, connecting sentences with different intents forming a ratio of 0.3,0.5 and 0.2 for sentences has which 1,2 and 3 intents, respectively	Train:12,759 utterances Dev:4,812 utterances Test:7,848 utterances	Intent(Multi),Slots
TOP semantic parsing	1,Hierarchical annotation scheme for semantic parsing 2,Allows the representation of compositional queries 3,Can be efficiently and accurately parsed by standard constituency parsing models	Download: http://fb.me/semanticparsingdialog Paper: https://www.aclweb.org/anthology/D18-1300.pdf	S	focused on navigation, events, and navigation to events evaluation script can be run from evaluate.py within the dataset	44783 annotations Train:31279 Dev:4462 Test:9042	Inten ,Slots in Tree format
MTOP: Multilingual TOP	1.An almost-parallel multilingual task-oriented semantic parsing dataset covering 6 languages and 11 domains. 2.the first multilingual dataset that contain compositional representations that allow complex nested queries. 3.the dataset creation: i) generating synthetic utterances and annotating in English, ii) translation, label transfer, post-processing, post editing and filtering for other languages	Download: https://fb.me/mtop_dataset Paper: https://arxiv.org/pdf/2008.09335.pdf	S	6 languages (both high and low resource): English, Spanish, French, German, Hindi and Thai. a mix of both simple as well as compositional nested queries across 11 domains, 117 intents and 78 slots.	100k examples in total for 6 languages. Roughly divided into 70:10:20 percent splits for train,eval and test.	Two kinds of representations: 1.flat representatiom: Intent and slots 2.compositional decoupled representations:nested intents inside slots More details 3.2 section in the paper
CAIS	Collected from real world speaker systems with manual annotations of slot tags and intent labels	https://github.com/Adaxry/CM-Net	S	1.The utterances were collected from the Chinese Artificial Intelligence Speakers 2.Adopt the BIOES tagging scheme for slots instead of the BIO2 used in the ATIS 3.intent labels are partial to the PlayMusic option	Train: 7,995 utterances Dev: 994 utterances Test: 1024 utterances	slots tags and intent labels
Simulated Dialogues dataset	machines2machines (M2M)	Download: https://github.com/google-research-datasets/simulated-dialogue Paper: http://www.colips.org/workshop/dstc4/papers/60.pdf	M	Slots: Sim-R (Restaurant) price_range, location, restaurant_name, category, num_people, date, time Sim-M (Movie) theatre_name, movie, date, time, num_people Sim-GEN (Movie):theatre_name, movie, date, time, num_people	Train: Sim-R:1116 Sim-M:384 Sim-GEN:100k Dev: Sim-R:349 Sim-M:120 Sim-GEN:10k Test: Sim-R:775 Sim-M:264 Sim-GEN:10k	Dialogue state User's act,slot,intent System's act,slot
Schema-Guided Dialogue Dataset(SGD)	dialogue simulation(auto based on identified scenarios), word-replacement and human intergration as paraphrasing	Download: https://github.com/google-researchdatasets/dstc8-schema-guided-dialogue Paper: https://arxiv.org/pdf/1909.05855.pdf	M	domains:16,dialogues:16142,turns:329964,acg turns per dialogue:20.44,total unique tokens:30352,slots:214,slot values:14319	NA	Scheme Representation: service_name;description;slot's name,description,is_categorial,possible_values;intent's name,description,is_transactional,required_slots,optional_slots,result_slots. Dialogue Representation: dialogue_id,services,turns,speaker,utterance,frame,service,slot's name,start,exclusive_end;action's act,slot,values,canonical_values;service_call's method,parameters;service_results,state's active_intent,requested_slots,slot_values

Frontiers

Single Slot Filling

Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network (SNIPS) ACL 2020 [pdf] [code]
A Hierarchical Decoding Model For Spoken Language Understanding From Unaligned Data (DSTC2) ICASSP 2019 [pdf]
Utterance Generation With Variational Auto-Encoder for Slot Filling in Spoken Language Understanding (ATIS/SNIPS/MIT Corpus) IEEE Signal Processing Letters 2019 [pdf]
Data Augmentation with Atomic Templates for Spoken Language Understanding (ATIS) EMNLP 2019 [pdf]
A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System (ATIS/CNLL-2003) COLING 2018 [pdf]
Improving Slot Filling in Spoken Language Understanding with Joint Pointer and Attention (DSTC2) ACL 2018 [pdf]
Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding (ATIS/Stanford Dialogue Dataset) COLING 2018 [pdf] [code]
Encoder-Decoder with Focus-Mechanism for Sequence Labelling Based Spoken Language Understanding (ATIS) ICASSP 2017 [pdf]
Neural Models for Sequence Chunking (ATIS/LARGE) AAAI 2017 [pdf]
Bi-directional recurrent neural network with ranking loss for spoken language understanding (ATIS) IEEE 2016 [pdf]
Labeled Data Generation with Encoder-decoder LSTM for Semantic Slot Filling (ATIS) INTERSPEECH 2016 [pdf]
Syntax or Semantics? Knowledge-Guided Joint Semantic Frame Parsing (ATIS/Cortana) IEEE Workshop on Spoken Language Technology 2016 [pdf]
Bi-Directional Recurrent Neural Network with Ranking Loss for Spoken Language Understanding (ATIS) ICASSP 2016 [pdf]
Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling (ATIS) EMNLP 2016 [pdf]
Labeled Data Generation with Encoder-decoder LSTM for Semantic Slot Filling (ATIS) INTERSPEECH 2016 [pdf]
Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding (ATIS) IEEE/ACM TASLP 2015 [pdf]
Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding (ATIS) IEEE/ACM Transactions on Audio, Speech, and Language Processing 2015 [pdf]
Recurrent Neural Network Structured Output Prediction for Spoken Language Understanding (ATIS) - 2015 [pdf]
Spoken Language Understanding Using Long Short-Term Memory Neural Networks (ATIS) IEEE 2014 [pdf]
Recurrent conditional random field for language understanding (ATIS) IEEE 2014 [pdf]
Recurrent Neural Networks for Language Understanding (ATIS) INTERSPEECH 2013 [pdf]
Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding (ATIS) ISCA 2013 [pdf]
Large-scale personal assistant technology deployment: the siri experience (-) INTERSPEECH 2013 [pdf]

Single Intent Detection

Zero-shot User Intent Detection via Capsule Neural Networks (SNIPS/CVA) EMNLP 2018 [pdf]
Intention Detection Based on Siamese Neural Network With Triplet Loss (SNIPS/ATIS/Facebook multilingual datasets/ Daily Dialogue/MRDA) IEEE Acess 2020 [pdf]
Multi-Layer Ensembling Techniques for Multilingual Intent Classification (ATIS) arXiv 2018 [pdf]
Deep Unknown Intent Detection with Margin Loss (SNIPS/ATIS) ACL 2019 [pdf]
Subword Semantic Hashing for Intent Classification on Small Datasets (The Chatbot Corpus/The AskUbuntu Corpus) IJCNN 2019 [pdf]
Dialogue intent classification with character-CNN-BGRU networks (the Chinese Wikipedia dataset) Multimedia Tools and Applications 2018 [pdf]
Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates (Alexa) InterSpeech 2018 [pdf]
Recurrent neural network and LSTM models for lexical utterance classification (ATIS/CB) INTERSPEECH 2015 [pdf]
Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification (collected by the author) ACL 2018 [pdf]
Exploiting Shared Information for Multi-Intent Natural Language Sentence Classification (collected by the author) ISCA 2013 [pdf]

Joint Model

Implicit joint modeling

Leveraging Non-Conversational Tasks for Low Resource Slot Filling: Does it help? (ATIS/MIT Restaurant, and Movie/OntoNotes 5.0/OPUS News Commentary) SIGDIAL 2019 [pdf]
Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems (ATIS/SNIPS) SIGDIAL 2019 [pdf]
Multi-task learning for Joint Language Understanding and Dialogue State Tracking (M2M/DSTC2) SIGDIAL 2018 [pdf]
A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding (ATIS/CQUD) IJCAI 2016 [pdf]
Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks (ATIS) SIGDIAL 2016 [pdf] [code]
Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM (ATIS) INTERSPEECH 2016 [pdf]
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling (ATIS) INTERSPEECH 2016 [pdf]
Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM (ATIS) INTERSPEECH 2016 [pdf]
JOINT SEMANTIC UTTERANCE CLASSIFICATION AND SLOT FILLING WITH RECURSIVE NEURAL NETWORKS (ATIS/Stanford Dialogue Dataset,Microsoft Cortana conversational understanding task(-)) IEEE SLT 2014 [pdf]
CONVOLUTIONAL NEURAL NETWORK BASED TRIANGULAR CRF FOR JOINT INTENT DETECTION AND SLOT FILLING (ATIS) IEEE Workshop on Automatic Speech Recognition and Understanding 2013 [pdf]

Explicit joint modeling

A Result based Portable Framework for Spoken Language Understanding(KVRET) ICME 2021 [pdf]
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection(ATIS/SNIPS) ICASSP 2021 [pdf] [code]
SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling (ATIS/SNIPS) EMNLP 2020 [pdf] [code]
Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding(ATIS/SNIPS) AAAI 2020 [pdf]
Joint Slot Filling and Intent Detection via Capsule Neural Networks (ATIS/SNIPS) ACL 2019 [pdf] [code]
A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding (ATIS/SNIPS) EMNLP 2019 [pdf] [code]
A Joint Learning Framework With BERT for Spoken Language Understanding (ATIS/SNIPS/Facebook's Multilingual dataset) IEEE 2019 [pdf]
BERT for Joint Intent Classification and Slot Filling (ATIS/SNIPS/Stanford Dialogue Dataset) arXiv 2019 [pdf] [code]
A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling (ATIS/Stanford Dialogue Dataset,SNIPS) ACL 2019 [pdf] [code]
Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog (ATIS/Stanford Dialogue Dataset/SNIPS) NAACL 2019 [pdf]
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding (ATIS/SNIPS/CAIS) EMNLP 2019 [pdf] [code]
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling (ATIS) NAACL 2018 [pdf]
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction (ATIS/Stanford Dialogue Dataset,SNIPS) NAACL 2018 [pdf] [code]
A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding (ATIS) EMNLP 2018 [pdf]

Contextual SLU

Knowing Where to Leverage: Context-Aware Graph Convolutional Network with An Adaptive Fusion Layer for Contextual Spoken Language Understanding (Simulated Dialogues dataset) IEEE 2021 [pdf]
Dynamically Context-sensitive Time-decay Attention for Dialogue Modeling (DSTC4) IEEE 2019 [pdf]
Multi-turn Intent Determination for Goal-oriented Dialogue systems (Frames/Key-Value Retrieval) IJCNN 2019 [pdf]
Transfer Learning for Context-Aware Spoken Language Understanding (single-turn: ATIS/SNIPS multi-turn: Simulated Dialogues dataset) IEEE 2019 [pdf]
How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues (DSTC4) NAACL 2018 [pdf] [code]
An Efficient Approach to Encoding Context for Spoken Language Understanding (Simulated Dialogues dataset) InterSpeech 2018 [pdf]
Speaker-sensitive dual memory networks for multi-turn slot tagging (Microsoft Cortana) IEEE 2017 [pdf]
Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning (DSTC4) IJCNLP 2017 [pdf] [code]
Sequential dialogue context modeling for spoken language understanding (collected by the author) SIGDIAL 2017 [pdf]
End-to-end joint learning of natural language understanding and dialogue manager (DSTC4) IEEE 2017 [pdf] [code]
Dynamic time-aware attention to speaker roles and contexts for spoken language understanding (DSTC4) IEEE 2017 [pdf] [code]
An Intelligent Assistant for High-Level Task Understanding (collected by the author) IUI 2016 [pdf]
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding (Collected from Microsoft Cortana) INTEERSPEECH 2016 [pdf]
Leveraging behavioral patterns of mobile applications for personalized spoken language understanding (collected by the author) ICMI 2015 [pdf]
Contextual spoken language understanding using recurrent neural networks (single-turn: ATIS multi-turn: Microsoft Cortana) 2015 [pdf]
Contextual domain classification in spoken language understanding systems using recurrent neural network (collected by the author) IEEE 2014 [pdf]
Easy contextual intent prediction and slot detection (collected by the author) IEEE 2013 [pdf]

Multi-intent SLU

AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling (MixATIS/MixSNIPS) EMNLP 2020 [pdf] [code]
Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog (ATIS/SNIPS/internal dataset) NACCL 2019 [pdf]
Two-stage multi-intent detection for spoken language understanding (Korean-language corpus for the TV guide domain colleted by author) Multimed Tools Appl 2017 [pdf]
Exploiting Shared Information for Multi-intent Natural Language Sentence Classification (inhouse corpus from Microsoft) Interspeech 2013 [pdf]

Chinese SLU

Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding (CAIS/ECDT-NLU) arXiv 2020 [pdf] [code]
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding (ATIS/SNIPS/CAIS) EMNLP 2019 [pdf] [code]

Cross-domain SLU

Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling (SNIPS) ACL 2020 [pdf] [code]
Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset (SGD) AAAI 2020 [pdf]
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents (ATIS/SINPS) AAAI 2019 [pdf]
Zero-Shot Adaptive Transfer for Conversational Language Understanding (collected by author) AAAI 2019 [pdf]
Robust Zero-Shot Cross-Domain Slot Filling with Example Values (SNIPS/XSchema) ACL 2019 [pdf]
Concept Transfer Learning for Adaptive Language Understanding (ATIS/DSTC2&3) SIGDIAL 2018 [pdf]
Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents (generated by the author) NAACL 2018 [pdf]
Bag of Experts Architectures for Model Reuse in Conversational Language Understanding (generated by the author) NAACL-HLT 2018 [pdf]
Domain Attention with an Ensemble of Experts (corpus 7 Microsoft Cortana domains) ACL 2017 [pdf]
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling INTERSPEECH 2017 (collected by the author) [pdf]
Zero-Shot Learning across Heterogeneous Overlapping Domains INTERSPEECH 2017 (inhouse data from Amazon) [pdf]
Domainless Adaptation by Constrained Decoding on a Schema Lattice (Cortana) COLING 2016 [pdf]
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding (United Airlines/Airbnb/Grey-hound bus service/OpenTable (Data obtained from App)) INTERSPEECH 2016 [pdf]
Natural Language Model Re-usability for Scaling to Different Domains (ATIS/MultiATIS) EMNLP 2016 [pdf]
Frustratingly Easy Neural Domain Adaptation (Cortana) COLING 2016 [pdf]
Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM (ATIS) INTERSPEECH 2016 [pdf]
A Model of Zero-Shot Learning of Spoken Language Understanding (generated by the author) EMNLP 2015 [pdf]
Online adaptative zero-shot learning spoken language understanding using word-embedding (DSTC2) IEEE 2015 [pdf]
Multi-Task Learning for Spoken Language Understanding with Shared Slots (collected by the author) INTERSPEECH 2011 [pdf]

Cross-lingual SLU

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP (SC2/4/MLDoc/Multi WOZ/Facebook Multilingual SLU Dataset) IJCAI 2020 [pdf] [code]
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment (Multilingual spoken language understanding (SLU) dataset) EMNLP 2020 [pdf] [code]
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU (ATIS/MultiATIS) EMNLP 2020 [pdf]
Multi-Level Cross-Lingual Transfer Learning With Language Shared and Specific Knowledge for Spoken Language Understanding (Facebook Multilingual SLU Dataset) IEEE Access 2020 [pdf]
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems (Facebook Multilingual SLU Dataset/(DST)MultiWOZ) AAAI 2020 [pdf] [code]
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark (MTOP /Multilingual ATIS) arXiv 2020 [pdf] [code]
Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding (ATIS) EMNLP-IJCNLP 2019 [pdf]
Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables (Facebook Multilingual SLU Dataset) EMNLP-IJCNLP 2019 [pdf]
Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog (Facebook Multilingual SLU Dataset) NAACL 2019 [pdf]
Almawave-SLU: A new dataset for SLU in Italian ([email protected]) CEUR Workshop 2019 [pdf]
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model (ATIS/SNIPS) arXiv 2019 [pdf]
(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding (ATIS manually translated into Hindi and Turkish) IEEE/ICASSP 2018 [pdf]
Neural Architectures for Multilingual Semantic Parsing (GEO/ATIS) ACL 2017 [pdf] [code]
Multi-style adaptive training for robust cross-lingual spoken language understanding (English-Chinese ATIS) IEEE 2013 [pdf]
ASGARD: A PORTABLE ARCHITECTURE FOR MULTILINGUAL DIALOGUE SYSTEMS (collected from crowd-sourcing platform) ICASSP 2013 [pdf]
Combining multiple translation systems for Spoken Language Understanding portability (MEDIA) IEEE 2012 [pdf]

Low-resource SLU

Few-shot SLU

Few-shot Learning for Multi-label Intent Detection (TourSG/StandfordLU) AAAI 2021 [pdf] [code]
Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network (SNIPS and further construct) ACL 2020 [pdf] [code]
Data Augmentation for Spoken Language Understanding via Pretrained Models (ATIS/SNIPS) arXiv 2020 [pdf]
Data augmentation by data noising for open vocabulary slots in spoken language understanding (ATIS/Snips/MIT-Restaurant.) NAACL-HLT 2019 [pdf]
Data Augmentation for Spoken Language Understanding via Joint Variational Generation (ATIS) AAAI 2019 [pdf]
Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding (ATIS) ACL 2018 [pdf]
Concept Transfer Learning for Adaptive Language Understanding (ATIS/DSTC2&3) SIGDIAL 2018 [pdf]

Zero-shot SLU

Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling (SNIPS) ACL 2020 [pdf] [code]
Zero-Shot Adaptive Transfer for Conversational Language Understanding (collected by the author) AAAI 2019 [pdf]
Toward zero-shot Entity Recognition in Task-oriented Conversational Agents (Entity gazetteers/Synthetic Gazetteers/Synthetic Utterances) SIGDIAL 2018 [pdf]
Zero-shot User Intent Detection via Capsule Neural Networks (SNIPS/CVA) EMNLP 2018 [pdf]
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling INTERSPEECH 2017 [pdf]
Zero-Shot Learning across Heterogeneous Overlapping Domains INTERSPEECH 2017 [pdf]
A Model of Zero-Shot Learning of Spoken Language Understanding (generated by the author) EMNLP 2015 [pdf]
Zero-shot semantic parser for spoken language understanding (DSTC2&3) INTERSPEECH 2015 [pdf]

Unsupervised SLU

Dialogue State Induction Using Neural Latent Variable Models (MultiWOZ 2.1/SGD) IJCAI 2020 [pdf]

LeaderBoard

ATIS

Non-pretrained model

Model	Intent Acc	Slot F1	Paper / Source	Code link	Conference
Co-Interactive(Qin et al., 2021)	97.7	95.9	A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [pdf]	https://github.com/kangbrilliant/DCA-Net	ICASSP
Graph LSTM(Zhang et al., 2021)	97.20	95.91	Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding [pdf]	-	AAAI
Stack Propagation(Qin et al., 2019)	96.9	95.9	A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding [pdf]	https://github.com/LeePleased/StackPropagation-SLU	EMNLP
SF-ID+CRF(SF first)(E et al., 2019)	97.76	95.75	A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling [pdf]		ACL
SF-ID+CRF(ID first)(E et al., 2019)	97.09	95.8	A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling [pdf]	https://github.com/ZephyrChenzf/SF-ID-Network-For-NLU	ACL
Capsule-NLU(Zhang et al. 2019)	95	95.2	Joint Slot Filling and Intent Detection via Capsule Neural Networks [pdf]	https://github.com/czhang99/Capsule-NLU	ACL
Utterance Generation With Variational Auto-Encoder(Guo et al., 2019)	-	95.04	Utterance Generation With Variational Auto-Encoder for Slot Filling in Spoken Language Understanding [pdf]	-	IEEE Signal Processing Letters
JULVA(full)(Yoo et al., 2019)	97.24	95.51	Data Augmentation for Spoken Language Understanding via Joint Variational Generation [pdf]	-	AAAI
CM-Net(Liu et al., 2019)	99.1	96.20	CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding[pdf]	https://github.com/Adaxry/CM-Net	EMNLP
Data noising method(Kim et al., 2019)	98.43	96.20	Data augmentation by data noising for open vocabulary slots in spoken language understanding [pdf]	-	NAACL-HLT
ACD(Zhu et al., 2018)	-	96.08	Concept Transfer Learning for Adaptive Language Understanding [pdf]	-	SIGDIAL
A Self-Attentive Model with Gate Mechanism(Li et al., 2018)	98.77	96.52	A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding [pdf]	-	EMNLP
Slot-Gated(Goo et al., 2018)	94.1	95.2	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction [pdf]	https://github.com/MiuLab/SlotGated-SLU	NAACL
DRL based Augmented Tagging System(Wang et al., 2018)	-	97.86	A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System [pdf]	-	COLING
Bi-model(Wang et al., 2018)	98.76	96.65	A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling [pdf]	-	NAACL
Bi-model+decoder(Wang et al., 2018)	98.99	96.89	A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling [pdf]	-	NAACL
Seq2Seq DA for LU(Hou et al., 2018)	-	94.82	Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding [pdf]	https://github.com/AtmaHou/Seq2SeqDataAugmentationForLU	COLING
BLSTM-LSTM(Zhu et al., 2017)	-	95.79	ENCODER-DECODER WITH FOCUS-MECHANISM FOR SEQUENCE LABELLING BASED SPOKEN LANGUAGE UNDERSTANDING [pdf]	-	ICASSP
neural sequence chunking model(Zhai et al., 2017)	-	95.86	Neural Models for Sequence Chunking [pdf]	-	AAAI
Joint Model of ID and SF(Zhang et al., 2016)	98.32	96.89	A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding [pdf]	-	IJCAI
Attention Encoder-Decoder NN (with aligned inputs)	98.43	95.87	Attention-Based Recurrent Neural Network Models for Joint Intent Detectionand Slot Filling [pdf]	-	InterSpeech
Attention BiRNN(Liu et al., 2016)	98.21	95.98	Attention-Based Recurrent Neural Network Models for Joint Intent Detectionand Slot Filling [pdf]	-	InterSpeech
Joint SLU-LM model(Liu ei al., 2016)	98.43	94.64	Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks [pdf]	http://speech.sv.cmu.edu/software.html	SIGDIAL
RNN-LSTM(Hakkani-Tur et al., 2016)	94.3	92.6	Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM [pdf]	-	InterSpeech
R-biRNN(Vu et al., 2016)	-	95.47	Bi-directional recurrent neural network with ranking loss for spoken language understanding [pdf]	-	IEEE
Encoder-labeler LSTM(Kurata et al., 2016)	-	95.4	Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling [pdf]	-	EMNLP
Encoder-labeler Deep LSTM(Kurata et al., 2016)	-	95.66	Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling [pdf]		EMNLP
5xR-biRNN(Vu et al., 2016)	-	95.56	Bi-directional recurrent neural network with ranking loss for spoken language understanding [pdf]	-	IEEE
Data Generation for SF(Kurata et al., 2016)	-	95.32	Labeled Data Generation with Encoder-decoder LSTM for Semantic Slot Filling [pdf]	-	InterSpeech
RNN-EM(Peng et al., 2015)	-	95.25	Recurrent Neural Networks with External Memory for Language Understanding [pdf]	-	InterSpeech
RNN trained with sampled label(Liu et al., 2015)	-	94.89	Recurrent Neural Network Structured Output Prediction for Spoken Language Understanding [pdf]	-	-
RNN(Ravuri et al., 2015)	97.55	-	Recurrent neural network and LSTM models for lexical utterance classification [pdf]	-	InterSpeech
LSTM(Ravuri et al., 2015)	98.06	-	Recurrent neural network and LSTM models for lexical utterance classification [pdf]	-	InterSpeech
Hybrid RNN(Mesnil et al., 2015)	-	95.06	Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding [pdf]	-	IEEE/ACM-TASLP
RecNN(Guo et al., 2014)	95.4	93.22	Joint semantic utterance classification and slot filling with recursive neural networks [pdf]	-	IEEE-SLT
LSTM(Yao et al., 2014)	-	94.85	Spoken Language Understading Using Long Short-Term Memory Neural Networks [pdf]	-	IEEE
Deep LSTM(Yao et al., 2014)	-	95.08	Spoken Language Understading Using Long Short-Term Memory Neural Networks [pdf]	-	IEEE
R-CRF(Yao et al., 2014)	-	96.65	Recurrent conditional random field for language understanding [pdf]	-	IEEE
RecNN+Viterbi(Guo et al., 2014)	95.4	93.96	Joint semantic utterance classification and slot filling with recursive neural networks [pdf]	-	IEEE-SLT
CNN CRF(Xu et al., 2013)	94.09	5.42	Convolutional neural network based triangular crf for joint intent detection and slot filling [pdf]	-	IEEE
RNN(Yao et al., 2013)	-	94.11	Recurrent Neural Networks for Language Understanding [pdf]	-	InterSpeech
Bi-dir. Jordan-RNN(2013)	-	93.98	Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding [pdf]	-	ISCA

+ Pretrained model

Model	Intent Acc	Slot F1	Paper/Source	Code link	Conference
Co-Interactive(Qin et al., 2021)	98.0	96.1	A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [pdf]	https://github.com/kangbrilliant/DCA-Net	ICASSP
Stack Propagation+BERT(Qin et al., 2019)	97.5	96.1	A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding [pdf]	https://github.com/LeePleased/StackPropagation-SLU	EMNLP
Bert-Joint(Castellucci et al., 2019)	97.8	95.7	Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model [pdf]	-	arXiv
BERT-SLU(Zhang et al., 2019)	99.76	98.75	A Joint Learning Framework With BERT for Spoken Language Understanding [pdf]	-	IEEE
Joint BERT(Chen et al., 2019)	97.5	96.1	BERT for Joint Intent Classification and Slot Filling [pdf]	https://github.com/monologg/JointBERT	arXiv
Joint BERT+CRF(Chen et al., 2019)	97.9	96	BERT for Joint Intent Classification and Slot Filling [pdf]	https://github.com/monologg/JointBERT	arXiv
ELMo-Light (ELMoL) (Siddhant et al., 2019)	97.3	95.42	Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents [pdf]	-	AAAI

SNIPS

Non-pretrained model

Model	Intent Acc	Slot F1	Paper / Source	Code link	Conference
Co-Interactive(Qin et al., 2021)	98.8	95.9	A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [pdf]	https://github.com/kangbrilliant/DCA-Net	ICASSP
Graph LSTM(Zhang et al., 2021)	98.29	95.30	Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding [pdf]	-	AAAI
SF-ID Network(E et al, 2019)	97.43	91.43	A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling [pdf]	https://github.com/ZephyrChenzf/SF-ID-Network-For-NLU	ACL
CAPSULE-NLU(Zhang et al, 2019)	97.3	91.8	Joint Slot Filling and Intent Detection via Capsule Neural Networks [pdf]	https://github.com/czhang99/Capsule-NLU	ACL
StackPropagation(Qin et al, 2019)	98	94.2	A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding [pdf]	https://github.com/LeePleased/StackPropagation-SLU.	EMNLP
CM-Net(Liu et al., 2019)	99.29	97.15	CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding[pdf]	https://github.com/Adaxry/CM-Net	EMNLP
Joint Multiple(Gangadharaiah et al, 2019)	97.23	88.03	Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog [pdf]	-	NAACL
Utterance Generation With Variational Auto-Encoder(Guo et al., 2019)	-	93.18	Utterance Generation With Variational Auto-Encoder for Slot Filling in Spoken Language Understanding [pdf]	-	IEEE Signal Processing Letters
Slot Gated Intent Atten.(Goo et al, 2018)	96.8	88.3	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction [pdf]	https://github.com/MiuLab/SlotGated-SLU	NAACL
Slot Gated Fulled Atten.(Goo et al, 2018)	97	88.8	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction [pdf]	https://github.com/MiuLab/SlotGated-SLU	NAACL
Joint Variational Generation + Slot Gated Intent Atten(Yoo et al., 2018)	96.7	88.3	Data Augmentation for Spoken Language Understanding via Joint Variational Generation [pdf]	-	AAAI
Joint Variational Generation + Slot Gated Full Atten(Yoo et al., 2018)	97.3	89.3	Data Augmentation for Spoken Language Understanding via Joint Variational Generation [pdf]	-	AAAI

+ Pretrained model

Model	Intent Acc	Slot F1	Paper/Source	Code link	Conference
Co-Interactive(Qin et al., 2021)	98.8	97.1	A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [pdf]	https://github.com/kangbrilliant/DCA-Net	ICASSP
StackPropagation + Bert(Qin et al, 2019)	99	97	A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding [pdf]	https://github.com/LeePleased/StackPropagation-SLU.	EMNLP
Bert-Joint(Castellucci et al, 2019)	99	96.2	Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Mode [pdf]	-	arXiv
Bert-SLU(Zhang et al, 2019)	98.96	98.78	A Joint Learning Framework With BERT for Spoken Language Understanding [pdf]	-	IEEE
Joint BERT(Chen et al, 2019)	98.6	97	BERT for Joint Intent Classification and Slot Filling [pdf]	https://github.com/monologg/JointBERT	arXiv
Joint BERT + CRF(Chen et al, 2019)	98.4	96.7	BERT for Joint Intent Classification and Slot Filling [pdf]	https://github.com/monologg/JointBERT	arXiv
ELMo-Light(Siddhant et al, 2019)	98.38	93.29	Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents [pdf]	-	AAAI
ELMo(Peters et al, 2018;Siddhant et al, 2019 )	99.29	93.9	Deep contextualized word representations [pdf]Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents [pdf]	-	NAACL/AAAI

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
pic		pic
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

Contributor

Introduction

Quick path

Resources

survey paper links

recent open-sourced code

Single Model

Joint Model

Complex SLU Model

Dataset

Frontiers

Single Slot Filling

Single Intent Detection

Joint Model

Implicit joint modeling

Explicit joint modeling

Contextual SLU

Multi-intent SLU

Chinese SLU

Cross-domain SLU

Cross-lingual SLU

Low-resource SLU

Few-shot SLU

Zero-shot SLU

Unsupervised SLU

LeaderBoard

ATIS

Non-pretrained model

+ Pretrained model

SNIPS

Non-pretrained model

+ Pretrained model

About

Releases

Packages

HIT-SCIR/Awesome-SLU-Survey

Folders and files

Latest commit

History

Repository files navigation

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

Contributor

Introduction

Quick path

Resources

survey paper links

recent open-sourced code

Single Model

Joint Model

Complex SLU Model

Dataset

Frontiers

Single Slot Filling

Single Intent Detection

Joint Model

Implicit joint modeling

Explicit joint modeling

Contextual SLU

Multi-intent SLU

Chinese SLU

Cross-domain SLU

Cross-lingual SLU

Low-resource SLU

Few-shot SLU

Zero-shot SLU

Unsupervised SLU

LeaderBoard

ATIS

Non-pretrained model

+ Pretrained model

SNIPS

Non-pretrained model

+ Pretrained model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages