A curated list of awesome TLP Resources
The links and information below are provided as a convenience to the user community. Anyone who has a tool, technique, resource, or dataset that can be of benefit to the TLP COI is welcome to submit information and links to the webmaster for inclusion in this list. Any mention of computer hardware, software or services here does not constitute endorsement by NIST, nor does it indicate that the products are necessarily those best suited for the intended purpose.
Technical Language Processing (TLP) is a set of tools, techniques, and guidelines meant to tailor Natural Language Processing (NLP) tools to engineering (and other) expert-driven text-based data.
- 📃 Adapting natural language processing for technical text - Paper describing the need to bring engineering principles and practices to NLP specifically for the purpose of extracting actionable information from language generated by experts in their technical tasks, systems, and processes.
- 📃 Technical Language Processing: Unlocking Maintenance Knowledge - Paper that describes the TLP concept and provides a call to action for the community.
- 📃 NLP Workshop Report - Workshop report on current trends, successes, and challenges with respect to NLP for maintenance in manufacturing.
- 📃 NLP Standards Needs Report - Discussion on standards needs for NLP in maintenance.
- 🖥️ Nestor - Nestor Graphical User Interface (GUI) is a free toolkit that helps maintainers annotate their Maintenance Work Order (MWO) data through a process called "tagging".
- 📃 Hybrid Datafication Paper - A paper describing the tagging methodology that is used in Nestor.
- 🔌 Nestor GUI repository - The GitHub repository containing the open-source code for Nestor.
- 🔌 Redcoat - A web-based annotation tool that supports collaborative hierarchical entity typing.
- 🔌 MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources - MaintNet is a resource of technical language tools and data and includes tools such as technical language spellchecker, POS, etc.
- 🗄️ MaintNet Datasets - The datasets in MaintNet spans maintenance records in aviation, automotive and facility industries.
- 📃 MaintNet Paper - Paper that describes the MaintNet library.
- 🖥️ Puggle - A Python package for working with the outputs of Information Extraction models and tools such as SPERT and QuickGraph. Also available on GitHub (link).
- 🖥️ Mudlark - A Python package for automatically cleaning the short text present in maintenance work orders and strategies. Also available on GitHub (link).
- 🗄️ 🏷️ MaintNorm - MaintNorm: A corpus and benchmark model for lexical normalisation and masking of industrial maintenance short text. Contains data, models and code.
- 🗄️ Excavator Maintenance Dataset - The Excavators Raw&Cleaned dataset provides clean and uncleaned MWOs using a rules based process.
- 🗄️ Asset Management Parks System Work Orders: 1.67M rows; 56 columns - This dataset provides raw MWOs for park equipment.
- 🗄️ Handyman Work Order Charges: 127K rows; 32 columns - Contains information about work orders created to conduct emergency repair work when an owner fails to address a hazardous condition pursuant to the requirements of an HPD issued violation.
- 🗄️ WSSC Completed Service Alert Work Orders: 235k rows; 9 columns - WSSC completed service alert work-orders published on a monthly basis.
- 🗄️ Traffic Signal Work Orders: 23.1k rows; 28 columns. - This dataset contains records of work completed by traffic signal technicians in the city of Austin, TX.
- 🗄️ Water Resources Preventive and Corrective Work Orders: 117k rows; 10 columns - The following data displays all of the preventive and corrective work orders at the Water Resource Department for the city of St. Petersburg, FL.
- 🗄️ National Highway traffic crash injury cases - The CIREN database consists of multiple discrete fields of data concerning severe motor vehicle crashes, including crash reconstruction and medical injury profiles.
- 🗄️ Initiating events for unexpected reactor trips: 3659 rows - Data for all unexpected reactor trips during power operations at commercial nuclear power plants were reviewed. US Nuclear Regulatory Commission.
- 📃(online resource & slide deck) Text as Data: the Road to Technical Language Processing - Online course for "Text as Data: the Road to Technical Language Processing" developed by Rachael Sexton.
- 📃(slide deck & recording)Technical Language Processing tutorial - Tutorial for Technical Language Processing at Prognostics & Health Management (PHM) Society Conference 2023, contains slide deck, recording in 3 parts as well as notebook for getting started using the Excavator dataset.
- 📃(online media) An Introduction to Technical Language Processing: Unlocking Maintenance Knowledge. - Overview talk about TLP with examples.
- 🗄️📃 (and leaderboard)DesignQA - DesignQA is a benchmark for evaluating proficiency of multimodel LLMs (MLLMs) in comprehending and applying engineering requirements in technical documentation. Two (of the 6) benchmarks are also applicable to LLM's.
- 🗄️ FMC-MWO2KG - FMC-MWO2KG (The MWO2KG Failure Mode Classification Dataset) comprises 502 observation and label pairs for training, 62 pairs for validation and 62 pairs for testing.
- 📘 ISO 15926-4:2019 - Reference data for recording information about process plants.
- 📘 ISO 14224:2016 - Bases for the collection of reliability and maintenance (RM) data for equipment in oil and gas industry.
- 📃 ROMAIN - Maintenance management ontology.
- 📃 An ontology for maintenance activities and its application to data quality. - Paper describing an ontology for maintenance activities, which is extensible across industries.
- 📃 An ontology for reasoning over engineering textual data stored in FMEA spreadsheet tables. - Ontology for representing Failure Modes and Effects Analysis (FMEA)
- 📃 Avoiding Past Mistakes in Unethical Human Subjects Research: Moving From Artificial Intelligence Principles to Practice - a brief history of the events that prompted today's ethical codes to protect human rights in research, key ethical principles from the Belmont Report, and how these ethical principles apply to AI research.
- 📃 Human Centric Technology Insertion - Provides a comprehensive look at technology insertion in the maintenance management workflow using well established error mitigation frameworks.
- 📃 MWO Categorization Errors - Analyzes human error in recording maintenance work order data into CMM systems.
- 📃 Condition Monitoring Annotations with BERT and Technical Language Substitution - Substituting out-of-vocabulary technical words with natural language terms can improve language model performance on other language domains for pre-trained BERT.
- TLP COI - The TLP COI will bring together interested participants to discuss ongoing and future directions for text analysis of technical data.
- IOF Maintenance WG - Industrial Ontologies Foundry (IOF) maintenance management ontology Working Group (WG).