Skip to content

Latest commit

 

History

History
32 lines (16 loc) · 1.5 KB

README.md

File metadata and controls

32 lines (16 loc) · 1.5 KB

DΕΤΕΧΑ: Declarative Extensible Text Exploration and Analysis

The repository contains a text mining library built on top of YeSQL. The library maps several text mining functionalities to reusable scalar, aggregate, and polymorphic table UDFs written in Python. Due to the performance characteristics of YeSQL and its design, the presented framework is able to execute critical text mining analytical tasks faster than other popular solutions, and allows data scientists to implement end-to-end text analytic pipelines within a declarative extended SQL.

Instructions

Install PyPy3 from here https://www.pypy.org

Run pypy3 DETEXA/mterm.py [-d database.db -f funcs]

Terminal help with .h

List all functions with .functions

Explain a function with .h functionname

Optional: Implement your own functions in a funcs directory and give it as a terminal parameter.

The natively supported functions are stored in DETEXA/functions/[row/aggregate/vtable] directories.

Publication

"DΕΤΕΧΑ: Declarative Extensible Text Exploration and Analysis", Yannis Foufoulas, Eleni Zacharia, Harry Dimitropoulos, Natalia Manola and Yannis Ioannidis, Theory and Practice of Digital Libraries (TPDL), 2022 (Nominated for the Best Paper Award)

"DΕΤΕΧΑ: Declarative Extensible Text Exploration and Analysis through SQL", Yannis Foufoulas, Eleni Zacharia, Harry Dimitropoulos, Natalia Manola and Yannis Ioannidis, International Journal on Digital Libraries (IJDL), 2023