Introducing TQL: A Domain-Specific Language for Efficient Data Discovery

Thursday 11 September 2025

The quest for a general-purpose data discovery tool has long been a challenge in the field of machine learning and data science. The absence of an expressive formal language, and corresponding implementation, has hindered the development of such a system. In a recent paper, researchers propose TQL, a domain-specific language designed to leverage and exploit the results of programming languages research in both its syntax and semantics.

TQL is built on top of relational algebra with types (RAT), an algebraic model that provides a formal foundation for querying data. This approach allows TQL to take advantage of the vast body of research in type theory, which has been shown to be effective in ensuring the correctness and reliability of programs.

The TQL language is designed to be expressive and flexible, allowing users to define complex queries using a combination of relational algebra operations and type constraints. The language also includes advanced features such as join operators and support for complex data types.

A key innovation of TQL is its use of type inference techniques from programming languages research to prune the search space of possible candidate inputs before evaluation. This approach allows TQL to efficiently evaluate queries over large datasets, even in the presence of missing or noisy data.

The researchers have also implemented a modular proof-of-concept system prototype for TQL, which demonstrates the feasibility and effectiveness of their approach. The prototype includes a choice function that uses static analysis on the abstract syntax tree (AST) of the query to select candidate inputs, and an interpreter that evaluates the query over these inputs.

TQL has several potential applications in data science and machine learning, including data exploration, feature engineering, and model training. It could also be used as a tool for automating data discovery tasks, such as identifying relevant datasets or generating hypotheses about relationships between variables.

The development of TQL represents an important step towards creating a general-purpose data discovery system that can efficiently and effectively handle the complexities of modern data analysis tasks. By leveraging advances in programming languages research, the researchers have created a language that is both powerful and flexible, with the potential to significantly impact the field of data science.

The TQL system is still in its early stages, but it has already shown promising results in several experimental evaluations. As the technology continues to evolve, it could potentially be used in a wide range of applications, from scientific research to business intelligence. With its ability to efficiently evaluate complex queries over large datasets, TQL has the potential to become an essential tool for data scientists and analysts.

Cite this article: “Introducing TQL: A Domain-Specific Language for Efficient Data Discovery”, The Science Archive, 2025.

Data Discovery, Machine Learning, Data Science, Domain-Specific Language, Relational Algebra, Type Theory, Query Language, Data Exploration, Feature Engineering, Model Training

Reference: Andrew Kang, Yashnil Saha, Sainyam Galhotra, “Towards General-Purpose Data Discovery: A Programming Languages Approach” (2025).

Discussion