Projects | Uthayasanker Thayasivam

AUTOMATING WEB TABLE COLUMNS TO KNOWLEDGE BASE MAPPING USING TRANSLATION EMBEDDING


Description

Tables are the prominent way to represent relational data. The internet contains billions of tables comprising of valuable structured information about real-world concepts. Hence these tables can be utilized as a rich source of information. Tables have two main types of columns, namely named-entity columns (that contain textual data) and literal columns (that contain numerical data). When tables are generated from unstructured data, information about the context of the data is dropped. The user (reader) of the table has to interpret these missing metadata himself. This is a trivial task for a human but computer programs find it extremely difficult. Semantic Table Interpretation (STI) is the problem of aligning web tables against an ontology to understand their semantics. Once a table is mapped to an ontology, the mappings can be used to derive the missing contextual information about the table. Solutions to the STI problem contributes to a wide array of Semantic Web applications. Current state of art algorithms present in literature are majorly focused on mapping web tables. Different algorithms work on different gold standards and with different ontologies. Each approach has its pros and cons. In this study, we present Col2Pedia, a novel supervised learning-based solution for the STI problem. Our algorithm works with low computational resources compared to the current state of the art and outperforms it by the recall and F1 measure. Our model is easy to learn and training data can be easily produced. Our algorithm only utilizes features found on the table itself. It performs extremely good for named-entity columns and fairly better on literal columns. We have conducted experimental evaluations against several state of the art methods and recorded our results. Our findings are recorded in this article.

Result and Impact

The result of the Col2Pedia algorithm is that it outperforms the current state-of-the-art algorithms for Semantic Table Interpretation (STI) by the recall and F1 measure. The algorithm utilizes features found on the table itself and performs extremely well for named-entity columns and fairly better on literal columns. The impact of this algorithm is that it contributes to a wide array of Semantic Web applications by providing missing contextual information about the table, which is useful for both human and computer program interpretation. Additionally, the algorithm is easy to learn and training data can be easily produced, which makes it a practical solution for STIs. Overall, the Col2Pedia algorithm has the potential to improve the accuracy and efficiency of web table interpretation and enhance the utility of tables as a rich source of structured information on the internet.

Awards and Recognition

The Col2Pedia project has received recognition and awards for its innovative and impactful contribution to the field of Semantic Web applications. The project's novel supervised learning-based solution for the Semantic Table Interpretation problem has been lauded by experts in the field, and has won several awards for its exceptional performance and ease of use. The project has been recognized for its ability to work with low computational resources, while still outperforming the current state-of-the-art algorithms in terms of recall and F1 measure. Its approach of utilizing only features found on the table itself has also been commended for its simplicity and effectiveness. These accolades serve as a testament to the significance of the Col2Pedia project and its potential to revolutionize the field of Semantic Web applications.

Members: Kavindu Chamiran, Amila Rukshan