Kashif Rabbani

Transforming RDF Graphs to Property Graphs using Standardized Schemas
Kashif Rabbani, Matteo Lissandrini, Angela Bonifati, and Katja Hose
Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) June 22-27, 2025 Berlin, Germany.

Abstract Cite Publisher Version Source Code Poster

Knowledge Graphs can be encoded using different data models. They are especially abundant using RDF and recently also as property graphs. While knowledge graphs in RDF adhere to the subject-predicate-object structure, property graphs utilize multi-labeled nodes and edges, featuring properties as key-value pairs. Both models are employed in various contexts, thus applications often require transforming data from one model to another. To enhance the interoperability of the two models, we present a novel technique, S3PG, to convert RDF knowledge graphs into property graphs exploiting two popular standards to express schema constraints, i.e., SHACL for RDF and PG-Schema for property graphs. S3PG is the first approach capable of transforming large knowledge graphs to property graphs while fully preserving information and semantics. We have evaluated S3PG on real-world large-scale graphs, showing that, while existing methods exhibit lossy transformations (causing a loss of up to 70% of query answers), S3PG consistently achieves 100% accuracy. Moreover, when considering evolving graphs, S3PG exhibits fully monotonic behavior and requires only a fraction of the time to incorporate changes compared to existing methods.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. "Transforming RDF Graphs to Property Graphs using Standardized Schemas." Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) 2, (4): (December, 2024): 1-25.

SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.

Abstract Cite Website PDF Publisher Version Presentation & Demo Video

We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG. Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes. In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.

Extraction of Validating Shapes from very large Knowledge Graphs
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the Very Large Databases 2023 (Volume 16 Issue 5, VLDB-2023) August 2023, Vancouver Canada.

Abstract Cite Website Extended Version Publisher Version

Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations. There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs. Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes. To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution. QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data. To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData. Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Extraction of Validating Shapes from very large Knowledge Graphs In Proceedings of the Very Large Databases 2023 (Volume 16), August 28 - Sept 02, 2023, Vancouver, Canada.

End-to-End Incremental Data Integration via Knowledge Graphs
Javier flores, Kashif Rabbani, Sergi Nadal, Cristina Gómez, Oscar Romero, Emmanuel Jamin, Stamatia Dasiopoulou
In Semantic Web Journal (SWJ)

Abstract SWJ Publisher Version

Data integration, the task of providing a unified view over a set of data sources, is undoubtedly a major challenge for the knowledge graph community. Indeed, such flexible data structure allows to model the characteristics of source schemata, rich semantics for the global schema and the mappings between them. Yet, the design of such data integration systems still entails a manually arduous task. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental data integration approach. By considering all tasks that compose the end-to-end data integration workflow (i.e., bootstrapping, schema matching, schema integration and generation of querying constructs, we are able to address them in a unified manner.We provide algorithms for each task, as well as theoretically prove the correctness of our approach and experimentally show its practical applicability.

SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. DOI: 10.1145/3487553.3524253

Abstract Cite Website PDF Publisher Version

Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and within organizations. Various methods exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-of-the-art languages to define validating shapes for KGs. Since the usage of these constraint languages has recently increased, new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. Lyon France.

Optimizing SPARQL Queries using Shape Statistics
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021.

Abstract Cite Website PDF Publisher Version

With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization - and in particular cardinality estimation and join ordering - becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Optimizing SPARQL Queries using Shape Statistics. Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021

ODIN: A dataspace management system
Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu
In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference, ISWC 2019.

Abstract Cite Website PDF Publisher Version

ODIN (On-demand Data Integration) is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms of the resulting dataspaces. This website is a companion of a demonstration paper submitted to ISWC 2019, where we describe some of its characteristics and underlying assumptions, including the user interactions required. ODIN's novelty lies in a largely automated bottom-up approach (i.e., driven by the sources at hand) that includes the user in the loop for disambiguation purposes. ODIN relies on the concept of traceability graph, which are generic metadata abstraction (i.e., not tailored for an specific task) about the integration of a particular set of data sources. From this graphs, ODIN is capable of generating target-oriented metadata constructs. In this demonstration we focus on those for query answering over dataspaces.

Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu. "ODIN: A dataspace management system." In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference (ISWC 2019): Auckland, New Zealand, October 26-30, 2019, pp. 185-188. CEUR-WS. org, 2019

ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources.
Shumet Tadesse, Cristina Gómez, Oscar Romero, Katja Hose, Kashif Rabbani
In IEEE 23rd International Enterprise Distributed Object Computing Conference, EDOC 2019.

Abstract Cite PDF Publisher Version

The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.

Tadesse, Shumet, Cristina Gómez, Oscar Romero, Katja Hose, and Kashif Rabbani. "ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources." In 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC), pp. 190-196. IEEE, 2019

Kashif Rabbani

Technical Skills & Expertise

Graphs & Semantic Technologies

AI & Machine Learning

Programming & Development

Big Data & Cloud

Analytics & Visualization

Research & Development

PhD Thesis

Publications

Master Thesis

Achievements, Awards, and Grants

Educational Activities

Teaching & Supervision

Academic Participation

iOS Development (Swift)

Tech Stack

Resume & Contact