Kashif Rabbani

Semantic Data Engineer (Research Ontologist), Novo Nordisk, Denmark

PhD in Computer Science, from Aalborg University, Denmark

Kashif has a PhD in Computer Science with specialization in Graph Databases. He is currently working as a Semantic Knowledge Engineer (Ontologist) at Novo Nordisk in Copenhagen, Denmark since January 2024. He received his PhD from the Department of Computer Science at Aalborg University, Denmark.
Previously, he received a Joint Master's Degree in Big Data Management and Analytics (BDMA) from ULB, Belgium; UPC, Spain; and TU-Berlin, Germany, in 2019. He has also served as a Data Engineer at everis Barcelona in the Semantic Business Unit (SEMBU). His research focuses on graph databases, knowledge graphs, query optimization, data modeling, databases, big data management, and analytics.

Highlights

Kashif defended his PhD thesis on July 29, 2024.
Starting from January 2024, Kashif has started working as a Research Ontologist at Novo Nordisk, Måløv, Copenhagen, Denmark.
Kashif has been working on a research project called RelWeb (A Reliable Web of Data) during his PhD.
His PhD is about "Exploiting Schemas for Efficient Query Processing over Knowledge Graphs", supervised by Prof. Katja Hose and Prof. Matteo Lissandrini.
During his PhD, he has been exploring various approaches to optimize query processing on Web, specifically in Knowledge Graphs, by exploiting various schematic constraints such as SHACL/ShEx etc. Moreover, he is also involved in educational activities like supervision of 5th/6th semester's Software Engineering groups in their projects and teaching assistant for the Database Course.
He has also been serving as a VLDB and ISWC external reviewer for reviewing scientific papers.

PhD Thesis

Scalable Extraction and Adoption of Shapes for Improving Data Quality and Query Processing in Knowledge Graphs
Kashif Rabbani, Aalborg University, Denmark Publisher Link

Publications

Transforming RDF Graphs to Property Graphs using Standardized Schemas
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) June 22-27, 2025 Berlin, Germany.
Abstract Cite Publisher Version

Knowledge Graphs can be encoded using different data models. They are especially abundant using RDF and recently also as property graphs. While knowledge graphs in RDF adhere to the subject-predicate-object structure, property graphs utilize multi-labeled nodes and edges, featuring properties as key-value pairs. Both models are employed in various contexts, thus applications often require transforming data from one model to another. To enhance the interoperability of the two models, we present a novel technique, S3PG, to convert RDF knowledge graphs into property graphs exploiting two popular standards to express schema constraints, i.e., SHACL for RDF and PG-Schema for property graphs. S3PG is the first approach capable of transforming large knowledge graphs to property graphs while fully preserving information and semantics. We have evaluated S3PG on real-world large-scale graphs, showing that, while existing methods exhibit lossy transformations (causing a loss of up to 70% of query answers), S3PG consistently achieves 100% accuracy. Moreover, when considering evolving graphs, S3PG exhibits fully monotonic behavior and requires only a fraction of the time to incorporate changes compared to existing methods.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. "Transforming RDF Graphs to Property Graphs using Standardized Schemas." Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) 2, (4): (December, 2024): 1-25.
SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.
Abstract Cite Website PDF Publisher Version Presentation & Demo Video

We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG. Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes. In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.
Extraction of Validating Shapes from very large Knowledge Graphs
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the Very Large Databases 2023 (Volume 16 Issue 5, VLDB-2023) August 2023, Vancouver Canada.
Abstract Cite Website Extended Version Publisher Version

Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations. There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs. Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes. To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution. QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data. To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData. Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Extraction of Validating Shapes from very large Knowledge Graphs In Proceedings of the Very Large Databases 2023 (Volume 16), August 28 - Sept 02, 2023, Vancouver, Canada.
End-to-End Incremental Data Integration via Knowledge Graphs
Javier flores, Kashif Rabbani, Sergi Nadal, Cristina Gómez, Oscar Romero, Emmanuel Jamin, Stamatia Dasiopoulou
In Semantic Web Journal (SWJ)
Abstract SWJ Publisher Version

Data integration, the task of providing a unified view over a set of data sources, is undoubtedly a major challenge for the knowledge graph community. Indeed, such flexible data structure allows to model the characteristics of source schemata, rich semantics for the global schema and the mappings between them. Yet, the design of such data integration systems still entails a manually arduous task. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental data integration approach. By considering all tasks that compose the end-to-end data integration workflow (i.e., bootstrapping, schema matching, schema integration and generation of querying constructs, we are able to address them in a unified manner.We provide algorithms for each task, as well as theoretically prove the correctness of our approach and experimentally show its practical applicability.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. Lyon France.
SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. DOI: 10.1145/3487553.3524253
Abstract Cite Website PDF Publisher Version

Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and within organizations. Various methods exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-of-the-art languages to define validating shapes for KGs. Since the usage of these constraint languages has recently increased, new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. Lyon France.
Optimizing SPARQL Queries using Shape Statistics
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021.
Abstract Cite Website PDF Publisher Version

With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization - and in particular cardinality estimation and join ordering - becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Optimizing SPARQL Queries using Shape Statistics. Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021
ODIN: A dataspace management system
Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu
In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference, ISWC 2019.
Abstract Cite Website PDF Publisher Version

ODIN (On-demand Data Integration) is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms of the resulting dataspaces. This website is a companion of a demonstration paper submitted to ISWC 2019, where we describe some of its characteristics and underlying assumptions, including the user interactions required. ODIN's novelty lies in a largely automated bottom-up approach (i.e., driven by the sources at hand) that includes the user in the loop for disambiguation purposes. ODIN relies on the concept of traceability graph, which are generic metadata abstraction (i.e., not tailored for an specific task) about the integration of a particular set of data sources. From this graphs, ODIN is capable of generating target-oriented metadata constructs. In this demonstration we focus on those for query answering over dataspaces.

Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu. "ODIN: A dataspace management system." In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference (ISWC 2019): Auckland, New Zealand, October 26-30, 2019, pp. 185-188. CEUR-WS. org, 2019
ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources.
Shumet Tadesse, Cristina Gómez, Oscar Romero, Katja Hose, Kashif Rabbani
In IEEE 23rd International Enterprise Distributed Object Computing Conference, EDOC 2019.
Abstract Cite PDF Publisher Version

The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.

Tadesse, Shumet, Cristina Gómez, Oscar Romero, Katja Hose, and Kashif Rabbani. "ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources." In 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC), pp. 190-196. IEEE, 2019

Master Thesis

Thesis title: "Supporting the Semi-Automatic Creation of the Target Schema in Data Integration Systems". BDMA 2019
Supervisors: Prof. Dr. Oscar Romero (UPC), Prof. Dr. Volker Markl (TU-Berlin), and Dr. Ralf-Detlef Kutsche (on behalf of the BDMA steering board). Advisors: Prof. Dr. Oscar Romero (UPC), Mr. Shumet Tadesse Nigatu (UPC), and Dr. Ralf-Detlef Kutsche (TU-Berlin). Thesis available here.

Educational activities

Kashif's educational activities focus on supervising and assisting multiple groups of computer science and software engineering students at bachelors level in Aalborg University.

Teaching Assistant for Database Management System Course
Fall 2020, Fall 2021, and Fall 2022, BSc Software Engineering Course, Computer Science Department, Aalborg University.
Group Supervisor for 5th semster students working on Knox (Knowledge Engineering Toolbox) project
Fall 2020 and Fall 2021, BSc Software Engineering - Semester Project, Computer Science Department, Aalborg University.
Group Supervisor for final semester students working on analyzing large scale ships AIS data
Spring 2021, BSc Software Engineering - Bachelors Project, Computer Science Department, Aalborg University.

Additionally, he participates in various related conferences and summer/winter schools.

He attended Tenth European Big Data Management & Analytics Summer School (eBISS) in 2022 where he presented a poster about his PhD. The poster is available at this link.

Achievements, Awards, and Grants

PhD Study Abroad Travel Grants 2023
- I was awarded travel grants by Otto Mønsted and Danish Data Science Academy for study abroad during my PhD studies.
Big Data Talent Awards 2019 (Runner-up)
- I was selected as a runner-up for Big Data Talent Awards 2019 at UPC Barcelona, Spain for my master thesis titled as "Dataspaces: Pay-As-you-go Data Integration".
Erasmus Mundus Scholarship
- I was awarded a fully funded scholarship for my master's degree in Big Data Management and Analytics (BDMA).
Gold Medal - Bachelors in Computer Science (BSCS)
- I won a campus Gold Medal award at COMSATS University Islamabad, Pakistan for the highest CGPA: 3.84/4.00 (Batch Spring 2013-17) of my Bachelor's degree in Computer Science (BSCS).
Silver Medal - Bachelors in Computer Science (BSCS)
- I won an institute Silver Medal award at COMSATS University Islamabad, Pakistan for the highest CGPA: 3.84/4.00 among all the institutes of COMSATS University for my Bachelor's degree in Computer Science (BSCS).
Open House Final Year Project Award (2nd Position)
- I got 2nd position for my Final Year Project during my Bachelor's degree in Computer Science (BSCS) at COMSATS University, Islamabad, Pakistan.

Blog Posts

Distributed GraphLab
- Read it on medium.com, Published in 2020.
Data Streams Mining
- Read it on medium.com, Published in 2020.
Metadata (Data about data)
- Read it on medium.com, Published in 2020.
Dark Silicon (Toward Dark Silicon in Servers)
- Read it on medium.com, Published in 2020.

Side Projects

Kashif has published a few applications on Apple App Store , and he is maintaining them as well.

Resume

Reach out ot me if you want to see my resume.

Updated: April 2025