Kashif Rabbani 
Semantic Data Engineer (Research Ontologist),
Novo Nordisk, Denmark
PhD in Computer Science, from Aalborg University, Denmark
Twitter
LinkedIn
GitHub
Google Scholar
Kashif has a PhD in Computer Science with specialization in Graph Databases. He is currently working as a Semantic Knowledge Engineer (Ontologist) at Novo Nordisk in Copenhagen, Denmark since January 2024.
He received his PhD from the Department of Computer Science at Aalborg University, Denmark.
Previously, he received a Joint Master's Degree in Big Data Management and Analytics (BDMA) from ULB, Belgium; UPC, Spain; and TU-Berlin, Germany, in 2019. He has also served as a Data Engineer at everis Barcelona in the Semantic Business Unit (SEMBU). His research focuses on graph databases, knowledge graphs, query optimization, data modeling, databases, big data management, and analytics.
Highlights
- Kashif defended his PhD thesis on July 29, 2024.
- Starting from January 2024, Kashif has started working as a Research Ontologist at Novo Nordisk,
Måløv, Copenhagen, Denmark.
- Kashif has been working on a research project called RelWeb (A Reliable Web of Data) during his PhD.
- His PhD is about "Exploiting Schemas for Efficient Query Processing over Knowledge Graphs", supervised by
Prof. Katja Hose and
Prof. Matteo Lissandrini.
- During his PhD, he has been exploring various approaches to optimize query processing on Web, specifically in Knowledge Graphs, by exploiting various schematic constraints such as SHACL/ShEx etc. Moreover, he is also involved in educational activities like supervision of 5th/6th semester's Software Engineering groups in their projects and teaching assistant for the Database Course.
-
He has also been serving as a VLDB and ISWC external reviewer for reviewing scientific papers.
PhD Thesis
-
Scalable Extraction and Adoption of Shapes for Improving Data Quality and Query Processing in Knowledge Graphs
Kashif Rabbani,
Aalborg University, Denmark
Publisher Link
Publications
-
Transforming RDF Graphs to Property Graphs using Standardized Schemas
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) June 22-27, 2025 Berlin, Germany.
Abstract
Cite
Publisher Version
Knowledge Graphs can be encoded using different data models. They are especially abundant using RDF and recently also as property graphs. While knowledge graphs in RDF adhere to the subject-predicate-object structure, property graphs utilize multi-labeled nodes and edges, featuring properties as key-value pairs. Both models are employed in various contexts, thus applications often require transforming data from one model to another. To enhance the interoperability of the two models, we present a novel technique, S3PG, to convert RDF knowledge graphs into property graphs exploiting two popular standards to express schema constraints, i.e., SHACL for RDF and PG-Schema for property graphs. S3PG is the first approach capable of transforming large knowledge graphs to property graphs while fully preserving information and semantics. We have evaluated S3PG on real-world large-scale graphs, showing that, while existing methods exhibit lossy transformations (causing a loss of up to 70% of query answers), S3PG consistently achieves 100% accuracy. Moreover, when considering evolving graphs, S3PG exhibits fully monotonic behavior and requires only a fraction of the time to incorporate changes compared to existing methods.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. "Transforming RDF Graphs to Property Graphs using Standardized Schemas." Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) 2, (4): (December, 2024): 1-25.
-
SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.
Abstract
Cite
Website
PDF
Publisher Version
Presentation & Demo Video
We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large
Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for
entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some
of those represent spurious constraints extracted due to the presence of erroneous data in the KG.
Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using
our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The
extracted shapes are further annotated with statistical information regarding their support in the
graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be
used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables
the user to also find and correct errors by automatically generating SPARQL queries over the graph
to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending
the data.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACTOR: Improving the
Quality of Large-Scale Knowledge Graphs with Validating Shapes. In Proceedings of the 2023
International Conference on Management of Data, (SIGMOD-Companion '23)
June 18-23, 2023, Seattle, WA, USA.
-
Extraction of Validating Shapes from very large Knowledge Graphs
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the Very Large Databases 2023 (Volume 16 Issue 5, VLDB-2023) August 2023, Vancouver Canada.
Abstract
Cite
Website
Extended Version
Publisher Version
Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations.
There exist shapes constraint languages to define validating shapes to ensure the quality of the
data in KGs.
Existing techniques to extract validating shapes often fail to extract complete shapes, are not
scalable, and are prone to produce spurious shapes.
To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract
validating shapes in very large graphs, for which we devise both an exact and an approximate
solution.
QSE provides information about the reliability of shape constraints by computing their confidence
and support within a KG and in doing so allows to identify shapes that are most informative and less
likely to be affected by incomplete or incorrect data.
To the best of our knowledge, QSE is the first approach to extract a complete set of validating
shapes from WikiData.
Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches,
while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction
of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916
to 809 on DBpedia.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Extraction of Validating
Shapes from very large Knowledge Graphs In Proceedings of the Very Large Databases 2023
(Volume 16), August 28 - Sept 02, 2023, Vancouver, Canada.
-
End-to-End Incremental Data Integration via Knowledge Graphs
Javier flores,
Kashif Rabbani,
Sergi Nadal,
Cristina Gómez,
Oscar Romero,
Emmanuel Jamin,
Stamatia Dasiopoulou
In Semantic Web Journal (SWJ)
Abstract
SWJ
Publisher Version
Data integration, the task of providing a unified view over a set of data sources, is undoubtedly a
major challenge for the knowledge graph community. Indeed, such flexible data structure allows to
model the characteristics of source schemata, rich semantics for the global schema and the mappings
between them. Yet, the design of such data integration systems still entails a manually arduous
task. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome
these issues, we propose a fully-fledged semi-automatic and incremental data integration approach.
By considering all tasks that compose the end-to-end data integration workflow (i.e., bootstrapping,
schema matching, schema integration and generation of querying constructs, we are able to address
them in a unified manner.We provide algorithms for each task, as well as theoretically prove the
correctness of our approach and experimentally show its practical applicability.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the
Wild: A Community Survey on Validating Shapes Generation and Adoption.
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022.
Lyon France.
-
SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. DOI: 10.1145/3487553.3524253
Abstract
Cite
Website
PDF
Publisher Version
Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and
within organizations. Various methods exist to manage KGs and ensure the quality of their data.
Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are
the two state-of-the-art languages to define validating shapes for KGs. Since the usage of these
constraint languages has recently increased, new needs arose. One such need is to enable the
efficient generation of these shapes. Yet, since these languages are relatively new, we witness a
lack of understanding of how they are effectively employed for existing KGs. Therefore, in this
work, we answer How validating shapes are being generated and adopted? Our contribution is
threefold. First, we conducted a community survey to analyze the needs of users (both from industry
and academia) generating validating shapes. Then, we cross-referenced our results with an extensive
survey of the existing tools and their features. Finally, we investigated how existing automatic
shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for
developing semi-automatic methods that can help users generate shapes from large KGs.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the
Wild: A Community Survey on Validating Shapes Generation and Adoption.
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022.
Lyon France.
-
Optimizing SPARQL Queries using Shape Statistics
Kashif Rabbani, Matteo Lissandrini, and Katja Hose
In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021.
Abstract
Cite
Website
PDF
Publisher Version
With the growing popularity of storing data in native RDF, we witness more and more diverse use
cases with complex SPARQL queries. As a consequence, query optimization - and in particular
cardinality estimation and join ordering - becomes
even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a
whole, which naturally fails to correctly capture correlations that are very common in RDF datasets,
which then leads to erroneous
cardinality estimations and suboptimal query execution plans. The alternative of trying to capture
correlations in a fine-granular manner, on the other hand, results in very costly preprocessing
steps to create these statistics. Hence, in
this paper we propose shapes statistics, which extend the recent SHACL standard with statistic
information to capture the correlation between classes and properties. Our extensive experiments on
synthetic and real data show that shapes
statistics can be generated and managed with only little overhead without disadvantages in query
runtime while leading to noticeable improvements in cardinality estimation.
Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Optimizing SPARQL Queries
using Shape Statistics. Proceedings of the 24th International Conference on Extending
Database Technology, EDBT 2021
-
ODIN: A dataspace management system
Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu
In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry,
and Outrageous Ideas): co-located with 18th International Semantic Web Conference, ISWC 2019.
Abstract
Cite
Website
PDF
Publisher Version
ODIN (On-demand Data Integration) is a system that supports the incremental pay-as-you-go
integration of data sources into dataspaces and provides user-friendly querying mechanisms of the
resulting dataspaces. This website is a companion
of a demonstration paper submitted to ISWC 2019, where we describe some of its characteristics and
underlying assumptions, including the user interactions required. ODIN's novelty lies in a largely
automated bottom-up approach (i.e.,
driven by the sources at hand) that includes the user in the loop for disambiguation purposes. ODIN
relies on the concept of traceability graph, which are generic metadata abstraction (i.e., not
tailored for an specific task) about the
integration of a particular set of data sources. From this graphs, ODIN is capable of generating
target-oriented metadata constructs. In this demonstration we focus on those for query answering
over dataspaces.
Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet
Tadesse
Nigatu. "ODIN: A dataspace management system." In Proceedings of the ISWC 2019 Satellite
Tracks (Posters & Demonstrations, Industry, and
Outrageous Ideas): co-located with 18th International Semantic Web Conference (ISWC 2019):
Auckland, New Zealand, October 26-30, 2019, pp. 185-188. CEUR-WS. org, 2019
-
ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources.
Shumet Tadesse, Cristina Gómez, Oscar Romero, Katja Hose, Kashif Rabbani
In IEEE 23rd International Enterprise Distributed Object Computing Conference, EDOC 2019.
Abstract
Cite
PDF
Publisher Version
The current wealth of information, typically known as Big Data, generates a large amount of
available data for organisations. Data Integration provides foundations to query disparate data
sources as if they were integrated into a single
source. However, current data integration tools are far from being useful for most organisations due
to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To
enable data integration of highly
heterogeneous and disparate data sources, this paper proposes a method to extract the schema from
semi-structured (such as JSON and XML) and structured (such as relational) data sources, and
generate an equivalent RDFS representation. The
output of our method complements current frameworks and reduces the manual workload required to
represent the input data sources in terms of the integration canonical data model. Our approach
consists of production rules at the meta-model
level that guarantee the correctness of the model translations. Finally, a tool for implementing our
approach has been developed.
Tadesse, Shumet, Cristina Gómez, Oscar Romero, Katja Hose, and Kashif Rabbani.
"ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources." In 2019 IEEE
23rd International Enterprise Distributed Object
Computing Conference (EDOC), pp. 190-196. IEEE, 2019
Master Thesis
- Thesis title: "Supporting the Semi-Automatic Creation of the Target Schema in Data Integration Systems".
BDMA 2019
Supervisors: Prof. Dr. Oscar Romero (UPC), Prof. Dr. Volker Markl (TU-Berlin), and Dr. Ralf-Detlef Kutsche
(on behalf of the BDMA steering board).
Advisors: Prof. Dr. Oscar Romero (UPC), Mr. Shumet Tadesse Nigatu (UPC), and Dr. Ralf-Detlef Kutsche
(TU-Berlin).
Thesis available
here.
Educational activities
Kashif's educational activities focus on supervising and assisting
multiple groups of computer science and software engineering students at bachelors level in Aalborg University.
-
Teaching Assistant for Database Management System Course
Fall 2020, Fall 2021, and Fall 2022, BSc Software Engineering Course, Computer Science Department, Aalborg University.
-
Group Supervisor for 5th semster students working on Knox (Knowledge Engineering Toolbox) project
Fall 2020 and Fall 2021, BSc Software Engineering - Semester Project, Computer Science Department, Aalborg University.
-
Group Supervisor for final semester students working on analyzing large scale ships AIS data
Spring 2021, BSc Software Engineering - Bachelors Project, Computer Science Department, Aalborg University.
Additionally, he participates in various related conferences and summer/winter schools.
Achievements, Awards, and Grants
-
PhD Study Abroad Travel Grants 2023
- I was awarded travel grants by Otto Mønsted and Danish
Data Science Academy for study abroad during my PhD studies.
-
Big Data Talent Awards 2019 (Runner-up)
- I was selected as a runner-up for Big Data Talent Awards 2019 at UPC Barcelona, Spain for my master thesis titled as "Dataspaces: Pay-As-you-go Data Integration".
-
Erasmus Mundus Scholarship
- I was awarded a fully funded scholarship for my master's degree in
Big Data Management and Analytics (BDMA).
-
Gold Medal - Bachelors in Computer Science (BSCS)
-
Silver Medal - Bachelors in Computer Science (BSCS)
- I won an institute Silver Medal award at COMSATS University Islamabad, Pakistan for the highest CGPA: 3.84/4.00 among all the institutes of COMSATS University for my Bachelor's degree in Computer Science (BSCS).
-
Open House Final Year Project Award (2nd Position)
- I got 2nd position for my Final Year Project during my Bachelor's
degree in Computer Science (BSCS) at COMSATS University, Islamabad, Pakistan.
Blog Posts
Side Projects
Resume
Reach out ot me if you want to see my resume.
Updated: April 2025