Kashif Rabbani

Kashif Rabbani

Computer Scientist -- Data Engineering & Science

Novo Nordisk, Denmark

Kashif is a data & software engineer with a doctorate in Computer Science and 6+ years of experience delivering high-impact, data-driven solutions across research and industry. His work spans scalable data platforms, distributed processing, and modern graph-based modeling.

Technical Skills & Expertise

Graphs & Semantic Technologies
Expert
TopBraid EDG Neo4j GraphDB AWS Neptune Protege/Semaphore Virtuoso Apache Jena RDF4j SPARQL SHACL OWL
AI & Machine Learning
Expert
OpenAI LangChain Graph-RAG Vector-RAG Milvus DB Prompt Engineering Fine-Tuning CurateGPT
Programming & Development
Advanced
Python Java JavaScript Swift Spring Framework Flask
Big Data & Cloud
Advanced
Databricks Apache Spark AWS Azure PostgreSQL MongoDB
Analytics & Visualization
Advanced
Tableau Power BI Matplotlib Plotly Highcharts
Research & Development
Expert
Research Paper Writing Peer Review Docker CI/CD LaTeX

PhD Thesis

Scalable Extraction and Adoption of Shapes for Improving Data Quality and Query Processing in Knowledge Graphs
Aalborg University, Denmark
Supervised by Prof. Katja Hose and Prof. Matteo Lissandrini

Publications

Transforming RDF Graphs to Property Graphs using Standardized Schemas
,, Angela Bonifati, and
Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) June 22-27, 2025 Berlin, Germany.
Knowledge Graphs can be encoded using different data models. They are especially abundant using RDF and recently also as property graphs. While knowledge graphs in RDF adhere to the subject-predicate-object structure, property graphs utilize multi-labeled nodes and edges, featuring properties as key-value pairs. Both models are employed in various contexts, thus applications often require transforming data from one model to another. To enhance the interoperability of the two models, we present a novel technique, S3PG, to convert RDF knowledge graphs into property graphs exploiting two popular standards to express schema constraints, i.e., SHACL for RDF and PG-Schema for property graphs. S3PG is the first approach capable of transforming large knowledge graphs to property graphs while fully preserving information and semantics. We have evaluated S3PG on real-world large-scale graphs, showing that, while existing methods exhibit lossy transformations (causing a loss of up to 70% of query answers), S3PG consistently achieves 100% accuracy. Moreover, when considering evolving graphs, S3PG exhibits fully monotonic behavior and requires only a fraction of the time to incorporate changes compared to existing methods.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. "Transforming RDF Graphs to Property Graphs using Standardized Schemas." Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '25) 2, (4): (December, 2024): 1-25.

SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes
,, and
In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.
We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG. Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes. In Proceedings of the 2023 International Conference on Management of Data, (SIGMOD-Companion '23) June 18-23, 2023, Seattle, WA, USA.

Extraction of Validating Shapes from very large Knowledge Graphs
,, and
In Proceedings of the Very Large Databases 2023 (Volume 16 Issue 5, VLDB-2023) August 2023, Vancouver Canada.
Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations. There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs. Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes. To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution. QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data. To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData. Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Extraction of Validating Shapes from very large Knowledge Graphs In Proceedings of the Very Large Databases 2023 (Volume 16), August 28 - Sept 02, 2023, Vancouver, Canada.

End-to-End Incremental Data Integration via Knowledge Graphs
Javier flores, Kashif Rabbani, Sergi Nadal, Cristina Gómez, Oscar Romero, Emmanuel Jamin, Stamatia Dasiopoulou
In Semantic Web Journal (SWJ)
Data integration, the task of providing a unified view over a set of data sources, is undoubtedly a major challenge for the knowledge graph community. Indeed, such flexible data structure allows to model the characteristics of source schemata, rich semantics for the global schema and the mappings between them. Yet, the design of such data integration systems still entails a manually arduous task. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental data integration approach. By considering all tasks that compose the end-to-end data integration workflow (i.e., bootstrapping, schema matching, schema integration and generation of querying constructs, we are able to address them in a unified manner.We provide algorithms for each task, as well as theoretically prove the correctness of our approach and experimentally show its practical applicability.
SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption
,, and
In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. DOI: 10.1145/3487553.3524253
Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and within organizations. Various methods exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-of-the-art languages to define validating shapes for KGs. Since the usage of these constraint languages has recently increased, new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022. Lyon France.

Optimizing SPARQL Queries using Shape Statistics
,, and
In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021.
With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization - and in particular cardinality estimation and join ordering - becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. Optimizing SPARQL Queries using Shape Statistics. Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021

ODIN: A dataspace management system
Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu
In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference, ISWC 2019.
ODIN (On-demand Data Integration) is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms of the resulting dataspaces. This website is a companion of a demonstration paper submitted to ISWC 2019, where we describe some of its characteristics and underlying assumptions, including the user interactions required. ODIN's novelty lies in a largely automated bottom-up approach (i.e., driven by the sources at hand) that includes the user in the loop for disambiguation purposes. ODIN relies on the concept of traceability graph, which are generic metadata abstraction (i.e., not tailored for an specific task) about the integration of a particular set of data sources. From this graphs, ODIN is capable of generating target-oriented metadata constructs. In this demonstration we focus on those for query answering over dataspaces.

Nadal Francesch, Sergi, Kashif Rabbani, Óscar Romero Moral, and Shumet Tadesse Nigatu. "ODIN: A dataspace management system." In Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas): co-located with 18th International Semantic Web Conference (ISWC 2019): Auckland, New Zealand, October 26-30, 2019, pp. 185-188. CEUR-WS. org, 2019

ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources.
Shumet Tadesse, Cristina Gómez, Oscar Romero, Katja Hose, Kashif Rabbani
In IEEE 23rd International Enterprise Distributed Object Computing Conference, EDOC 2019.
The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.

Tadesse, Shumet, Cristina Gómez, Oscar Romero, Katja Hose, and Kashif Rabbani. "ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources." In 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC), pp. 190-196. IEEE, 2019

Master Thesis
Supporting the Semi-Automatic Creation of the Target Schema in Data Integration Systems

BDMA 2019
Supervisors: Prof. Dr. Oscar Romero (UPC), Prof. Dr. Volker Markl (TU-Berlin), and Dr. Ralf-Detlef Kutsche (on behalf of the BDMA steering board) Advisors: Prof. Dr. Oscar Romero (UPC), Mr. Shumet Tadesse Nigatu (UPC), and Dr. Ralf-Detlef Kutsche (TU-Berlin)

Achievements, Awards, and Grants

Award for Excellence in Software: Theory and Practice, IFIP TC2 Manfred Paul
Awarded to the paper entitled "Extraction of Validating Shapes from Very Large Knowledge Graphs".
Big Data Talent Awards 2019 (Runner-up)
I was selected as a runner-up for Big Data Talent Awards 2019 at UPC Barcelona, Spain for my master thesis titled as "Dataspaces: Pay-As-you-go Data Integration".
Erasmus Mundus Scholarship
I was awarded a fully funded scholarship for my master's degree in Big Data Management and Analytics (BDMA).
Gold Medal - Bachelors in Computer Science (BSCS)
I won a campus Gold Medal award at COMSATS University Islamabad, Pakistan for the highest CGPA: 3.84/4.00 (Batch Spring 2013-17) of my Bachelor's degree in Computer Science (BSCS).
Silver Medal - Bachelors in Computer Science (BSCS)
I won an institute Silver Medal award at COMSATS University Islamabad, Pakistan for the highest CGPA: 3.84/4.00 among all the institutes of COMSATS University for my Bachelor's degree in Computer Science (BSCS).
Open House Final Year Project Award (2nd Position)
I got 2nd position for my Final Year Project during my Bachelor's degree in Computer Science (BSCS) at COMSATS University, Islamabad, Pakistan.

Educational Activities

Kashif's educational activities focus on supervising and assisting multiple groups of computer science and software engineering students at bachelor's level at Aalborg University.

Teaching & Supervision
Teaching Assistant - Database Management Systems
Duration: Fall 2020, Fall 2021, Fall 2022
Program: BSc Software Engineering
Department: Computer Science, Aalborg University
Group Supervisor - Knox Project
Project: Knowledge Engineering Toolbox
Duration: Fall 2020, Fall 2021
Level: 5th Semester - BSc Software Engineering
Group Supervisor - AIS Data Analysis
Project: Large Scale Ships AIS Data Analysis
Duration: Spring 2021
Level: Final Semester - Bachelor's Project
Academic Participation
Conference & Summer School Participation
Actively participates in various related conferences and summer/winter schools to stay current with research trends and methodologies.
eBISS 2022 - Research Presentation
Attended the Tenth European Big Data Management & Analytics Summer School (eBISS) in 2022 and presented a poster about PhD research.

iOS Development (Swift)

Mobile App Developer & Publisher
I have developed and published 6 iOS applications using Swift on the Apple App Store. These apps demonstrate my mobile development skills and cover various domains including education and religious studies, showcasing my ability to create user-friendly mobile experiences.
Tech Stack
Language: Swift
Platform: iOS, iPadOS
IDE: Xcode
Apps Published: 6
Status: Active & Maintained

Resume & Contact

Resume & References Available
For my detailed resume, references, or additional information about my work and research, please feel free to reach out to me directly.

Updated: July 2025