Schema Discovery

The main objective of the task is to design, develop and implement the various components for Propert Graphs (PGs) data processing at scale, such as the schema discover. Methods for schema discovery, which are necessary for establishing mappings and transformation rules between multiple PGs, are currently underway. Property graph schemas are being defined as part of the LDBC standardisation activities and these definitions are expected to be adopted by graph database vendors. Schema inference methods can be used to extract standard schemas from PGs and use them to specify mappings across different PGs.

Most graph database systems do not require an a priori schema definition, precisely in order to support seamless data evolution and scalability. However, tasks such as data integration are error-prone in the absence of schema information. This task will capitalize works from semantic and JSON schema discovery to enable schema discovery for PGs. To this direction approaches trying to cluster PGs will be exploited with rule-based and machine learning approaches, whereas key idea to schema identification are the properties attached to the various nodes establishing a pattern for similar nodes.