A model for the integration of the multiple models within the Australian, national, Foundational Spatial Data Framework scenario.

overview
Figure 1. Overview of this Supermodel

1. Metadata

IRI

https://linked.data.gov.au/def/fsdf-supermodel

Title

FSDF Supermodel Specification

Description

This Model - the FSDF Supermodel - is the overarching data model that provides integration logic for all FSDF elements. It is based on the general-purpose Supermodel Model.

Created

2022-02-24

Modified

2022-08-05

Issued

2022-08-05

Creator

Geoscience Australia & SURROUND Australia Pty Ltd

Publisher

Intergovernmental Committee on Surveying & Mapping

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

2. Preamble

2.1. Abstract

This Model - the FSDF Supermodel - is the overarching data model that provides integration logic for all FSDF datasets. It is based on the general-purpose Supermodel Model.

This model is effectively an update to the methodology of the Location Index (Loc-I) Project which introduced a generic spatial dataset model, the Loc-I Ontology, and a series of models for the specific FSDF datasets considered by the Loc-I Project. This Supermodel is a more formalised implementation of that Loc-I Project’s vision and also one that relies on updated background models, particularly GeoSPARQL, and updated or new dataset models.

2.2. Namespaces

This model is built on a "baseline" of Semantic Web models which use a variatey of namespaces. Prefixes for these namespaces, used throughout this document, are listed below. Additional namespaces and prefixes are listed in later sections of this document where they only apply to that section.

Table 1. Namespaces
Prefix Namespace Description

dcterms:

http://purl.org/dc/terms/

Dublin Core Terms vocabulary namespace

ex:

http://example.com/

Generic examples namespace

geo

http://www.opengis.net/ont/geosparql#

GeoSPARQL ontology namespace

owl:

http://www.w3.org/2002/07/owl#

Web Ontology Language ontology namespace

rdfs:

http://www.w3.org/2000/01/rdf-schema#

RDF Schema ontology namespace

sosa:

http://www.w3.org/ns/sosa/

Sensor, Observation, Sample, and Actuator ontology namespace

skos:

http://www.w3.org/2004/02/skos/core#

Simple Knowledge Organization System (SKOS) ontology namespace

suterms:

https://linked.data.gov.au/def/supermodel/terms/

Supermodel Terms & Definitions Vocabulary

2.3. Terms & Definitions

The following terms appear in this document and, when they do, the definitions in this section apply to them.

These terms are presented as a formal Semantic Web vocabulary at

Backbone Model

suterms:backbone-model

An integrative, summary model, that allows for crosswalking of elements within the Component Models of a Supermodel.

Background Model

suterms:background-model

A standard and common Semantic Web model used as "upper" or higher order/abstract model for all other Supermodel models to conform to when modelling something within the Background Model’s purview.

Central Class

suterms:central-class

Central Classes are the generic data classes at the centre of Data Domains with high-level relationships between them defined in this supermodel.

These classes are taken from general standards - usually well-known international standards - and specialised and extended within implementation scenarios to cater for specific needs.

Component Model

suterms:component-model

An individual models of something of importance within a Supermodel scenario.

Data Domain

suterms:data-domain

High-level conceptual areas within which Geosicence Australia has data.

These Data Domains are not themed scientifically - 'geology', 'hydrogeology', etc. - but instead based on parts of the Observations & Measurement [ISO19156] standard, realised in Semantic Web form in the SOSA Ontology, part of the Semantic Sensor Network Ontology [SSN].

Current Data Domain are shown in Figure 1.

FSDF

:fsdf-defn - defined here

The Foundation Spatial Data Framework (FSDF): a project to deliver national coverages of the best available, most current, authoritative foundation data which is standardised and quality controlled. See https://link.fsdf.org.au.

Knowledge Graph

suterms:knowledge-graph

A Knowledge Graph is a dataset that uses a graph data structure - nodes and edges - with strongly-defined elements.

Linked Data

suterms:linked-data

A set of technologies and conventions defined by the World Wide Web Consortium that aim to present data in both human- and machine-readable form over the Internet.

Linked Data is strongly-defined with each element having either a local definition or a link to an available definition on the Internet.

Linked Data is graph-based in nature, that is it consists of nodes and edges that can forever be linked to further concepts with defined relationships.

Location Index

suterms:location-index

A project aiming to provide a consistent way to seamlessly integrate spatial data from distributed sources.

Null Profile

:mull-profile-defn - defined here

A Null Profile is a Profile of a Standard that implements no additional constraints on the profile. A Null Profile’s purpose is to act as a conformance target for the Standard by supplying itemised requirements and machine-executable validators when the Standard itself cannot have these elements added to it.

Ontology

suterms:ontology

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse.

The word ontology was originally defined as "the branch of philosophy that studies concepts such as existence, being, becoming, and reality". and the computer science term is derived from that definition.

Profile

https://www.w3.org/TR/dx-prof/#dfn-profile

A data standard that constrains, extends, combines, or provides guidance or explanation about the usage of other standards.

This definition includes what are sometimes called "data profiles", "application profiles", "metadata application profiles", or "metadata profiles". In this document, "profile" and these other variants are all referred to as just "profiles".

Note
This definition has been taken from [PROF] and altered slightly for clarity here
Semantic Web

suterms:semantic-web

The World Wide Web Consortium's vision of an Internet-based web of Linked Data.

Semantic Web is used to refer to something more than just the technologies and conventions of Linked Data; the term also encompasses a specific set of interoperable data models - often called ontologies - published by the W3C, other standards bodies and some well-known companies.

The 'semantic' refers to the strongly-defined nature of the elements in the Semantic Web: the meaning of Semantic Web data is as precisely defined as any data can be.

Vocabulary

suterms:vocabulary

A managed codelist or taxonomy of concepts.

Note
Supermodels tend to use vocabularies formulated according to the [SKOS] taxonomy model or lists of objects defined using basic [RDF], [RDFS] or [OWL] elements.

2.4. Conventions

All model diagrams use elements introduced in Figure 5. These elements are defined in the [RDF], [RDFS] and [OWL] ontologies.

All code snippets in this document, used to show formal and machine-readable versions of concepts, are expressed using the Turtle RDF syntax [TTL].

3. Introduction

This Supermodel is based on previous work from the Location Index (Loc-I) Project. For clarity, that the Loc-I Project is described, followed by how this Supermodel inherits from it.

3.1. Loc-I Project

The Location Index (LOC-I) project, established in 2018, created a methodology, data models and an informal framework to allow for a consistent way to seamlessly integrate spatial data from distributed sources. The target was Australian spatial data "of national significance", meaning most - initially all - of the data considered was Australian Federal government data.

See the project website, http://www.ga.gov.au/locationindex, for more project information.

3.2. Loc-I Technical Implementation

The technical implementation of Loc-I was based on Semantic Web principles allowing datasets to be published as Linked Data independently, by data holders - different government departments, companies etc. - and consumed with minimal effort required for integration.

The technical implementation relied on data from the various datasets sharing common patterns, principally, how the datasets packaged their content and how real-world objects, their spatiality and non-spatial properties were modelled.to this end, a number of Background Models were used, to which all Loc-I dataset models - here called Component Models - conformed. Additionally, a Loc-I Ontology was also created which both included modelling elements needed for the LocI Project that were not present in Background Models and which was to act as a conceptual, if not technical, conformance target for all Component Models.

Figure 2 below shows the original detailed architecture diagram used to explain Loc-I’s parts from 2018 - 2021.

Original Loc-I Detailed Architecture
Figure 2. Original Loc-I Detailed Architecture (from CSIRO)

3.3. Loc-I to Supermodel

The FSDF Project has adopted a more rigorously defined Supermodel concept to formalise things of importance to Loc-I-like work but which the Loc-I Project didn’t define. For example, the categorisation of relevant models as Background Models, Component Models and so on. The major differences/additions are:

  • Formalised terminology

  • Model categorisation

  • Explicit integration

    • By explicitly defining a Backbone Model for each scenario deployment, the Supermodel conventions indicate precisely what minimum requirements for Component Models are to be able to be integrated

  • Validatable profiling instead of model specialisation

    • Loc-I relied on defining an ontology to which datasets were expected to conform

    • Supermodel implements a Profile of Background Models to which Component Datasets mush conform

    • The Profile provides executable data validators

In addition to these model/methodological changes, this particular Spuermodel deployment has updated scenario-specific things, in particular:

  • Use of GeoSPARQL 1.1

    • The Loc-I Project motivated extensions to the GeoSPARQL 1.0 ontology which were captured in the GeoSPARQL Extensions Ontology (GeoX). That ontology was then used by many Loc-I Project dataset models (Component Models)

    • The update to GeoSPARQL, GeoSPARQL 1.1, absorbed many of these updates and so there is no longer a need to use GeoX

  • Collection-based Feature organisation

    • Several Loc-I dataset models (Component Models) used specialised properties to indicate aggregations of Feature, e.g. the ASGS Ontology’s aggregatesTo & isAggregationOf

    • The latest issue of the ASGS dataset within this Supermodel, online at https://linked.data.gov.au/dataset/asgsed3, uses only Collection membership (all Meshblock Features are members of the Meshblocks Feature Collection) and standard topological relations, e.g. each SA2 is geo:sfWithin an SA3

    • This takes advantage of GeoSPARQL 1.1’s collections which match OGC API structures and removes non-standard spatial object relations

    • specialised Feature aggregations may be re-added to objects within this Supermodel, if required

  • Topological querying instead of Linksets

    • Loc-I Linksets are datasets that declare topologicla relations between Features

    • GeoSPARQL allows topological relationships to be calculated using topologicla functions

    • Loc-I Project established only a limited set of Linksets, e.g. the Current Addresses to 2016 Mesh Blocks Linkset

    • This Supermodel deployment includes a cache of all datasets, https://cache.linked.fsdf.org.au, on which topological queries across all datasets can be performed

To conclude this Loc-I/Supermodel relations, here is a table mapping elements and terminology.

Table 2. Loc-I/FSDF Supermodel element comparison
Loc-I FSDF Supermodel Notes

Upper ontologies
GeoSPARQL 1.0, DCAT etc.

Background Models

The Supermodel precisely lists Background Models in Background Models where Loc-I left their discovery to general documentation or dataset model imports

Loc-I Ontology &
GeoX Ontology

Backbone Model

The Backbone Model profiles GeoSPARQL 1.1 and thus incorporates equivalent (updated) GeoX modelling. Loc-I Ontology Linksets are not used. Loc-I Ontology Datasets are replaced with Backbone Model profiling of DCAT

Dataset models
GNAF, ASGS, Geofabric

Component Models

The FSDF Supermodel lists the Component Models defined to be within it formally in this Supermodel document

GNAF Ontology

ANZ National Address Model

The new model is a more standards-based form of the previous one and all aspects of the original model are covered by the new

ASGS Ontology

Backbone Model

The current delivery of the ASGS at https://asgs.linked.fsdf.org.au uses no properties other than those already in the Backbone Model therefore no custom Component Model is needed

Geofabric Ontology

unchanged

The Geofabric data online at https://geofabric.linked.fsdf.org.au currently uses only one element of the Geofabric Ontology, the property hasDownstreamCatchment however more use of that ontology, or an extended version of it, may be used in the future

Placenames Ontology

unchanged

While not an official original Loc-I dataset, Placenames was implemented in Loc-I style at https://fsdf.org.au/dataset/placenames/. It has been brought in to the Supermodel as a standard Component Model using the same ontology, albeit with updates

Geometry Data Service

Graph Cache

Loc-I build a non-Semantic geometry data service for some cross-dataset spatial queries. Supermodel implements a total cache of all Component Models' content in semantic form for GeoSPARQL spatial querying and other semantic querying

The Loc-I project also implemented a series of custom clients for Loc-I systems. These clients demonstrated

  • downloading lists of Loc-I identifiers (IRIs) for classes of object for offline use

  • reapportioning numerical observations data formed according to one set of geometries to another

  • finding Features that intersect with a given point or Feature

Some of this functionality is now available through the SPARQL Endpoints available for all FSDF APIs and also the API accessing a cached copy of all data.

Functionality not covered by SPARQL Endpoints is being implemented in new FSDF clients which Geoscience Australia will release when ready.

4. Supermodel

models
Figure 3. The various models of this Supermodel

4.1. Overview

This Section describes the structure of this Supermodel, aspects of the modelling involved and how to use this Supermodel. Following Sections describe the elements of the Supermodel in detail.

4.2. Structure

The high-level structure of this Supermodel consists of:

  • Background Models

    • Standard and common Semantic Web models used as "upper" or higher order/abstract model for all other Supermodel models to conform to when modelling something within the Background Model’s purview.

    • Models such as the Provenance Ontology [PROV] model provenance and all Supermodel models follow it when doing provenance work

    • GeoSPARQL [GEO] serves as the background model for spatial objects - features and their geometries

  • Backbone Model

    • This is a profile of the Background Models and includes validators

    • Data must conform to this model in order to be considered within this Supermodel

    • This model is a bare minimum: Component Models can, and already do, extend beyond this model to cater for their specific needs

  • Component Models

    • These are individual models for datasets within this Supermodel

    • Not all dataset require a Component Model, for example, the ASGS is currently modelled using Backbone Model elements only

  • Supporting Vocabularies

    • Vocabularies that support the Backbone and Component models

    • They must conform to the VocPub Profile of SKOS

    • They may contain specialised elements beyond VocPub/SKOS too

Further details of and definitions for these elements are provided in the Terms & Definitions section, above, and in the Supermodel Model.

The next section deals with some aspects of how the models are created.

4.3. Modelling Methods

The modelling language/system used for all Supermodel elements expressed formally is the Web Ontology Language [OWL]. OWL diagramming is used for formal model images and, when it is, this is noted in the figure description. The figure below is a key for all OWL diagramming elements.

key
Figure 4. Diagram elements key

4.3.1. Object Modelling

The elements from the above subsection are shown in relation to one another in the figure below.

level0 owl
Figure 5. OWL objects and their relations

The elements shown above are identified with prefixed IRIs that correspond to entries in the Namespace Table. A short explanation of the diagram key elements is:

  • owl:Class - represents any conceptual class of objects. Classes are expected to contain individuals - instances of the class - and the class, as a whole, may have relations to other classes

  • owl:NamedIndividual - an individual of an owl:class. For example, for the class ships, an individual might be Titanic

  • rdf:property - a relationship between classes, individuals, or any objects and Literals

  • rdfs:subClassOf - an rdf:property indicating that the domain (from object) is a subclass of the range (to objects). An example is the class student which is a subclass of person: all students are clearly persons but not vice versa

  • rdf:type - the property that related an owl:NamedIndividual to the owl:Class that it’s a member of

  • Literal - a simple literal data property, e.g. the string "Nicholas", or the number 42. Specific literal types are usually indicated when used

The remaining diagrams in this document use extensions to this basic model, for example Figure 3 uses colour-coded specialised forms of owl:Class (subclasses of it).

4.3.2. Provenance

General provenance/lineage information about anything - a rock sample, a dataset, a term in a vocabulary etc. - is described using the Provenance Ontology [PROV] which views everything in the world as being of one or more types in Figure 3.

level0 prov
Figure 6. PROV main classes and main relations

According to PROV, all things are either a:

  • prov:Entity - a physical, digital, conceptual, or other kind of thing with some fixed aspects

  • prov:Agent - something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity

  • prov:Activity - something that occurs over a period of time and acts upon or with entities

While not often in front of mind for objects in any Data Domain, provenance relations always apply, for example: a sosa:Sample within the Sampling domain is a prov:Entity and will necessarily have been created via a sosa:Sampling which is a prov:Activity. Another example: an sdo:Person related to a dcat:Dataset via the property dcterms:creator in the DataCataloging domain is a specialised form of a prov:Agent related to a prov:Entity via prov:wasAttributedTo.

4.4. Ensuring Data Conformance

First the requirements for data to conform to are described, then how to test conformance in the Validation section.

4.4.1. Conformance Requirements

Data wishing to be used within this Supermodel must conform to:

  1. Relevant Background Models

  2. The Backbone Model

  3. Perhaps a Component Model

Relevant Background Models

All data within the Supermodel will need to conform to at least some Background Models. Working out which ones are relevant is done by looking at the conceptual scope of the various listed Background Models and comparing the conceptual scope of the data to them.

For example, if the data is spatial - and most of it will be - it will need to conform to GeoSPARQL [GEO]. If it’s observations information, Data Cube [DQ].

Many Background Models are generic and have a wide scope and thus most data will need to conform to most Background Models.

For example, if the data contains provenance information, regardless of whether it’s a spatial or observations dataset, it will need to conform to the Provenance Ontology [PROV].

Backbone Model

All data will need to conform to the Backbone Model as this model is used to ensure all data can work together.

This model is only concerned with minimum requirements for data, so data conforming to this model may have any other things in it - details specific to that dataset’s concern - that are un-handled/unknown in the Backbone Model. That’s fine, as long as the minimal requirements are met.

Component Model

Many datasets will have a Component Model implemented for them. If they due, obviously data within that dataset must conform to it. If no Component Model has been implemented, it means that dataset is a direct implementation of the Backbone Model and need only conform to that.

4.4.2. Validation

To ensure that data within a dataset conforms to the models it needs to, automated validation of data must occur. This Supermodel implements validators for the Backbone Model that must be used to test data with.

This Supermodel is also either obtaining or implementing validators for all Background Models over time. Validators implemented for Background Models in this Supermodel are implemented within Null Profiles of them since most of the Background Models are previously defined controlled standards that cannot have all the profiling elements relevant to Supermodels just added to them.

4.5. How to use this Supermodel

This Supermodel provides a general structure for datasets that want to integrate within the FSDF Data Platform. The common tasks you might perform with the Supermodel are:

  1. Model a new dataset as an FSDF Supermodel generic dataset

  2. Validate new dataset data according to the FSDF Supermodel

  3. Create an extended/specialised Component Model for a dataset

  4. Validate extended/specialised new dataset data according to the extended/specialised FSDF Supermodel Component Model

  5. Create a dataset of observations - population/statistical or natural world - linked to Component Models

Detailed suggestions as to how to achieve these tasks are given below.

4.5.1. 1. Model a new dataset

Individual datasets are modelled as Component Models. The most basic of Component Models contain Dataset, FeatureCollection & Feature classes modelled using the DCAT & GeoSPARQL Background Models with certain relations. The details of this modelling are given in the first part of the Component Models section.

To model a highly specialised dataset, you will need to be able to implement both the most basic Component Model elements but also model the specialised elements relevant to your dataset. No specific guidance about your dataset can be given here however the Component Models section does indicate existing datasets that contain a large amount of specialisation that you may draw inspiration from.

In all cases, you can use the tools listed in the Validators section to test any data you’ve created to see if it really is valid according to this Supermodel.

4.5.2. 2. Validate new dataset

Data validators are available, for all elements of this Supermodel, so you can use them to validate your data. See the Validators section.

4.5.3. 3. Create an extended/specialised 'Component Model'

As per subsection 1. above, we can’t give specific details about specialised modelling here since we don’t know about your particular dataset however we can both indicate existing specialised datasets (see the start of the Component Models section), and we can make a few general points:

  • this Supermodel is concerned with the modelling of spatial datasets as Component Models with Dataset, FeatureCollection & Feature with certain relations between them

  • most specialisation is likely to occur by adding special properties to Feature instances

    • for example, the Feature instances within the FSDF’s Power Stations FeatureCollection contain properties relevant to power generation, such as primaryfuelType indicating coal, biogas etc., and these are important for knowledge of Power Stations but don’t affect the general spatial feature modelling of this Supermodel in any way

  • spatial relations - between Feature instances within one FeatureCollection or even across Dataset instances - are expected and can be modelled using GeoSPARQL’s Simple Features Topological Relations Family.

    • No custom modelling is likely required for standard spatial relations

  • The geometries of Feature instances can be represented in several ways and Feature instances can have multiple geometries

    • Boundaries at different levels of resolution may be given or geometries with different roles, e.g. high and low tide boundaries

4.5.4. 4. Validate extended/specialised new dataset data

As per section 2. above, see the Validators section. Of course, your specialised modelling won’t have a validator for it, however you can certainly ensure that your new data is valid according to this Supermodel.

4.5.5. 5. Create a dataset of observations

The spatial datasets within this Supermodel are intended to present spatial objects that observations' data can be referenced against. For example, Australian census data is keyed to the Mesh Blocks and other spatial areas of the ASGS dataset, water data in the Bureau of Meteorology’s AWRIS system are keyed to catchments within the Geofabric dataset.

You can create your own observations data and key them to any datasets that exist within this Supermodel or to datasets that you make that are compatible with this Supermodel’s elements.

5. Background Models

Background Models are:

standard and common Semantic Web model used as "upper" or higher order/abstract model for all other Supermodel models to conform to when modelling something within the Background Model’s purview.
Background Models definition from the Terms & Definitions section

5.1. List

The particular Background Models in this FSDF Supermodel are given in the table below, with a description of the conceptual area they cover indicated as 'Domain', to assist in assessment of their relevance to Supermodel data.

Table 3. Background Models
Background Model Reference Domain

Web Ontology Language

[OWL]

General Modelling: all the other Background Models are OWL models

schema.org

schema.org

General Modelling: Agents (People & Organisations), licensing etc.

Data Catalog Vocabulary

[DCAT]

Dataset Metadata

The Provenance Ontology

[PROV]

Data Metadata: attribution of data to owners/publishers etc. Data lineage: what things datasets derive from

GeoSPARQL 1.1

[GEO]

Spatiality: Feature/Geometry links, topological relations, spatial scalar values (e.g. area)

Data Cube Vocabulary

[QB]

Data Dimensions: for observations data, e.g. population census

Sensor, Observation, Sample, and Actuator (SOSA)

[SSN]

Data Dimensions: for spatial and natural-world data, e.g. from satellites

Simple Knowledge Organization System

[SKOS]

Vocabularies

Vocabulary Publications Profile of SKOS

[VOCPUB]

A profile of SKOS requiring certain properties for vocabularies and their elements

5.2. Domain Details

The domains for each of the Background Models are noted in the table above, however here now are indicative models for each of then indicating the main classes and properties of concern within the domain.

5.2.1. General Modelling

All of the Background Models, the Backbone Model and all other Supermodel models use the Web Ontology Language, OWL, for their modelling structures. While a description of OWL is out-of-scope for this document, below is given a key for the main OWL elements seen in subsequent figures.

key
Figure 7. Figure key for the OWL modelling used in subsequent Figures

OWL it itself built on [RDF] & [RDFS], but we only reference OWL in this Background Models section as OWL "covers" these lower-level models.

5.2.2. Dataset Metadata

Metadata for datasets within this Supermodel is based on the Data Catalog Vocabulary [DCAT] with elements of The Provenance Ontology [PROV] for a few purposes, such as Data/Agent relations. Note that DCAT recommends this use of PROV. The figure below gives an informal overview of the concerns in this domain.

domain dataset metadata
Figure 8. Data Cataloguing Model, based on DCAT & PROV

Specifics properties for dataset-level metadata, such as license, copyright notices, who the publisher is etc. are mostly taken from schema.org which is a general-purpose OWL (or at least OWL-compatible!) vocabulary of classes and properties.

schema.org is also used for Agent/Agent relations, as per the figure below.

details agents
Figure 9. Organisation Model, based on schema.org

5.2.3. Spatialiaty

This Supermodel’s core concern of modelling spatiality is based on use of the GeoSPARQL 1.1 Standard [GEO] which concerns itself with the elements in the figure below. The figure is a part reproduction of GeoSAPRQL’s overview diagram.

domain spatiality
Figure 10. Classes and properties of the GeoSPARQL model from [GEO], Figure 3

Essentially all spatial relations between objects and the associations of objects with spatiality (Features with Geometries) and the details of Geometry data are defined by GeoSPARQL.

5.2.4. Data Dimensions

The dimensions of data are, in general, modelled in relation to observations according to the Data Cube Vocabulary [QB], however the dimensions (observable properties) of spatial and real-world objects is modelled using Sensor, Observation Sampling & Actuation (SOSA) ontology within the Semantic Sensor Networks [SSN] standard. SOSA is, within this Supermodel at least, a domain-specialised version of QB.

domain observations
Figure 11. Observations Model, based on Data Cube Vocabulary overview
domain spatial observations
Figure 12. Observations Model for Spatial & Real-World Features

The separate modelling for spatial/real-world features' properties is due to widespread use of SOSA for observations in that domain, for example the Geoscience Australia Samples catalogue (http://sss.pid.geoscience.gov.au/sample/).

The net effect of both QB and SOSA is to define data types for and observable properties/dimensions that observations are of.

5.2.5. Vocabularies

Many of the models in this Supermodel rely on vocabularies of individual items, for example Data/Agent relations rely on vocabularies of Agent roles. When vocabularies are modelled, this Supermodel uses a profile of the Simple Knowledge Organization System, [SKOS], called VocPub [VOCPUB]. VocPub just requires certain metadata, allowed by bot not mandated by SKOS, to be present within vocabularies for data management purposes. At a whole-of-vocabulary level, VocPub requires very similar metadata to DCAT, thus a Vocabulary appears as a form of Dataset.

All the current FSDF vocabularies at https://linked.fsdf.org.au/vocab conform to VocPub.

domain vocabularies
Figure 13. Basic SKOS/VocPub data model

6. Backbone Model

The Backbone Model of this Supermodel is the model to which all data MUST conform. The model is a profile of the Background Models, which means it implements no new modelling elements of its own but just constrains existing elements in the Background Models. Thus, anything that conforms to the Backbone Model will conform to the Background Models also. This form of profiling is formally defined in The Profiles Vocabulary [PROF].

This Backbone Model is quite "lite" in that it only has comparatively few requirements. This is due to its role: to ensure that all data in the Supermodel exhibits a minimum set of properties and patters for interoperability. the Backbone Model doesn’t try to model all thins within all sub-domains of this Supermodel: that is the job of the various Component Models.

This Backbone Model is defined at:

The profile’s main elements are articulated here for the completeness of documentations.

6.1. Definition

This is part of the formal definition of this profile:

<https://linked.data.gov.au/def/fsdf-backbone>
    a prof:Profile ;
    sdo:name "FSDF Backbone Model Profile" ;
    sdo:description "This is a profile of DCAT, GeoSPARQL & VocPub to be used as a conformance target for data within the FSDF Supermodel"@en ;
    prof:profileOf
        <https://www.w3.org/TR/vocab-dcat/> ,
        <http://www.opengis.net/doc/IS/geosparql/1.1> ,
        <https://www.w3.org/TR/vocab-data-cube/> ,
        <https://www.w3.org/TR/vocab-ssn/> ,
        <https://w3id.org/profile/vocpub> ;
    ...
.

This code identifies the profile, https://linked.data.gov.au/def/fsdf-backbone, and, appart from basic human-readable annotations, states that it is a profile of (profileOf) the main Background Models.

The full profile definition, online at https://linked.data.gov.au/def/fsdf-backbone, gives further details such as the listing of resources within the profile, which includes its specification & validators - also described below.

6.2. Requirements

Here the requirements for data to conform to this profile are listed. The Requirements are identifies (GM1 etc.) and they are referenced by validation rules in the Validation section described below. Note that some of these Requirements reference whole other Standards and Profile so that all the Requirements from them are relevant.

The capitalised, italicised words such as MUST, MAY etc., have meanings as per [RFC2119].

The rdf namespace referred to is http://www.w3.org/1999/02/22-rdf-syntax-ns#.

Domain ID Name Definition

General Modelling

GM1

OWL Conformance

Data must conform to OWL

GM2

Class Modelling

Classes of object MUST be modelled as owl:Class instances

GM3

Property Modelling

Properties and predicates MUST be modelled as either rdf:Property instances or instances of the various OWL properties

Dataset Metadata

DM1

DCAT Conformance

Data must conform to DCAT

DM2

Dataset Mandatory Properties

dcat:Dataset instances MUST be presented as dcat:Dataset instances with at least the following properties: dcterms:identifier - as an xsd:token value, dcaterms:title & dcterms:description - as strings or langString values, sdo:dateCreated & sdo:dateModified - as xsd:date, xsd:dateTime or xsd:dateTimeStamp values

DM3

Dataset Agents

dcat:Dataset instances MUST indicate at least creator & publisher Agents via use of the sdo:creator and sdo:publisher properties. The values for these properties must be IRIs, not strings

DM4

Dataset Provenance

dcat:Dataset instances SHOULD indicate provenance in one of three ways: if derived from an RDF dataset then with prov:wasDerivedFrom - with an IRI value, if derived from an online but non-RDF data source then with dcterms:source - xsd:anyURI` value and if neither then with a written statement with dcterms:provenance - string or langString value

DM5

Dataset Spatiality

dcat:Dataset instances MUST indicate the total spatial footprint the elements within them by indicating a geo:Geometry object via a geo:hasGeometry or geo:hasBoundingBox property

DM6

Dataset Feature Collections

dcat:Dataset instances MUST indicate at least one geo:FeatureCollection instances within them with the rdfs:member property

Spatiality

S1

GeoSPARQL Conformance

Spatial data MUST conform to the GeoSPARQL 1.1 Standard

S2

Feature Collection Mandatory Properties

geo:FeatureCollection instances MUST be presented with at least the following properties: dcterms:identifier - as an xsd:token value, dcaterms:title & dcterms:description - as strings or langString values

S3

Dataset Spatiality

geo:FeatureCollection instances MUST have the total spatial footprint the elements within them by indicating a geo:Geometry object via a geo:hasGeometry or geo:hasBoundingBox property

S4

Feature Collection Features

geo:FeatureCollection instances MUST indicate at least one geo:Feature instances within them with the rdfs:member property

S5

Feature Mandatory Properties

geo:Feature instances MUST be presented with at least the following properties: dcterms:identifier - as an xsd:token value

Data Dimensions

DD1

Data Cube Vocabulary Conformance

Non-physical sciences observations data must conform to Data Cube Vocabulary

DD2

SOSA Conformance

Physical sciences observations data must conform to the SOSA ontology

Vocabularies

V1

VocPub Conformance

Vocabularies MUST conform to the VocPub Profile of SKOS

6.3. Validation

To prove that data does conform to this Backbone Model, it must be validated. Since all the expected data for this Supermodel is RDF data, SHACL [SHACL] validation may be used.

This Profile presents its own validator which only includes tests for the rules specific to this profile and not those of the things this Profile profiles. However, a compounded validator is also given below which includes this Profile’s validator and validators from all the Standards and Profiles that this Profile profiles, that have validators. The Standards' and Profiles' are also listed individually.

Note
Since of the Standards that this Profile Profiles do not present SHACL validators, we use Null Profiles for them where a Null Profile is a Profile that implements no constrains on the Standard profiles and exists only to provide a validator for it.

For total validation, the compounded validator should be used. For partial validation, use each of the individual ones.

6.3.1. Process

To validate RDF data, a SHACL validation tool, such as pySHACL (online tools for validation exist too, see [tooling] below), is used with the data to be validated and the validator as inputs. The data to be validated must include all the elements necessary for validation, for example, if a valid Dataset/Agent relation includes the requirement for the Agent to be classed as an sdo:Person or an sdo:Organization then the data to be validated must declare this classification, rather than leaving it up to external resources.

Note
Validators that find nothing to validate will return true, so if the data to be validated contains no instances fo classes known to the validator, no sensible result will be obtained.

Regarding scale: validation is a resource-intensive task, so large datasets should not be validated without dedicated systems. It is probably appropriate to validate only a sample of Dataset contents, especially if the content is produced by a script or someother method that makes similar Feature Collections & Features.

6.3.2. Validators

Standard / Profile Validator IRI

Backbone Model

Backbone Model Validator

https://linked.data.gov.au/def/fsdf-backbone/validator

Backbone Model

Backbone Model Compounded Validator

https://linked.data.gov.au/def/fsdf-backbone/validator-compounded

DCAT

DCAT Null Profile Validator

https://w3id.org/profile/dcat-null

GeoSPARQL 1.1

GeoSPARQL Validator

http://www.opengis.net/def/geosparql/validator

Data Cube Vocabulary

Data Cube Vocabulary Null Profile Validator

https://w3id.org/profile/qb-null

SOSA

SOSA Null Profile Validator

https://w3id.org/profile/sosa-null

VocPub

VocPub Validator

https://w3id.org/profile/vocpub/validator

6.3.3. Tooling

Several online SHACL validation tools exist that may be used with the validators above:

We recommend either the public RDFTools Online tool, since it is actively maintained, and includes some of the validators listed above, or the Geoscience Australia copy of RDFTools with all the validators above preloaded:

7. Component Models

This section contains the details of the various Component Models used for the individual datasets within the FSDF Data Platform.

Several datasets in the Platform use only the Backbone Model for their model and thus require no specialised Component Model. It is expected that many straightforward spatial dataset may be able to be created as plain Backbone Model-only datasets in this way if the only contain collections of names Features without special relationships or properties.

The table below lists the current major Datasets in the FSDF Data Platform and their Component Models or use of the Backbone Model.

Dataset Persistent Identifier Model Notes

Australian Statistical Geographies Standard, Edition 3

https://linked.data.gov.au/dataset/asgsed3

Backbone Model

Previous editions of the ASGS dataset used specialised Component Models, in particular the ASGS Ontology, but the current Edition 3 uses only the Backbone Model

Australian Hydrological Geospatial Fabric (Geofabric), v3

https://linked.data.gov.au/dataset/geofabric

Geofabric Ontology

The Geofabric Ontology is a very small specialisation of the Backbone Model that only declares specialised classes of geo:Feature (Catchment, RiverRegion etc.), but declares only one special property hasDownstreamCatchment for Catchment instances

(Australian) Geocoded National Address File (G-NAF)

https://linked.data.gov.au/dataset/gnaf

ANZ National Address Model

The G-NAF uses a highly specialised Component Model that models many aspects of Addresses

Placenames

https://linked.data.gov.au/dataset/placenames

Place Names Ontology

The Place Names ontology originally made for the Loc-I Project is currently used, however a future version might use parts of the ANZ National Address Model which covers Place Names also

The table below lists the current smaller "FSDF" Datasets in the FSDF Data Platform.

Dataset Persistent Identifier Model Notes

Electrical Infrastructure

https://linked.data.gov.au/dataset/power-infrastructure

Backbone Model

Facilities

https://linked.data.gov.au/dataset/facilities

Backbone Model

Sandgate example

http://example.com/dataset/sandgate

Backbone Model

This dataset does not have a Persistent Identifier as it’s an example dataset only

8. Supporting Vocabularies

The vocabularies supporting the Component Models of the Datasets listed in the Section above are all from one of the following types:

  1. Well-known, public vocabularies

    • For example, the vocabulary of Role types for Agents with respect to Datasets is the ISO’s Role Code vocabulary

  2. Vocabularies within Component Models

  3. Vocabularies created for Dataset within this Supermodel not within Component Models

All vocabularies are discoverable by inspecting the data that refers to them. Those within Component Models are also discoverable via their models, but those in Category 3. are also discoverable via the FSDF Vocabulary Server:

References