Skip to content

Instantly share code, notes, and snippets.

@cristianvasquez
Last active April 15, 2025 11:37
Show Gist options
  • Select an option

  • Save cristianvasquez/19634b549b635bd59b9696f0ef34141a to your computer and use it in GitHub Desktop.

Select an option

Save cristianvasquez/19634b549b635bd59b9696f0ef34141a to your computer and use it in GitHub Desktop.
Transforming XML to RDF through RDB

Transforming XML to RDF through RDB

When generating notice/RDF, the ted-rdf-mapping-eforms repository map N source schemas to a single target ontology (v4.0), resulting in N individual mappings. With each update to the ontology, all N mappings must also be updated, adding complexity and cost.

By introducing an intermediate model as a pivot, we can simplify ontology upgrades. The concept is to map the N source schemas to the pivot, and then map the pivot to RDF. This reduces the number of mappings to maintain during Ontology upgrades to one.

One approach for this pivot model is to use an in-memory relational database (RDB), where the XML data is mapped to relational tables. RDF can then be produced from the relational schema using a mapping language such as R2RML.

This method will enhance performance and clarity in the mapping process.

Implementation

  • Initialization
    • Set up an in-memory relational database, prepopulating it with tables that contain controlled vocabularies.
  • Populating Data
    • Traverse the XML data to fill the corresponding database tables.
      • Converting XML into relational tables is a well-known problem, and tools exist to automate this process, often utilizing XSD files. Ideally, a declarative language should be used for this translation.
  • Mapping to RDF
    • Convert the populated relational tables into RDF triples. O(n)
      • During development, tools like Ontop can be used to execute SPARQL queries directly against the relational database.
      • While I haven't used the latest version of R2RML, my previous experience with D2RQ yielded excellent results.

Pros and Cons

Pros

  • Lower Migration Costs: When updating to a new ontology version, only the R2RML mapping needs to be modified, rather than N individual mappings.
  • Built-in Data Integrity: Database constraints, such as foreign keys tied to controlled vocabularies, help ensure data integrity during the transformation process.

Cons

  • XML-to-Relational Challenges: It remains to be seen whether all SDK versions can be effectively translated into standard relational tables. Further investigation is required to ensure full compatibility. I've asked ChatGPT to come up with the relational DB schema starting from XML. It proposes something that appears to make sense.
@cristianvasquez
Copy link
Author

cristianvasquez commented Apr 15, 2025

Bonus: Transforming RDF via HTML

One of the simplest approaches to extract RDF is by leveraging the HTML representation of a notice.

Step-by-step:

  1. Fetch the HTML of the notice page, e.g., https://ted.europa.eu/en/notice/-/detail/246765-2025.
  2. Process the HTML to inject embedded RDFa (RDF-in-attributes) containing semantic annotations aligned with ePO.
  3. Triplify the document using these annotations to generate RDF data.

Pros:

  • Easy to implement.
  • No need to manage different notice versions as the work was done by the team that produces the HTML

Cons:

  • Relies on the current structure of the HTML view, which may change over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment