Accessing RDF/XML/OWL file nodes using Perl -


i have rdf/xml data i'd parse , access node. looks this:

<!-- http://purl.obolibrary.org/obo/vo_0000185 -->      <owl:class rdf:about="&obo;vo_0000185">         <rdfs:label>influenza virus gene</rdfs:label>         <rdfs:subclassof rdf:resource="&obo;vo_0000156"/>         <obo:iao_0000117>yh</obo:iao_0000117>     </owl:class>        <!-- http://purl.obolibrary.org/obo/vo_0000186 -->      <owl:class rdf:about="&obo;vo_0000186">         <rdfs:label>rna vaccine</rdfs:label>         <owl:equivalentclass>             <owl:class>                 <owl:intersectionof rdf:parsetype="collection">                     <rdf:description rdf:about="&obo;vo_0000001"/>                     <owl:restriction>                         <owl:onproperty rdf:resource="&obo;bfo_0000161"/>                         <owl:somevaluesfrom rdf:resource="&obo;vo_0000728"/>                     </owl:restriction>                 </owl:intersectionof>             </owl:class>         </owl:equivalentclass>         <rdfs:subclassof rdf:resource="&obo;vo_0000001"/>         <obo:iao_0000116>using rna may eliminate problem of having tailor vaccine each individual patient specific immunity. advantage of rna can used immunity types , can taken single cell. dna vaccines need produce rna prompts manufacture of proteins. however, rna vaccine eliminates step dna rna.</obo:iao_0000116>         <obo:iao_0000115>a vaccine uses rna(s) derived pathogen organism.</obo:iao_0000115>         <obo:iao_0000117>yh</obo:iao_0000117>     </owl:class> 

the complete rdf/xml file can found here.

what want do following:

  1. find chunk contains entry <rdfs:subclassof rdf:resource="&obo;vo_0000001"/>
  2. access literal term defined <rdfs:label>...</rdfs:label>

so in above example code go through second chunk , output: "rna vaccine".

i'm stuck following code. couldn't access node. what's right way it? solutions other using xml::libxml welcomed.

#!/usr/bin/perl -w use strict; use data::dumper; use carp; use file::basename; use xml::libxml 1.70;  $filename = "vo.owl"; # obtained http://svn.code.sf.net/p/vaccineontology/code/trunk/src/ontology/vo.owl  $parser = xml::libxml->new(); $doc = $parser->parse_file( $filename );  foreach $chunk ($doc->findnodes('/owl:class')) {         ($label) = $chunk->findnodes('./rdfs:label');         ($subclass) = $chunk->findnodes('./rdfs:subclassof');         print $label->to_literal;         print $subclass->to_literal;  } 

parsing rdf if xml folly. exact same data can appear in many different ways. example, of following rdf files carry same data. conforming rdf implementation must handle them identically...

<!-- example 1 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">   <rdf:description rdf:about="#me">     <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/person" />     <foaf:name>toby inkster</foaf:name>   </rdf:description> </rdf:rdf>  <!-- example 2 --> <rdf:rdf     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:foaf="http://xmlns.com/foaf/0.1/">   <foaf:person rdf:about="#me">     <foaf:name>toby inkster</foaf:name>   </foaf:person> </rdf:rdf>  <!-- example 3 --> <rdf:rdf     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:foaf="http://xmlns.com/foaf/0.1/">   <foaf:person rdf:about="#me" foaf:name="toby inkster" /> </rdf:rdf>  <!-- example 4 --> <rdf:rdf     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:foaf="">   <rdf:description rdf:about="#me"     rdf:type="http://xmlns.com/foaf/0.1/person"     foaf:name="toby inkster" /> </rdf:rdf>  <!-- example 5 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">   <rdf:description rdf:id="me">     <rdf:type>       <rdf:description rdf:about="http://xmlns.com/foaf/0.1/person" />     </rdf:type>     <foaf:name>toby inkster</foaf:name>   </rdf:description> </rdf:rdf>  <!-- example 6 --> <foaf:person     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"     xmlns:foaf="http://xmlns.com/foaf/0.1/"     rdf:about="#me"     foaf:name="toby inkster" /> 

i list half dozen other variations too, i'll stop there. , rdf file contains 2 statements - i'm person; name "toby inkster" - op's data contains on 50,000 statements.

and xml serialization of rdf; there other serializations too.

if try handling xpath, you're end becoming lunatic locked away in tower somewhere, muttering in sleep triples; triples...

luckily, greg williams has taken mental health bullet you. rdf::trine , rdf::query not best rdf frameworks perl; they're amongst best in programming language.

here how op's task achieved using rdf::trine , rdf::query:

#!/usr/bin/env perl  use v5.12; use rdf::trine; use rdf::query;  $model = 'rdf::trine::model'->new(     'rdf::trine::store::dbi'->new(         'vo',         'dbi:sqlite:dbname=/tmp/vo.sqlite',         '',  # no username         '',  # no password     ), );  'rdf::trine::parser::rdfxml'->new->parse_url_into_model(     'http://svn.code.sf.net/p/vaccineontology/code/trunk/src/ontology/vo.owl',     $model, ) unless $model->size > 0;  $query = rdf::query->new(<<'sparql'); prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select ?super_label ?sub_label {     ?sub rdfs:subclassof ?super .     ?sub rdfs:label ?sub_label .     ?super rdfs:label ?super_label . } limit 5 sparql  print $query->execute($model)->as_string; 

sample output:

+----------------------------+----------------------------------+ | super_label                | sub_label                        | +----------------------------+----------------------------------+ | "aves vaccine"             | "ducks vaccine"                  | | "route of administration"  | "intravaginal route"             | | "shigella gene"            | "aroa shigella"             | | "papillomavirus vaccine"   | "bovine papillomavirus vaccine"  | | "virus protein"            | "feline leukemia virus protein"  | +----------------------------+----------------------------------+ 

update: here's sparql query can plugged script above retrieve data wanted:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix obo:  <http://purl.obolibrary.org/obo/> select ?subclass ?label {     ?subclass         rdfs:subclassof obo:vo_0000001 ;         rdfs:label ?label . } 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -