Accessing RDF/XML/OWL file nodes using Perl -
i have rdf/xml data i'd parse , access node. looks this:
<!-- http://purl.obolibrary.org/obo/vo_0000185 --> <owl:class rdf:about="&obo;vo_0000185"> <rdfs:label>influenza virus gene</rdfs:label> <rdfs:subclassof rdf:resource="&obo;vo_0000156"/> <obo:iao_0000117>yh</obo:iao_0000117> </owl:class> <!-- http://purl.obolibrary.org/obo/vo_0000186 --> <owl:class rdf:about="&obo;vo_0000186"> <rdfs:label>rna vaccine</rdfs:label> <owl:equivalentclass> <owl:class> <owl:intersectionof rdf:parsetype="collection"> <rdf:description rdf:about="&obo;vo_0000001"/> <owl:restriction> <owl:onproperty rdf:resource="&obo;bfo_0000161"/> <owl:somevaluesfrom rdf:resource="&obo;vo_0000728"/> </owl:restriction> </owl:intersectionof> </owl:class> </owl:equivalentclass> <rdfs:subclassof rdf:resource="&obo;vo_0000001"/> <obo:iao_0000116>using rna may eliminate problem of having tailor vaccine each individual patient specific immunity. advantage of rna can used immunity types , can taken single cell. dna vaccines need produce rna prompts manufacture of proteins. however, rna vaccine eliminates step dna rna.</obo:iao_0000116> <obo:iao_0000115>a vaccine uses rna(s) derived pathogen organism.</obo:iao_0000115> <obo:iao_0000117>yh</obo:iao_0000117> </owl:class>
the complete rdf/xml file can found here.
what want do following:
- find chunk contains entry
<rdfs:subclassof rdf:resource="&obo;vo_0000001"/>
- access literal term defined
<rdfs:label>...</rdfs:label>
so in above example code go through second chunk , output: "rna vaccine".
i'm stuck following code. couldn't access node. what's right way it? solutions other using xml::libxml welcomed.
#!/usr/bin/perl -w use strict; use data::dumper; use carp; use file::basename; use xml::libxml 1.70; $filename = "vo.owl"; # obtained http://svn.code.sf.net/p/vaccineontology/code/trunk/src/ontology/vo.owl $parser = xml::libxml->new(); $doc = $parser->parse_file( $filename ); foreach $chunk ($doc->findnodes('/owl:class')) { ($label) = $chunk->findnodes('./rdfs:label'); ($subclass) = $chunk->findnodes('./rdfs:subclassof'); print $label->to_literal; print $subclass->to_literal; }
parsing rdf if xml folly. exact same data can appear in many different ways. example, of following rdf files carry same data. conforming rdf implementation must handle them identically...
<!-- example 1 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:description rdf:about="#me"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/person" /> <foaf:name>toby inkster</foaf:name> </rdf:description> </rdf:rdf> <!-- example 2 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:person rdf:about="#me"> <foaf:name>toby inkster</foaf:name> </foaf:person> </rdf:rdf> <!-- example 3 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:person rdf:about="#me" foaf:name="toby inkster" /> </rdf:rdf> <!-- example 4 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=""> <rdf:description rdf:about="#me" rdf:type="http://xmlns.com/foaf/0.1/person" foaf:name="toby inkster" /> </rdf:rdf> <!-- example 5 --> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:description rdf:id="me"> <rdf:type> <rdf:description rdf:about="http://xmlns.com/foaf/0.1/person" /> </rdf:type> <foaf:name>toby inkster</foaf:name> </rdf:description> </rdf:rdf> <!-- example 6 --> <foaf:person xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" rdf:about="#me" foaf:name="toby inkster" />
i list half dozen other variations too, i'll stop there. , rdf file contains 2 statements - i'm person; name "toby inkster" - op's data contains on 50,000 statements.
and xml serialization of rdf; there other serializations too.
if try handling xpath, you're end becoming lunatic locked away in tower somewhere, muttering in sleep triples; triples...
luckily, greg williams has taken mental health bullet you. rdf::trine , rdf::query not best rdf frameworks perl; they're amongst best in programming language.
here how op's task achieved using rdf::trine , rdf::query:
#!/usr/bin/env perl use v5.12; use rdf::trine; use rdf::query; $model = 'rdf::trine::model'->new( 'rdf::trine::store::dbi'->new( 'vo', 'dbi:sqlite:dbname=/tmp/vo.sqlite', '', # no username '', # no password ), ); 'rdf::trine::parser::rdfxml'->new->parse_url_into_model( 'http://svn.code.sf.net/p/vaccineontology/code/trunk/src/ontology/vo.owl', $model, ) unless $model->size > 0; $query = rdf::query->new(<<'sparql'); prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select ?super_label ?sub_label { ?sub rdfs:subclassof ?super . ?sub rdfs:label ?sub_label . ?super rdfs:label ?super_label . } limit 5 sparql print $query->execute($model)->as_string;
sample output:
+----------------------------+----------------------------------+ | super_label | sub_label | +----------------------------+----------------------------------+ | "aves vaccine" | "ducks vaccine" | | "route of administration" | "intravaginal route" | | "shigella gene" | "aroa shigella" | | "papillomavirus vaccine" | "bovine papillomavirus vaccine" | | "virus protein" | "feline leukemia virus protein" | +----------------------------+----------------------------------+
update: here's sparql query can plugged script above retrieve data wanted:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix obo: <http://purl.obolibrary.org/obo/> select ?subclass ?label { ?subclass rdfs:subclassof obo:vo_0000001 ; rdfs:label ?label . }
Comments
Post a Comment