regex - A simple xml parser specific to R -
i've read questions why never use regex on {ht,x}ml, such 1 -- regex indent xml file , thought i'd post function wrote absolutely nothing indent xml lines based on levels of subordination.
to meet guidelines of so, i'll jeopardy - ize solution :-) , so--
what go wrong when start using function format xml files unnamed bad person sent sans indents?
xmlit <- function(x,indchar = '\t'){ # require x vector of char strings, 1 # per line of xml file. # add indent every line below 1 starting "<[!/]" , # remove indent every line below "</" indit <-'' y<-vector('character',length(x)) for(j in 1:length(x) ) { # first add whatever indent we're y[j] <- paste(indit,x[j],collapse='',sep='') # check openers: '<' not '</' or '/>' if( grepl('<[^/?!]' ,x[j]) & !grepl('/>', x[j]) & !grepl('</',x[j]) ) { indit<-paste(indit,indchar,collapse='',sep='') } else { # check closers: '</' if( grepl('<[/]' ,x[j]) & !grepl('<[^/?!]',x[j]) ) { # move existing line out 1 indent y[j]<- substr(y[j],2,1000) indit<-substr(indit,2,1000) } } } # note i'm depending on every level have matching closer, # , in particular last line closer. return(invisible(y)) }
there assumption opening tag must first thing on line. if not, there problems:
> cat(xmlit(c("<begin>","<foo/><begin>","</begin>","</begin>")), sep="\n") <begin> <foo/><begin> </begin> /begin>
for subset of xml enough assumptions (additional) structure, regular expressions can work. if assumptions violated, well, that's why there parsers.
Comments
Post a Comment