regex - A simple xml parser specific to R -


i've read questions why never use regex on {ht,x}ml, such 1 -- regex indent xml file , thought i'd post function wrote absolutely nothing indent xml lines based on levels of subordination.

to meet guidelines of so, i'll jeopardy - ize solution :-) , so--

what go wrong when start using function format xml files unnamed bad person sent sans indents?

xmlit <- function(x,indchar = '\t'){ # require x vector of char strings, 1 # per line of xml file.   # add indent every line below 1 starting "<[!/]" , # remove indent every line below "</"   indit <-'' y<-vector('character',length(x)) for(j in 1:length(x) ) { # first add whatever indent we're     y[j] <- paste(indit,x[j],collapse='',sep='')     # check openers: '<' not '</' or '/>'   if( grepl('<[^/?!]' ,x[j]) & !grepl('/>', x[j]) & !grepl('</',x[j]) ) {             indit<-paste(indit,indchar,collapse='',sep='')   } else {    # check closers: '</'      if( grepl('<[/]' ,x[j]) & !grepl('<[^/?!]',x[j])  ) { # move existing line out 1 indent         y[j]<- substr(y[j],2,1000)         indit<-substr(indit,2,1000)     } } } # note i'm depending on every level have matching closer, # , in particular last line closer. return(invisible(y)) } 

there assumption opening tag must first thing on line. if not, there problems:

> cat(xmlit(c("<begin>","<foo/><begin>","</begin>","</begin>")), sep="\n") <begin>         <foo/><begin> </begin> /begin> 

for subset of xml enough assumptions (additional) structure, regular expressions can work. if assumptions violated, well, that's why there parsers.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -