regex - A simple xml parser specific to R -


i've read questions why never use regex on {ht,x}ml, such 1 -- regex indent xml file , thought i'd post function wrote absolutely nothing indent xml lines based on levels of subordination.

to meet guidelines of so, i'll jeopardy - ize solution :-) , so--

what go wrong when start using function format xml files unnamed bad person sent sans indents?

xmlit <- function(x,indchar = '\t'){ # require x vector of char strings, 1 # per line of xml file.   # add indent every line below 1 starting "<[!/]" , # remove indent every line below "</"   indit <-'' y<-vector('character',length(x)) for(j in 1:length(x) ) { # first add whatever indent we're     y[j] <- paste(indit,x[j],collapse='',sep='')     # check openers: '<' not '</' or '/>'   if( grepl('<[^/?!]' ,x[j]) & !grepl('/>', x[j]) & !grepl('</',x[j]) ) {             indit<-paste(indit,indchar,collapse='',sep='')   } else {    # check closers: '</'      if( grepl('<[/]' ,x[j]) & !grepl('<[^/?!]',x[j])  ) { # move existing line out 1 indent         y[j]<- substr(y[j],2,1000)         indit<-substr(indit,2,1000)     } } } # note i'm depending on every level have matching closer, # , in particular last line closer. return(invisible(y)) } 

there assumption opening tag must first thing on line. if not, there problems:

> cat(xmlit(c("<begin>","<foo/><begin>","</begin>","</begin>")), sep="\n") <begin>         <foo/><begin> </begin> /begin> 

for subset of xml enough assumptions (additional) structure, regular expressions can work. if assumptions violated, well, that's why there parsers.


Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -