python - Confused about regular expression -


in book programming collective intelligence there regular expression,

splitter = re.compile('\\w*') 

from context looks matches non-alphanumeric character. confused because seems matches backslash, 1 or more w's. match?

your regex equivalent \w*. matches 0 or more non-alphanumeric characters.

actually, using python string literal, instead of raw string. in python string literal, match literal backslash, need escape backslash - \\, backslash has special meaning there. , regex, need escape both backslashes, make - \\\\.

so, match \ followed 0 or more w, need \\\\w* in string literal. can simplify using raw string. \\ match literal \. that's because, backslashes not handled in special way when used inside raw string.

the below example understand this:

>>> s = "\wwww$$$$"  # without raw string >>> splitter = re.compile('\\w*')   # match non-alphanumeric characters >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', '']  >>> splitter = re.compile('\\\\w*') # match `\` followed 0 or more `w` >>> re.findall(splitter, s) ['\\wwww']  # raw string >>> splitter = re.compile(r'\w*')   # same first one. need single `\` >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', '']  >>> splitter = re.compile(r'\\w*')  # same 2nd. 2 `\\` needed. >>> re.findall(splitter, s) ['\\wwww'] 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -