python - Confused about regular expression -
in book programming collective intelligence there regular expression,
splitter = re.compile('\\w*') from context looks matches non-alphanumeric character. confused because seems matches backslash, 1 or more w's. match?
your regex equivalent \w*. matches 0 or more non-alphanumeric characters.
actually, using python string literal, instead of raw string. in python string literal, match literal backslash, need escape backslash - \\, backslash has special meaning there. , regex, need escape both backslashes, make - \\\\.
so, match \ followed 0 or more w, need \\\\w* in string literal. can simplify using raw string. \\ match literal \. that's because, backslashes not handled in special way when used inside raw string.
the below example understand this:
>>> s = "\wwww$$$$" # without raw string >>> splitter = re.compile('\\w*') # match non-alphanumeric characters >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', ''] >>> splitter = re.compile('\\\\w*') # match `\` followed 0 or more `w` >>> re.findall(splitter, s) ['\\wwww'] # raw string >>> splitter = re.compile(r'\w*') # same first one. need single `\` >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', ''] >>> splitter = re.compile(r'\\w*') # same 2nd. 2 `\\` needed. >>> re.findall(splitter, s) ['\\wwww']
Comments
Post a Comment