python - Confused about regular expression -
in book programming collective intelligence there regular expression,
splitter = re.compile('\\w*')
from context looks matches non-alphanumeric character. confused because seems matches backslash, 1 or more w's. match?
your regex equivalent \w*
. matches 0 or more non-alphanumeric characters.
actually, using python string literal, instead of raw string. in python string literal, match literal backslash, need escape backslash - \\
, backslash has special meaning there. , regex, need escape both backslashes, make - \\\\
.
so, match \
followed 0 or more w
, need \\\\w*
in string literal. can simplify using raw string. \\
match literal \
. that's because, backslashes not handled in special way when used inside raw string.
the below example understand this:
>>> s = "\wwww$$$$" # without raw string >>> splitter = re.compile('\\w*') # match non-alphanumeric characters >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', ''] >>> splitter = re.compile('\\\\w*') # match `\` followed 0 or more `w` >>> re.findall(splitter, s) ['\\wwww'] # raw string >>> splitter = re.compile(r'\w*') # same first one. need single `\` >>> re.findall(splitter, s) ['\\', '', '', '', '', '$$$$', ''] >>> splitter = re.compile(r'\\w*') # same 2nd. 2 `\\` needed. >>> re.findall(splitter, s) ['\\wwww']
Comments
Post a Comment