Python extract value beautifulsoup regex -
i have below through regex , beautifulsoup. need extract uid value e.g 5968723334.
[u'/home.html', u'browse_settings.html', u'browse.html?', u'test.html?uid=5415292833', u'test.html?uid=5968723334', u'test.html?uid=5968723334', u'test.html?uid=5453943714', u'test.html?uid=5453943714', u'test.html?uid=6740871094', u'test.html?uid=6740871094', u'test.html?uid=5991868792', u'test.html?uid=5991868792', u'test.html?uid=25072413', u'test.html?uid=25072413', u'test.html?uid=6739965683', u'test.html?uid=6739965683', u'test.html?uid=7272910004', u'test.html?uid=7272910004', u'test.html?uid=13179298', u'test.html?uid=13179298', u'test.html?uid=5392816266', u'test.html?uid=5392816266', u'test.html?uid=5992588819', u'test.html?uid=5992588819', u'test.html?uid=6727114420', u'test.html?uid=6727114420', u'test.html?uid=7263648884', u'test.html?uid=7263648884', u'test.html?uid=5447240210', u'test.html?uid=5447240210', u'test.html?uid=5460515002', u'test.html?uid=5460515002', u'test.html?uid=5400731231', u'test.html?uid=5400731231', u'browse.html?params=_f_18_24_gb_0___grid_1', u'/home.html?t=1374068507', u'/account_info.html', u'http://www.example.com/browse.html?params=_f_18_24_gb_0___grid_0', u'http://www.example.com/contact.html', u'/logout.html', u'#top', u'/terms_of_service.html', u'http://safety.example.com']
i’ve managed extract 1 'uid' so, i'd extract uid's:
>>> m = re.search("uid=(\d*)", soup.contents[0]) >>> print m <_sre.sre_match object @ 0x211b210> >>> print m.group(1) 5442562712
please help!
you can loop through list , apply regular expression each:
uid = re.compile(r"uid=(\d*)") uids = [match.group(1) match in filter(none, map(uid.search, list_of_urls))]
the above compact version of:
uid = re.compile(r"uid=(\d*)") uids = [] url in list_of_urls: match = uid.search(url) if match not none: uids.append(match.group(1))
the code takes account of urls not contain uid number.
demo:
>>> import re >>> list_of_urls = [u'/home.html', u'browse_settings.html', u'browse.html?', u'test.html?uid=5415292833', u'test.html?uid=5968723334', u'test.html?uid=5968723334', u'test.html?uid=5453943714', u'test.html?uid=5453943714', u'test.html?uid=6740871094', u'test.html?uid=6740871094', u'test.html?uid=5991868792', u'test.html?uid=5991868792', u'test.html?uid=25072413', u'test.html?uid=25072413', u'test.html?uid=6739965683', u'test.html?uid=6739965683', u'test.html?uid=7272910004', u'test.html?uid=7272910004', u'test.html?uid=13179298', u'test.html?uid=13179298', u'test.html?uid=5392816266', u'test.html?uid=5392816266', u'test.html?uid=5992588819', u'test.html?uid=5992588819', u'test.html?uid=6727114420', u'test.html?uid=6727114420', u'test.html?uid=7263648884', u'test.html?uid=7263648884', u'test.html?uid=5447240210', u'test.html?uid=5447240210', u'test.html?uid=5460515002', u'test.html?uid=5460515002', u'test.html?uid=5400731231', u'test.html?uid=5400731231', u'browse.html?params=_f_18_24_gb_0___grid_1', u'/home.html?t=1374068507', u'/account_info.html', u'http://www.example.com/browse.html?params=_f_18_24_gb_0___grid_0', u'http://www.example.com/contact.html', u'/logout.html', u'#top', u'/terms_of_service.html', u'http://safety.example.com'] >>> uid = re.compile(r"uid=(\d*)") >>> [match.group(1) match in filter(none, map(uid.search, list_of_urls))] [u'5415292833', u'5968723334', u'5968723334', u'5453943714', u'5453943714', u'6740871094', u'6740871094', u'5991868792', u'5991868792', u'25072413', u'25072413', u'6739965683', u'6739965683', u'7272910004', u'7272910004', u'13179298', u'13179298', u'5392816266', u'5392816266', u'5992588819', u'5992588819', u'6727114420', u'6727114420', u'7263648884', u'7263648884', u'5447240210', u'5447240210', u'5460515002', u'5460515002', u'5400731231', u'5400731231']
Comments
Post a Comment