regex - Creating a keyword based search in python -
i have giant csv file close 6k entries , file looks this:
pdb id ndb id structure title citation title abstract 1et4 1et4 structure of solution structure research performed , haemoglobin mrna of mrna aptamer structure of mrna obtained aptamer.
my end goal display output given keyword so:
keyword: mrna pdb id ndb id structure title citation title abstract location of first hit struc/citation/abstract
what starting point me? also, have use called regex this?
disclaimer: part of research project, not school homework.
a pseudocode or template great me.
you parse csv file , create 2 data structures. both dictionaries.
one dictionary contain each line, keyed on pdb id
. other dictionary store sets of pdb id
s , keyed on keywords.
below example code because i'm ignoring headers. want parse csv properly...
from collections import defaultdict entries = {} keywords = defaultdict(set) open('my_csv.csv') f: line in f: entries[line.split()[0]] = line # keying on pdb id open('my_csv.csv') f: line in f: kw in line.split()[1:] keywords[kw].add(line.split()[0])
once have 2 data structures should trivial keyword in keywords dict, iterate on set, , print out each line relevant pdb id.
Comments
Post a Comment