csv - Unable to loop through dictionary to compare 2 strings and return max ratio in Python -


i have list of items stored in csv. trying compare item name csv list see if there match. load csv list dictionary pass function. each item on dictionary compared input item give matching ratio. want return item highest ratio , highest ratio must higher set max ratio.

example of item csv file

001 green apple 002 red apple 003 orange 004 mango 

this have tried far

def fuzzy_token_set_matching(index_dict, str_for_comparison):     matching_threshold = 70       #if try dict size here, it's 0     print(len(index_dict))           index, indexed_string in index_dict.items():                max_ratio = 0         #compare input name vs name in dictionary         fuzz_matching_ratio = fuzz.token_sort_ratio(indexed string, str_for_comparison)                 if fuzz_matching_ratio > max_ratio:             max_ratio = fuzz_matching_ratio                          if max_ratio > matching_threshold:                 return index, indexed_string                             else:                 return none  input_file = 'index.csv' output_file = 'results.csv'  #load index list dictionary open(input_file, mode = 'r') index_infile:     index_reader = csv.reader(index_infile)     index_dict = {rows[0]:rows[2] rows in index_reader}  print(fuzzy_token_set_matching(index_dict, 'green apple'))  >>> current results return: 0  >>> correct result: 001 green apple 

for reason getting none every results if there exact match should return 100.

the issue have you're returning after first pass of loop, when there more items consider. here's relevant part of code:

for index, indexed_string in index_dict.items():         #...         if max_ratio > matching_threshold:             return index_index, title                         else:             return none 

you don't want else clause run every time, if loop ends without match meets threshold. try instead:

for index, indexed_string in index_dict.items():         #...         if max_ratio > matching_threshold:             return index_index, title return none 

you let function end without explicit return none line, since that's default, i'd recommend keeping return statement make clear intentional.

note return first match exceeds threshold, not best match. if want best match only, want save max index , move both parts of if out of loop:

for index, indexed_string in index_dict.items():     #...            if fuzz_matching_ratio > max_ratio:         max_ratio = fuzz_matching_ratio         max_index = index         max_string = indexed_string  if max_ratio > matching_threshold:     return max_index, max_string           else:     return none 

Comments

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

python - Healpy: From Data to Healpix map -