csv - Unable to loop through dictionary to compare 2 strings and return max ratio in Python -
i have list of items stored in csv. trying compare item name csv list see if there match. load csv list dictionary pass function. each item on dictionary compared input item give matching ratio. want return item highest ratio , highest ratio must higher set max ratio.
example of item csv file
001 green apple 002 red apple 003 orange 004 mango
this have tried far
def fuzzy_token_set_matching(index_dict, str_for_comparison): matching_threshold = 70 #if try dict size here, it's 0 print(len(index_dict)) index, indexed_string in index_dict.items(): max_ratio = 0 #compare input name vs name in dictionary fuzz_matching_ratio = fuzz.token_sort_ratio(indexed string, str_for_comparison) if fuzz_matching_ratio > max_ratio: max_ratio = fuzz_matching_ratio if max_ratio > matching_threshold: return index, indexed_string else: return none input_file = 'index.csv' output_file = 'results.csv' #load index list dictionary open(input_file, mode = 'r') index_infile: index_reader = csv.reader(index_infile) index_dict = {rows[0]:rows[2] rows in index_reader} print(fuzzy_token_set_matching(index_dict, 'green apple')) >>> current results return: 0 >>> correct result: 001 green apple
for reason getting none every results if there exact match should return 100.
the issue have you're returning after first pass of loop, when there more items consider. here's relevant part of code:
for index, indexed_string in index_dict.items(): #... if max_ratio > matching_threshold: return index_index, title else: return none
you don't want else
clause run every time, if loop ends without match meets threshold. try instead:
for index, indexed_string in index_dict.items(): #... if max_ratio > matching_threshold: return index_index, title return none
you let function end without explicit return none
line, since that's default, i'd recommend keeping return
statement make clear intentional.
note return first match exceeds threshold, not best match. if want best match only, want save max index , move both parts of if
out of loop:
for index, indexed_string in index_dict.items(): #... if fuzz_matching_ratio > max_ratio: max_ratio = fuzz_matching_ratio max_index = index max_string = indexed_string if max_ratio > matching_threshold: return max_index, max_string else: return none
Comments
Post a Comment