python - analyze text file in parallel with mpi4py -
i have input tab separated text file:
0 .4 1 .9 2 .2 3 .12 4 .55 5 .98
i analyze in plain python as:
lines = open("songs.tsv").readlines() def extract_hotness(line): return float(line.split()[1]) songs_hotness =map(extract_hotness, lines) max_hotness = max(songs_hotness)
how perform same operation in parallel using mpi4py
? started implementing scatter
, won't work straight away because scatter
needs list elements same length number of nodes.
processing text file in parallel difficult. split file? reading parallel file system? might consider mpi-io if have large enough input file. if go route, these answers, provided in c context, describe challenges still hold in mpi4py: https://stackoverflow.com/a/31726730/1024740 , https://stackoverflow.com/a/12942718/1024740
another approach not scatter data read in on rank 0 , broadcast else. approach requires enough memory stage input data @ once, or master-worker scheme data read in 1 shot.
Comments
Post a Comment