python - How to create multiple copies of rows by multiplication in pandas? -
i have file several columns, 1 of attribute called count. indicates there multiple counts of same score.
i want multiply number of rows representative of number found in column count.
i tried using dataframe.mul
multiplied count values , returned nan
string values.
what function should call accomplish goal? e.g.
"survey" list question description option count c3 2o15 survey rate hotel & accomodations fair 2
should transformed to:
"survey" list question description option count c3 2o15 survey rate hotel & accomodations fair 2 c3 2o15 survey rate hotel & accomodations fair 2
this flawed previous attempt
import pandas pd data = pd.read_excel('/users/dheepan.ramanan/documents/c3data/structureddata.xlsx') main = pd.dataframe(data) multiplier = pd.dataframe(data['count']) main.mul(multiplier) count description list option question "survey" 0 121 nan nan nan nan nan 1 100 nan nan nan nan nan
sorry if simple question, i'm new pandas.
thanks!
i don't think pandas tailored such applications (more dropping duplicate rather creating them).
edit: saw trying achieve (counting number of 'options') after answered, should try df.groupby(by='option').sum()['count']
anyways, here works:
in [1]: # create sample data play df = pd.dataframe(np.random.randn(3,2), columns=['a', 'b']) df['count'] = np.random.random_integers(1,3,3) df.index = ['c'+str(x) x in df.index] df = np.round(df,1) df out[1]: b count c0 1.6 2.0 3 c1 0.7 1.6 2 c2 0.9 -0.4 1 in [2]: # function duplicate rows def duplicate_rows(df, countcol): _, row in df.iterrows(): in range(int(row[countcol])-1): # append row @ end of dataframe df = df.append(row) # remove countcol (could drop that...) notcountcols = [x x in df.columns if x != countcol] df = df[notcountcols] # optional: sort index df.sort_index(inplace=true) return df in [3]: df_dup = duplicate_rows(df, 'count') df_dup out[3]: b c0 1.6 2.0 c0 1.6 2.0 c0 1.6 2.0 c1 0.7 1.6 c1 0.7 1.6 c2 0.9 -0.4
Comments
Post a Comment