python - How to create multiple copies of rows by multiplication in pandas? -

- March 15, 2011

i have file several columns, 1 of attribute called count. indicates there multiple counts of same score.

i want multiply number of rows representative of number found in column count.

i tried using dataframe.mul multiplied count values , returned nan string values.

what function should call accomplish goal? e.g.

"survey"   list    question    description option  count c3 2o15 survey      rate hotel & accomodations      fair    2

should transformed to:

"survey"   list    question    description option  count c3 2o15 survey      rate hotel & accomodations      fair    2 c3 2o15 survey      rate hotel & accomodations      fair    2

this flawed previous attempt

import pandas pd   data = pd.read_excel('/users/dheepan.ramanan/documents/c3data/structureddata.xlsx') main = pd.dataframe(data) multiplier = pd.dataframe(data['count']) main.mul(multiplier)   count description list option question ï»¿"survey" 0      121         nan  nan    nan      nan         nan 1      100         nan  nan    nan      nan         nan

sorry if simple question, i'm new pandas.

thanks!

i don't think pandas tailored such applications (more dropping duplicate rather creating them).

edit: saw trying achieve (counting number of 'options') after answered, should try df.groupby(by='option').sum()['count']

anyways, here works:

in [1]: # create sample data play df = pd.dataframe(np.random.randn(3,2), columns=['a', 'b']) df['count'] = np.random.random_integers(1,3,3) df.index = ['c'+str(x) x in df.index] df = np.round(df,1) df  out[1]:            b      count c0  1.6     2.0     3 c1  0.7     1.6     2 c2  0.9     -0.4    1  in [2]: # function duplicate rows def duplicate_rows(df, countcol):     _, row in df.iterrows():         in range(int(row[countcol])-1):             # append row @ end of dataframe             df = df.append(row)      # remove countcol (could drop that...)     notcountcols = [x x in df.columns if x != countcol]     df = df[notcountcols]     # optional: sort index     df.sort_index(inplace=true)     return df   in [3]:   df_dup = duplicate_rows(df, 'count')  df_dup   out[3]:            b c0  1.6     2.0 c0  1.6     2.0 c0  1.6     2.0 c1  0.7     1.6 c1  0.7     1.6 c2  0.9     -0.4

Search This Blog

Ruby Co

python - How to create multiple copies of rows by multiplication in pandas? -

Comments

Post a Comment

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

YouTubePlayerFragment cannot be cast to android.support.v4.app.Fragment -