python - Merge a lot of DataFrames together, without loop and not using concat -
i have >1000 dataframes, each have >20k rows , several columns, need merge common column, idea can illustrated this:
data1=pd.dataframe({'name':['a','c','e'], 'value':[1,3,4]}) data2=pd.dataframe({'name':['a','d','e'], 'value':[3,3,4]}) data3=pd.dataframe({'name':['d','e','f'], 'value':[1,3,5]}) data4=pd.dataframe({'name':['d','f','g'], 'value':[0,3,4]}) #some or them may have more or less columns others: #data5=pd.dataframe({'name':['d','f','g'], 'value':[0,3,4], 'score':[1,3,4]}) final_data=data1 i, v in enumerate([data2, data3, data4]): if i==0: final_data=pd.merge(final_data, v, how='outer', left_on='name', right_on='name', suffixes=('_0', '_%s'%(i+1))) #in real case right_on may = columns other 'name' #dependents on dataframe, requirement can #ignored in minimal example. else: final_data=pd.merge(final_data, v, how='outer', left_on='name', right_on='name', suffixes=('', '_%s'%(i+1)))
result:
name value_0 value_1 value value_3 0 1 3 nan nan 1 c 3 nan nan nan 2 e 4 4 3 nan 3 d nan 3 1 0 4 f nan nan 5 3 5 g nan nan nan 4 [6 rows x 5 columns]
it works, anyway can done without loop?
also, why column name of second last column not value_2
?
p.s. know in minimal example, result can achieved by:
pd.concat([item.set_index('name') item in [data1, data2, data3, data4]], axis=1)
but in real case due way how dataframes constructed , information stored in index columns, not ideal solution without additional tricks. so, let's not consider route.
does make sense merge it, then? what's wrong panel?
> data = [data1, data2, data3, data4] > p = pd.panel(dict(zip(map(str, range(len(data))), data))) > p.to_frame().t major 0 1 2 minor name value name value name value 0 1 c 3 e 4 1 3 d 3 e 4 2 d 1 e 3 f 5 3 d 0 f 3 g 4 # , kicks > p.transpose(2, 0, 1).to_frame().reset_index().pivot_table(values='value', rows='name', cols='major') major 0 1 2 3 name 1 3 nan nan c 3 nan nan nan d nan 3 1 0 e 4 4 3 nan f nan nan 5 3 g nan nan nan 4
Comments
Post a Comment