pandas - How to use the merge function to merge the common values in two DataFrames? -


i have 2 dataframes, want merge on column "id"

df1 :

id   reputation  1     10  3     5  4     40 

df2 :

id   reputation  1     10  2     5  3     5  6     55 

i want output be:

dfoutput :

id    reputation 1       10 2       5 3       5 4       40 6       55 

i wish keep values both df s merge duplicate values one. know have use merge() function don't know arguments pass.

you concatenate dataframes, groupby id, , aggregate taking first item in each group.

in [62]: pd.concat([df1,df2]).groupby('id').first() out[62]:      reputation id             1           10 2            5 3            5 4           40 6           55  [5 rows x 1 columns] 

or, preserve id column rather index, use as_index=false:

in [68]: pd.concat([df1,df2]).groupby('id', as_index=false).first() out[68]:     id  reputation 0   1          10 1   2           5 2   3           5 3   4          40 4   6          55  [5 rows x 2 columns] 

karld. suggests excellent idea; use combine_first:

in [99]: df1.set_index('id').combine_first(df2.set_index('id')).reset_index() out[99]:     id  reputation 0   1          10 1   2           5 2   3           5 3   4          40 4   6          55  [5 rows x 2 columns] 

this solution appears faster large dataframes:

import pandas pd import numpy np  n = 10**6 df1 = pd.dataframe({'id':np.arange(n), 'reputation': np.random.randint(5, size=n)}) df2 = pd.dataframe({'id':np.arange(10, 10+n), 'reputation':np.random.randint(5, size=n)}) 

in [95]: %timeit df1.set_index('id').combine_first(df2.set_index('id')).reset_index() 10 loops, best of 3: 174 ms per loop  in [96]: %timeit pd.concat([df1,df2]).groupby('id', as_index=false).first() 1 loops, best of 3: 221 ms per loop 

Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -