# Efficiently copying values from one ndarray to another on unequal sized arrays

Stack Overflow Asked by DoubleDouble on July 22, 2020

I have two arrays of different sizes, but I am trying to overwrite some values within the first array with values from the second array on the matching "keys". My actual problem may have many, many rows, and I have already determined that this is currently bottle-necking my program.

edit: I failed to recognize that there may be duplicate values in a1, which should stay duplicated. I added one such example to the np.array examples.

example:

import numpy as np

# first two columns are 'keys', overwrite the 3rd column in a1 with the 3rd column from a2
# some values may be missing from a2. Those should keep the value in a1

a1 = np.array([[ 0.0,  2.0,  10.0 ],
[ 0.0,  2.0,  10.0 ],
[ 0.0,  3.0,  10.0 ],
[ 1.0,  3.0,  10.0 ],
[ 1.0, 13.0,  10.0 ],
[ 2.0,  2.0,  10.0 ],
[ 2.0,  5.0,  10.0 ]])

a2 = np.array([[ 0.0,  2.0,  0.0   ],
[ 0.0,  3.0,  0.713 ],
[ 1.0,  3.0,  0.713 ],
[ 1.0, 13.0,  1.0   ],
[ 2.0,  2.0,  0.0   ]])

# wanted result:
np.array([[ 0.0,  2.0,  0.0   ],
[ 0.0,  2.0,  0.0   ],
[ 0.0,  3.0,  0.713 ],
[ 1.0,  3.0,  0.713 ],
[ 1.0, 13.0,  1.0   ],
[ 2.0,  2.0,  0.0   ],
[ 2.0,  5.0,  10.0   ]])


When I do this brute force, I would simply take each row in a2 and loop through each row in a1 to replace values on matches, but is there a way to do this that runs more efficiently? Some way to vectorize the operation on at least one of the loops? My actual case involves many rows in both arrays and this takes a looooong time.

If column three is getting updated and you want to use pandas:

import numpy as np
import pandas as pd

a1 = np.array([[ 0.0,  2.0,  10.0 ],
[ 0.0,  2.0,  10.0 ],
[ 0.0,  3.0,  10.0 ],
[ 1.0,  3.0,  10.0 ],
[ 1.0, 13.0,  10.0 ],
[ 2.0,  2.0,  10.0 ],
[ 2.0,  5.0,  10.0 ]])

a2 = np.array([[ 0.0,  2.0,  0.0   ],
[ 0.0,  3.0,  0.713 ],
[ 1.0,  3.0,  0.713 ],
[ 1.0, 13.0,  1.0   ],
[ 2.0,  2.0,  0.0   ]])

d1 = pd.DataFrame(a1)

d2 = pd.DataFrame(a2)

d3 = d2.set_index([0,1])[[2]].combine_first(d1.set_index([0,1])[[2]]).reset_index().to_numpy()
d3


Output:

array([[ 0.   ,  2.   ,  0.   ],
[ 0.   ,  2.   ,  0.   ],
[ 0.   ,  3.   ,  0.713],
[ 1.   ,  3.   ,  0.713],
[ 1.   , 13.   ,  1.   ],
[ 2.   ,  2.   ,  0.   ],
[ 2.   ,  5.   , 10.   ]])


Answered by Scott Boston on July 22, 2020

Concatenate a2 and a1 and leave only unique rows for first 2 columns.

a_all = np.r_[a2, a1]
a_all = a_all[np.unique(a_all[:, :2], axis=0, return_index=True)[1]]


Answered by V. Ayrat on July 22, 2020

The solution has 2 parts. First, you need to identify which keys in a1 aren't in a2, and then you need to figure out which row of a2 each row of a1 is associated with.

Here's my solution:

equiv = np.all(np.equal(a1[:,None,:2],a2[None,:,:2]),-1)
ind = np.argmax(equiv,0)



I start by broadcasting both arrays to conforming dimensions and computing the equivalence matrix that tells me for each row of a1 and a2 which are equal for both elements.

Then, it's easy to figure out which rows of a1 are not included in a2 and make a boolean mask from the previous result. We can also find the element number associated with each pair.

Finally, you associate every value of the last column of a1 that has a correspondence in a2 with the associated element in a2.

Answered by asimoneau on July 22, 2020

Would you consider other packages like Pandas?

import pandas as pd

d2 = pd.DataFrame(a2).set_index([0,1])
d1 = pd.DataFrame(a1).set_index([0,1])

d1.update(d2)
d1.reset_index().values


Output:

array([[ 0.   ,  2.   ,  0.   ],
[ 0.   ,  2.   ,  0.   ],
[ 0.   ,  3.   ,  0.713],
[ 1.   ,  3.   ,  0.713],
[ 1.   , 13.   ,  1.   ],
[ 2.   ,  2.   ,  0.   ],
[ 2.   ,  5.   , 10.   ]])


Answered by Quang Hoang on July 22, 2020