Stack Overflow Asked on December 2, 2021
I have three arrays, such that:
Data_Arr = np.array([1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5])
ID_Arr = np.array([1, 2, 3, 4, 5])
Value_Arr = np.array([0.1, 0.6, 0.3, 0.8, 0.2])
I want to create a new array which has the dimensions of Data, but where each element is from Values, using the index position in ID. So far I have this in a loop, but its very slow as my Data array is very large:
out = np.zeros_like(Data_Arr, dtype=np.float)
for i in range(len(Data_Arr)):
out[i] = Values_Arr[ID_Arr==Data_Arr[I]]
is there a more pythonic way of doing this and avoiding this loop (doesn’t have to use numpy)?
Actual data looks like:
Data_Arr = [ 852116 852116 852116 ... 1001816 1001816 1001816]
ID_Arr = [ 852116 852117 852118 ... 1001814 1001815 1001816]
Value_Arr = [1.5547194 1.5547196 1.5547197 ... 1.5536859 1.5536858 1.5536857]
shapes are:
Data_Arr = (4021165,)
ID_Arr = (149701,)
Value_Arr = (149701,)
Based off approaches from this post
, here are the adaptations.
# https://stackoverflow.com/a/62658135/ @Divakar
a,b,invalid_specifier = ID_Arr, Data_Arr, 0
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
# Remove out of bounds indices as they wont be matches
idx[idx==len(a)] = 0
# Get traced back indices corresponding to original version of a
idx0 = sidx[idx]
# Mask out invalid ones with invalid_specifier and return
out = np.where(a[idx0]==b, Values_Arr[idx0], invalid_specifier)
Lookup based -
# https://stackoverflow.com/a/62658135/ @Divakar
def find_indices_lookup(a,b,invalid_specifier=-1):
# Setup array where we will assign ranged numbers
N = max(a.max(), b.max())+1
lookup = np.full(N, invalid_specifier)
# We index into lookup with b to trace back the positions. Non matching ones
# would have invalid_specifier values as wount had been indexed by ranged ones
lookup[a] = np.arange(len(a))
indices = lookup[b]
return indices
idx = find_indices_lookup(ID_Arr, Data_Arr)
out = np.where(idx!=-1, Values_Arr[idx], 0)
Faster/simpler variant
And a simplified and hopefully faster version would be a direct lookup of values -
a,b,invalid_specifier = ID_Arr, Data_Arr, 0
N = max(a.max(), b.max())+1
lookup = np.zeros(N, dtype=Values_Arr.dtype)
lookup[ID_Arr] = Values_Arr
out = lookup[Data_Arr]
If all values from ID_Arr
are guaranteed to be in Data_Arr
, we can use np.empty
in place of np.zeros
for the array-assignment and thus gain further perf. boost.
Answered by Divakar on December 2, 2021
Since ID_Arr
is sorted, we can directly use np.searchsorted
and index Value_Arr
with the result:
Value_Arr[np.searchsorted(ID_Arr, Data_Arr)]
array([0.1, 0.1, 0.1, 0.6, 0.6, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.8, 0.8,
0.2, 0.2, 0.2])
If ID_Arr
isn't sorted (note: in case there may be out of bounds indices, we should remove them, see divakar's answer):
s_ind = ID_Arr.argsort()
ss = np.searchsorted(ID_Arr, Data_Arr, sorter=s_ind)
out = Value_Arr[s_ind[ss]]
Checking with the arrays suggested by alaniwi:
Data_Arr = np.array([1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5])
ID_Arr = array([2, 1, 3, 4, 5])
Value_Arr = np.array([0.6, 0.1, 0.3, 0.8, 0.2])
out_op = np.zeros_like(Data_Arr, dtype=np.float)
for i in range(len(Data_Arr)):
out_op[i] = Value_Arr[ID_Arr==Data_Arr[i]]
s_ind = ID_Arr.argsort()
ss = np.searchsorted(ID_Arr, Data_Arr, sorter=s_ind)
out_answer = Value_Arr[s_ind[ss]]
np.array_equal(out_op, out_answer)
#True
Answered by yatu on December 2, 2021
Looks like you want:
out = Value_Arr[ID_Arr[Data_Arr - 1] - 1]
Note that the - 1
are due to the fact that Python/Numpy is 0
-based index.
Answered by Quang Hoang on December 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP