Stack Overflow Asked by 119631 on December 23, 2021
For this question, I refer to the example in Python docs discussing the "use of the SharedMemory
class with NumPy
arrays, accessing the same numpy.ndarray
from two distinct Python shells".
A major change that I’d like to implement is manipulate array of class objects rather than integer values as I demonstrate below.
import numpy as np
from multiprocessing import shared_memory
# a simplistic class example
class A():
def __init__(self, x):
self.x = x
# numpy array of class objects
a = np.array([A(1), A(2), A(3)])
# create a shared memory instance
shm = shared_memory.SharedMemory(create=True, size=a.nbytes, name='psm_test0')
# numpy array backed by shared memory
b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
# copy the original data into shared memory
b[:] = a[:]
print(b)
# array([<__main__.Foo object at 0x7fac56cd1190>,
# <__main__.Foo object at 0x7fac56cd1970>,
# <__main__.Foo object at 0x7fac56cd19a0>], dtype=object)
Now, in a different shell, we attach to the shared memory space and try to manipulate the contents of the array.
import numpy as np
from multiprocessing import shared_memory
# attach to the existing shared space
existing_shm = shared_memory.SharedMemory(name='psm_test0')
c = np.ndarray((3,), dtype=object, buffer=existing_shm.buf)
Even before we are able to manipulate c
, printing it will result in a segmentation fault. Indeed I can not expect to observe a behaviour that has not been written into the module, so my question is what can I do to work with a shared array of objects?
I’m currently pickling the list but protected read/writes add a fair bit of overhead. I’ve also tried using Namespace
, which was quite slow because indexed writes are not allowed. Another idea could be to use share Ctypes Structure in a ShareableList
but I wouldn’t know where to start with that.
In addition there is also a design aspect: it appears that there is an open bug in shared_memory
that may affect my implementation wherein I have several processes working on different elements of the array.
Is there a more scalable way of sharing a large list of objects between several processes so that at any given time all running processes interact with a unique object/element in the list?
UPDATE: At this point, I will also accept partial answers that talk about whether this can be achieved with Python at all.
So, I did a bit of research (Shared Memory Objects in Multiprocessing) and came up with a few ideas:
Serialize the objects, then save them as byte strings to a numpy array. Problematic here is that
One one needs to pass the data type from the creator of 'psm_test0'
to any consumer of 'psm_test0'
. This could be done with another shared memory though.
pickle
and unpickle
is essentailly like deepcopy
, i.e. it actually copies the underlying data.
The code for the 'main' process reads:
import pickle
from multiprocessing import shared_memory
import numpy as np
# a simplistic class example
class A():
def __init__(self, x):
self.x = x
def pickle(self):
return pickle.dumps(self)
@classmethod
def unpickle(self, bts):
return pickle.loads(bts)
if __name__ == '__main__':
# Test pickling procedure
a = A(1)
print(A.unpickle(a.pickle()).x)
# >>> 1
# numpy array of byte strings
a_arr = np.array([A(1).pickle(), A(2).pickle(), A('This is a really long test string which should exceed 42 bytes').pickle()])
# create a shared memory instance
shm = shared_memory.SharedMemory(
create=True,
size=a_arr.nbytes,
name='psm_test0'
)
# numpy array backed by shared memory
b_arr = np.ndarray(a_arr.shape, dtype=a_arr.dtype, buffer=shm.buf)
# copy the original data into shared memory
b_arr[:] = a_arr[:]
print(b_arr.dtype)
# 'S105'
and for the consumer
import numpy as np
from multiprocessing import shared_memory
from test import A
# attach to the existing shared space
existing_shm = shared_memory.SharedMemory(name='psm_test0')
c = np.ndarray((3,), dtype='S105', buffer=existing_shm.buf)
# Test data transfer
arr = [a.x for a in list(map(A.unpickle, c))]
print(arr)
# [1, 2, ...]
I'd say you have a few ways of going forward:
Stay with simple data types.
Implement something using the C api, but I can't really help you there.
Use Rust
Use Mangers. You maybe loose out on some performance (I'd like to see a real benchmark though), but You can get a relatively safe and simple interface for shared objects.
Use Redis, which also has Python bindings...
Answered by AlexNe on December 23, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP