Stack Overflow Asked on November 12, 2021
I am running multiple processes with Pool
import spacy
import multiprocessing
import logging
# global variable
nlp_bert = spacy.load("en_trf_bertbaseuncased_lg")
logging.basicConfig(level=logging.DEBUG)
def job_pool(data, job_number, job_to_do, groupby=None, split_col=None, **kwargs):
pool = multiprocessing.Pool(processes=job_number)
jobs = pool.map(job_to_do, data)
return jobs
def job(slice):
logging.debug('this shows')
w1 = nlp_bert('word')
w2 = nlp_bert('other')
logging.debug(w1.similarity(w2))
logging.debug("this doesn't")
job_pool([1, 2, 3, 4], 4, job)
The nlp_bert function does not return anything and there is no error. How can I find out what is going wrong? I have logging set to debug level already.
The function works outside of multiprocess – i.e. just writing it in a script and running the following.
import spacy
nlp_bert = spacy.load("en_trf_bertbaseuncased_lg")
w1 = nlp_bert('word')
w2 = nlp_bert('other')
print(w1.similarity(w2))
0.8381155446247196
I’m using:
It turns out this is a known issue with pytorch running multithreading in child processes, causing deadlocks.
https://github.com/explosion/spaCy/issues/4667
A workaround is to add the following:
import torch
torch.set_num_threads(1)
Answered by forgetso on November 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP