Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am doing a machine learning project in Python, so I have to do parallel predict function, which I'm using in my program.

from multiprocessing.dummy import Pool
from multiprocessing import cpu_count


def multi_predict(X, predict, *args, **kwargs):
    pool = Pool(cpu_count())
    results = pool.map(predict, X)
    pool.close()
    pool.join()
    return results

The problem is that all my CPUs loaded only on 20-40% (in sum it's 100%). I use multiprocessing.dummy because I have some problems with multiprocessing module in pickling function.

question from:https://stackoverflow.com/questions/26432411/multiprocessing-dummy-in-python-is-not-utilising-100-cpu

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
895 views
Welcome To Ask or Share your Answers For Others

1 Answer

When you use multiprocessing.dummy, you're using threads, not processes:

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you're going to need to address the pickling issue you're hitting with multiprocessing.Pool.

Note that multiprocessing.dummy might still be useful if the work you need to parallelize is IO bound, or utilizes a C-extension that releases the GIL. For pure Python code, however, you'll need multiprocessing.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...