Python multiprocessing

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 42.1 : Software and hardware
42.2 : Process
42.2.1 : Arguments
42.2.2 : Process details
42.3 : Pools and mapping
42.4 : Shared data
42.4.1 : Pipes
42.4.2 : Queues
Back to Table of Contents

42 Python multiprocessing

Python has a multiprocessing toolbox. This is a parallel processing library that relies on subprocesses, rather than threads.

42.1 Software and hardware

crumb trail: > multiprocessing > Software and hardware

// pool.py
nprocs = mp.cpu_count()
print(f"I detect {nprocs} cores")

42.2 Process

crumb trail: > multiprocessing > Process

A process is an object that will execute a python function:

// quicksort.py
import multiprocessing as mp
import random
import os

def quicksort( numbers ) :
    if len(numbers)==1:
        return numbers
    else:
        median = numbers[0]
        left  = [ i for i in numbers if i<median ]
        right = [ i for i in numbers if i>=median ]
        with mp.Pool(2) as pool:
            [sortleft,sortright] = pool.map( quicksort,[left,right] )
        return sortleft.append( sortright )

if __name__ == '__main__':
    numbers = [ random.randint(1,50) for i in range(32) ]
    process = mp.Process(target=quicksort,args=[numbers])
    process.start()
    process.join()

Creating a process does not start it: for that use the start function. Execution of the process is not guaranteed until you call the join function on it:

if __name__ == '__main__':
    for p in processes:
        p.start()
    for p in processes:
        p.join()

By making the start and join calls less regular than in a loop like this, arbitrarily complicated code can be produced.

42.2.1 Arguments

crumb trail: > multiprocessing > Process > Arguments

Arguments can be passed to the function of the process with the args keyword. This accepts a list (or tuple) of arguments, leading to a somewhat strange syntax for a single argument:

proc = Process(target=print_func, args=(name,))

42.2.2 Process details

crumb trail: > multiprocessing > Process > Process details

Note the test on __main__ : the processes started read the current file in order to execute the function specified. Without this clause, the import would first execute more process start calls, before getting to the function execution.

Processes have a name that you can retrieve as current_process().name . The default is Process-5 and such, but you can specify custom names:

Process(name="Your name here")

The target function of a process can get hold of that process with the current_process function.

Of course you can also query os.getpid()

but that does not offer any further possibilities.

def say_name(iproc):
    print(f"Process {os.getpid()} has name: {mp.current_process().name}")
if __name__ == '__main__':
    processes = [ mp.Process(target=say_name,name=f"proc{iproc}",args=[iproc])
                  for iproc in range(6) ]

42.3 Pools and mapping

crumb trail: > multiprocessing > Pools and mapping

Often you want a number of processes to do apply to a number of arguments, for instance in a parameter sweep . For this, create a Pool object, and apply the map method to it:

pool = mp.Pool( nprocs )
results = pool.map( print_value,range(1,2*nprocs) )

Note that this is also the easiest way to get return values from a process, which is not directly possible with a Process object. Other approaches are using a shared object, or an object in a Queue or Pipe object; see below.

42.4 Shared data

crumb trail: > multiprocessing > Shared data

The best way to deal with shared data is to create a Value or Array object, which come equipped with a lock for safe updating.

pi = mp.Value('d')
pi.value = 0

For instance, one could stochastically calculate $\pi$ by

  1. generating random points in $[0,1)^2$, and
  2. recording how many fall in the unit circle, after which
  3. $\pi$ is $4\times$ the ratio between points in the circle and the total number of points.

// pi.py
def calc_pi1(pi,n):
    for i in range(n):
        x = random.random()
        y = random.random()
        with pi.get_lock():
            if x*x+y*y<1:
                pi.value += 1.

Exercise

Do you see a way to improve the speed of this calculation?

42.4.1 Pipes

crumb trail: > multiprocessing > Shared data > Pipes

A pipe , object type Pipe , corresponds to what used to be called a channel in older parallel programming systems: a  FIFO object into which one process can place items, and from which another process can take them. However, a pipe is not associated with any particular pair: creating the pipe gives the entrace and exit from the pipe

q_entrance,q_exit = mp.Pipe()

And they can be passed to any process

producer1 = mp.Process(target=add_to_pipe,args=([1,q_entrance]))
producer2 = mp.Process(target=add_to_pipe,args=([2,q_entrance]))
printer = mp.Process(target=print_from_pipe,args=(q_exit,))

which can then can put and get items, using the send and recv commands.

// pipemulti.py
def add_to_pipe(v,q):
    for i in range(10):
        print(f"put {v}")
        q.send(v)
        time.sleep(1)
    q.send("END")
def print_from_pipe(q):
    ends = 0
    while True:
        v = q.recv()
        print(f"Got: {v}")
        if v=="END":
            ends += 1
        if ends==2:
            break
    print("pipe is empty")

42.4.2 Queues

crumb trail: > multiprocessing > Shared data > Queues

Back to Table of Contents