Python Threading (Part 1)

What is Threading / Why You Should Care

In most languages, you operate in a pretty linear fashion - do x, then do y, then z, etc. Sometimes, there are cases where this might not be the most efficient -- imagine you want to crawl a bunch of websites or something and you don't care what order they are done in and the tasks operate most independently of eachother. In this case, you might want to consider using threads and spawning off child "threads" for each individual thing you want to do.

            +
            |
            |+------> +->
            |       +    +
       main |       |    |
     thread |       |    |child threads
            |       |    |
            |       |    |
            |       |    |
            |<----+ v <--+
            |
            v

Personally, I usually end up using threading when I need to do:

Example / Technique 1

This is probably the most common example you see of people using the threading module. Basically, define a function and then call that function in some kind of loop using threading.Thread.

import threading

def seagull_sound():
  print 'caw!'

for i in range(0, 10):
  thread = threading.Thread(target=seagull_sound)
  thread.start()

When I run this:

Toms-MacBook-Air-2:Desktop tom$ python caw.py
caw!
 caw!
 caw!
caw!
 caw!
caw!
caw!
caw!
caw!
 caw!

Note that it acts kind of weird with extra spaces - I'm pretty sure this is due to my terminal being confused by output coming from 10 streams simultanesouly.

So when we ran this, it created 10 threads and basically all off them will output 'caw!' in (relative) concurrency. Technically, I suppose this isn't true concurrency because of python's locking system and you'd need the multiprocessing module but I don't really want to get lost in the weeds here.

Example / Technique 2

I favor this implementation of threading over the one above. Instead of creating a thread object over some functions, you can just create a class which inherits from thread.Thread and reimplements the run() function:

import threading

class SeaGull(threading.Thread):
  def __init__(self):
    ''' re-implement the constructor '''
    threading.Thread.__init__(self)

  def run(self):
    ''' make the sound of a seagull '''
    print 'caw!'

for i in range(0, 10):
  SeaGull().start()  # call threading's start() method

Then, when I run this:

Toms-MacBook-Air-2:Desktop tom$ python caw2.py
caw!caw!

caw!
caw!
 caw!
caw!
 caw!
caw!
caw!
caw!

We can see the output is basically the same - it just attempts to hammer out all the print statements at the same time. I prefer the latter of the two options, having your object exist as an extension of the thread module is a slick implementation.

Passing Data to your Thread

The above examples using seagulls are pretty dumb. Suppose you want to output the seagull sound and also print a number for which thread is making the sound. We can make a small modification to our code to support this:

import threading

class SeaGull(threading.Thread):
  def __init__(self, thread_no):
    ''' re-implement the constructor '''
    threading.Thread.__init__(self)
    self._thread_no = thread_no

  def run(self):
    ''' make the sound of a seagull '''
    print '[%d] caw!' % self._thread_no

for i in range(0, 10):
  SeaGull(i).start() # note that we're passing in i now

And when we run this:

Toms-MacBook-Air-2:Desktop tom$ python caw2.py
[0] caw!
 [1] caw!
[2] caw!
 [3] caw!
[4] caw!
 [5] caw!
[6] caw!
[7] caw!
[8] caw!
[9] caw!

What you can't/shouldn't do

I put this section last because you're probably not going to read it and instead make some mistakes yourself. That's cool too. In general, I find the below items to be "code smells" for when you are mis-using threading or potentially doing something stupid:

comments powered by Disqus