Location: Chicago, Illinois
Send me $DOGE: DSnhpH9HXZZMQhj2TeWmFhiky8zUNuV9Fu Python Threading (Part 1) Abstract: I give some code examples of how to do magic with threading python ## What is Threading / Why You Should Care In most languages, you operate in a pretty linear fashion - do x, then do y, then z, etc. Sometimes, there are cases where this might not be the most efficient -- imagine you want to crawl a bunch of websites or something and you don't care what order they are done in and the tasks operate most independently of eachother. In this case, you might want to consider using threads and spawning off child "threads" for each individual thing you want to do.  + | |+------> +-> | + + main | | | thread | | |child threads | | | | | | | | | |<----+ v <--+ | v  Personally, I usually end up using threading when I need to do: • Crawling multiple websites on many domains - ie I want to scrape a bunch of sites that are not hosted in the same place. • Doing computation that can be done independently; usually, I end up using threading when I am doing de-duping or processes that do not need to work together. ## Example / Technique 1 This is probably the most common example you see of people using the threading module. Basically, define a function and then call that function in some kind of loop using threading.Thread. import threading def seagull_sound(): print 'caw!' for i in range(0, 10): thread = threading.Thread(target=seagull_sound) thread.start()  When I run this: Toms-MacBook-Air-2:Desktop tom$ python caw.py
caw!
caw!
caw!
caw!
caw!
caw!
caw!
caw!
caw!
caw!


Note that it acts kind of weird with extra spaces - I'm pretty sure this is due to my terminal being confused by output coming from 10 streams simultanesouly.

So when we ran this, it created 10 threads and basically all off them will output 'caw!' in (relative) concurrency. Technically, I suppose this isn't true concurrency because of python's locking system and you'd need the multiprocessing module but I don't really want to get lost in the weeds here.

## Example / Technique 2

I favor this implementation of threading over the one above. Instead of creating a thread object over some functions, you can just create a class which inherits from thread.Thread and reimplements the run() function:

import threading

def __init__(self):
''' re-implement the constructor '''

def run(self):
''' make the sound of a seagull '''
print 'caw!'

for i in range(0, 10):
SeaGull().start()  # call threading's start() method


Then, when I run this:

[0] caw!
[1] caw!
[2] caw!
[3] caw!
[4] caw!
[5] caw!
[6] caw!
[7] caw!
[8] caw!
[9] caw!


## What you can't/shouldn't do

I put this section last because you're probably not going to read it and instead make some mistakes yourself. That's cool too. In general, I find the below items to be "code smells" for when you are mis-using threading or potentially doing something stupid:

• You ask yourself - I want all my threads to talk to eachother. There are ways to have threads communicate but if you're doing a lot of it, it might be easier to solve this problem in another way.
• You want to pass in a huge list of parameters into my thread. This is always generally a code smell, threads should be pretty simple and independent.
• You want to run an insane number of threads. Use your best judgement.
• You're trying to do some kind of map-reduce-like thing where you can parallelize some data processing (there are better modules for this, such as MRJob or even the multiprocessing python module)