The computer networks that make up the internet are complex and handling an immense amount of traffic. So sometimes when you make a request it will fail intermittently, and you want to try until it succeeds. This is easy in requests using urllib3 Retry.

I was trying to download data from Common Crawl's S3 exports, but ocassionally the process would fail due to a network or server error. My process would keep the successful downloads using an AtomicFileWriter, but I'd have to restart the process. Here's what the new code looks like:

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
CC_DATA_URL = 'https://commoncrawl.s3.amazonaws.com/'
RETRY_STRATEGY = Retry(
total=5,
backoff_factor=1
)
...
We use a requests.Session to be able to store the retry logic. An advantage of Session is it reuses the TCP connections, and in my case made downloading twice as fast.