How to set proxy in Python Requests

Introduction

As a seasoned developer with a keen interest in web scraping and data extraction, I've often leveraged Python for its simplicity and power. In this realm, understanding and utilizing proxies becomes a necessity, especially to navigate through the complexities of web requests, IP bans, and rate limiting. In this article, I'll share my insights and experiences on using proxies with Python's Requests library. We'll start from the basics and gradually move to more advanced techniques like retries and rotating proxies. My journey through these concepts has been filled with trials and errors, and I aim to provide you with a clear path to mastering proxies in Python, peppered with practical, real-world code examples. If you are relatively new to web scraping in Python and proxies in particular, I recommend my blog post about choosing a proxy for web scraping and do not forget that you will probably need to check that the proxy is working properly, this is where my simple bash proxy checker might save you a lot of keystrokes over time.

Prerequisites & Installation

Before we dive into the nuances of proxy usage in Python, let's set the stage with the necessary prerequisites. First and foremost, you need a basic understanding of Python (if it needs mentioning, I will be using Python version 3 in the code examples below).

If you're already comfortable with Python's syntax and basic libraries, you're good to go. Additionally, familiarity with HTTP requests and responses will be beneficial, as proxies predominantly deal with these elements. Installing the Requests library is our starting point. This library simplifies HTTP requests in Python, providing an intuitive and user-friendly way to send and receive data. You can install it using pip, Python's package manager. Just run pip install requests. Once you have Requests installed, the next step is to ensure you have access to a proxy or a list of proxies.

Proxies can be free or paid, and the choice depends on your specific needs and the level of reliability you require. In my experience, paid proxies tend to be much more reliable for real-world scraping tasks, and using free proxies is usually very time consuming and work poorly - I don't really recommend this unless you need to do just 3 or 5 requests for basic tests.

How to use a proxy with Python Requests

Basic example

Using a proxy with Python Requests is straightforward. In its simplest form, you define your proxy and pass it as a parameter to the requests.get() or requests.post() method. Here's a basic example:

import requests

# Replace with your proxy URL
proxy = "http://your_proxy_here"

# Using the proxy with a GET request
response = requests.get("http://example.com", proxies={"http": proxy, "https": proxy})
print(response.text)

In this code, replace "http://your_proxy_here" with your actual proxy URL. This example demonstrates a GET request, but the same logic applies to other types of HTTP requests.

With authentication

When using authenticated proxies, you need to provide a username and password alongside the proxy URL. This can be a bit tricky, as the credentials need to be included in the proxy URL itself. Here's a basic example of how to use authenticated proxies in Python:

import requests

# Your proxy credentials
proxy_user = "user"
proxy_pass = "password"

# Your proxy URL
proxy_url = "http://your_proxy_here"

# Forming the authenticated proxy URL
proxies = {
    "http": f"http://{proxy_user}:{proxy_pass}@{proxy_url}",
    "https": f"http://{proxy_user}:{proxy_pass}@{proxy_url}"
}

# Using the authenticated proxy with a GET request
response = requests.get("http://example.com", proxies=proxies)
print(response.text)

With retries

In a real-world scenario, proxies might fail, and it's crucial to handle these failures gracefully. Retries are an effective way to ensure your request eventually goes through. Here's how I implement retries in my projects:

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

# Session with retry strategy
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))

# Making a request using the session
response = session.get("http://example.com", proxies={"http": proxy, "https": proxy})
print(response.text)

This approach uses a session object with a mounted HTTPAdapter. The Retry object defines the retry strategy, indicating the number of retries and the status codes that trigger a retry.

Rotating proxies from the list

When dealing with a large number of requests, using a single proxy might not be sufficient due to rate limiting or IP bans. Rotating proxies can solve this problem.

I should note that some proxy providers currently offer rotating ips via just one proxy address. In this case, you don't really need to cycle through the list of proxies – you just need to retry your request, obviously (like in the code example above). The code below is handling a different case, when you have multiple proxies with separate addresses and you need to rotate through them.

Here's how I rotate proxies in Python:

from itertools import cycle

# List of proxies
proxies = ["http://proxy1.example.com", "http://proxy2.example.com", "http://proxy3.example.com"]
proxy_pool = cycle(proxies)

# Function to make a request with rotating proxies
def make_request(url):
    for _ in range(len(proxies)):
        proxy = next(proxy_pool)
        try:
            response = requests.get(url, proxies={"http": proxy, "https": proxy})
            return response
        except:
            # Handle exception (e.g., log and continue to the next proxy)
            pass

# Usage
response = make_request("http://example.com")
print(response.text if response else "Request failed")

In this snippet, cycle from itertools is

used to rotate through the proxy list. Each request attempts to use a different proxy, providing a simple yet effective way to manage multiple proxies.

Conclusion

In summary, understanding and efficiently utilizing proxies in Python can significantly enhance your web scraping capabilities. By integrating basic proxy usage, implementing retries, and rotating through multiple proxies, you can overcome common challenges like IP bans and rate limiting. This knowledge is not just limited to Python; it's applicable to other languages and frameworks, as I've explored in my articles on Puppeteer and Node.js web scraping. Remember, the key to successful web scraping lies in being respectful to the target websites, adhering to their terms of service, and using proxies judiciously to avoid any unethical practices.

As always, I'm eager to share more insights and practical tips in future articles. If you found this article helpful, you might also enjoy my Youtube video on discovering hidden website APIs via chrome dev tools. Happy coding and scraping!