Eng
  • Eng
  • Rus
  • Ukr

Proxy settings in Scrapy

Scrapy is a program that collects information from different sites, processes it, and organizes it into spreadsheets. Next, this data is used for marketing, research, journalistic and other purposes.

Web data scraping is not prohibited by law, but the owners of many sites do not support such actions on their resources. Security systems will track from which IP address the scraping took place and block this user. This is why Scrapy needs to set up proxy servers.

Proxies will give you the opportunity to:

  • Hide your IP address and use multiple addresses at once for multi-threaded tasks. This will help avoid blocking on many sites.
  • Get more specific and relevant information from web resources.
  • Automate the process of scraping and analyzing information.

The most reliable proxies for Scrapy are HTTP(S) and SOCKS5. They are suitable for scraping a large amount of information and protecting your data in the process.

How to set up a proxy in Scrapy

We will tell you about two ways to configure a proxy in the program.

Method 1: Using your middleware

This method is considered safer and more reliable. You need to create your middleware.

  1. Open the program.
  2. Enter the code with your proxy data in the format: ["proxy"] = " type://IP-address:Port:Username:Password".
  3. 2.png

    Enable this middleware in settings and put it before the "HttpProxyMiddleware" parameter.

    2:1.png

  4. Close the page. Done!

Method 2: Using Query Parameters

In this case, you need to put the proxy server as a parameter.

  1. Open the Scrapy program.
  2. In the code, find the middleware called "HttpProxyMiddleware".
  3. Now, next to the "meta" parameter, enter your proxy data in the format: "proxy": "type://IP-address:Port:Username:Password".
  4. 3:2.png

  5. Close the page and get to work.

How to check if a proxy is working in Scrapy

You can check if you have configured the proxy correctly using a special test site. For this:

  • Find any site that can determine your IP address (type in "My IP" or "Check IP" and select any website).
  • Scrape it with Scrapy.

If as a result, you see the address of your proxy server, then the setup was successful.