Initial setup

Learn how to get scrapy-zyte-api installed and configured on an existing Scrapy project.

Tip

Zyte’s web scraping tutorial covers scrapy-zyte-api setup as well.

Requirements

You need at least:

scrapy-poet integration requires Scrapy 2.6+.

Installation

For a basic installation:

pip install scrapy-zyte-api

For scrapy-poet integration, install the provider extra:

pip install scrapy-zyte-api[provider]

For x402 support, install the x402 extra:

pip install scrapy-zyte-api[x402]

Note that you can install multiple extras:

pip install scrapy-zyte-api[provider,x402]

Configuration

To configure scrapy-zyte-api, set up authentication and either enable the add-on (Scrapy ≥ 2.10) or configure all components separately.

Authentication

Sign up for a Zyte API account, copy your API key and do either of the following:

  • Define an environment variable named ZYTE_API_KEY with your API key:

    • On Windows’ CMD:

      > set ZYTE_API_KEY=YOUR_API_KEY
      
    • On macOS and Linux:

      $ export ZYTE_API_KEY=YOUR_API_KEY
      
  • Add your API key to your setting module:

    settings.py
    ZYTE_API_KEY = "YOUR_API_KEY"
    

To use x402 instead, see x402.

Enabling the add-on

If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following add-on with any priority:

settings.py
ADDONS = {
    "scrapy_zyte_api.Addon": 500,
}

Note

The addon enables transparent mode by default.

Enabling all components separately

If enabling the add-on is not an option, you can set up scrapy-zyte-api integration as follows:

settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
}
SPIDER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
    "scrapy_zyte_api.ScrapyZyteAPIRefererSpiderMiddleware": 1000,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

By default, scrapy-zyte-api doesn’t change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:

settings.py
ZYTE_API_TRANSPARENT_MODE = True

For scrapy-poet integration, configure scrapy-poet first, and then add the following provider to the SCRAPY_POET_PROVIDERS setting:

settings.py
SCRAPY_POET_PROVIDERS = {
    "scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}

If you already had a custom value for REQUEST_FINGERPRINTER_CLASS, set that value on ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS instead.

settings.py
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"

For session management support, add the following downloader middleware to the DOWNLOADER_MIDDLEWARES setting:

settings.py
DOWNLOADER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPISessionDownloaderMiddleware": 667,
}

Changing reactors may require code changes

If your TWISTED_REACTOR setting was not set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor" before, you will be changing the Twisted reactor that your Scrapy project uses, and your existing code may need changes, such as:

  • Handling a pre-installed reactor.

    Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.

  • Integrating Deferred code and asyncio code.

    Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of self.crawler.engine.download() from a spider callback, you are yielding a Deferred.

x402

It is possible to use Zyte API without a Zyte API account by using the x402 protocol to handle payments:

  1. Read the Zyte Terms of Service. By using Zyte API, you are accepting them.

  2. During installation, make sure to install the x402 extra.

  3. Configure the private key of your Ethereum account to authorize payments.

Configuring your Ethereum private key

It is recommended to configure your Ethereum private key through an environment variable, so that it also works when you use python-zyte-api:

  • On Windows’ CMD:

    > set ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
    
  • On macOS and Linux:

    $ export ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
    

Alternatively, you can add your Ethereum private key to the settings module:

settings.py
ZYTE_API_ETH_KEY = "YOUR_ETH_PRIVATE_KEY"