Initial setup

Learn how to get scrapy-zyte-api installed and configured on an existing Scrapy project.

Tip

Zyte’s web scraping tutorial covers scrapy-zyte-api setup as well.

Requirements

You need at least:

A Zyte API subscription (there’s a free trial).
Python 3.8+
Scrapy 2.0.1+

scrapy-poet integration requires higher versions:

Scrapy 2.6+

Installation

For a basic installation:

pip install scrapy-zyte-api

For scrapy-poet integration:

pip install scrapy-zyte-api[provider]

Configuration

To configure scrapy-zyte-api, set your API key and either enable the add-on (Scrapy ≥ 2.10) or configure all components separately.

Warning

Changing reactors may require code changes.

Setting your API key

Add your Zyte API key, and add it to your project settings.py:

ZYTE_API_KEY = "YOUR_API_KEY"

Alternatively, you can set your API key in the ZYTE_API_KEY environment variable instead.

Enabling the add-on

If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following add-on with any priority:

settings.py

ADDONS = {
    "scrapy_zyte_api.Addon": 500,
}

Note

The addon enables transparent mode by default.

Enabling all components separately

If enabling the add-on is not an option, you can set up scrapy-zyte-api integration as follows:

settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 1000,
}
SPIDER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

By default, scrapy-zyte-api doesn’t change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:

settings.py

ZYTE_API_TRANSPARENT_MODE = True

For scrapy-poet integration, add the following provider to the SCRAPY_POET_PROVIDERS setting:

settings.py

SCRAPY_POET_PROVIDERS = {
    "scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}

If you already had a custom value for REQUEST_FINGERPRINTER_CLASS, set that value on ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS instead.

settings.py

ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"

Changing reactors may require code changes

If your TWISTED_REACTOR setting was not set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor" before, you will be changing the Twisted reactor that your Scrapy project uses, and your existing code may need changes, such as:

Handling a pre-installed reactor.

Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.
Awaiting on Deferreds.

Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of self.crawler.engine.download() from a spider callback, you are yielding a Deferred.