Initial setup

Learn how to get scrapy-zyte-api installed and configured on an existing Scrapy project.

Tip

Zyte’s web scraping tutorial covers scrapy-zyte-api setup as well.

Requirements

You need at least:

scrapy-poet integration requires higher versions:

  • Scrapy 2.6+

Installation

For a basic installation:

pip install scrapy-zyte-api

For scrapy-poet integration:

pip install scrapy-zyte-api[provider]

Configuration

To configure scrapy-zyte-api, set your API key and either enable the add-on (Scrapy ≥ 2.10) or configure all components separately.

Setting your API key

Add your Zyte API key, and add it to your project settings.py:

ZYTE_API_KEY = "YOUR_API_KEY"

Alternatively, you can set your API key in the ZYTE_API_KEY environment variable instead.

Enabling the add-on

If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following add-on with any priority:

settings.py
ADDONS = {
    "scrapy_zyte_api.Addon": 500,
}

Note

The addon enables transparent mode by default.

Enabling all components separately

If enabling the add-on is not an option, you can set up scrapy-zyte-api integration as follows:

settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 1000,
}
SPIDER_MIDDLEWARES = {
    "scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

By default, scrapy-zyte-api doesn’t change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:

settings.py
ZYTE_API_TRANSPARENT_MODE = True

For scrapy-poet integration, add the following provider to the SCRAPY_POET_PROVIDERS setting:

settings.py
SCRAPY_POET_PROVIDERS = {
    "scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}

If you already had a custom value for REQUEST_FINGERPRINTER_CLASS, set that value on ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS instead.

settings.py
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"

Changing reactors may require code changes

If your TWISTED_REACTOR setting was not set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor" before, you will be changing the Twisted reactor that your Scrapy project uses, and your existing code may need changes, such as:

  • Handling a pre-installed reactor.

    Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.

  • Awaiting on Deferreds.

    Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of self.crawler.engine.download() from a spider callback, you are yielding a Deferred.