Initial setup
Learn how to get scrapy-zyte-api installed and configured on an existing Scrapy project.
Tip
Zyte’s web scraping tutorial covers scrapy-zyte-api setup as well.
Requirements
You need at least:
A Zyte API subscription (there’s a free trial).
Python 3.10+
Scrapy 2.0.1+
scrapy-poet integration requires Scrapy 2.6+.
Installation
For a basic installation:
pip install scrapy-zyte-api
For scrapy-poet integration, install the provider extra:
pip install scrapy-zyte-api[provider]
For x402 support, install the x402 extra:
pip install scrapy-zyte-api[x402]
Note that you can install multiple extras:
pip install scrapy-zyte-api[provider,x402]
Configuration
To configure scrapy-zyte-api, set up authentication and either enable the add-on (Scrapy ≥ 2.10) or configure all components separately.
Authentication
Sign up for a Zyte API account, copy your API key and do either of the following:
Define an environment variable named
ZYTE_API_KEYwith your API key:On Windows’ CMD:
> set ZYTE_API_KEY=YOUR_API_KEY
On macOS and Linux:
$ export ZYTE_API_KEY=YOUR_API_KEY
Add your API key to your setting module:
settings.pyZYTE_API_KEY = "YOUR_API_KEY"
Enabling the add-on
If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following add-on with any priority:
ADDONS = {
"scrapy_zyte_api.Addon": 500,
}
Note
The addon enables transparent mode by default.
Enabling all components separately
If enabling the add-on is not an option, you can set up scrapy-zyte-api integration as follows:
DOWNLOAD_HANDLERS = {
"http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
"https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
}
SPIDER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
"scrapy_zyte_api.ScrapyZyteAPIRefererSpiderMiddleware": 1000,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
By default, scrapy-zyte-api doesn’t change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:
ZYTE_API_TRANSPARENT_MODE = True
For scrapy-poet integration, configure scrapy-poet first, and then add the following provider to the
SCRAPY_POET_PROVIDERS setting:
SCRAPY_POET_PROVIDERS = {
"scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}
If you already had a custom value for REQUEST_FINGERPRINTER_CLASS, set that value on
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS instead.
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"
For session management support, add
ScrapyZyteAPISessionDownloaderMiddleware to the
DOWNLOADER_MIDDLEWARES setting,
alongside the main downloader middleware:
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
"scrapy_zyte_api.ScrapyZyteAPISessionDownloaderMiddleware": 667,
}
Changing reactors may require code changes
If your TWISTED_REACTOR setting was not
set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor" before,
you will be changing the Twisted reactor that your Scrapy project uses, and
your existing code may need changes, such as:
Handling a pre-installed reactor.
Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.
Integrating Deferred code and asyncio code.
Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of
self.crawler.engine.download()from a spider callback, you are yielding a Deferred.
x402
It is possible to use Zyte API without a Zyte API account by using the x402 protocol to handle payments:
Read the Zyte Terms of Service. By using Zyte API, you are accepting them.
During installation, make sure to install the
x402extra.Configure the private key of your Ethereum account to authorize payments.
Configuring your Ethereum private key
It is recommended to configure your Ethereum private key through an environment variable, so that it also works when you use python-zyte-api:
On Windows’ CMD:
> set ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
On macOS and Linux:
$ export ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
Alternatively, you can add your Ethereum private key to the settings module:
ZYTE_API_ETH_KEY = "YOUR_ETH_PRIVATE_KEY"