Initial setup
Learn how to get scrapy-zyte-api installed and configured on an existing Scrapy project.
Tip
Zyte’s web scraping tutorial covers scrapy-zyte-api setup as well.
Requirements
You need at least:
A Zyte API subscription (there’s a free trial).
Python 3.9+
Scrapy 2.0.1+
scrapy-poet integration requires higher versions:
Scrapy 2.6+
Installation
For a basic installation:
pip install scrapy-zyte-api
pip install scrapy-zyte-api[provider]
Configuration
To configure scrapy-zyte-api, set your API key and either enable the add-on (Scrapy ≥ 2.10) or configure all components separately.
Setting your API key
Add your Zyte API key, and add it to your project settings.py
:
ZYTE_API_KEY = "YOUR_API_KEY"
Alternatively, you can set your API key in the ZYTE_API_KEY
environment
variable instead.
Enabling the add-on
If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following add-on with any priority:
ADDONS = {
"scrapy_zyte_api.Addon": 500,
}
Note
The addon enables transparent mode by default.
Enabling all components separately
If enabling the add-on is not an option, you can set up scrapy-zyte-api integration as follows:
DOWNLOAD_HANDLERS = {
"http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
"https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
}
SPIDER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
"scrapy_zyte_api.ScrapyZyteAPIRefererSpiderMiddleware": 1000,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
By default, scrapy-zyte-api doesn’t change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:
ZYTE_API_TRANSPARENT_MODE = True
For scrapy-poet integration, configure scrapy-poet first, and then add the following provider to the
SCRAPY_POET_PROVIDERS
setting:
SCRAPY_POET_PROVIDERS = {
"scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}
If you already had a custom value for REQUEST_FINGERPRINTER_CLASS
, set that value on
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS
instead.
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"
For session management support, add the following downloader
middleware to the DOWNLOADER_MIDDLEWARES
setting:
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPISessionDownloaderMiddleware": 667,
}
Changing reactors may require code changes
If your TWISTED_REACTOR
setting was not
set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
before,
you will be changing the Twisted reactor that your Scrapy project uses, and
your existing code may need changes, such as:
Handling a pre-installed reactor.
Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.
-
Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of
self.crawler.engine.download()
from a spider callback, you are yielding a Deferred.