Settings
Settings for scrapy-zyte-api.
ZYTE_API_AUTOMAP_PARAMS
Default: {}
dict
of parameters to be combined with automatic request
parameters.
These parameters are merged with zyte_api_automap
parameters.
zyte_api_automap
parameters take precedence.
This setting has no effect on requests with manual request parameters.
When using transparent mode, be careful of which
parameters you define in this setting. In transparent mode, all Scrapy requests
go through Zyte API, even requests that Scrapy sends automatically, such as
those for robots.txt
files when ROBOTSTXT_OBEY
is True
, or those for sitemaps when using
SitemapSpider
. Certain parameters, like
browserHtml or screenshot, are not meant to be
used for every single request.
If zyte_api_default_params
in Request.meta
is set to False
, this setting is ignored for
that request.
See Default parameters.
ZYTE_API_BROWSER_HEADERS
Default: {"Referer": "referer"}
Determines headers that can be mapped as requestHeaders.
It is a dict
, where keys are header names and values are the key that
represents them in requestHeaders.
ZYTE_API_DEFAULT_PARAMS
Default: {}
dict
of parameters to be combined with manual request parameters.
You may set zyte_api
to an empty dict
to only use the
parameters defined here for that request.
These parameters are merged with zyte_api
parameters.
zyte_api
parameters take precedence.
This setting has no effect on requests with automatic request parameters.
If zyte_api_default_params
in Request.meta
is set to False
, this setting is ignored for
that request.
See Default parameters.
ZYTE_API_ENABLED
Default: True
Can be set to False
to disable scrapy-zyte-api.
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS
Default: scrapy_poet.ScrapyPoetRequestFingerprinter
if scrapy-poet is
installed, else scrapy.utils.request.RequestFingerprinter
Request fingerprinter to for requests that do not go through Zyte API. See Request fingerprinting.
ZYTE_API_KEY
Default: None
Your Zyte API key.
You can alternatively define an environment variable with the same name.
Tip
On Scrapy Cloud, this setting is defined automatically.
ZYTE_API_LOG_REQUESTS
Default: False
Set this to True
and LOG_LEVEL
to "DEBUG"
to enable the logging of debug messages that indicate the JSON object sent on
every Zyte API request.
For example:
Sending Zyte API extract request: {"url": "https://example.com", "httpResponseBody": true}
See also: ZYTE_API_LOG_REQUESTS_TRUNCATE
.
ZYTE_API_LOG_REQUESTS_TRUNCATE
Default: 64
Determines the maximum length of any string value in the JSON object logged
when ZYTE_API_LOG_REQUESTS
is enabled, excluding object keys.
To disable truncation, set this to 0
.
ZYTE_API_MAX_REQUESTS
Default: None
When set to an integer value > 0, the spider will close when the number of Zyte API requests reaches it.
Note that requests with error responses that cannot be retried or exceed their retry limit also count here.
ZYTE_API_PROVIDER_PARAMS
Default: {}
Defines additional request parameters to use in Zyte API requests sent by the scrapy-poet integration.
For example:
ZYTE_API_PROVIDER_PARAMS = {
"requestCookies": [
{"name": "a", "value": "b", "domain": "example.com"},
],
}
ZYTE_API_RETRY_POLICY
Default: "zyte_api.aio.retry.zyte_api_retrying"
Determines the retry policy for Zyte API requests.
It must be a string with the import path of a tenacity.AsyncRetrying
subclass.
Note
Settings must be picklable
,
and retry policies are not,
so you cannot assign a retry policy class directly to this setting, you
must use their import path as a string instead.
See Retries.
ZYTE_API_SKIP_HEADERS
Default: ["Cookie"]
Determines headers that must not be mapped as customHttpRequestHeaders.
ZYTE_API_TRANSPARENT_MODE
Default: False
See Transparent mode.
ZYTE_API_USE_ENV_PROXY
Default: False
Set to True
to make Zyte API requests respect system proxy settings. See
Using a proxy.