Request mapping
When you enable automatic request parameter mapping, be it through transparent mode or for a specific request, some Zyte API parameters are chosen automatically for you, and you can then change them further if you wish.
Automatic mapping
Request.url
becomes url, same as in requests with manual parameters.If
Request.method
is something other than"GET"
, it becomes httpRequestMethod.Request.body
becomes httpRequestBody.Request.headers
become customHttpRequestHeaders for HTTP requests and requestHeaders for browser requests. See Header mapping and Unsupported scenarios for details.If
ZYTE_API_EXPERIMENTAL_COOKIES_ENABLED
isTrue
,COOKIES_ENABLED
isTrue
(default), andRequest.meta
does not setdont_merge_cookies
toTrue
:experimental.responseCookies becomes
True
.Cookies from the
cookiejar
become experimental.requestCookies.All cookies from the cookie jar are set, regardless of their cookie domain. This is because Zyte API requests may involve requests to different domains (e.g. when following cross-domain redirects, or during browser rendering).
See also:
ZYTE_API_MAX_COOKIES
,ZYTE_API_COOKIE_MIDDLEWARE
.
httpResponseBody and httpResponseHeaders are set to
True
.This is subject to change without prior notice in future versions of scrapy-zyte-api, so please account for the following:
If you are requesting a binary resource, such as a PDF file or an image file, set httpResponseBody to
True
explicitly in your requests:Request( url="https://toscrape.com/img/zyte.png", meta={ "zyte_api_automap": {"httpResponseBody": True}, }, )
In the future, we may stop setting httpResponseBody to
True
by default, and instead use a different, new Zyte API parameter that only works for non-binary responses (e.g. HMTL, JSON, plain text).If you need to access response headers, be it through
response.headers
or throughresponse.raw_api_response["httpResponseHeaders"]
, set httpResponseHeaders toTrue
explicitly in your requests:Request( url="https://toscrape.com/", meta={ "zyte_api_automap": {"httpResponseHeaders": True}, }, )
At the moment scrapy-zyte-api requests response headers because some response headers are necessary to properly decode the response body as text. In the future, Zyte API may be able to handle this decoding automatically, so scrapy-zyte-api would stop setting httpResponseHeaders to
True
by default.
For example, the following Scrapy request:
Request(
method="POST",
url="https://httpbin.org/anything",
headers={"Content-Type": "application/json"},
body=b'{"foo": "bar"}',
cookies={"a": "b"},
)
Results in a request to the Zyte API data extraction endpoint with the following parameters:
{
"customHttpRequestHeaders": [
{
"name": "Content-Type",
"value": "application/json"
}
],
"experimental": {
"requestCookies": [
{
"name": "a",
"value": "b",
"domain": ""
}
],
"responseCookies": true
},
"httpResponseBody": true,
"httpResponseHeaders": true,
"httpRequestBody": "eyJmb28iOiAiYmFyIn0=",
"httpRequestMethod": "POST",
"url": "https://httpbin.org/anything"
}
Header mapping
When mapping headers, some headers are dropped based on the values of the
ZYTE_API_SKIP_HEADERS
and ZYTE_API_BROWSER_HEADERS
settings. Their default values cause the drop of headers not supported by Zyte
API.
Even if not defined in ZYTE_API_SKIP_HEADERS
, additional headers may
be dropped from HTTP requests (customHttpRequestHeaders):
The
Accept
andAccept-Language
headers are dropped if their values are not user-defined, i.e. they come from the default global value (settingpriority
of 0) of theDEFAULT_REQUEST_HEADERS
setting.The
Accept-Encoding
header is dropped if its value is not user-defined, i.e. it was set by theHttpCompressionMiddleware
.The
User-Agent
header is dropped if its value is not user-defined, i.e. it comes from the default global value (settingpriority
of 0) of theUSER_AGENT
setting.
To force the mapping of these headers, define the corresponding setting
(if any), set them in the DEFAULT_REQUEST_HEADERS
setting, or set them in
Request.headers
from a spider callback.
They will be mapped even if defined with their default value.
Headers will also be mapped if set to a non-default value elsewhere, e.g. in a
custom downloader middleware, as long as it is done before the scrapy-zyte-api
downloader middleware, which is responsible for the mapping, processes the
request. Here “before” means a lower value than 633
in the
DOWNLOADER_MIDDLEWARES
setting.
Similarly, you can add any of those headers to the
ZYTE_API_SKIP_HEADERS
setting to prevent their mapping.
Also note that Scrapy sets the Referer
header by default in all requests
that come from spider callbacks. To unset the header on a given request, set
the header value to None
on that request. To unset it from all requests,
set the REFERER_ENABLED
setting to
False
. To unset it only from Zyte API requests, add it to the
ZYTE_API_SKIP_HEADERS
setting and remove it from the
ZYTE_API_BROWSER_HEADERS
setting.
Unsupported scenarios
To maximize support for potential future changes in Zyte API, automatic request parameter mapping allows some parameter values and parameter combinations that Zyte API does not currently support, and may never support:
Request.method
becomes httpRequestMethod even for unsupported httpRequestMethod values, and even if httpResponseBody is unset.You can set customHttpRequestHeaders or requestHeaders to
True
to force their mapping fromRequest.headers
in scenarios where they would not be mapped otherwise.Conversely, you can set customHttpRequestHeaders or requestHeaders to
False
to prevent their mapping fromRequest.headers
.Request.body
becomes httpRequestBody even if httpResponseBody is unset.You can set httpResponseBody to
False
(which unsets the parameter), and not set other outputs (browserHtml, screenshot, product…) toTrue
. In this case,Request.headers
is mapped as requestHeaders.You can set httpResponseBody to
True
or use automatic extraction from httpResponseBody, and also set browserHtml or screenshot toTrue
or use automatic extraction from browserHtml. In this case,Request.headers
is mapped both as customHttpRequestHeaders and as requestHeaders, and browserHtml is used asresponse.body
.