Request mapping
When you enable automatic request parameter mapping, be it through transparent mode or for a specific request, some Zyte API parameters are chosen automatically for you, and you can then change them further if you wish.
Automatic mapping
Request.urlbecomes url, same as in requests with manual parameters.If
Request.methodis something other than"GET", it becomes httpRequestMethod.Request.bodybecomes httpRequestBody.
Request.headersbecome customHttpRequestHeaders for HTTP requests and requestHeaders for browser requests. See Header mapping and Unsupported scenarios for details.If serp is enabled, request header mapping is disabled.
If
ZYTE_API_EXPERIMENTAL_COOKIES_ENABLEDisTrue,COOKIES_ENABLEDisTrue(default), andRequest.metadoes not setdont_merge_cookiestoTrue:experimental.responseCookies becomes
True.Cookies from the
cookiejarbecome experimental.requestCookies.All cookies from the cookie jar are set, regardless of their cookie domain. This is because Zyte API requests may involve requests to different domains (e.g. when following cross-domain redirects, or during browser rendering).
See also:
ZYTE_API_MAX_COOKIES,ZYTE_API_COOKIE_MIDDLEWARE.
httpResponseBody and httpResponseHeaders are set to
True.This is subject to change without prior notice in future versions of scrapy-zyte-api, so please account for the following:
If you are requesting a binary resource, such as a PDF file or an image file, set httpResponseBody to
Trueexplicitly in your requests:Request( url="https://toscrape.com/img/zyte.png", meta={ "zyte_api_automap": {"httpResponseBody": True}, }, )
In the future, we may stop setting httpResponseBody to
Trueby default, and instead use a different, new Zyte API parameter that only works for non-binary responses (e.g. HMTL, JSON, plain text).If you need to access response headers, be it through
response.headersor throughresponse.raw_api_response["httpResponseHeaders"], set httpResponseHeaders toTrueexplicitly in your requests:Request( url="https://toscrape.com/", meta={ "zyte_api_automap": {"httpResponseHeaders": True}, }, )
At the moment scrapy-zyte-api requests response headers because some response headers are necessary to properly decode the response body as text. In the future, Zyte API may be able to handle this decoding automatically, so scrapy-zyte-api would stop setting httpResponseHeaders to
Trueby default.
For example, the following Scrapy request:
Request(
method="POST",
url="https://httpbin.org/anything",
headers={"Content-Type": "application/json"},
body=b'{"foo": "bar"}',
cookies={"a": "b"},
)
Results in a request to the Zyte API data extraction endpoint with the following parameters:
{
"customHttpRequestHeaders": [
{
"name": "Content-Type",
"value": "application/json"
}
],
"experimental": {
"requestCookies": [
{
"name": "a",
"value": "b",
"domain": ""
}
],
"responseCookies": true
},
"httpResponseBody": true,
"httpResponseHeaders": true,
"httpRequestBody": "eyJmb28iOiAiYmFyIn0=",
"httpRequestMethod": "POST",
"url": "https://httpbin.org/anything"
}
Header mapping
When mapping headers, some headers are dropped based on the values of the
ZYTE_API_SKIP_HEADERS and ZYTE_API_BROWSER_HEADERS
settings. Their default values cause the drop of headers not supported by Zyte
API.
Even if not defined in ZYTE_API_SKIP_HEADERS, additional headers may
be dropped from HTTP requests (customHttpRequestHeaders):
The
AcceptandAccept-Languageheaders are dropped if their values are not user-defined, i.e. they come from the default global value (settingpriorityof 0) of theDEFAULT_REQUEST_HEADERSsetting.The
Accept-Encodingheader is dropped if its value is not user-defined, i.e. it was set by theHttpCompressionMiddleware.The
User-Agentheader is dropped if its value is not user-defined, i.e. it comes from the default global value (settingpriorityof 0) of theUSER_AGENTsetting.
To force the mapping of these headers, define the corresponding setting
(if any), set them in the DEFAULT_REQUEST_HEADERS setting, or set them in
Request.headers from a spider callback.
They will be mapped even if defined with their default value.
Headers will also be mapped if set to a non-default value elsewhere, e.g. in a
custom downloader middleware, as long as it is done before the scrapy-zyte-api
downloader middleware, which is responsible for the mapping, processes the
request. Here “before” means a lower value than 633 in the
DOWNLOADER_MIDDLEWARES setting.
Similarly, you can add any of those headers to the
ZYTE_API_SKIP_HEADERS setting to prevent their mapping.
Also note that Scrapy sets the Referer header by default in all requests
that come from spider callbacks. To unset the header on a given request, set
the header value to None on that request. To unset it from all requests,
set the REFERER_ENABLED setting to
False. To unset it only from Zyte API requests, add it to the
ZYTE_API_SKIP_HEADERS setting and remove it from the
ZYTE_API_BROWSER_HEADERS setting.
Unsupported scenarios
To maximize support for potential future changes in Zyte API, automatic request parameter mapping allows some parameter values and parameter combinations that Zyte API does not currently support, and may never support:
Request.methodbecomes httpRequestMethod even for unsupported httpRequestMethod values, and even if httpResponseBody is unset.You can set customHttpRequestHeaders or requestHeaders to
Trueto force their mapping fromRequest.headersin scenarios where they would not be mapped otherwise.Conversely, you can set customHttpRequestHeaders or requestHeaders to
Falseto prevent their mapping fromRequest.headers.Request.bodybecomes httpRequestBody even if httpResponseBody is unset.You can set httpResponseBody to
False(which unsets the parameter), and not set other outputs (browserHtml, screenshot, product…) toTrue. In this case,Request.headersis mapped as requestHeaders.You can set httpResponseBody to
Trueor use automatic extraction from httpResponseBody, and also set browserHtml or screenshot toTrueor use automatic extraction from browserHtml. In this case,Request.headersis mapped both as customHttpRequestHeaders and as requestHeaders, and browserHtml is used asresponse.body.