Request fingerprinting
The request fingerprinter class of scrapy-zyte-api ensures that Scrapy 2.7 and later generate unique request fingerprints for Zyte API requests based on some of their parameters.
For example, a request for browserHtml and a request for screenshot with the same target URL are considered different requests. Similarly, requests with the same target URL but different actions are also considered different requests.
Use ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS
to define a custom
request fingerprinting for requests that do not go through Zyte API.
Request fingerprinting before Scrapy 2.7
If you have a Scrapy version older than Scrapy 2.7, Zyte API parameters are not taken into account for request fingerprinting. This can cause some Scrapy components, like the filter of duplicate requests or the HTTP cache extension, to interpret 2 different requests as being the same.
To avoid most issues, use automatic request parameters, either
through transparent mode or setting
zyte_api_automap
to True
in Request.meta
, and then use Request
attributes instead of Request.meta
as much
as possible. Unlike Request.meta
,
Request
attributes do affect request fingerprints in
Scrapy versions older than Scrapy 2.7.
For requests that must have the same Request
attributes
but should still be considered different, such as browser-based requests with
different URL fragments, you can set dont_filter=True
when creating your
request to prevent the duplicate filter of Scrapy to filter any of them out.
For example:
yield Request(
"https://toscrape.com#1",
meta={"zyte_api_automap": {"browserHtml": True}},
dont_filter=True,
)
yield Request(
"https://toscrape.com#2",
meta={"zyte_api_automap": {"browserHtml": True}},
dont_filter=True,
)
Note, however, that for other Scrapy components, like the HTTP cache extensions, these 2 requests would still be considered identical.