Request fingerprinting

The request fingerprinter class of scrapy-zyte-api ensures that Scrapy 2.7 and later generate unique request fingerprints for Zyte API requests based on some of their parameters.

For example, a request for browserHtml and a request for screenshot with the same target URL are considered different requests. Similarly, requests with the same target URL but different actions are also considered different requests.

Use ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS to define a custom request fingerprinting for requests that do not go through Zyte API.

Request fingerprinting before Scrapy 2.7

If you have a Scrapy version older than Scrapy 2.7, Zyte API parameters are not taken into account for request fingerprinting. This can cause some Scrapy components, like the filter of duplicate requests or the HTTP cache extension, to interpret 2 different requests as being the same.

To avoid most issues, use automatic request parameters, either through transparent mode or setting zyte_api_automap to True in Request.meta, and then use Request attributes instead of Request.meta as much as possible. Unlike Request.meta, Request attributes do affect request fingerprints in Scrapy versions older than Scrapy 2.7.

For requests that must have the same Request attributes but should still be considered different, such as browser-based requests with different URL fragments, you can set dont_filter=True when creating your request to prevent the duplicate filter of Scrapy to filter any of them out. For example:

yield Request(
    "https://toscrape.com#1",
    meta={"zyte_api_automap": {"browserHtml": True}},
    dont_filter=True,
)
yield Request(
    "https://toscrape.com#2",
    meta={"zyte_api_automap": {"browserHtml": True}},
    dont_filter=True,
)

Note, however, that for other Scrapy components, like the HTTP cache extensions, these 2 requests would still be considered identical.