scrapy-poet integration

If during the initial setup you followed the required steps for scrapy-poet integration, you can request supported page inputs in your page objects:

@attrs.define
class ProductPage(BasePage):
    response: BrowserResponse
    product: Product


class ZyteApiSpider(scrapy.Spider):
    ...

    def parse_page(self, response: DummyResponse, page: ProductPage):
        ...

Or request them directly in the callback:

class ZyteApiSpider(scrapy.Spider):
    ...

    def parse_page(self,
                   response: DummyResponse,
                   browser_response: BrowserResponse,
                   product: Product,
                   ):
        ...

Dependency annotations

ZyteApiProvider understands and makes use of some dependency annotations.

Note

Dependency annotations require Python 3.9+.

Item annotations

Item dependencies such as zyte_common_items.Product can be annotated directly. The only currently supported annotation is scrapy_zyte_api.ExtractFrom:

from typing import Annotated

from scrapy_zyte_api import ExtractFrom


@attrs.define
class MyPageObject(BasePage):
    product: Annotated[Product, ExtractFrom.httpResponseBody]

The provider will set the extraction options based on the annotations, so for this code extractFrom will be set to httpResponseBody in productOptions.

Geolocation

You can specify the geolocation field by adding a scrapy_zyte_api.Geolocation dependency and annotating it with a country code:

from typing import Annotated

from scrapy_zyte_api import Geolocation


@attrs.define
class MyPageObject(BasePage):
    product: Product
    geolocation: Annotated[Geolocation, "DE"]

Browser actions

You can specify browser actions by adding a scrapy_zyte_api.Actions dependency and annotating it with actions passed to the scrapy_zyte_api.actions() function:

from typing import Annotated

from scrapy_zyte_api import Actions, actions


@attrs.define
class MyPageObject(BasePage):
    product: Product
    actions: Annotated[
        Actions,
        actions(
            [
                {
                    "action": "click",
                    "selector": {"type": "css", "value": "button#openDescription"},
                    "delay": 0,
                    "button": "left",
                    "onError": "return",
                },
                {"action": "waitForTimeout", "timeout": 5, "onError": "return"},
            ]
        ),
    ]

You can access the results of these actions in the Actions.results attribute of the dependency in the resulting page object:

def validate_input(self):
    for action_result in self.actions.result:
        if action_result["status"] != "success":
            return Product(is_valid=False)
    return None

Custom parameters

scrapy-poet integration ignores both manual and automatic Zyte API parameters.

Whenever you can, use inputs and dependency annotations to get additional Zyte API parameters into Zyte API requests made by the scrapy-poet integration.

If that is not possible, you can add Zyte API parameters to requests made by the scrapy-poet integration with the zyte_api_provider request metadata key or the ZYTE_API_PROVIDER_PARAMS setting.

When zyte_api_provider or ZYTE_API_PROVIDER_PARAMS include one of the Zyte API extraction option parameters (e.g. productOptions for product), but the final Zyte API request does not include the corresponding extraction type, the unused options are automatically removed. So, it is safe to use ZYTE_API_PROVIDER_PARAMS to set the default options for various extraction types:

setting.py

ZYTE_API_PROVIDER_PARAMS = {
    "productOptions": {"extractFrom": "httpResponseBody"},
    "productNavigationOptions": {"extractFrom": "httpResponseBody"},
}

When both zyte_api_provider and ZYTE_API_PROVIDER_PARAMS are defined, they are combined, with zyte_api_provider taking precedence in case of conflict.