scrapy-poet integration
If during the initial setup you followed the required steps for scrapy-poet integration, you can request supported page inputs in your page objects:
@attrs.define
class ProductPage(BasePage):
response: BrowserResponse
product: Product
class ZyteApiSpider(scrapy.Spider):
...
def parse_page(self, response: DummyResponse, page: ProductPage):
...
Or request them directly in the callback:
class ZyteApiSpider(scrapy.Spider):
...
def parse_page(self,
response: DummyResponse,
browser_response: BrowserResponse,
product: Product,
):
...
Dependency annotations
ZyteApiProvider understands and makes use of some dependency annotations.
Item annotations
Item dependencies such as zyte_common_items.Product can be annotated
directly. The only currently supported annotation is
scrapy_zyte_api.ExtractFrom:
from typing import Annotated
from scrapy_zyte_api import ExtractFrom
@attrs.define
class MyPageObject(BasePage):
product: Annotated[Product, ExtractFrom.httpResponseBody]
The provider will set the extraction options based on the annotations, so for
this code extractFrom will be set to httpResponseBody in
productOptions.
Geolocation
You can specify the geolocation field by adding a
scrapy_zyte_api.Geolocation dependency and annotating it with a
country code:
from typing import Annotated
from scrapy_zyte_api import Geolocation
@attrs.define
class MyPageObject(BasePage):
product: Product
geolocation: Annotated[Geolocation, "DE"]
Browser actions
You can specify browser actions by adding a scrapy_zyte_api.Actions
dependency and annotating it with actions passed to the
scrapy_zyte_api.actions() function:
from typing import Annotated
from scrapy_zyte_api import Actions, actions
@attrs.define
class MyPageObject(BasePage):
product: Product
actions: Annotated[
Actions,
actions(
[
{
"action": "click",
"selector": {"type": "css", "value": "button#openDescription"},
"delay": 0,
"button": "left",
"onError": "return",
},
{"action": "waitForTimeout", "timeout": 5, "onError": "return"},
]
),
]
You can access the results of these actions in the
Actions.results attribute of the dependency in the
resulting page object:
def validate_input(self):
for action_result in self.actions.result:
if action_result["status"] != "success":
return Product(is_valid=False)
return None
Custom attribute extraction
You can request custom attribute extraction by using either a
zyte_common_items.CustomAttributes dependency (if you need both the
attribute values and the attribute extraction metadata) or a
zyte_common_items.CustomAttributesValues dependency (if you only need
the values). You need to annotate it with input data as a dictionary and, if
needed, a dictionary with extraction options. You should use the
scrapy_zyte_api.custom_attrs() function to create the annotation:
from typing import Annotated
from scrapy_zyte_api import custom_attrs
from zyte_common_items import CustomAttributes
@attrs.define
class MyPageObject(BasePage):
product: Product
custom_attributes: Annotated[
CustomAttributes,
custom_attrs(
{"name": {"type": "string", "description": "name of the product"}},
{"method": "generate"},
),
]
You can then access the results as the dependency value:
def parse_page(self, response: DummyResponse, page: MyPageObject):
...
for k, v in page.custom_attributes.values.items():
...
Custom parameters
scrapy-poet integration ignores both manual and automatic Zyte API parameters.
Whenever you can, use inputs and dependency annotations to get additional Zyte API parameters into Zyte API requests made by the scrapy-poet integration.
If that is not possible, you can add Zyte API parameters to requests made by
the scrapy-poet integration with the zyte_api_provider request
metadata key or the ZYTE_API_PROVIDER_PARAMS setting.
When zyte_api_provider or ZYTE_API_PROVIDER_PARAMS
include one of the Zyte API extraction option parameters (e.g.
productOptions for product), but the final Zyte API request does not
include the corresponding extraction type, the unused options are automatically
removed. So, it is safe to use ZYTE_API_PROVIDER_PARAMS to set the
default options for various extraction types:
ZYTE_API_PROVIDER_PARAMS = {
"productOptions": {"extractFrom": "httpResponseBody"},
"productNavigationOptions": {"extractFrom": "httpResponseBody"},
}
When both zyte_api_provider and ZYTE_API_PROVIDER_PARAMS
are defined, they are combined, with zyte_api_provider taking
precedence in case of conflict.