scrapy-poet integration
If during the initial setup you followed the required steps for scrapy-poet integration, you can request supported page inputs in your page objects:
@attrs.define
class ProductPage(BasePage):
response: BrowserResponse
product: Product
class ZyteApiSpider(scrapy.Spider):
...
def parse_page(self, response: DummyResponse, page: ProductPage):
...
Or request them directly in the callback:
class ZyteApiSpider(scrapy.Spider):
...
def parse_page(self,
response: DummyResponse,
browser_response: BrowserResponse,
product: Product,
):
...
Dependency annotations
ZyteApiProvider
understands and makes use of some dependency annotations.
Note
Dependency annotations require Python 3.9+.
Item annotations
Item dependencies such as zyte_common_items.Product
can be annotated
directly. The only currently supported annotation is
scrapy_zyte_api.ExtractFrom
:
from typing import Annotated
from scrapy_zyte_api import ExtractFrom
@attrs.define
class MyPageObject(BasePage):
product: Annotated[Product, ExtractFrom.httpResponseBody]
The provider will set the extraction options based on the annotations, so for
this code extractFrom
will be set to httpResponseBody
in
productOptions
.
Geolocation
You can specify the geolocation field by adding a
scrapy_zyte_api.Geolocation
dependency and annotating it with a
country code:
from typing import Annotated
from scrapy_zyte_api import Geolocation
@attrs.define
class MyPageObject(BasePage):
product: Product
geolocation: Annotated[Geolocation, "DE"]
Browser actions
You can specify browser actions by adding a scrapy_zyte_api.Actions
dependency and annotating it with actions passed to the
scrapy_zyte_api.actions()
function:
from typing import Annotated
from scrapy_zyte_api import Actions, actions
@attrs.define
class MyPageObject(BasePage):
product: Product
actions: Annotated[
Actions,
actions(
[
{
"action": "click",
"selector": {"type": "css", "value": "button#openDescription"},
"delay": 0,
"button": "left",
"onError": "return",
},
{"action": "waitForTimeout", "timeout": 5, "onError": "return"},
]
),
]
You can access the results of these actions in the
Actions.results
attribute of the dependency in the
resulting page object:
def validate_input(self):
for action_result in self.actions.result:
if action_result["status"] != "success":
return Product(is_valid=False)
return None
Custom parameters
scrapy-poet integration ignores both manual and automatic Zyte API parameters.
Whenever you can, use inputs and dependency annotations to get additional Zyte API parameters into Zyte API requests made by the scrapy-poet integration.
If that is not possible, you can add Zyte API parameters to requests made by
the scrapy-poet integration with the zyte_api_provider
request
metadata key or the ZYTE_API_PROVIDER_PARAMS
setting.
When zyte_api_provider
or ZYTE_API_PROVIDER_PARAMS
include one of the Zyte API extraction option parameters (e.g.
productOptions
for product
), but the final Zyte API request does not
include the corresponding extraction type, the unused options are automatically
removed. So, it is safe to use ZYTE_API_PROVIDER_PARAMS
to set the
default options for various extraction types:
ZYTE_API_PROVIDER_PARAMS = {
"productOptions": {"extractFrom": "httpResponseBody"},
"productNavigationOptions": {"extractFrom": "httpResponseBody"},
}
When both zyte_api_provider
and ZYTE_API_PROVIDER_PARAMS
are defined, they are combined, with zyte_api_provider
taking
precedence in case of conflict.