Spiders Contracts

Testing spiders can get particularly annoying and while nothing prevents you from writing unit tests the task gets cumbersome quickly. Scrapy offers an integrated way of testing your spiders by the means of contracts.

This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an @ and included in the docstring. See the following example:

def parse(self, response):
    """
    This function parses a sample response. Some contracts are mingled
    with this docstring.

    @url http://www.example.com/s?field-keywords=selfish+gene
    @returns items 1 16
    @returns requests 0 0
    @scrapes Title Author Year Price
    """

You can use the following contracts:

class scrapy.contracts.default.UrlContract(method: Callable[..., Any], *args: Any)[source]

Sets (@url) the sample URL used when checking the other contract conditions of a callback.

This contract is mandatory: callbacks lacking it are ignored when running the checks.

@url url

class scrapy.contracts.default.CallbackKeywordArgumentsContract(method: Callable[..., Any], *args: Any)[source]

Sets (@cb_kwargs) the cb_kwargs attribute of the sample request.

Its value must be a valid JSON dictionary.

@cb_kwargs {"arg1": "value1", "arg2": "value2", ...}

class scrapy.contracts.default.MetadataContract(method: Callable[..., Any], *args: Any)[source]

Sets (@meta) the meta attribute of the sample request.

Its value must be a valid JSON dictionary.

@meta {"arg1": "value1", "arg2": "value2", ...}

class scrapy.contracts.default.ReturnsContract(*args: Any, **kwargs: Any)[source]

Sets (@returns) lower and upper bounds for the items and requests returned by a callback.

The upper bound is optional:

@returns item(s)|request(s) [min [max]]

For example:

@returns request
@returns request 2
@returns request 2 10
@returns request 0 10

Set both bounds to the same value to require an exact number:

@returns request 2 2

class scrapy.contracts.default.ScrapesContract(method: Callable[..., Any], *args: Any)[source]

Checks (@scrapes) that all items returned by a callback have the specified fields.

@scrapes field_1 field_2 ...

Use the check command to run the contract checks.

Custom Contracts

If you find you need more power than the built-in Scrapy contracts you can create and load your own contracts in the project by using the SPIDER_CONTRACTS setting:

SPIDER_CONTRACTS = {
    "myproject.contracts.ResponseCheck": 10,
    "myproject.contracts.ItemValidate": 10,
}

Each contract must inherit from Contract and can override three methods:

class scrapy.contracts.Contract(method: Callable[..., Any], *args: Any)[source]

Base class for custom contracts.

method is the callback function to which the contract is associated.

args is the list of arguments passed into the docstring, separated by whitespace.

Subclasses may override adjust_request_args(), and define a pre_process method or a post_process method, or both.

adjust_request_args(args: dict[str, Any]) → dict[str, Any][source]

Receive a dict with the default arguments for the sample request and return it, either unmodified or with changes.

Request is used by default, but this can be changed with the request_cls attribute. If multiple contracts in the chain define this attribute, the last one is used.

pre_process(response): This allows hooking in various checks on the response received from the sample request, before it’s being passed to the callback.

post_process(output): This allows processing the output of the callback. Iterators are converted to lists before being passed to this hook.

Raise ContractFail from pre_process or post_process if expectations are not met:

class scrapy.exceptions.ContractFail[source]: Error raised in case of a failing contract

Here is a demo contract which checks the presence of a custom header in the response received:

from scrapy.contracts import Contract
from scrapy.exceptions import ContractFail


class HasHeaderContract(Contract):
    """
    Demo contract which checks the presence of a custom header
    @has_header X-CustomHeader
    """

    name = "has_header"

    def pre_process(self, response):
        for header in self.args:
            if header not in response.headers:
                raise ContractFail("X-CustomHeader not present")

Detecting check runs

When scrapy check is running, the SCRAPY_CHECK environment variable is set to the true string. You can use os.environ to perform any change to your spiders or your settings when scrapy check is used:

import os
import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"

    def __init__(self):
        if os.environ.get("SCRAPY_CHECK"):
            pass  # Do some scraper adjustments when a check is running