Scrapy 1.0 documentation¶

This documentation contains everything you need to know about Scrapy.

Getting help¶

Having trouble? We’d like to help!

Try the FAQ – it’s got answers to some common questions.
Looking for specific information? Try the Index or Module Index.
Search for information in the archives of the scrapy-users mailing list, or post a question.
Ask a question in the #scrapy IRC channel.
Report bugs with Scrapy in our issue tracker.

Command line tool: Learn about the command-line tool used to manage your Scrapy project.
Spiders: Write the rules to crawl your websites.
Selectors: Extract the data from web pages using XPath.
Scrapy shell: Test your extraction code in an interactive environment.
Items: Define the data you want to scrape.
Item Loaders: Populate your items with the extracted data.
Item Pipeline: Post-process and store your scraped data.
Feed exports: Output your scraped data using different formats and storages.
Requests and Responses: Understand the classes used to represent HTTP requests and responses.
Link Extractors: Convenient classes to extract links to follow from pages.
Settings: Learn how to configure Scrapy and see all available settings.
Exceptions: See all available exceptions and their meaning.

Frequently Asked Questions: Get answers to most frequently asked questions.
Debugging Spiders: Learn how to debug common problems of your scrapy spider.
Spiders Contracts: Learn how to use contracts for testing your spiders.
Common Practices: Get familiar with some Scrapy common practices.
Broad Crawls: Tune Scrapy for crawling a lot domains in parallel.
Using Firefox for scraping: Learn how to scrape with Firefox and some useful add-ons.
Using Firebug for scraping: Learn how to scrape efficiently using Firebug.
Debugging memory leaks: Learn how to find and get rid of memory leaks in your crawler.
Downloading and processing files and images: Download files and/or images associated with your scraped items.
Ubuntu packages: Install latest Scrapy packages easily on Ubuntu
Deploying Spiders: Deploying your Scrapy spiders and run them in a remote server.
AutoThrottle extension: Adjust crawl rate dynamically based on load.
Benchmarking: Check how Scrapy performs on your hardware.
Jobs: pausing and resuming crawls: Learn how to pause and resume crawls for large spiders.

Architecture overview: Understand the Scrapy architecture.
Downloader Middleware: Customize how pages get requested and downloaded.
Spider Middleware: Customize the input and output of your spiders.
Extensions: Extend Scrapy with your custom functionality
Core API: Use it on extensions and middlewares to extend Scrapy functionality
Signals: See all available signals and how to work with them.
Item Exporters: Quickly export your scraped items to a file (XML, CSV, etc).

Release notes: See what has changed in recent Scrapy versions.
Contributing to Scrapy: Learn how to contribute to the Scrapy project.
Versioning and API Stability: Understand Scrapy versioning and API stability.