When creating a scrapy project with scrapy startproject myproject
, you'll find a pipelines.py
file already available for creating your own pipelines. It isn't mandatory to create your pipelines in this file, but it would be good practice. We'll be explaining how to create a pipeline using the pipelines.py
file:
pipelines.py
class MyPipeline(object):
def process_item(self, item, spider):
# process your `item` here
return item
Now to enable it you need to specify it is going to be used in your settings. Go to your settings.py
file and search (or add) the ITEM_PIPELINES
variable. Update it with the path to your pipeline class and its priority over other pipelines:
settings.py
ITEM_PIPELINES = {
'myproject.pipelines.MyPipeline': 300,
}
Now every item that your spider returns, will go through this pipeline.