Python下载图像文件夹

问题描述 投票:0回答:1

我有Python和Scrapy的问题,我认为脚本仍在工作并将所有数据放在MongoDB上,但当他刮他仍然只在数据库中拍摄照片但我想在此结构中下载/项目/照片/链接-page / name.jpg

你有我的代码!这是Itmes.py

 import scrapy
from PIL import Image
class RedditItem(scrapy.Item):
    '''
    Defining the storage containers for the data we
    plan to scrape
    '''

    title = scrapy.Field()
    photoLink = scrapy.Field()

这是来自setting.py

ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/ProjectX/reddit/reddit/photos/'

这里我有scrapper.py

    from scrapy.http import Request
    from scrapy.selector import HtmlXPathSelector
    from scrapy.contrib.spiders import CrawlSpider
    from scrapy.http import HtmlResponse
    from scrapy.selector import Selector
    from datetime import datetime as dt
    import scrapy
    from reddit.items import RedditItem
    from PIL import Image
def parse_following_urls(self, response):
        item = RedditItem()
        item['title'] = response.css('h1.kiwii-font-xlarge::text').extract_first()
        item['photoLink'] = response.css("div.kiwii-carousel-picture span::attr(src)").extract()
python-2.7 scrapy
1个回答
0
投票

如果要存储图像,例如:{IMAGES_STORE}/link-page/name.jpg,则需要扩展默认的ImagesPipeline类并覆盖方法file_path

例如:

from scrapy.pipelines.images import ImagesPipeline

class MyImagesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        # Code to generate {link-page/name.jpg} value

然后将其作为管道添加到您的设置文件中,而不是默认的ImagePipeline:

ITEM_PIPELINES = {'your_project.pipelines.ImagesPipeline': 1}
© www.soinside.com 2019 - 2024. All rights reserved.