我,我正在尝试仅从 json 结构中删除每个产品的图像网址,仅 jpg 扩展名以及“alt”中可用的名称,例如(下面也提到)“attributes”>“media_map”>(“b”,“c”, “d”,e 可用)>“src”,然后“medium”,“lg”,“xl”,“xxl”
"a218": {
"label": "Shape",
"field_type": "button_select",
"value_order": [
"v766",
"v767"
],
"values": {
"v766": {
"label": "Round",
"value": "S6CBRO",
"price": 35
},
"v767": {
"label": "Rectangle",
"value": "S6CBRE",
"price": 35,
"hypotheticalPrice": 24.5
}
}
}
},
"inventory": {
"stock": 0,
"sold": 0,
"total": 0
},
"optional": {},
"media_map": {
"b": {
"src": {
"xs": "https://ctl.s6img.com/society6/img/xVx1vleu7iLcR79ZkRZKqQiSzZE/w_125/artwork/~artwork/s6-0041/a/18613683_5971445",
"lg": "https://ctl.s6img.com/society6/img/W-ESMqUtC_oOEUjx-1E_SyIdueI/w_550/artwork/~artwork/s6-0041/a/18613683_5971445",
"xl": "https://ctl.s6img.com/society6/img/z90VlaYwd8cxCqbrZ1ttAxINpaY/w_700/artwork/~artwork/s6-0041/a/18613683_5971445",
"xxl": null
},
"type": "image",
"alt": "I'M NOT ALWAYS A BITCH (Red) Cutting Board",
"meta": null
},
"c": {
"src": {
"xs": "https://ctl.s6img.com/society6/img/KQJbb4jG0gBHcqQiOCivLUbKMxI/w_125/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg",
"lg": "https://ctl.s6img.com/society6/img/ztGrxSpA7FC1LfzM3UldiQkEi7g/w_550/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg",
"xl": "https://ctl.s6img.com/society6/img/PHjp9jDic2NGUrpq8k0aaxsYZr4/w_700/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg",
"xxl": "https://ctl.s6img.com/society6/img/m-1HhSM5CIGl6DY9ukCVxSmVDIw/w_1500/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg"```
below is my code i,m able to access "media_map" but dnt know how to access jpg extension url
```contents = []
with open('urls.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
newlist = []
for url in contents:
try:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')[7].text.strip()[24:]
data = json.loads(scripts)
link = data['product']['response']['product']['data']['attributes']['media_map']```
every product have "b" , "c" , "d" or "b" , "c" , "d" , "e" , "f"
or some products have only "b" , "c"
i,m new in scraping but stuck over there
而不是
link = data['product']['response']['product']['data']['attributes']['media_map']
有
mediaMap = data['product']['response']['product']['data']['attributes']['media_map']
然后就可以从中提取你想要的了
mediaMap
如果你想要替代品:
mediaAlts = [m['alt'] for m in mediaMap.values() if 'alt' in m]
(如果你只想要第一个,就得到
mediaAlts[0]
)
或者如果您只想要图像替代品:
imgAlts = [
m['alt'] for m in mediaMap.values() if 'alt' in m
and 'type' in m and m['type'] == 'image'
]
如果您想要 media_map 中的 first 对象中的所有 src 链接:
m1srcs = list(list(mediaMap.values())[0]['src'].values())
要过滤为仅 jpg:
m1srcs = [s for s in m1srcs if type(s) == str and s.endswith('.jpg')]
编辑:
对于所有带有 alts 的 jpg 图像:
altJpgs = [
src for srcs in [[
s for s in mv['src'].values()
if type(s) == str and s.endswith('.jpg')
] for mv in mediaMap.values()
if type(mv) == dict and 'src' in mv
and 'alt' in mv # has alt
and 'type' in mv and mv['type'] == 'image' # has type listed as image
] for src in srcs
]
或者在这种情况下,for 循环可能比列表理解更具可读性:
altJpgs = []
for mv in mediaMap.values():
if type(mv) != dict or 'src' not in mv: continue
if 'alt' not in mv: continue
if 'type' not in mv and mv['type'] != 'image': continue
for s in mv['src'].values():
if type(s) == str and s.endswith('.jpg'):
altJpgs.append(s)
(编辑或删除任何
if...
行以调整过滤器)