我的数据框中有两列,即
diff
和 diff2
.
diff
的实例:
{'paths': {'modified': {'/v1/authorization/details/byDate': {'operations': {'modified': {'POST': {'requestBody': {'added': True}}}}}}}, 'endpoints': {'modified': {'{ method: POST, path: /v1/authorization/details/byDate }': {'requestBody': {'added': True}}}}}
{'info': {'version': {'from': '1.0.2', 'to': '1.0.3'}}, 'paths': {'modified': {'/equipment-status': {'operations': {'modified': {'GET': {'parameters': {'modified': {'query': {'pei': {'schema': {'pattern': {'from': '^(imei-[0-9]{15}|imeisv-[0-9]{16}|.+)$', 'to': '^(imei-[0-9]{15}|imeisv-[0-9]{16}|mac([0-9a-fA-F]{2})((-[0-9a-fA-F]{2}){5})|.+)$'}}}}}}}}}}}}, 'endpoints': {'modified': {'{ method: GET, path: /equipment-status }': {'parameters': {'modified': {'query': {'pei': {'schema': {'pattern': {'from': '^(imei-[0-9]{15}|imeisv-[0-9]{16}|.+)$', 'to': '^(imei-[0-9]{15}|imeisv-[0-9]{16}|mac([0-9a-fA-F]{2})((-[0-9a-fA-F]{2}){5})|.+)$'}}}}}}}}}, 'externalDocs': {'description': {'from': '3GPP TS 29.511 V15.4.0; 5G System; Equipment Identity Register Services; Stage 3', 'to': '3GPP TS 29.511 V16.0.0; 5G System; Equipment Identity Register Services; Stage 3'}}}
diff2
的实例:
Backward compatibility errors (1):
error at specs/389643.json, in API POST /v1/authorization/details/byDate added required request body [added-required-request-body].
Backward compatibility errors (1):
warning at specs/419378.json, in API GET /equipment-status changed the pattern for the 'query' request parameter 'pei' from '^(imei-[0-9]{15}|imeisv-[0-9]{16}|.+)$' to '^(imei-[0-9]{15}|imeisv-[0-9]{16}|mac([0-9a-fA-F]{2})((-[0-9a-fA-F]{2}){5})|.+)$' [request-parameter-pattern-changed]. This is a warning because it is difficult to automatically analyze if the new pattern is a superset of the previous pattern(e.g. changed from '[0-9]+' to '[0-9]*')
我想检查
diff2
中的关键字(总是从 API 开始)是否与 diff
中存在的任何关键字匹配,并基于此为它们分配标签。如果所有关键字都匹配并且没有不匹配的单词集,我想将更改分配为Breaking
并且如果有匹配的单词(来自diff2
),并且也不匹配(所有剩余的来自diff
),我希望标签是Both
如果
diff2
是Nan
那么变化是Non-Breaking
所以对于第一个例子,变化是
Breaking
,第二个是Both
.
预期的输出是这样的:
diff diff_2 Change
{'paths': {'modified': {'/v1/authorization/details/byDate' ./ API POST /v1/authorization/details/byDate Breaking
任何关于如何做到这一点的建议或想法将不胜感激。
我不完全确定你想做什么,因为你的例子不能完全重现,这是我的,在哪里:
import pandas as pd
df = pd.DataFrame(
{
"diff": [
{
"paths": {
"modified": {
"/v1/authorization/details/byDate": {
"operations": {
"modified": {"POST": {"requestBody": {"added": True}}}
}
}
}
},
},
]
* 3
+ [pd.NA, 2, "aaa"],
"diff2": [
"Backward compatibility errors (1): error at specs/389643.json, in API POST /v1/authorization/details/byDate added",
"Backward compatibility errors (1): error at specs/390643.json, in API GET /v1/authorization/details/byDate added",
"Backward compatibility errors (1): error at specs/391643.json, in API PUSH /v2/authorization/details/byDate removed",
"",
"",
"",
],
}
)
首先,定义一个 递归 辅助函数以从嵌套字典中获取所有键:
def get_keys_from_dict(d, keys=None):
keys = keys if keys else []
if not isinstance(d, dict):
return None
for k, v in d.items():
keys.append(k)
if isinstance(v, dict):
get_keys_from_dict(v, keys)
if isinstance(v, list):
for i in v:
get_keys_from_dict(i, keys)
return keys
使用str.split定义另一个辅助函数以获取字符串中“API”一词之后的所有关键字:
def get_keywords_from_string(string):
return (
[item for item in string.split("API")[1].split(" ") if item] if string else []
)
另一个比较两个关键字列表与 Python 内置函数all 和any:
def compare(keywords, other_keywords):
if not keywords or not other_keywords:
return ""
results = [item in keywords for item in other_keywords]
if all(results):
return "Breaking"
if any(results):
return "Both"
return "Non-Breaking"
最后,使用数据框组合和应用这些功能:
df["Change"] = df.apply(
lambda x: compare(
get_keys_from_dict(x["diff"], []),
get_keywords_from_string(x["diff2"]),
),
axis=1,
)
然后:
print(df)
# Output
diff ... Change
0 {'paths': {'modified': {'/v1/authorization/det... ... Breaking
1 {'paths': {'modified': {'/v1/authorization/det... ... Both
2 {'paths': {'modified': {'/v1/authorization/det... ... Non-Breaking
3 <NA> ...
4 2 ...
5 aaa ...