在Python中动态循环json数据

Question

我在服务器中有一个两个json文件。第一个json文件是json格式的数据帧，它有21列。

第二个json对象是要应用于第一个json（数据文件）的不同过滤器的集合，我想在应用每个过滤器后动态计算量减少列。

两个jsons都在服务器中。这个样本如下，

[{
        "criteria_no.": 1,
        "expression": "!=",
        "attributes": "Industry_name",
        "value": "Clasentrix"

    },{ 
        "criteria_no.": 2,
        "expression": "=",
        "attributes": "currency",
        "value": ["EUR","GBP","INR"]


    },{
        "criteria_no.": 3,
        "expression": ">",
        "attributes": "Industry_Rating",
        "value": "A3"

    },{
        "criteria_no.": 4,
        "expression": "<",
        "attributes": "Due_date",
        "value": "01/01/2025"

    }
    ]

在python中编码时，如下所示，

import urllib2, json
url = urllib2.urlopen('http://.../server/criteria_sample.json')
obj = json.load(url)
print obj

[{u'attributes': u'Industry_name', u'expression': u'!=', u'value': u'Clasentrix', u'criteria_no.': 1}, {u'attributes': u'currency', u'expression': u'=', u'value': [u'EUR', u'GBP', u'INR'], u'criteria_no.': 2}, {u'attributes': u'Industry_Rating', u'expression': u'>', u'value': u'A3', u'criteria_no.': 3}, {u'attributes': u'Due_date', u'expression': u'<', u'value': u'01/01/2025', u'criteria_no.': 4}]

现在，在示例json中，我们可以看到"attributes"，它们只是第一个数据文件中的列。我提到它有21列，"Industry_name"，"currency"，"Industry_Rating"，"Due_date"是其中的四个。 "Loan_amount"是数据文件中存在的另一列以及所有列。

现在，由于此标准列表仅是一个样本，我们有n个这样的标准或过滤器。我希望这个过滤器动态应用于数据文件，我想计算贷款金额的减少。让我们考虑第一个过滤器，它说"Industry_name"列不应该有"Clasentrix"。所以从数据文件中我想过滤"Industry_name"，它不会有'Clasentrix'条目。现在让我们说11个观测值，我们从数据文件中的61个观测值中得到了'Clasentrix'。然后我们将获得整个贷款金额（61行）的总和，然后从贷款总额中减去包括'Clasentrix'在内的11行贷款金额的总和。在应用第一个过滤器后，该数字将被视为减少。

现在，对于n个标准中的每一个，我想在python中动态地计算减少量。因此，在循环内部，过滤器json文件将创建考虑属性，表达式和值的过滤器。就像第一个过滤器一样，它是"Industry_name != 'Clasentrix'"。这应该反映在json对象的每组行中，就像第二个标准（过滤器）一样，它应该是"currency=['EUR','GBP','INR']"等等。我还想相应地计算减少量。

我正在努力为上面提到的练习创建python代码。我的帖子太长了，为此道歉。但请提供协助，我如何动态计算每个n标准的减少量。

提前致谢！！

更新第一个数据文件，找到一些样本行;

[{
        "industry_id.": 1234,
        "loan_id": 1113456,
        "Industry_name": "Clasentrix",
        "currency": "EUR",
        "Industry_Rating": "Ba3",
        "Due_date": "20/02/2020",
        "loan_amount": 563332790,
        "currency_rate": 0.67,
        "country": "USA"


    },{ 
        "industry_id.": 6543,
        "loan_id": 1125678,
        "Industry_name": "Wolver",
        "currency": "GBP",
        "Industry_Rating": "Aa3",
        "Due_date": "23/05/2020",
        "loan_amount": 33459087,
        "currency_rate": 0.8,
        "country": "UK"


    },{
        "industry_id.": 1469,
        "loan_id": "8876548",
        "Industry_name": "GroupOn",
        "currency": "EUR",
        "Industry_Rating": "Aa1",
        "Due_date": "16/09/2021",
        "loan_amount": 66543278,
        "currency_rate": 0.67,
        "country": "UK"
    },{
        "industry_id.": 1657,
        "loan_id": "6654321",
        "Industry_name": "Clasentrix",
        "currency": "EUR",
        "Industry_Rating": "Ba3",
        "Due_date": "15/07/2020",
        "loan_amount": 5439908765,
        "currency_rate": 0.53,
        "country": "USA"

    }
    ]

Answer 1

您可以使用Pandas将json数据转换为数据帧，并将条件转换为query字符串。需要进行一些处理以将条件json转换为有效查询。在下面的代码中，日期仍被视为字符串 - 您可能需要显式设置日期查询以将字符串首先转换为日期。

import pandas as pd
import json
# ...
criteria = json.load(url)
df = pd.DataFrame(json.load(data_url)) # data_url is the handle of the data file
print("Loan total without filters is {}".format(df["loan_amount"].sum()))

for c in criteria:
    if c["expression"] == "=":
        c["expression"] = "=="

    # If the value is a string we need to surround it in quotation marks
    # Note this can break if any values contain "
    if isinstance(c["value"], basestring):
        query = '{attributes} {expression} "{value}"'.format(**c)
    else:
        query = '{attributes} {expression} {value}'.format(**c)
    loan_total = df.query(query)["loan_amount"].sum()
    print "With criterion {}, {}, loan total is {}".format(c["criteria_no."], query, loan_total)

或者，您可以将每个标准转换为索引向量，如下所示：

def criterion_filter(s, expression, value):
    if type(value) is list:
        if expression == "=":
            return s.isin(value)
        elif expression == "!=":
            return ~s.isin(value)
    else:
        if expression == "=":
            return s == value
        elif expression == "!=":
            return s != value
        elif expression == "<":
            return s < value
        elif expression == ">":
            return s > value        

for c in criteria:
    filt = criterion_filter(df[c["attributes"]], c["expression"], c["value"])
    loan_total = df[filt]["loan_amount"].sum()
    print "With criterion {}, loan total is {}".format(c["criteria_no."],  loan_total)

编辑：要计算贷款总额的累计减少量，您可以使用＆运算符组合索引向量。

loans = [df["loan_amount"].sum()]
print("Loan total without filters is {}".format(loans[0]))
filt = True
for c in criteria:
    filt &= criterion_filter(df[c["attributes"]], c["expression"], c["value"])
    loans.append(df[filt]["loan_amount"].sum())
    print "Adding criterion {} reduces the total by {}".format(c["criteria_no."],
        loans[-2] - loans[-1])
    print "The cumulative reduction is {}".format(loans[0] - loans[-1])

在Python中动态循环json数据

问题描述投票：1回答：1

1个回答

最新问题

在Python中动态循环json数据

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1