在我的项目中,我分析给定考试的问题。假设每次考试有 10 个问题。
对于每个问题,我使用类
QuestionData
的构造函数方法(在文件 question_data.py
中定义)计算一些内容并保存它。每个 QuestionData
对象都有一个 pandas
数据框、一些字典、一些浮点属性和一个 numpy
数组。
接下来,考试分析是使用类
ExamData
完成的 - 它也有一些简单的属性、一些字典和所有 QuestionData
对象的列表。
最终,我需要做的是以 JSON 形式返回
ExamData
对象,以便可以将其作为响应发送回来。
我正在使用 conda 和 python 3.12.4。我认为从序列化单个
QuestionData
对象开始是明智之举。尝试使用__dict__
技巧解释here,但失败了
AttributeError: 'weakref.ReferenceType' object has no attribute '__dict__'. Did you mean: '__dir__'?
然后我尝试使用 conda install orjson
安装
orjson,但由于 SSL 它拒绝工作:
>conda install orjson
Collecting package metadata (current_repodata.json): failed
CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.
Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
上面是我让它从
openssl
更新3.0.14-h827c3e9_0 --> 3.0.15-h827c3e9_0
之后,这是安装的要求。
orjson
的某些内容?我在各种编程语言、OOP 和 JSON 方面拥有丰富的经验,但我对 Python 很陌生,所以请谨慎行事。
代码:
question_data.py
:
import pandas as pd
import numpy as np
import scipy.stats as sps
import string
class QuestionData:
def __init__(self, data, item: str):
options_list = ...
#df for answer analysis
self._options_data = pd.DataFrame(index = options_list)
#percent chosen column
self._options_data["pct"] = ...
#mean ability for chosen answer
self._options_data["theta_mean"] = ...
#ability sd for chosen answer
self._options_data["theta_sd"] = ...
#corr of chosen answer with ability
self._options_data["theta_corr"] = ...
#item delta
self._delta = ...
#biserial of key with theta
self._key_biserial = ...
#initial IRT params. To be done later
self._IRT_params = {"a": 1, "b": 0, "c": 0}
self._IRT_info = {"theta_MI": 0, "info_theta_MI": 0}
#response times vector
self._response_time = data._response_times[str(item)].to_numpy()
exam_data.py
:
from question_data import QuestionData
from datetime import datetime
from dateutil import relativedelta
class ExamData:
_quantile_list = [5, 25, 50, 75, 95]
_date_format = '%d/%m/%Y'
def __init__(self, data):
fromDate = datetime.strptime(data._details["fromDate"], self._date_format)
toDate = datetime.strptime(data._details["toDate"], self._date_format)
delta = relativedelta.relativedelta(toDate, fromDate)
self._report_duration ={"years": delta.years, "months": delta.months, "days": delta.days}
self._exposure_num = ...
self._total_times = data._response_times.sum(axis = 1)
self._time_quantiles = dict(zip(self._quantile_list,
[self._total_times.quantile(q/100) for q in self._quantile_list]))
self._q_list = ...
self._q_data = dict(zip(self._q_list,
[QuestionData(data, q) for q in self._q_list]))
我想要得到的例子-
问题数据:
{
"_options_data": {"pct": {...}, "theta_mean": {...}, ...}, //<pandas df serialization>
"_delta": 10,
"_IRT_info": {"theta_MI": 0, "info_theta_MI": 0},
"_response_time": [25.5, 41.6, 30.9, ...],
...
}
考试数据:
{
"_report_duration": {"years": 0, "months": 0, "days": 17},
"_exposure_num": 150,
"_time_quantiles": {"5": 117.89, "25": 167.15, "50": 224.1, ...},
"_total_times": {"id1": 120.3, "id2": 149.9, ...}, //<pandas series serialization>
"_q_data": {"Q1": <QuestionData Object>, "Q2": <QuestionData Object>, ...},
...
}
最终最简单的解决方案是编写我自己的序列化器,只是对这篇文章的简单扩展。
import json
import numpy as np
import pandas as pd
from question_data import QuestionData
from exam_data import ExamData
# JSON serializer class so we can easily handle numpy+pandas objects
class CustomTypeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.generic):
return obj.item()
elif ((isinstance(obj, np.ndarray)) or (isinstance(obj, pd.Series))):
return obj.tolist()
elif isinstance(obj, pd.DataFrame):
return obj.T.to_dict()
elif ((isinstance(obj, QuestionData)) or (isinstance(obj, ExamData))):
return obj.__dict__
elif hasattr(obj, 'to_json'):
return obj.to_json(orient='records')
return json.JSONEncoder.default(self, obj)
然后,在需要时,按如下方式使用:
import json
from question_data import QuestionData
from exam_data import ExamData
data = ...
ed = ExamData(data)
q1d = ed._q_data["q1"] #QuestionData object
json_str1 = json.dumps(ed, cls=CustomTypeEncoder) #this works perfectly
json_str2 = json.dumps(q1d, cls=CustomTypeEncoder) #this too