所以我的任务的一部分,我在申请线性和套索回归,这里的问题7。
基于从第6题的得分,什么伽马值对应的欠拟合(并且有最坏的测试集精度)的模型?什么伽玛值对应于被过度拟合(并且有最坏的测试集精度)的模型?什么选择伽马的将是对这个数据集(在训练和测试集精度高),具有良好的推广性能的模型是最好的选择?
提示:尝试绘制从第6题的得分可视化γ和精度之间的关系。记得在提交之前注释掉进口matplotlib线。
(欠拟合,过度拟合,Good_Generalization)请注意,只有一个正确的解决方案:此功能应该按以下顺序返回一个元组的配合度值。
我真的需要帮助,我真的不能相信任何办法解决这个最后一个问题。我应该用什么代码来确定(欠拟合,过度拟合,Good_Generalization),为什么???
谢谢,
数据集:http://archive.ics.uci.edu/ml/datasets/Mushroom?ref=datanews.io
下面是从问题6我的代码:
from sklearn.svm import SVC
from sklearn.model_selection import validation_curve
def answer_six():
# SVC requires kernel='rbf', C=1, random_state=0 as instructed
# C: Penalty parameter C of the error term
# random_state: The seed of the pseudo random number generator
# used when shuffling the data for probability estimates
# e radial basis function kernel, or RBF kernel, is a popular
# kernel function used in various kernelized learning algorithms,
# In particular, it is commonly used in support vector machine
# classification
model = SVC(kernel='rbf', C=1, random_state=0)
# Return numpy array numbers spaced evenly on a log scale (start,
# stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)
gamma = np.logspace(-4,1,6)
# Create a Validation Curve for model and subsets.
# Create parameter name and range regarding gamma. Test Scoring
# requires accuracy.
# Validation curve requires X and y.
train_scores, test_scores = validation_curve(model, X_subset, y_subset, param_name='gamma', param_range=gamma, scoring ='accuracy')
# Determine mean for scores and tests along columns (axis=1)
sc = (train_scores.mean(axis=1), test_scores.mean(axis=1))
return sc
answer_six()
好了,让自己熟悉的过度拟合。你应该产生这样的:Article on this topic
在左边你欠拟合,右侧过学习......如果这两个错误是低你有很好的概括。
而这些东西是伽玛(在regularizor)的函数