我正在尝试了解如何使用

问题描述 投票:0回答:2
如何使用

Scikit

使用监督的机器学习,以便当我引入新的setA和setB数据时,它可以尝试识别哪个新数据属于setA或setB.
数据集的apologies很小且“构成”。我只想在其他数据集上使用Scikit应用相同的方法。

supsuped学习可用于将实例(数据行)分类为几个类别(或者在您的情况下仅2集)。您在上面示例中缺少的是一个变量,该变量说集合1行属于。
import numpy as np # numpy will help us to concatenate the columns into a 2-dimensional array # so instead of hiving 3 separate arrays, we have 1 array with 3 columns and 18 rows Variable1A = [ 3,4,4,5,4,5,5,6,7,7,5,4,5,6,4,9,3,4] Variable2A = [ 5,4,4,3,4,5,4,5,4,3,4,5,3,4,3,4,4,3] Variable3A = [ 7,8,4,5,6,7,3,3,3,4,4,9,7,6,8,6,7,8] #our target variable for A target_variable_A=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] Variable1B = [ 7,8,11,12,7,9,8,7,8,11,15,9,7,6,9,9,7,11] Variable2B = [ 1,2,3,3,4,2,4,1,0,1,2,1,3,4,3,1,2,3] Variable3B = [ 12,18,14,15,16,17,13,13,13,14,14,19,17,16,18,16,17,18] # target variable for B target_variable_B=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] #lets create a dataset C with only 4 rows that we need to predict if belongs to "1" which is data set A or "0" which is dataset B Variable1C = [ 7,4,4,12] Variable2C = [ 1,4,4,3] Variable3C = [ 12,8,4,15] #make the objects 2-dimenionsal arrays (so 1 array with X rows and 3 columns-variables) Dataset_A=np.column_stack((Variable1A,Variable2A,Variable3A)) Dataset_B=np.column_stack((Variable1B,Variable2B,Variable3B)) Dataset_C=np.column_stack((Variable1C,Variable2C,Variable3C)) print(" dataset A rows ", Dataset_A.shape[0]," dataset A columns ", Dataset_A.shape[1] ) print(" dataset B rows ", Dataset_B.shape[0]," dataset B columns ", Dataset_B.shape[1] ) print(" dataset C rows ", Dataset_C.shape[0]," dataset C columns ", Dataset_C.shape[1] ) ##########Prints ########## #(' dataset A rows ', 18L, ' dataset A columns ', 3L) #(' dataset B rows ', 18L, ' dataset B columns ', 3L) #(' dataset C rows ', 4L, ' dataset C columns ', 3L) # since now we have an identification that tells us if it belongs to A or B (e.g. 1 or 0) we can append the new sets together Dataset_AB=np.concatenate((Dataset_A,Dataset_B),axis=0) # this creates a set with 36 rows and 3 columns target_variable_AB=np.concatenate((target_variable_A,target_variable_B),axis=0) print(" dataset AB rows ", Dataset_AB.shape[0]," dataset Ab columns ", Dataset_AB.shape[1] ) print(" target Variable rows ", target_variable_AB.shape[0]) ##########Prints ########## #(' dataset AB rows ', 36L, ' dataset Ab columns ', 3L) #(' target Variable rows ', 36L) #now we will select the most common supervised scikit model - Logistic Regression from sklearn.linear_model import LogisticRegression model=LogisticRegression() # we create an instance of the model model.fit(Dataset_AB,target_variable_AB) # the model learns to distinguish between A and B (1 or 0) #now we make predictions for the new dataset C predictions_for_C=model.predict(Dataset_C) print(predictions_for_C) # this will print #[0 1 1 0] # so first case belongs to set A , second to B, third to B and fourth to A

您的问题非常广泛,因此这只是一个简短的概述。 您希望将两个集合放在一个列表/数组中,而不是以这种方式格式化数据,而另一列则代表将每一行属于。 这样的东西:

data = [ [3, 5, 7, 0] [4, 4, 8, 0], # these rows have 0 as the last element to represent group A ... [7, 1, 12, 1], [8, 2, 18, 1], # these have 1 as the last element to represent group A ... ]

python machine-learning supervised-learning
2个回答
2
投票
data

中,然后称为

X
,然后仅包含一个单独的数组
y

1
投票
[0, 0, 0, ..., 1, 1, 1, ...]

(指示每一行的组成员资格)。 您要避免的是将有关一个点的信息存储在变量的

名中;相反,您希望将存储在变量的
值中的“设置A或集B”信息(因为这里存储在
data

y
的最后一列中的值中),
无论您做什么,几乎可以肯定,您都需要使用numpy阵列或
pandas
数据结构来保存您的数据,而不是列表。
有许多有关如何使用Scikit-Learn的教程和示例,以及示例数据集可能比您组成的示例更有用。  “监督机器学习”是一个广义术语,它结合了确定数据点所在组的许多不同方法,因此您必须访问并尝试不同的分类算法。  所有这些信息都可以通过搜索和/或浏览Scikit文档来找到。
    
the the the the to训练的数据是指模型标记为训练的标签,这是每个用于培训的样本的结果。 在您提供的问题中,基本上有2个集:设置A和集B,因此您必须使用诸如Logistic回归模型之类的二进制分类器。 基于1或0的标签元素作为1或0 vice,基于它们属于的集合,也就是说,如果元素E属于元素E属于A设置为A标记为1 else 1。 然后从python中的scikitlearn导入逻辑回归分类器。
next的事情是合并两个集合之类的集合,然后是集合B或VICE,反之亦然,并以相同的顺序合并了您已经提供的标签。

您可以使用pandas或numpy堆叠这些设置并准备标记的数据集。
现在您有一个标签良好的数据集。

您现在可以使用数据集(包含a和set b元素)和标签集来调用logistic回归分类器的拟合函数。

在使用您要使用的数据调用预测函数后,您将获得0或1的预测类。

如果您想要的集合可以使用字典将键映射为1和0,则具有“ set a”和“ set b”的值。 这样您就可以从中获得套装。

import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression as lr #set A firstA=[3,4,4,5,4,5,5,6,7,7,5,4,5,6,4,9,3,4] secondA=[5,4,4,3,4,5,4,5,4,3,4,5,3,4,3,4,4,3] thirdA=[7,8,4,5,6,7,3,3,3,4,4,9,7,6,8,6,7,8] #set B firstB=[7,8,11,12,7,9,8,7,8,11,15,9,7,6,9,9,7,11] secondB=[1,2,3,3,4,2,4,1,0,1,2,1,3,4,3,1,2,3] thirdB=[12,18,14,15,16,17,13,13,13,14,14,19,17,16,18,16,17,18] #stacking up and building the dataset Aset=[firstA,secondA,thirdA] Bset=[firstB,secondB,thirdB] totalset=[Aset,Bset] data=pd.DataFrame(columns["0","1","2","3","4","5","6", "7","8","9","10","11","12","13","14","15","16","17"]) c=0 for i in range(0,2): for j in range(0,3): data.loc[c]=totalset[i][j] c=c+1 label=np.array([0,0,0,1,1,1]) df2=pd.DataFrame(columns=["0","1","2","3","4","5"]) df2=label #Training and testing the model model=lr() model.fit(df,df2) k=model.predict([[17,18,14,15,16,17,13, 13,13,41,14,19,17,16,18,16,17,28]]) #mapping(chosen set A element's with label 0 and set B with 1) dic={0:"set A",1:"set B"} print(dic[int(k)])

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.