我正在尝试在Methyl450k数据集上实现XGBoost。该数据具有大约480000+个特定的CpG站点,后续的β值介于0和1之间。下面是数据(带有响应的10列样本):
cg13869341 cg14008030 cg12045430 cg20826792 cg00381604 cg20253340 cg21870274 cg03130891 cg24335620 cg16162899 response
1 0.8612869 0.6958909 0.07918330 0.10816711 0.03484078 0.4875475 0.7475878 0.11051578 0.7120003 0.8453396 0
2 0.8337106 0.6276754 0.09811698 0.08934333 0.03348864 0.6300766 0.7753453 0.08652890 0.6465146 0.8137132 0
3 0.8516102 0.6575332 0.13310207 0.07990076 0.04195286 0.4325115 0.7257208 0.14334007 0.7384455 0.8054013 0
4 0.8970384 0.6955810 0.08134887 0.08950676 0.03578006 0.4711689 0.7214661 0.08299838 0.7718571 0.8151683 0
5 0.8562323 0.7204416 0.08078766 0.14902533 0.04274820 0.4769631 0.8034706 0.16473891 0.7143823 0.8475410 0
6 0.8613325 0.6527599 0.10158672 0.15459204 0.04839691 0.4805285 0.8004808 0.12598627 0.8218743 0.8222552 0
7 0.9168869 0.5963966 0.11457045 0.13245761 0.03720798 0.5067649 0.6806004 0.13601034 0.7063457 0.8509160 0
8 0.9002366 0.6898320 0.07029171 0.07158694 0.03875135 0.7065322 0.8167016 0.15394095 0.7226098 0.8310477 0
9 0.8876504 0.6172154 0.13511072 0.15276686 0.06149520 0.5642073 0.7177438 0.14752285 0.6846876 0.8360360 0
10 0.8992898 0.6361644 0.15423780 0.19111275 0.05700406 0.4941239 0.7819968 0.10109936 0.6680640 0.8504023 0
11 0.8997905 0.5906462 0.10411472 0.15006796 0.04157008 0.4931531 0.7857664 0.13430963 0.6946644 0.8326747 0
12 0.9009607 0.6721858 0.09081460 0.11057752 0.05824153 0.4683763 0.7655608 0.01755990 0.7113345 0.8346149 0
13 0.9036750 0.6313643 0.07477824 0.12089404 0.04738597 0.5502747 0.7520128 0.16332395 0.7036665 0.8564414 0
14 0.8420276 0.6265071 0.15351674 0.13647090 0.04901864 0.5037902 0.7446693 0.10534171 0.7727812 0.8317943 0
15 0.8995276 0.6515500 0.09214429 0.08973162 0.04231420 0.5071999 0.7484940 0.21822470 0.6859165 0.7775508 0
16 0.9071643 0.7945852 0.15809474 0.11264440 0.04793316 0.5256078 0.8425513 0.17150603 0.7581367 0.8271037 0
17 0.8691358 0.6206902 0.11868549 0.15944891 0.03523320 0.4581166 0.8058461 0.11557264 0.6960848 0.8579109 1
18 0.8330247 0.7030860 0.12832663 0.12936172 0.03534059 0.4687507 0.7630222 0.12176819 0.7179690 0.8775521 1
19 0.9015574 0.6592869 0.12693119 0.14671845 0.03819418 0.4395692 0.7420882 0.10293369 0.7047038 0.8435531 1
20 0.8568249 0.6762936 0.18220218 0.10123198 0.04963466 0.5781550 0.6324743 0.06676272 0.6805745 0.8291353 1
21 0.8799152 0.6736554 0.15056617 0.16070673 0.04944037 0.4015415 0.4587438 0.10392791 0.7467060 0.7396137 1
22 0.8730770 0.6663321 0.10802390 0.14481460 0.04448009 0.5177664 0.6682854 0.16747621 0.7161234 0.8309462 1
23 0.9359656 0.7401368 0.16730300 0.11842173 0.03388908 0.4906018 0.5730439 0.15970761 0.7904663 0.8136450 1
24 0.9320397 0.6978085 0.10474803 0.10607080 0.03268366 0.5362214 0.7832729 0.15564091 0.7171350 0.8511477 1
25 0.8444256 0.7516799 0.16767449 0.12025258 0.04426417 0.5040725 0.6950104 0.16010829 0.7026808 0.8800469 1
26 0.8692707 0.7016945 0.10123979 0.09430876 0.04037325 0.4877716 0.7053603 0.09539885 0.8316933 0.8165352 1
27 0.8738410 0.6230674 0.12793232 0.14837137 0.04878595 0.4335648 0.6547601 0.13714725 0.6944921 0.8788708 1
28 0.9041870 0.6201079 0.12490195 0.16227251 0.04812720 0.4845896 0.6619842 0.13093443 0.7415606 0.8479339 1
29 0.8618622 0.7060291 0.09453812 0.14068246 0.04799782 0.5474036 0.6088231 0.23338428 0.6772588 0.7795908 1
30 0.8776350 0.7132561 0.12100425 0.17367148 0.04399987 0.5661632 0.6905305 0.12971867 0.6788903 0.8198201 1
31 0.9134456 0.7249370 0.07144695 0.08759897 0.04864476 0.6682650 0.7445900 0.16374150 0.7322691 0.8071598 1
32 0.8706637 0.6743936 0.15291891 0.11422262 0.04284591 0.5268217 0.7207478 0.14296945 0.7574967 0.8609048 1
33 0.8821504 0.6845216 0.12004074 0.14009196 0.05527732 0.5677475 0.6379840 0.14122421 0.7090634 0.8386022 1
34 0.9061180 0.5989445 0.09160787 0.14325261 0.05142950 0.5399465 0.6718870 0.08454002 0.6709083 0.8264233 1
35 0.8453511 0.6759766 0.13345672 0.16310764 0.05107034 0.4666146 0.7343603 0.12733287 0.7062292 0.8471812 1
36 0.9004188 0.6114532 0.11837118 0.14667433 0.05050403 0.4975502 0.7258132 0.14894363 0.7195090 0.8382364 1
37 0.9051729 0.6652954 0.15153241 0.14571184 0.05026702 0.4855397 0.7226850 0.12179138 0.7430388 0.8342340 1
38 0.9112012 0.6314450 0.12681305 0.16328649 0.04076789 0.5382251 0.7404122 0.13971506 0.6607798 0.8657917 1
39 0.8407927 0.7148585 0.12792107 0.15447060 0.05287096 0.6798039 0.7182050 0.06549068 0.7433669 0.7948445 1
40 0.8554747 0.7356683 0.22698080 0.21692162 0.05365043 0.4496654 0.7353112 0.13341649 0.8032266 0.7883068 1
41 0.8535359 0.5729331 0.14392737 0.16612463 0.04651752 0.5228045 0.7397588 0.09967424 0.7906682 0.8384434 1
42 0.8059968 0.7148594 0.16774123 0.19006840 0.04990847 0.5929818 0.7011064 0.17921090 0.8121909 0.8481069 1
43 0.8856906 0.6987405 0.19262137 0.18327412 0.04816967 0.4340002 0.6569263 0.13724290 0.7600389 0.7788117 1
44 0.8888717 0.6760166 0.17025712 0.21906969 0.04812641 0.4173613 0.7927178 0.17458413 0.6806101 0.8297604 1
45 0.8691575 0.6682723 0.11932277 0.13669098 0.04014911 0.4680455 0.6186511 0.10002737 0.8012731 0.7177891 1
46 0.9148742 0.7797494 0.13313955 0.15166151 0.03934042 0.4818276 0.7484973 0.16354624 0.6979735 0.8164431 1
47 0.9226736 0.7211714 0.08036409 0.10395457 0.04063595 0.4014187 0.8026643 0.17762644 0.7194800 0.8156545 1
我试图在R中实现该算法,但我仍然会遇到错误。
尝试:
> train <- beta_values1_updated[training1, ]
> test <- beta_values1_updated[-training1, ]
> labels <- train$response
> ts_label <- test$response
> new_tr <- model.matrix(~.+0,data = train[,-c("response"),with=F])
Error in `[.data.frame`(train, , -c("response"), with = F) :
unused argument (with = F)
> new_ts <- model.matrix(~.+0,data = test[,-c("response"),with=F])
Error in `[.data.frame`(test, , -c("response"), with = F) :
unused argument (with = F)
我试图按照这里的教程:
任何关于如何正确实现XGBoost算法的见解都将非常感激。
编辑:
我正在添加其他代码以显示教程中我遇到的问题:
train<-data.table(train)
test<-data.table(test)
new_tr <- model.matrix(~.+0,data = train[,-c("response"),with=F])
new_ts <- model.matrix(~.+0,data = test[,-c("response"),with=F])
#convert factor to numeric
labels <- as.numeric(labels)-1
ts_label <- as.numeric(ts_label)-1
#preparing matrix
dtrain <- xgb.DMatrix(data = new_tr,label = labels)
#preparing matrix
dtrain <- xgb.DMatrix(data = new_tr,label = labels)
dtest <- xgb.DMatrix(data = new_ts,label=ts_label)
params <- list(booster = "gbtree", objective = "binary:logistic", eta=0.3, gamma=0, max_depth=6, min_child_weight=1, subsample=1, colsample_bytree=1)
xgbcv <- xgb.cv( params = params, data = dtrain, nrounds = 100, nfold = 5, showsd = T, stratified = T, print.every.n = 10, early.stop.round = 20, maximize = F)
[1] train-error:0.000000+0.000000 test-error:0.000000+0.000000
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 20 rounds.
[11] train-error:0.000000+0.000000 test-error:0.000000+0.000000
[21] train-error:0.000000+0.000000 test-error:0.000000+0.000000
Stopping. Best iteration:
[1] train-error:0.000000+0.000000 test-error:0.000000+0.000000
Warning messages:
1: 'print.every.n' is deprecated.
Use 'print_every_n' instead.
See help("Deprecated") and help("xgboost-deprecated").
2: 'early.stop.round' is deprecated.
Use 'early_stopping_rounds' instead.
See help("Deprecated") and help("xgboost-deprecated").
本教程的作者使用的是data.table
包。 As you can read here,使用with = F
有时用来获得一个列。确保已加载并安装了data.table
和其他软件包以遵循本教程。另外,请确保您的数据集是data.table对象。