我可以使用随机森林进行在线学习吗?我有几百万个数据点,分类器无法完成交叉验证步骤。
我可以按顺序把它分成几块吗?
当前代码:
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
print('Planting trees...')
clf = RandomForestClassifier(
n_estimators=50,
max_depth=None,
min_samples_split=1,
random_state=0
)
print('Growing trees...')
classifier = clf.fit(X_train, y_train)
# see how we did
print('Testing trees...')
scores = cross_val_score(classifier, X_test, y_test)
print(scores)
print('accuracy: %d' % (scores.mean()))
我可以将其更改为:
for chunk in df:
clf.fit(...)
cross_validate...