Помогите разобраться с кроссвалидацией
Есть код:
X_col = train.drop(columns=['user_id', 'ts', 'date']).columns.tolist()
y_col = 'user_id'
X_num = train[X_col].loc[:, train[X_col].dtypes == 'float'].columns.tolist()
Если я делаю так:
preprocessor = make_column_transformer(
(StandardScaler(), X_num),
)
pipe_et = Pipeline(
[
('preprocessor', preprocessor),
("model_et", ExtraTreesClassifier(random_state=STATE, max_depth=18, n_estimators=450)),
]
)
scor = cross_val_score(pipe_et, train[X_col], train[y_col], cv=3, scoring='f1_weighted')
print(scor)
scor.mean()
то получаю:
[0.11863361 0.13799628 0.11946796]
0.1253659515939661
А если так:
scal = StandardScaler()
train_scal = train[X_col].copy()
train_scal[X_num] = scal.fit_transform(train[X_num])
et = ExtraTreesClassifier(random_state=STATE, max_depth=18, n_estimators=450)
scor = cross_val_score(et, train_scal, train[y_col], cv=3, scoring='f1_weighted')
print(scor)
scor.mean()
получаю:
[0.08459432 0.11228319 0.07813365]
0.09167038497134483
Почему так?
Вообщем ещё попробовал таким образом на дефолте:
preprocessor = make_column_transformer(
(StandardScaler(), X_num),
)
pipe_et = Pipeline(
[
('preprocessor', preprocessor),
("model_et", ExtraTreesClassifier(random_state=STATE)),
]
)
scor = cross_val_score(pipe_et, train[X_col], train[y_col], cv=3,
scoring='f1_weighted')
print(scor)
scor.mean()
[0.1137086 0.12649808 0.11097906]
0.11706191336989713
а при
preprocessor = make_column_transformer(
(StandardScaler(), X_num),
)
pipe_et = Pipeline(
[
#('preprocessor', preprocessor),
("model_et", ExtraTreesClassifier(random_state=STATE)),
]
)
scor = cross_val_score(pipe_et, train[X_col], train[y_col], cv=3,
scoring='f1_weighted')
print(scor)
scor.mean()
[0.08419376 0.10285775 0.07655608]
0.08786919734511862
Во втором случае:
И так:
scal = StandardScaler()
train_scal = train[X_col].copy()
train_scal[X_num] = scal.fit_transform(train[X_num])
et = ExtraTreesClassifier(random_state=STATE)
scor = cross_val_score(et, train_scal, train[y_col], cv=3, scoring='f1_weighted')
print(scor)
scor.mean()
[0.08393461 0.10239762 0.07679076]
0.08770765993873268
И так:
et = ExtraTreesClassifier(random_state=STATE)
scor = cross_val_score(et, train[X_col], train[y_col], cv=3, scoring='f1_weighted')
print(scor)
scor.mean()
[0.08419376 0.10285775 0.07655608]
0.08786919734511862
Видимо я что-то не понимаю, но что я не понимаю, я не понимаю!)