新手求教matlab的 sequentialfs函数用法

答案:1 悬赏:0 手机版

解决时间 2021-11-12 13:23

提问者网友：动次大次蹦擦擦
2021-11-11 19:10

最佳答案

五星知识达人网友：怙棘
2021-11-11 19:16

sequentialfs Sequential feature selection.

INMODEL = sequentialfs(FUN,X,Y) selects a subset of features from X
that best predict the data in Y, by sequentially selecting features
until there is no improvement in prediction. X is a data matrix whose
rows correspond to points (or observations) and whose columns
correspond to features (or predictor variables). Y is a column vector
of response values or class labels for each observations in X. X and Y
must have the same number of rows. FUN is a function handle, created
using @, that defines the criterion that sequentialfs uses to select
features and to determine when to stop. sequentialfs returns INMODEL, a
logical vector indicating which features are finally chosen.

Starting from an empty feature set, sequentialfs creates candidate
feature subsets by adding in each of the features not yet selected. For
each candidate feature subset, sequentialfs performs 10-fold
cross-validation by repeatedly calling FUN with different training and
test subsets of X and Y, as follows:

CRITERION = FUN(XTRAIN,YTRAIN,XTEST,YTEST)

XTRAIN and YTRAIN contain the same subset of rows of X and Y, while
XTEST and YTEST contain the complementary subset of rows. XTRAIN and
XTEST contain the data taken from the columns of X that correspond to
the current candidate feature set.

Each time it is called, FUN must return a scalar value CRITERION.
Typically, FUN uses XTRAIN and YTRAIN to train or fit a model, then
predicts values for XTEST using that model, and finally returns some
measure of distance or loss of those predicted values from YTEST. In
the cross-validation calculation for a given candidate feature set,
sequentialfs sums the values returned by FUN across all test sets, and
divides that sum by the total number of test observations. It then uses
that mean value to evaluate each candidate feature subset. Two commonly
used loss measures for FUN are the sum of squared errors for regression
models (sequentialfs computes the mean squared error in this case), and
the number of misclassified observations for classification models
(sequentialfs computes the misclassification rate in this case).

Note: sequentialfs divides the sum of the values returned by FUN across
all test sets by the total number of test observations, therefore FUN
should not divide its output value by the number of test observations.

Given the mean CRITERION values for each candidate feature subset,
sequentialfs chooses the one that minimizes the mean CRITERION value.
This process continues until adding more features does not decrease the
criterion.

INMODEL = sequentialfs(FUN,X,Y,Z,...) allows any number of input
variables X, Y, Z, ... . sequentialfs chooses features (columns) only
from X, but otherwise imposes no interpretation on X, Y, Z, ... .
All data inputs, whether column vectors or matrices, must have the same
number of rows. sequentialfs calls FUN with training and test subsets
of X, Y, Z, ..., as follows:

CRITERION = FUN(XTRAIN,YTRAIN,ZTRAIN,...,XTEST,YTEST,ZTEST,...)

sequentialfs creates XTRAIN, YTRAIN, ZTRAIN, ... and XTEST, YTEST,
ZTEST, ... by selecting subsets of the rows of X, Y, Z, ... . FUN must
return a scalar value CRITERION, but may compute that value in any way.
Elements of the logical vector INMODEL correspond to columns of X, and
indicate which features are finally chosen.

[INMODEL,HISTORY] = sequentialfs(FUN,X,...) returns information on
which feature is chosen in each step. HISTORY is a scalar structure
with the following fields:

Crit A vector containing the criterion values computed at each
step.
In A logical matrix in which row I indicates which features
are included at step I.

[...] = sequentialfs(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies
one or more of the following name/value pairs:

'CV' The validation method used to compute the criterion for
each candidate feature subset. Choices are:
a positive integer K - Use K-fold cross-validation (without
stratification). K should be greater
than one.
a CVPARTITION object - Perform cross-validation specified
by the CVPARTITION object.
'resubstitution' - Use resubstitution, i.e., the
original data are passed
to FUN as both the training and test
data to compute the criterion.
'none' - Call FUN as CRITERION =
FUN(X,Y,Z,...), without separating
test and training sets.
The default value of 'CV' is 10, i.e., 10-fold
cross-validation (without stratification).

So-called "wrapper" methods use a function FUN that
implements a learning algorithm. These methods usually
apply cross-validation to select features. So-called
"filter" methods use a function that measures the
characteristics (such as correlation) of the data to select
features.

'MCReps' A positive integer indicating the number of Monte-Carlo
repetitions for cross-validation. The default value is 1.
'MCReps' must be 1 if 'CV' is 'none' or 'resubstitution'.

'Direction' The direction in which to perform the sequential search.
The default is 'forward'. If 'Direction' is 'backward',
sequentialfs begins with a feature set including all
features and removes features sequentially until the
criterion increases.

'KeepIn' A logical vector, or a vector of column numbers, specifying a
set of features which must be included. The default is
empty.

'KeepOut' A logical vector, or a vector of column numbers, specifying a
set of features which must be excluded. The default is
empty.

'NFeatures' The number of features at which sequentialfs should stop.
INMODEL includes exactly this many features. The default
value is empty, indicating that sequentialfs should stop
when a local minimum of the criterion is found. A
non-empty value for 'NFeatures' overrides 'MaxIter' and
'TolFun' in 'Options'.

'NullModel' A logical value, indicating whether or not the null model
(containing no features from X) should be included in the
feature selection procedure and in the HISTORY output. The
default is FALSE.

'Options' Options structure for the iterative sequential search
algorithm, as created by STATSET. sequentialfs uses the
following fields:

'Display' Level of display output. Choices are 'off' (the
default), 'final', and 'iter'.
'MaxIter' Maximum number of steps allowed. The default is
Inf.
'TolFun' Positive number giving the termination tolerance
for the criterion. The default is 1e-6 if
'Direction' is 'forward', or 0 if 'Direction' is
'backward'.
'TolTypeFun' 'abs', to use 'TolFun' as an absolute tolerance, or
'rel', to use it as a relative tolerance. The
default is 'rel'.
'UseParallel'
'UseSubStreams'
'Streams' These fields specify whether to perform cross-
validation computations in parallel, and how to use
random numbers during cross-validation.
For information on these fields see PARALLELSTATS.
NOTE: If supplied, 'Streams' must be of length one.

Examples:
% Perform sequential feature selection for CLASSIFY on iris data with
% noisy features and see which non-noise features are important
load('fisheriris');
X = randn(150,10);
X(:,[1 3 5 7 ])= meas;
y = species;
opt = statset('display','iter');
% Generating a stratified partition is usually preferred to
% evaluate classification algorithms.
cvp = cvpartition(y,'k',10);
[fs,history] = sequentialfs(@classf,X,y,'cv',cvp,'options',opt);

where CLASSF is a MATLAB function such as:
function err = classf(xtrain,ytrain,xtest,ytest)
yfit = classify(xtest,xtrain,ytrain,'quadratic');
err = sum(~strcmp(ytest,yfit));

我要举报

如以上问答信息为低俗、色情、不良、暴力、侵权、涉及违法等信息，可以点下面链接进行举报！