永发信息网

新手求教matlab的 sequentialfs函数用法

答案:1  悬赏:0  手机版
解决时间 2021-11-12 13:23
新手求教matlab的 sequentialfs函数用法
最佳答案
sequentialfs Sequential feature selection.

INMODEL = sequentialfs(FUN,X,Y) selects a subset of features from X
that best predict the data in Y, by sequentially selecting features
until there is no improvement in prediction. X is a data matrix whose
rows correspond to points (or observations) and whose columns
correspond to features (or predictor variables). Y is a column vector
of response values or class labels for each observations in X. X and Y
must have the same number of rows. FUN is a function handle, created
using @, that defines the criterion that sequentialfs uses to select
features and to determine when to stop. sequentialfs returns INMODEL, a
logical vector indicating which features are finally chosen.

Starting from an empty feature set, sequentialfs creates candidate
feature subsets by adding in each of the features not yet selected. For
each candidate feature subset, sequentialfs performs 10-fold
cross-validation by repeatedly calling FUN with different training and
test subsets of X and Y, as follows:

CRITERION = FUN(XTRAIN,YTRAIN,XTEST,YTEST)

XTRAIN and YTRAIN contain the same subset of rows of X and Y, while
XTEST and YTEST contain the complementary subset of rows. XTRAIN and
XTEST contain the data taken from the columns of X that correspond to
the current candidate feature set.

Each time it is called, FUN must return a scalar value CRITERION.
Typically, FUN uses XTRAIN and YTRAIN to train or fit a model, then
predicts values for XTEST using that model, and finally returns some
measure of distance or loss of those predicted values from YTEST. In
the cross-validation calculation for a given candidate feature set,
sequentialfs sums the values returned by FUN across all test sets, and
divides that sum by the total number of test observations. It then uses
that mean value to evaluate each candidate feature subset. Two commonly
used loss measures for FUN are the sum of squared errors for regression
models (sequentialfs computes the mean squared error in this case), and
the number of misclassified observations for classification models
(sequentialfs computes the misclassification rate in this case).

Note: sequentialfs divides the sum of the values returned by FUN across
all test sets by the total number of test observations, therefore FUN
should not divide its output value by the number of test observations.

Given the mean CRITERION values for each candidate feature subset,
sequentialfs chooses the one that minimizes the mean CRITERION value.
This process continues until adding more features does not decrease the
criterion.

INMODEL = sequentialfs(FUN,X,Y,Z,...) allows any number of input
variables X, Y, Z, ... . sequentialfs chooses features (columns) only
from X, but otherwise imposes no interpretation on X, Y, Z, ... .
All data inputs, whether column vectors or matrices, must have the same
number of rows. sequentialfs calls FUN with training and test subsets
of X, Y, Z, ..., as follows:

CRITERION = FUN(XTRAIN,YTRAIN,ZTRAIN,...,XTEST,YTEST,ZTEST,...)

sequentialfs creates XTRAIN, YTRAIN, ZTRAIN, ... and XTEST, YTEST,
ZTEST, ... by selecting subsets of the rows of X, Y, Z, ... . FUN must
return a scalar value CRITERION, but may compute that value in any way.
Elements of the logical vector INMODEL correspond to columns of X, and
indicate which features are finally chosen.

[INMODEL,HISTORY] = sequentialfs(FUN,X,...) returns information on
which feature is chosen in each step. HISTORY is a scalar structure
with the following fields:

Crit A vector containing the criterion values computed at each
step.
In A logical matrix in which row I indicates which features
are included at step I.

[...] = sequentialfs(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies
one or more of the following name/value pairs:

'CV' The validation method used to compute the criterion for
each candidate feature subset. Choices are:
a positive integer K - Use K-fold cross-validation (without
stratification). K should be greater
than one.
a CVPARTITION object - Perform cross-validation specified
by the CVPARTITION object.
'resubstitution' - Use resubstitution, i.e., the
original data are passed
to FUN as both the training and test
data to compute the criterion.
'none' - Call FUN as CRITERION =
FUN(X,Y,Z,...), without separating
test and training sets.
The default value of 'CV' is 10, i.e., 10-fold
cross-validation (without stratification).

So-called "wrapper" methods use a function FUN that
implements a learning algorithm. These methods usually
apply cross-validation to select features. So-called
"filter" methods use a function that measures the
characteristics (such as correlation) of the data to select
features.

'MCReps' A positive integer indicating the number of Monte-Carlo
repetitions for cross-validation. The default value is 1.
'MCReps' must be 1 if 'CV' is 'none' or 'resubstitution'.

'Direction' The direction in which to perform the sequential search.
The default is 'forward'. If 'Direction' is 'backward',
sequentialfs begins with a feature set including all
features and removes features sequentially until the
criterion increases.

'KeepIn' A logical vector, or a vector of column numbers, specifying a
set of features which must be included. The default is
empty.

'KeepOut' A logical vector, or a vector of column numbers, specifying a
set of features which must be excluded. The default is
empty.

'NFeatures' The number of features at which sequentialfs should stop.
INMODEL includes exactly this many features. The default
value is empty, indicating that sequentialfs should stop
when a local minimum of the criterion is found. A
non-empty value for 'NFeatures' overrides 'MaxIter' and
'TolFun' in 'Options'.

'NullModel' A logical value, indicating whether or not the null model
(containing no features from X) should be included in the
feature selection procedure and in the HISTORY output. The
default is FALSE.

'Options' Options structure for the iterative sequential search
algorithm, as created by STATSET. sequentialfs uses the
following fields:

'Display' Level of display output. Choices are 'off' (the
default), 'final', and 'iter'.
'MaxIter' Maximum number of steps allowed. The default is
Inf.
'TolFun' Positive number giving the termination tolerance
for the criterion. The default is 1e-6 if
'Direction' is 'forward', or 0 if 'Direction' is
'backward'.
'TolTypeFun' 'abs', to use 'TolFun' as an absolute tolerance, or
'rel', to use it as a relative tolerance. The
default is 'rel'.
'UseParallel'
'UseSubStreams'
'Streams' These fields specify whether to perform cross-
validation computations in parallel, and how to use
random numbers during cross-validation.
For information on these fields see PARALLELSTATS.
NOTE: If supplied, 'Streams' must be of length one.

Examples:
% Perform sequential feature selection for CLASSIFY on iris data with
% noisy features and see which non-noise features are important
load('fisheriris');
X = randn(150,10);
X(:,[1 3 5 7 ])= meas;
y = species;
opt = statset('display','iter');
% Generating a stratified partition is usually preferred to
% evaluate classification algorithms.
cvp = cvpartition(y,'k',10);
[fs,history] = sequentialfs(@classf,X,y,'cv',cvp,'options',opt);

where CLASSF is a MATLAB function such as:
function err = classf(xtrain,ytrain,xtest,ytest)
yfit = classify(xtest,xtrain,ytrain,'quadratic');
err = sum(~strcmp(ytest,yfit));
我要举报
如以上问答信息为低俗、色情、不良、暴力、侵权、涉及违法等信息,可以点下面链接进行举报!
大家都在看
从深圳北坐地铁到广州火车站
2014年上海闵行区教师招聘考试考什么内容呀?
Excel电子表格中如何将考试成绩80分以上的替
周杰伦有一首歌唱的 哦 哦 你的口红啥子的
已知多项式x³-3xy²-4的常数项是a,
长1米5-宽1米,高60公分,求立方
2012款大众cc变速箱是几档
驾校报名显示,该学员已完成预录入。报不进去
欢乐喜剧人第二季宋晓峰亲吻沈春阳是真是吗
火车k4293车次5号车厢10b坐是靠窗的吗
潜污泵是否离心泵
有什么可以在网上买到的练踢腿的工具吗?除了
男士适合用oppor9plus金色还是玫色
电影 喜剧之王 屎我是一坨屎 是什么歌
前年买的k2的车,出了一次事故,现在想把他卖
推荐资讯
泥土里挖出的是什么?
韩国跑男总喝的那个苦茶叫什么?
后面有一个泷的是哪个歌手
拉格朗日力学中,为什么广义坐标和广义速度互
108路公交车路线从横街最晚几点
cousin这个单词怎么读
詹姆斯的扮演者凯姆·吉甘戴原本是被导演请来
请问从沈阳到南票黄甲有没有直通的客车,或者
宁乡花猪肉加盟多少钱
k1042次列车在杭州哪个人站停
和男朋友吵架后他强行跟我发生了性关系
cannot.see.Bobby.the.cakes.table.on.any连
正方形一边上任一点到这个正方形两条对角线的
阴历怎么看 ?