Title: | Robust Functional Linear Regression |
---|---|
Description: | Functions for implementing robust methods for functional linear regression. In the functional linear regression, we consider scalar-on-function linear regression and function-on-function linear regression. More details, see Beyaztas, U., and Shang, H. L. (2021) <arXiv:2111.01238> and Beyaztas, U., and Shang, H. L. (2022) <arXiv:2203.05065>. |
Authors: | Ufuk Beyaztas [aut, cre, cph]
|
Maintainer: | Ufuk Beyaztas <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2025-03-01 05:22:24 UTC |
Source: | https://github.com/cran/robflreg |
This package presents robust methods for analyzing functional linear regression.
Ufuk Beyaztas and Han Lin Shang
Maintainer: Ufuk Beyaztas <[email protected]>
B. Akturk, U. Beyaztas, H. L. Shang, A. Mandal (2024) Robust functional logistic regression, Advances in Data Analysis and Classification, in press.
U. Beyaztas, H. L. Shang and A. Mandal (2024) Robust function-on-function interaction regression, Statistical Modelling: An International Journal, in press.
U. Beyaztas, M. Tez and H. L. Shang (2024) Robust scalar-on-function partial quantile regression, Journal of Applied Statistics, in press.
U. Beyaztas and H. L. Shang (2023) Robust functional linear regression models, The R Journal, 15(1), 212-233.
M. Mutis, U. Beyaztas, G. G. Simsek and H. L. Shang (2023) A robust scalar-on-function logistic regression for classification, Communications in Statistics - Theory and Methods, 52(23), 8538-8554.
S. Saricam, U. Beyaztas, B. Asikgil and H. L. Shang (2022) On partial least-squares estimation in scalar-on-function regression models, Journal of Chemometrics, 36(12), e3452.
U. Beyaztas and H. L. Shang (2022) A comparison of parameter estimation in function-on-function regression, Communications in Statistics - Simulation and Computation, 51(8), 4607-4637.
U. Beyaztas and H. L. Shang (2022) A robust functional partial least squares for scalar-on-multiple-function regression, Journal of Chemometrics, 36(4), e3394.
U. Beyaztas, H. L. Shang and A. Alin (2022) Function-on-function partial quantile regression, Journal of Agricultural, Biological, and Environmental Statistics, 27(1), 149-174.
U. Beyaztas and H. L. Shang (2021) A partial least squares approach for function-on-function interaction regression, Computational Statistics, 36(2), 911-939.
U. Beyaztas and H. L. Shang (2021) A robust partial least squares approach for function-on-function regression, Brazilian Journal of Probability and Statistics, 36(2), 199-219.
U. Beyaztas and H. L. Shang (2021) Function-on-function linear quantile regression, Mathematical Modelling and Analysis, 27(2), 322-341.
U. Beyaztas and H. L. Shang (2020) On function-on-function regression: partial least squares approach, Environmental and Ecological Statistics, 27(1), 95-114.
U. Beyaztas and H. L. Shang (2019) Forecasting functional time series using weighted likelihood methodology, 89(16), 3046-3060.
This function provides a unified simulation structure for the function-on-function regression model
where denotes the functional response,
denotes the
-th functional predictor,
denotes the
-th bivariate regression coefficient function, and
is the error function.
generate.ff.data(n.pred, n.curve, n.gp, out.p = 0)
generate.ff.data(n.pred, n.curve, n.gp, out.p = 0)
n.pred |
An integer, denoting the number of functional predictors to be generated. |
n.curve |
An integer, specifying the number of observations for each functional variable to be generated. |
n.gp |
An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1]. |
out.p |
An integer between 0 and 1, denoting the outlier percentage in the generated data. |
In the data generation process, first, the functional predictors are simulated based on the following process:
where is a vector generated from a Normal distribution with mean one and variance
,
is a uniformly generated random number between 1 and 4, and
The bivariate regression coefficient functions are generated from a coefficient space that includes ten different functions such as:
and
where is generated from a uniform distribution between 1 and 3. The error function
, on the other hand, is generated from the Ornstein-Uhlenbeck process:
where are constants,
is the initial value of
taken from
, and
is the Wiener process. If outliers are allowed in the generated data, i.e.,
, then, the randomly selected
of the data are generated in a different way from the aforementioned process. In more detail, if
, the bivariate regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with
(instead of
), where
is generated from a uniform distribution between 1 and 2, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:
where is a vector generated from a Normal distribution with mean one and variance
and
All the functions are generated equally spaced point in the interval .
A list object with the following components:
Y |
An |
X |
A list with length n.pred. The elements are the |
f.coef |
A list with length n.pred. Each element is a matrix and contains the generated bivariate regression coefficient function. |
out.indx |
A vector with length |
Ufuk Beyaztas and Han Lin Shang
E. Garcia-Portugues and J. Alvarez-Liebana J and G. Alvarez-Perez G and W. Gonzalez-Manteiga W (2021) "A goodness-of-fit test for the functional linear model with functional response", Scandinavian Journal of Statistics, 48(2), 502-528.
library(fda) library(fda.usc) set.seed(2022) sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) Y <- sim.data$Y X <- sim.data$X coeffs <- sim.data$f.coef out.indx <- sim.data$out.indx fY <- fdata(Y, argvals = seq(0, 1, length.out = 101)) plot(fY[-out.indx,], lty = 1, ylab = "", xlab = "Grid point", main = "Response", mgp = c(2, 0.5, 0), ylim = range(fY)) lines(fY[out.indx,], lty = 1, col = "black") # Outlying functions
library(fda) library(fda.usc) set.seed(2022) sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) Y <- sim.data$Y X <- sim.data$X coeffs <- sim.data$f.coef out.indx <- sim.data$out.indx fY <- fdata(Y, argvals = seq(0, 1, length.out = 101)) plot(fY[-out.indx,], lty = 1, ylab = "", xlab = "Grid point", main = "Response", mgp = c(2, 0.5, 0), ylim = range(fY)) lines(fY[out.indx,], lty = 1, col = "black") # Outlying functions
This function is used to simulate data for the scalar-on-function regression model
where denotes the scalar response,
denotes the
-th functional predictor,
denotes the
-th regression coefficient function, and
is the error process.
generate.sf.data(n, n.pred, n.gp, out.p = 0)
generate.sf.data(n, n.pred, n.gp, out.p = 0)
n |
An integer, specifying the number of observations for each variable to be generated. |
n.pred |
An integer, denoting the number of functional predictors to be generated. |
n.gp |
An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1]. |
out.p |
An integer between 0 and 1, denoting the outlier percentage in the generated data. |
In the data generation process, first, the functional predictors are simulated based on the following process:
where is a vector generated from a Normal distribution with mean one and variance
,
is a uniformly generated random number between 1 and 4, and
The regression coefficient functions are generated from a coefficient space that includes ten different functions such as:
and
where is generated from a uniform distribution between 1 and 3. The error process is generated from the standard normal distribution. If outliers are allowed in the generated data, i.e.,
, then, the randomply selected
of the data are generated in a different way from the aforementioned process. In more detail, if
, the regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with
(instead of
), where
is generated from a uniform distribution between 3 and 5, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:
where is a vector generated from a Normal distribution with mean one and variance
and
Moreover, the error process is generated from a normal distribution with mean 1 and variance 1. All the functional predictors are generated equally spaced point in the interval .
A list object with the following components:
Y |
An |
X |
A list with length n.pred. The elements are the |
f.coef |
A list with length n.pred. Each element is a vector and contains the generated regression coefficient function. |
out.indx |
A vector with length |
Ufuk Beyaztas and Han Lin Shang
library(fda.usc) library(fda) set.seed(2022) sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1) Y <- sim.data$Y X <- sim.data$X coeffs <- sim.data$f.coef out.indx <- sim.data$out.indx plot(Y[-out.indx,], type = "p", pch = 16, xlab = "Index", ylab = "", main = "Response", ylim = range(Y)) points(out.indx, Y[out.indx,], type = "p", pch = 16, col = "blue") # Outliers fX1 <- fdata(X[[1]], argvals = seq(0, 1, length.out = 101)) plot(fX1[-out.indx,], lty = 1, ylab = "", xlab = "Grid point", main = expression(X[1](s)), mgp = c(2, 0.5, 0), ylim = range(fX1)) lines(fX1[out.indx,], lty = 1, col = "black") # Leverage points
library(fda.usc) library(fda) set.seed(2022) sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1) Y <- sim.data$Y X <- sim.data$X coeffs <- sim.data$f.coef out.indx <- sim.data$out.indx plot(Y[-out.indx,], type = "p", pch = 16, xlab = "Index", ylab = "", main = "Response", ylim = range(Y)) points(out.indx, Y[out.indx,], type = "p", pch = 16, col = "blue") # Outliers fX1 <- fdata(X[[1]], argvals = seq(0, 1, length.out = 101)) plot(fX1[-out.indx,], lty = 1, ylab = "", xlab = "Grid point", main = expression(X[1](s)), mgp = c(2, 0.5, 0), ylim = range(fX1)) lines(fX1[out.indx,], lty = 1, col = "black") # Leverage points
This function is used to obtain the estimated bivariate regression coefficient functions for function-on-function regression model (see the description in
rob.ff.reg
based on output object obtained from rob.ff.reg
).
get.ff.coeffs(object)
get.ff.coeffs(object)
object |
The output object of |
In the estimation of bivariate regression coefficient functions, the estimated functional principal components of
response and predictor
variables and the estimated regression parameter function obtained from the regression model between the principal component scores of response and predictor variables
are used, i.e.,
.
A list object with the following components:
vars |
A numeric vector specifying the indices of functional predictors used in the function-on-function regression model |
gpY |
A vector containing the grid points of the functional response |
gpX |
A list with length |
coefficients |
A list with length |
Ufuk Beyaztas and Han Lin Shang
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.ff.reg(Y, X, model = "full", emodel = "classical", gpY = gpY, gpX = gpX) coefs <- get.ff.coeffs(model.fit)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.ff.reg(Y, X, model = "full", emodel = "classical", gpY = gpY, gpX = gpX) coefs <- get.ff.coeffs(model.fit)
This function is used to obtain the estimated regression coefficient functions and the estimated regression coefficients
(if
) for scalar-on-function regression model (see the description in
rob.sf.reg
based on output object obtained from rob.sf.reg
).
get.sf.coeffs(object)
get.sf.coeffs(object)
object |
The output object of |
In the estimation of regression coefficient functions, the estimated functional principal components of predictor variables and the estimated regression parameter function obtained from the regression model of scalar response on the principal component scores of the functional predictor variables
are used, i.e.,
.
A list object with the following components:
gp |
A list with length |
coefficients |
A list with length |
scl.coefficients |
A vector consisting of the estimated coefficients of the scalar predictor |
Ufuk Beyaztas and Han Lin Shang
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.sf.reg(Y, X, emodel = "classical", gp = gp) coefs <- get.sf.coeffs(model.fit)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.sf.reg(Y, X, emodel = "classical", gp = gp) coefs <- get.sf.coeffs(model.fit)
This function is used to perform functional principal component analysis.
getPCA(data, nbasis, ncomp, gp, emodel = c("classical", "robust"))
getPCA(data, nbasis, ncomp, gp, emodel = c("classical", "robust"))
data |
An |
nbasis |
An integer specifying the number of B-spline basis expansion functions used to approximate the functional principal components. |
ncomp |
An integer specifying the number of functional principal components to be computed. |
gp |
A vector containing the grid points for the functional data for |
emodel |
Method to be used for functional principal component decomposition. Possibilities are "classical" and "robust". |
The functional principal decomposition of a functional data is given by
where is the mean function,
is the
-th weight function, and
is the corresponding principal component score which is given by
When computing the estimated functional principal components, first, the functional data is expressed by a set of B-spline basis expansion. Then, the functional principal components are equal to the principal components extracted from the matrix , where
is the matrix of basis expansion coefficients and
is the inner product matrix of the basis functions, i.e.,
. Finally, the
-th weight function is given by
, where
is the
-th eigenvector of the sample covariance matrix of
.
If emodel = "classical"
, then, the standard functional principal component decomposition is used as given by
Ramsay and Dalzell (1991).
If emodel = "robust"
, then, the robust principal component algorithm of Hubert, Rousseeuw and Verboven (2002) is used.
A list object with the following components:
PCAcoef |
A functional data object for the eigenfunctions. |
PCAscore |
A matrix of principal component scores. |
meanScore |
A functional data object for the mean function. |
bs_basis |
A functional data object for B-spline basis expansion. |
evalbase |
A matrix of the B-spline basis expansion functions. |
ncomp |
An integer denoting the computed number of functional principal components. If the input “ |
gp |
A vector containing the grid points for the functional data for |
emodel |
A character vector denoting the method used for functional principal component decomposition. |
Ufuk Beyaztas and Han Lin Shang
J. O. Ramsay and C. J. Dalzell (1991) "Some tools for functional data analysis (with discussion)", Journal of the Royal Statistical Society: Series B, 53(3), 539-572.
M. Hubert and P. J. Rousseeuw and S. Verboven (2002) "A fast robust method for principal components with applications to chemometrics", Chemometrics and Intelligent Laboratory Systems, 60(1-2), 101-111.
P. Filzmoser and H. Fritz and K Kalcher (2021) pcaPP: Robust PCA by Projection Pursuit, R package version 1.9-74, URL: https://cran.r-project.org/web/packages/pcaPP/index.html.
J. L. Bali and G. Boente and D. E. Tyler and J.-L. Wang (2011) "Robust functional principal components: A projection-pursuit approach", The Annals of Statistics, 39(6), 2852-2882.
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y gpY <- seq(0, 1, length.out = 101) # grid points rob.fpca <- getPCA(data = Y, nbasis = 20, ncomp = 4, gp = gpY, emodel = "robust")
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y gpY <- seq(0, 1, length.out = 101) # grid points rob.fpca <- getPCA(data = Y, nbasis = 20, ncomp = 4, gp = gpY, emodel = "robust")
This function is used to compute the functional principal component scores of a test sample based on outputs obtained from getPCA
.
getPCA.test(object, data)
getPCA.test(object, data)
object |
An output object of |
data |
An |
See getPCA
for details.
A matrix of principal component scores for the functional data.
Ufuk Beyaztas and Han Lin Shang
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y Y.train <- Y[1:100,] Y.test <- Y[101:200,] gpY = seq(0, 1, length.out = 101) # grid points rob.fpca <- getPCA(data = Y.train, nbasis = 20, ncomp = 4, gp = gpY, emodel = "robust") rob.fpca.test <- getPCA.test(object = rob.fpca, data = Y.test)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y Y.train <- Y[1:100,] Y.test <- Y[101:200,] gpY = seq(0, 1, length.out = 101) # grid points rob.fpca <- getPCA(data = Y.train, nbasis = 20, ncomp = 4, gp = gpY, emodel = "robust") rob.fpca.test <- getPCA.test(object = rob.fpca, data = Y.test)
Hourly river flow measurements obtained from January 2009 to December 2014 (6 years in total) in the Mery River, Australia.
data(MaryRiverFlow)
data(MaryRiverFlow)
Ufuk Beyaztas and Han Lin Shang
data(MaryRiverFlow) # Plot library(fda.usc) fflow <- fdata(MaryRiverFlow, argvals = 1:24) plot(fflow, lty = 1, ylab = "", xlab = "Hour", main = "", mgp = c(2, 0.5, 0), ylim = range(fflow))
data(MaryRiverFlow) # Plot library(fda.usc) fflow <- fdata(MaryRiverFlow, argvals = 1:24) plot(fflow, lty = 1, ylab = "", xlab = "Hour", main = "", mgp = c(2, 0.5, 0), ylim = range(fflow))
This function is used to obtain image plots of bivariate regression coefficient functions of a function-on-function regression model based on output object obtained from get.ff.coeffs
.
plot_ff_coeffs(object, b)
plot_ff_coeffs(object, b)
object |
The output object of |
b |
An integer value indicating which regression parameter function to be plotted. |
No return value, called for side effects.
Ufuk Beyaztas and Han Lin Shang
D. Nychka and R. Furrer and J. Paige and S. Sain (2021) fields: Tools for spatial data. R package version 14.1, URL: https://github.com/dnychka/fieldsRPackage.
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.ff.reg(Y, X, model = "full", emodel = "classical", gpY = gpY, gpX = gpX) coefs <- get.ff.coeffs(model.fit) plot_ff_coeffs(object = coefs, b = 1)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.ff.reg(Y, X, model = "full", emodel = "classical", gpY = gpY, gpX = gpX) coefs <- get.ff.coeffs(model.fit) plot_ff_coeffs(object = coefs, b = 1)
This function is used to obtain the plots of regression coefficient functions of a scalar-on-function regression model based on output object obtained from get.sf.coeffs
.
plot_sf_coeffs(object, b)
plot_sf_coeffs(object, b)
object |
The output object of |
b |
An integer value indicating which regression parameter function to be plotted. |
No return value, called for side effects.
Ufuk Beyaztas and Han Lin Shang
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.sf.reg(Y, X, emodel = "classical", gp = gp) coefs <- get.sf.coeffs(model.fit) plot_sf_coeffs(object = coefs, b = 1)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.fit <- rob.sf.reg(Y, X, emodel = "classical", gp = gp) coefs <- get.sf.coeffs(model.fit) plot_sf_coeffs(object = coefs, b = 1)
This function is used to make prediction for a new set of functional predictors based upon a fitted function-on-function regression model in the output of rob.ff.reg
.
predict_ff_regression(object, Xnew)
predict_ff_regression(object, Xnew)
object |
An output object obtained from |
Xnew |
A list of matrices consisting of the new observations of functional predictors. The argument |
An -dimensional matrix of predicted functions of the response variable for the given set of new functional predictors
Xnew
. Here, , the number of rows of the matrix of predicted values, equals to the number of rows of
Xnew
, and equals to the number of columns of
Y
, the input in the rob.ff.reg
.
Ufuk Beyaztas and Han Lin Shang
set.seed(2022) sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx Y <- sim.data$Y X <- sim.data$X indx.test <- sample(c(1:200)[-out.indx], 60) indx.train <- c(1:200)[-indx.test] Y.train <- Y[indx.train,] Y.test <- Y[indx.test,] X.train <- X.test <- list() for(i in 1:5){ X.train[[i]] <- X[[i]][indx.train,] X.test[[i]] <- X[[i]][indx.test,] } gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y.train, X = X.train, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX) pred.MM <- predict_ff_regression(object = model.MM, Xnew = X.test) round(mean((Y.test - pred.MM)^2), 4) # 0.5925 (MM method)
set.seed(2022) sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx Y <- sim.data$Y X <- sim.data$X indx.test <- sample(c(1:200)[-out.indx], 60) indx.train <- c(1:200)[-indx.test] Y.train <- Y[indx.train,] Y.test <- Y[indx.test,] X.train <- X.test <- list() for(i in 1:5){ X.train[[i]] <- X[[i]][indx.train,] X.test[[i]] <- X[[i]][indx.test,] } gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y.train, X = X.train, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX) pred.MM <- predict_ff_regression(object = model.MM, Xnew = X.test) round(mean((Y.test - pred.MM)^2), 4) # 0.5925 (MM method)
This function is used to make prediction for a new set of functional and scalar (if any) predictors based upon a fitted scalar-on-function regression model in the output of rob.sf.reg
.
predict_sf_regression(object, Xnew, Xnew.scl = NULL)
predict_sf_regression(object, Xnew, Xnew.scl = NULL)
object |
An output object obtained from |
Xnew |
A list of matrices consisting of the new observations of functional predictors. The argument |
Xnew.scl |
A matrix consisting of the new observations of scalar predictors. The argument |
An -dimensional matrix of predicted values of the scalar response variable for the given set of new functional and scalar (if any) predictors
Xnew
and Xnew.scl
, respectively. Here, , the number of rows of the matrix of predicted values, equals to the number of rows of
Xnew
and and Xnew.scl
(if any).
Ufuk Beyaztas and Han Lin Shang
set.seed(2022) sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx indx.test <- sample(c(1:400)[-out.indx], 120) indx.train <- c(1:400)[-indx.test] Y <- sim.data$Y X <- sim.data$X Y.train <- Y[indx.train,] Y.test <- Y[indx.test,] X.train <- X.test <- list() for(i in 1:5){ X.train[[i]] <- X[[i]][indx.train,] X.test[[i]] <- X[[i]][indx.test,] } gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.tau <- rob.sf.reg(Y.train, X.train, emodel = "robust", fmodel = "tau", gp = gp) pred.tau <- predict_sf_regression(object = model.tau, Xnew = X.test) round(mean((Y.test - pred.tau)^2), 4) # 1.868 (tau method)
set.seed(2022) sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx indx.test <- sample(c(1:400)[-out.indx], 120) indx.train <- c(1:400)[-indx.test] Y <- sim.data$Y X <- sim.data$X Y.train <- Y[indx.train,] Y.test <- Y[indx.test,] X.train <- X.test <- list() for(i in 1:5){ X.train[[i]] <- X[[i]][indx.train,] X.test[[i]] <- X[[i]][indx.test,] } gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.tau <- rob.sf.reg(Y.train, X.train, emodel = "robust", fmodel = "tau", gp = gp) pred.tau <- predict_sf_regression(object = model.tau, Xnew = X.test) round(mean((Y.test - pred.tau)^2), 4) # 1.868 (tau method)
This function is used to perform both classical and robust function-on-function regression model
where denotes the functional response,
denotes the
-th functional predictor,
denotes the
-th bivariate regression coefficient function, and
is the error function.
rob.ff.reg(Y, X, model = c("full", "selected"), emodel = c("classical", "robust"), fmodel = c("MCD", "MLTS", "MM", "S", "tau"), nbasisY = NULL, nbasisX = NULL, gpY = NULL, gpX = NULL, ncompY = NULL, ncompX = NULL)
rob.ff.reg(Y, X, model = c("full", "selected"), emodel = c("classical", "robust"), fmodel = c("MCD", "MLTS", "MM", "S", "tau"), nbasisY = NULL, nbasisX = NULL, gpY = NULL, gpX = NULL, ncompY = NULL, ncompX = NULL)
Y |
An |
X |
A list consisting of |
model |
Model to be fitted. Possibilities are "full" and "selected". |
emodel |
Method to be used for functional principal component decomposition. Possibilities are "classical"" and "robust". |
fmodel |
Fitting model used to estimate the function-on-function regression model. Possibilities are "MCD", "MLTS", "MM", "S", and "tau". |
nbasisY |
An integer value specifying the number of B-spline basis expansion functions to be used to approximate the functional principal components for the response variable |
nbasisX |
A vector with length |
gpY |
A vector containing the grid points of the functional response |
gpX |
A list with length |
ncompY |
An integer specifying the number of functional principal components to be computed for the functional response |
ncompX |
A vector with length |
When performing a function-on-function regression model based on the functional principal component analysis, first, both the functional response and functional predictors
are decomposed by the functional principal component analysis method:
where and
are the mean functions,
and
are the weight functions, and
and
are the principal component scores for the functional response and
-th functional predictor, respectively. Assume that the
-th bivariate regression coefficient function admits the expansion
where . Then, the following multiple regression model is obtained for the functional response:
If model = "full"
, then, all the functional predictor variables are used in the model.
If model = "selected"
, then, only the significant functional predictor variables determined by the forward variable selection procedure of Beyaztas and Shang (2021) are used in the model.
If emodel = "classical"
, then, the least-squares method is used to estimate the function-on-function regression model.
If emodel = "robust"
, then, the robust functional principal component analysis of Bali et al. (2011) along with the method specified in fmodel
is used to estimate the function-on-function regression model.
If fmodel = "MCD"
, then, the minimum covariance determinant estimator of Rousseeuw et al. (2004) is used to estimate the function-on-function regression model.
If fmodel = "MLTS"
, then, the multivariate least trimmed squares estimator Agullo et al. (2008) is used to estimate the function-on-function regression model.
If fmodel = "MM"
, then, the MM estimator of Kudraszow and Maronna (2011) is used to estimate the function-on-function regression model.
If fmodel = "S"
, then, the S estimator of Bilodeau and Duchesne (2000) is used to estimate the function-on-function regression model.
If fmodel = "tau"
, then, the tau estimator of Ben et al. (2006) is used to estimate the function-on-function regression model.
A list object with the following components:
data |
A list of matrices including the original functional response and functional predictors. |
fitted.values |
An |
residuals |
An |
fpca.results |
A list object containing the functional principal component analysis results of the functional predictor and functional predictors variables. |
model.details |
A list object containing model details, such as number of basis functions, number of principal components, and grid points used for each functional variable. |
Ufuk Beyaztas and Han Lin Shang
J. Agullo and C. Croux and S. V. Aelst (2008), "The multivariate least-trimmed squares estimator", Journal of Multivariate Analysis, 99(3), 311-338.
M. G. Ben and E. Martinez and V. J. Yohai (2006), "Robust estimation for the multivariate linear model based on a scale", Journal of Multivariate Analysis, 97(7), 1600-1622.
U. Beyaztas and H. L. Shang (2021), "A partial least squares approach for function-on-function interaction regression", Computational Statistics, 36(2), 911-939.
J. L. Bali and G. Boente and D. E. Tyler and J. -L.Wang (2011), "Robust functional principal components: A projection-pursuit approach", The Annals of Statistics, 39(6), 2852-2882.
M. Bilodeau and P. Duchesne (2000), "Robust estimation of the SUR model", The Canadian Journal of Statistics, 28(2), 277-288.
N. L. Kudraszow and R. A. Moronna (2011), "Estimates of MM type for the multivariate linear model", Journal of Multivariate Analysis, 102(9), 1280-1292.
P. J. Rousseeuw and K. V. Driessen and S. V. Aelst and J. Agullo (2004), "Robust multivariate regression", Technometrics, 46(3), 293-305.
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY <- seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y, X = X, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gpY <- seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y, X = X, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX)
This function is used to detect outliers in the functional response based on a fitted function-on-function regression model in the output of rob.ff.reg
.
rob.out.detect(object, alpha = 0.01, B = 200, fplot = FALSE)
rob.out.detect(object, alpha = 0.01, B = 200, fplot = FALSE)
object |
An output object obtained from |
alpha |
Percentile of the distribution of the functional depth. The default value is 0.01. |
B |
The number of bootstrap samples. The default value is 200. |
fplot |
If |
The functional depth-based outlier detection method of Febrero-Bande et al. (2008) together with the h-modal depth proposed by Cuaves et al. (2007) is applied to the estimated residual functions obtained from rob.ff.reg
to determine the outliers in the response variable. This method makes it possible to determine both magnitude and shape outliers in the response variable Hullait et al., (2021).
A vector containing the indices of outlying observations in the functional response.
Ufuk Beyaztas and Han Lin Shang
M. Febrero-Bande and P. Galeano and W. Gonzalez-Mantelga (2008), "Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels", Environmetrics, 19(4), 331-345.
A. Cuaves and M. Febrero and R Fraiman (2007), "Robust estimation and classification for functional data via projection-based depth notions", Computational Statistics, 22(3), 481-496.
H. Hullait and D. S. Leslie and N. G. Pavlidis and S. King (2021), "Robust function-on-function regression", Technometrics, 63(3), 396-409.
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y, X = X, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX) rob.out.detect(object = model.MM, fplot = TRUE) sort(out.indx)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1) out.indx <- sim.data$out.indx Y <- sim.data$Y X <- sim.data$X gpY = seq(0, 1, length.out = 101) # grid points of Y gpX <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.MM <- rob.ff.reg(Y = Y, X = X, model = "full", emodel = "robust", fmodel = "MM", gpY = gpY, gpX = gpX) rob.out.detect(object = model.MM, fplot = TRUE) sort(out.indx)
This function is used to perform both classical and robust scalar-on-function regression model
where denotes the scalar response,
denotes the
-th functional predictor,
denotes the
-th regression coefficient function,
denotes the matrix of scalar predictors,
denotes the vector of coefficients for the scalar predictors' matrix, and
is the error function, which is assumed to follow standard normal distribution.
rob.sf.reg(Y, X, X.scl = NULL, emodel = c("classical", "robust"), fmodel = c("LTS", "MM", "S", "tau"), nbasis = NULL, gp = NULL, ncomp = NULL)
rob.sf.reg(Y, X, X.scl = NULL, emodel = c("classical", "robust"), fmodel = c("LTS", "MM", "S", "tau"), nbasis = NULL, gp = NULL, ncomp = NULL)
Y |
An |
X |
A list consisting of |
X.scl |
An |
emodel |
Method to be used for functional principal component decomposition. Possibilities are "classical"" and "robust". |
fmodel |
Fitting model used to estimate the function-on-function regression model. Possibilities are "LTS", "MM", "S", and "tau". |
nbasis |
A vector with length |
gp |
A list with length |
ncomp |
A vector with length |
When performing a scalar-on-function regression model based on the functional principal component analysis, first, the functional predictors are decomposed by the functional principal component analysis method:
where is the mean function,
is the weight function, and
is the principal component score for the
-th functional predictor. Assume that the
-th regression coefficient function admits the expansion
where . Then, the following multiple regression model is obtained for the scalar response:
If emodel = "classical"
, then, the least-squares method is used to estimate the scalar-on-function regression model.
If emodel = "robust"
, then, the robust functional principal component analysis of Bali et al. (2011) along with the method specified in fmodel
is used to estimate the scalar-on-function regression model.
If fmodel = "LTS"
, then, the least trimmed squares robust regression of Rousseeuw (1984) is used to estimate the scalar-on-function regression model.
If fmodel = "MM"
, then, the MM-type regression estimator described in Yohai (1987) and Koller and Stahel (2011) is used to estimate the scalar-on-function regression model.
If fmodel = "S"
, then, the S estimator is used to estimate the scalar-on-function regression model.
If fmodel = "tau"
, then, the tau estimator proposed by Salibian-Barrera et al. (2008) is used to estimate the scalar-on-function regression model.
A list object with the following components:
data |
A list of matrices including the original scalar response and both the scalar and functional predictors. |
fitted.values |
An |
residuals |
An |
fpca.results |
A list object containing the functional principal component analysis results of the functional predictors variables. |
model.details |
A list object containing model details, such as number of basis functions, number of principal components, and grid points used for each functional predictor variable. |
Ufuk Beyaztas and Han Lin Shang
J. L. Bali and G. Boente and D. E. Tyler and J. -L.Wang (2011), "Robust functional principal components: A projection-pursuit approach", The Annals of Statistics, 39(6), 2852-2882.
P. J. Rousseeuw (1984), "Least median of squares regression", Journal of the American Statistical Association, 79(388), 871-881.
P. J. Rousseeuw and K. van Driessen (1999) "A fast algorithm for the minimum covariance determinant estimator", Technometrics, 41(3), 212-223.
V. J. Yohai (1987), "High breakdown-point and high efficiency estimates for regression", The Annals of Statistics, 15(2), 642-65.
M. Koller and W. A. Stahel (2011), "Sharpening Wald-type inference in robust regression for small samples", Computational Statistics & Data Analysis, 55(8), 2504-2515.
M. Salibian-Barrera and G. Willems and R. Zamar (2008), "The fast-tau estimator for regression", Journal of Computational and Graphical Statistics, 17(3), 659-682
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.tau <- rob.sf.reg(Y, X, emodel = "robust", fmodel = "tau", gp = gp)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101) Y <- sim.data$Y X <- sim.data$X gp <- rep(list(seq(0, 1, length.out = 101)), 5) # grid points of Xs model.tau <- rob.sf.reg(Y, X, emodel = "robust", fmodel = "tau", gp = gp)