% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cv_similarity.R
\name{cv_similarity}
\alias{cv_similarity}
\title{Compute similarity measures to evaluate possible extrapolation in testing folds}
\usage{
cv_similarity(
  cv,
  x,
  r,
  num_plot = seq_along(cv$folds_list),
  method = "MESS",
  num_sample = 10000L,
  jitter_width = 0.1,
  points_size = 2,
  points_alpha = 0.7,
  points_colors = NULL,
  progress = TRUE
)
}
\arguments{
\item{cv}{a blockCV cv_* object; a \code{cv_spatial}, \code{cv_cluster}, \code{cv_buffer}
or \code{cv_nndm}}

\item{x}{a simple features (sf) or SpatialPoints object of the spatial sample data used for creating
the \code{cv} object.}

\item{r}{a terra SpatRaster object of environmental predictor that are going to be used for modelling. This
is used to calculate similarity between the training and testing points.}

\item{num_plot}{a vector of indices of folds.}

\item{method}{the similarity method including: MESS, L1 and L2. Read the details section.}

\item{num_sample}{number of random samples from raster to calculate similarity distances (only for L1 and L2).}

\item{jitter_width}{numeric; the width of jitter points.}

\item{points_size}{numeric; the size of points.}

\item{points_alpha}{numeric; the opacity of points}

\item{points_colors}{character; a character vector of colours for points}

\item{progress}{logical; whether to shows a progress bar for random fold selection.}
}
\value{
a ggplot object
}
\description{
This function evaluates environmental similarity between training and testing folds,
helping to detect potential extrapolation in the testing data. It supports three
similarity measures: Multivariate Environmental Similarity Surface (MESS), Manhattan
distance (L1), and Euclidean distance (L2).
}
\details{
The MESS is calculated as described in Elith et al. (2010). MESS represents
how similar a point in a testing fold is to a training fold (as a reference
set of points), with respect to a set of predictor variables in \code{r}.
The negative values are the sites where at least one variable has a value that is outside
the range of environments over the reference set, so these are novel environments.

When using the L1 (Manhattan) or L2 (Euclidean) distance options (experimental), the
function performs the following steps for each test sample:

\itemize{
\item{1. Calculates the minimum distance between each test sample and all training samples
   in the same fold using the selected metric (L1 or L2).}
\item{2. Calculates a baseline distance: the average of the minimum distances between a set
   of random background samples (defined by \code{num_sample}) from the raster and all training/test
   samples combined.}
\item{3. Computes a similarity score by subtracting the test sample’s minimum distance from
   the baseline average. A higher score indicates the test sample is more similar to
   the training data, while lower or negative scores indicate novelty.}
}

This provides a simple, distance-based novelty metric, useful for assessing
extrapolation or dissimilarity in prediction scenarios. Note that this approach is
experimental.
}
\examples{
\donttest{
library(blockCV)

# import presence-absence species data
points <- read.csv(system.file("extdata/", "species.csv", package = "blockCV"))
# make an sf object from data.frame
pa_data <- sf::st_as_sf(points, coords = c("x", "y"), crs = 7845)

# load raster data
path <- system.file("extdata/au/", package = "blockCV")
files <- list.files(path, full.names = TRUE)
covars <- terra::rast(files)

# hexagonal spatial blocking by specified size and random assignment
sb <- cv_spatial(x = pa_data,
                 column = "occ",
                 size = 450000,
                 k = 5,
                 iteration = 1)

# compute extrapolation
cv_similarity(cv = sb, r = covars, x = pa_data)

}
}
\references{
Elith, J., Kearney, M., & Phillips, S. (2010). The art of modelling range-shifting species: The art of modelling range-shifting species. Methods in Ecology and Evolution, 1(4), 330–342.
}
\seealso{
\code{\link{cv_spatial}}, \code{\link{cv_cluster}}, \code{\link{cv_buffer}}, and \code{\link{cv_nndm}}
}
