This vignette introduces the smap family of functions.
These functions are useful when you want to apply custom
igraph-based operations to glycan structure vectors.
This guide assumes you are comfortable with R programming and have some familiarity with graph concepts. If you are just getting started, read the “Getting Started with glyrepr” vignette first.
Before using smap, it helps to understand why these
functions exist.
Working with glycan structures means working with graphs, and graph operations are computationally expensive. When you are analyzing thousands of glycans from a large-scale study, this becomes a real bottleneck.
glyrepr implements an optimization called unique
structure storage. Instead of storing thousands of identical
graphs, it stores only the unique ones and keeps track of which original
positions they belong to.
Let’s see this in action:
# Our test data: some common glycan structures
iupacs <- c(
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", # N-glycan core
"Gal(b1-3)GalNAc(a1-", # O-glycan core 1
"Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-", # O-glycan core 2
"Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(a1-", # Branched mannose
"GlcNAc6Ac(b1-4)Glc3Me(a1-" # With decorations
)
struc <- as_glycan_structure(iupacs)
# Now create a realistic dataset with lots of repetition.
large_struc <- rep(struc, 1000) # 5,000 total structures
large_struc
#> <glycan_structure[5000]>
#> [1] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [2] Gal(b1-3)GalNAc(a1-
#> [3] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> [4] Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(a1-
#> [5] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> [6] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [7] Gal(b1-3)GalNAc(a1-
#> [8] Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-
#> [9] Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(a1-
#> [10] GlcNAc6Ac(b1-4)Glc3Me(a1-
#> ... (4990 more not shown)
#> # Unique structures: 5Notice that the object reports only 5 unique structures. The vector has 5,000 elements, but only 5 unique graphs are stored internally.
We can verify that directly:
library(lobstr)
#>
#> Attaching package: 'lobstr'
#> The following object is masked from 'package:dplyr':
#>
#> src
obj_sizes(struc, large_struc)
#> * 15.18 kB
#> * 40.69 kBThe memory difference can be substantial. For repeated structures, the optimized representation can be much smaller than storing every graph independently.
smap FamilyThere is one important consequence of this internal representation:
regular lapply() or purrr::map() functions do
not operate directly on a glycan structure vector as if it were a list
of graphs.
# This will not work and will raise an error.
tryCatch(
purrr::map_int(large_struc, ~ igraph::vcount(.x)),
error = function(e) cat("Error:", rlang::cnd_message(e))
)
#> Error: ℹ In index: 1.
#> Caused by error in `ensure_igraph()`:
#> ! Must provide a graph object (provided wrong object type).Why does this fail? Because purrr
functions don’t understand the internal structure optimization of
glycan_structure objects.
The smap functions are structure-aware alternatives to
purrr mapping functions. They understand the unique
structure optimization and work directly with the underlying graph
objects.
vertex_counts <- smap_int(large_struc, ~ igraph::vcount(.x))
vertex_counts[1:10]
#> [1] 5 2 3 5 2 5 2 3 5 2The “s” stands for “structure”: these functions operate on the
underlying igraph objects that represent glycan
structures.
smap ToolkitThe smap family provides glycan-aware equivalents for
virtually all purrr functions:
| purrr | smap | purrr | smap |
|---|---|---|---|
map() |
smap() |
map2() |
smap2() |
map_lgl() |
smap_lgl() |
map2_lgl() |
smap2_lgl() |
map_int() |
smap_int() |
map2_int() |
smap2_int() |
map_dbl() |
smap_dbl() |
map2_dbl() |
smap2_dbl() |
map_chr() |
smap_chr() |
map2_chr() |
smap2_chr() |
some() |
ssome() |
pmap() |
spmap() |
every() |
severy() |
pmap_*() |
spmap_*() |
none() |
snone() |
imap() |
simap() |
imap_*() |
simap_*() |
As a simple rule, replace map with smap,
pmap with spmap, and imap with
simap. The function signatures are designed to feel
familiar if you already use purrr.
Count vertices in each structure:
vertex_counts <- smap_int(large_struc, igraph::vcount)
summary(vertex_counts)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2.0 2.0 3.0 3.4 5.0 5.0Find structures with more than 4 vertices:
has_many_vertices <- smap_lgl(large_struc, ~ igraph::vcount(.x) > 4)
sum(has_many_vertices)
#> [1] 2000Get the degree sequence of each structure:
degree_sequences <- smap(large_struc, ~ igraph::degree(.x))
degree_sequences[1:3]
#> [[1]]
#> 1 2 3 4 5
#> 1 1 3 2 1
#>
#> [[2]]
#> 1 2
#> 1 1
#>
#> [[3]]
#> 1 2 3
#> 1 1 2Check if any structure has isolated vertices:
Verify all structures are connected:
smap()Quick examples of the extended family:
# smap2: Apply function with additional parameters
thresholds <- c(3, 4, 5)
large_enough <- smap2_lgl(struc[1:3], thresholds, function(g, threshold) {
igraph::vcount(g) >= threshold
})
large_enough
#> [1] TRUE FALSE FALSE# simap: Include position information
indexed_report <- simap_chr(large_struc[1:3], function(g, i) {
paste0("#", i, ": ", igraph::vcount(g), " vertices")
})
indexed_report
#> [1] "#1: 5 vertices" "#2: 2 vertices" "#3: 3 vertices"Performance note: simap functions do
not benefit from the unique structure optimization. Since each element
has a different index, the combination of
(structure, index) is always unique, breaking the
deduplication that makes other smap functions fast. Use
simap only when you truly need position information.
The main performance benefit of smap functions comes
from automatic deduplication:
# Create a large dataset with high redundancy
huge_struc <- rep(struc, 5000) # 25,000 structures, only 5 unique
cat("Dataset size:", length(huge_struc), "structures\n")
#> Dataset size: 25000 structures
cat("Unique structures:", length(attr(huge_struc, "structures")), "\n")
#> Unique structures: 0
cat("Redundancy factor:", length(huge_struc) / length(attr(huge_struc, "structures")), "x\n")
#> Redundancy factor: Inf x
library(tictoc)
# Optimized approach: smap only processes 5 unique structures
tic("smap_int (optimized)")
vertex_counts_optimized <- smap_int(huge_struc, igraph::vcount)
toc()
#> smap_int (optimized): 0.002 sec elapsed
# Naive approach: extract all graphs and process each one
tic("Naive approach (all graphs)")
all_graphs <- get_structure_graphs(huge_struc) # Extracts all 25,000 graphs
vertex_counts_naive <- purrr::map_int(all_graphs, igraph::vcount)
toc()
#> Naive approach (all graphs): 0.076 sec elapsed
# Verify results are equivalent (though data types may differ)
all.equal(vertex_counts_optimized, vertex_counts_naive)
#> [1] TRUEThe higher the redundancy, the larger the performance gain. In real glycoproteomics datasets with repeated structures, this optimization can provide about 10x speedups.
The function you pass to smap must accept an
igraph object as its first argument. You can use
purrr-style lambda notation:
# Create a compact structure summary.
structure_analysis <- smap(large_struc, function(g) {
list(
vertices = igraph::vcount(g),
edges = igraph::ecount(g),
diameter = ifelse(igraph::is_connected(g), igraph::diameter(g), NA),
clustering = igraph::transitivity(g, type = "global")
)
})
# Convert to a more usable format
analysis_df <- do.call(rbind, lapply(structure_analysis, data.frame))
head(analysis_df)
#> vertices edges diameter clustering
#> 1 5 4 3 0
#> 2 2 1 1 NaN
#> 3 3 2 1 0
#> 4 5 4 2 0
#> 5 2 1 1 NaN
#> 6 5 4 3 0smap FunctionsUse smap functions when:
igraph-based functions to glycan
structures.Use regular R functions when:
Special note on simap:
While simap functions are convenient for position-aware
operations, they do not provide performance benefits over regular
imap functions. The inclusion of index information breaks
the unique structure optimization, making each
(structure, index) pair unique even when structures are
identical.
Here’s how you might build a custom glycan analysis pipeline using
smap functions:
# Custom motif detector
detect_branching <- function(g) {
degrees <- igraph::degree(g)
any(degrees >= 3)
}
# Apply to a large dataset using unique structure optimization.
has_branching <- smap_lgl(large_struc, detect_branching)
cat("Structures with branching:", sum(has_branching), "out of", length(large_struc), "\n")
#> Structures with branching: 2000 out of 5000
# Use smap2 to check structures against complexity thresholds
complexity_thresholds <- rep(c(3, 4, 5, 2, 4), 1000) # Thresholds for each structure
meets_threshold <- smap2_lgl(large_struc, complexity_thresholds, function(g, threshold) {
igraph::vcount(g) >= threshold
})
cat("Structures meeting complexity threshold:", sum(meets_threshold), "out of", length(large_struc), "\n")
#> Structures meeting complexity threshold: 2000 out of 5000The smap family provides structure-aware mapping
functions for glycan structure vectors. It lets you write custom
graph-based analyses while preserving the unique structure optimization
used by glyrepr.
Key takeaways:
smap functions are purrr-like tools that
understand glycan structure vectors.smap for structures, and use regular R or
purrr functions for other data types.sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Tahoe 26.5.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] tictoc_1.2.1 lobstr_1.2.1 dplyr_1.2.1 tibble_3.3.1 purrr_1.2.2
#> [6] glyrepr_0.12.1
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_2.0.0 compiler_4.6.0 tidyselect_1.2.1 stringr_1.6.0
#> [5] jquerylib_0.1.4 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
#> [9] generics_0.1.4 igraph_2.3.2 knitr_1.51 backports_1.5.1
#> [13] checkmate_2.3.4 rstackdeque_1.1.1 bslib_0.11.0 pillar_1.11.1
#> [17] rlang_1.2.0 utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
#> [21] xfun_0.58 sass_0.4.10 otel_0.2.0 cli_3.6.6
#> [25] magrittr_2.0.5 digest_0.6.39 lifecycle_1.0.5 prettyunits_1.2.0
#> [29] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1 rmarkdown_2.31
#> [33] tools_4.6.0 pkgconfig_2.0.3 htmltools_0.5.9