The Soil Data Access (SDA) system provides REST-based access to the SSURGO and STATSGO2 databases maintained by USDA-NRCS. Instead of downloading large databases, SDA allows you to query exactly what you need via a web service.
soilDB provides several functions that help with
generating requests to the web service at a low level, as well as
higher-level functions that can return complex data structures like the
SoilProfileCollection from the aqp package or
spatial object types.
SDA provides access to a SQL Server database containing a variety of soil information:
The SDA REST endpoint is available at:
https://sdmdataaccess.sc.egov.usda.gov/tabular/post.rest
All queries to SDA are executed through the SDA_query()
function in soilDB, which handles the HTTP POST requests
and JSON response parsing automatically.
SDA contains both the detailed Soil Survey Geographic Database (SSURGO) and the generalized U.S. General Soil Map (STATSGO2).
CA630, IN005).'US'.See the Data Filters section below for details on filtering out STATSGO2 data.
Use SDA for current data from NRCS without manual downloads, queries across multiple survey areas, quick one-time analyses, or access to interpretations and aggregated properties. Use local SSURGO databases for repeated queries to the same survey areas (faster), offline access, complete survey databases with all tables, or direct SQLite manipulation.
The same get_SDA_*() functions documented in this
vignette can be applied to local SSURGO databases by passing the
dsn parameter. See the Local
SSURGO vignette for downloading and creating local SSURGO GeoPackage
databases, and for examples of using these functions with local
data.
SDA_query() FunctionThe SDA_query() function executes SQL queries directly
against the SDA database:
# Simple query to get the first 5 map units from an area
q <- "
SELECT TOP 5
mapunit.mukey,
mapunit.nationalmusym,
legend.areasymbol,
mapunit.musym,
mapunit.muname
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
WHERE legend.areasymbol = 'CA630'
ORDER BY mapunit.musym
"
result <- SDA_query(q)
head(result)If SDA_query() errors for any reason, the result will
not be a data.frame with data. Instead it will be an object of class
"try-error". You can handle the possibility of error using
inherits(). For example:
result <- SDA_query(q)
# check if the result is an error response
if (!inherits(result, 'try-error')){
head(result)
} else {
# stop with error message extracted from result
stop("Error occured! ", result[1])
}You can set up a loop to retry a query that fails in the event of
intermittent network or server issues. You should have a brief period of
wait time (Sys.sleep()) between requests.
For repeated failures, it is best to increase that wait time in between each try up to, for example, a maximum of 3 attempts. The following example uses “exponential backoff” which increases the wait time in between requests incrementally from ~3 seconds to ~20 seconds over the course of several attempts.
n_tries <- 1
while (n_tries <= 3){
result <- SDA_query(q)
# check if the result is an error response
if (!inherits(result, 'try-error')){
head(result)
# if no error, break out of the loop
break
} else {
# on error, increment the number of tries
# and sleep (time in seconds)
Sys.sleep(exp(n_tries))
n_tries <- n_tries + 1
if (n_tries > 3) {
stop("Error occured! ", result[1])
}
}
}SDA uses Microsoft SQL Server Transact-SQL (T-SQL) dialect.
Key differences of T-SQL from SQLite and other SQL dialects that are relevant to queries used in soilDB include:
| Feature | T-SQL | SQLite | Notes |
|---|---|---|---|
| Row limit | TOP 10 |
LIMIT 10 |
Place after SELECT |
| NULL functions | ISNULL(col, 0) |
IFNULL(col, 0) |
Different function names |
| String concatenation | col1 + col2 |
col1 || col2 |
Different operators |
When using SDA_query() with a local SQLite
database (via dsn parameter), soilDB
automatically converts some T-SQL syntax to SQLite equivalents to allow
for both to work. This behavior is triggered when the dsn
argument is specified. You can view the queries (and save them to run
yourself) by passing the query_string=TRUE argument.
Many custom SSURGO queries follow a standard pattern of relationships from legend, to mapunit, to component, to a related table to component such as component horizon.
In SSURGO data model primary and foreign key physical column names
all end with the word "key". The primary key is a unique
identifier of a particular record.
Foreign keys are references to primary keys of other tables and define the relationships between tables that can be used for joins.
Here we select a handful of values from all of the above mentioned
tables corresponding to the first 10 component horizon
(chorizon) in the SDA table.
# Get map unit, component, and horizon data
q <- "
SELECT TOP 10
mu.mukey,
leg.areasymbol,
mu.musym,
co.compname,
co.comppct_r,
ch.hzname,
ch.hzdept_r,
ch.hzdepb_r,
ch.claytotal_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
AND ch.claytotal_r IS NOT NULL
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r
"
result <- SDA_query(q)
head(result, 10)We supplied all of the filtering constraints in a WHERE
clause for clarity, with joins only using primary and foreign keys.
In general, the same constraints can be expressed as part of a join to relevant tables, which can be more efficient in complex queries.
One of the most important considerations in SSURGO queries is data completeness: not all relationships in the SSURGO database are populated for all soils or components. Understanding when to use LEFT JOIN vs. INNER JOIN is critical for correct analysis.
Use INNER JOIN when the relationship must
exist for meaningful results:
# These relationships are always present in SSURGO:
# - mapunit always belongs to a legend (survey area)
# - component always belongs to a mapunit
q <- "
SELECT TOP 10
leg.areasymbol,
mu.musym,
co.compname,
co.comppct_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
WHERE leg.areasymbol = 'CA630'
ORDER BY mu.musym, co.comppct_r DESC
"
result <- SDA_query(q)
# Every row represents a valid component with all parent relationships
head(result)Use LEFT JOIN when a parent record may exist
without child records. Two example cases:
Case 1: Components Without Horizons Some soil
components in SSURGO do not have associated horizon
data. These are often “non-soil” components (compkind
“Miscellaneous Area”) or minor components (majcompflag
“No”) which may lack detailed horizon description:
# Some components have NO horizons defined
# INNER JOIN would exclude these components entirely
# LEFT JOIN preserves components even without horizon data
q <- "
SELECT
leg.areasymbol,
mu.mukey,
mu.musym,
co.compname,
co.comppct_r,
ch.chkey,
ch.hzdept_r,
ch.hzdepb_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r
"
result <- SDA_query(q)
# Check for components without horizons:
# Rows with NA in horizon columns indicate a component with no horizon data
if (is.data.frame(result) && nrow(result) > 0) {
components_no_horizons <- result[is.na(result$chkey), ]
if (nrow(components_no_horizons) > 0) {
cat('Found', nrow(components_no_horizons), 'components without horizon data:\n')
print(components_no_horizons[, c('musym', 'compname', 'comppct_r')])
}
}Comparison: INNER JOIN would have excluded those components:
# Using INNER JOIN on horizons drops components that lack horizon data
q_inner <- "
SELECT
leg.areasymbol,
mu.mukey,
mu.musym,
co.compname,
co.comppct_r,
ch.desgnmaster,
ch.hzdept_r,
ch.hzdepb_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
INNER JOIN chorizon AS ch ON ch.cokey = co.cokey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r
"
result_inner <- SDA_query(q_inner)
# number of rows in LEFT JOIN result
nrow(result)
# number of rows in INNER JOIN result
nrow(result_inner)Case 2: Horizons Without Rock Fragments
Similarly, not all horizons have child table rock fragment volume recorded. An empty chfrags table is conventionally used for horizons with no fragments, but in some cases may be used for soils that have non-zero fragment weight fractions (for example, those with human artifacts). When you need to include horizons regardless of fragment availability, use LEFT JOIN:
# Some horizons have NO rock fragment data recorded
# LEFT JOIN preserves horizons even without fragment information
q <- "
SELECT
mu.musym,
co.compname,
ch.desgnmaster,
ch.hzdept_r,
ch.hzdepb_r,
rf.fragsize_h,
rf.fragvol_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey
LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r
"
result <- SDA_query(q)
# Check for horizons without fragments:
if (is.data.frame(result) && nrow(result) > 0) {
horizons_no_frags <- result[!duplicated(result$chkey) & is.na(result$fragsize_h), ]
if (nrow(horizons_no_frags) > 0) {
cat('Found', nrow(horizons_no_frags), 'horizons without rock fragment data:\n')
print(horizons_no_frags[, c('musym', 'compname', 'desgnmaster', 'fragvol_r')])
}
}| Situation | Join Type | Reason |
|---|---|---|
| Query result must have values at all levels | INNER | Filter out incomplete records |
| Parent may exist without children (components without horizons, horizons without fragments) | LEFT | Preserve parent records even when children are missing |
| Calculating aggregations (COUNT, SUM, AVG) with missing data | Depends | LEFT includes zero-occurrence groups; INNER ignores them |
| Applying WHERE filters to child tables | Typically LEFT | INNER JOIN combined with WHERE can lose parents unintentionally |
Tip: When using LEFT JOIN, always check for
NA values in child table columns to identify records with
missing child data.
SDA supports standard SQL aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.):
# Get component statistics per map unit
q <- "
SELECT
mu.mukey,
mu.musym,
mu.muname,
COUNT(co.cokey) AS n_components,
SUM(co.comppct_r) AS total_comppct,
AVG(co.comppct_r) AS avg_comppct
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
WHERE leg.areasymbol = 'CA630'
GROUP BY mu.mukey, mu.musym, mu.muname
ORDER BY mu.musym
"
result <- SDA_query(q)
head(result)Here we use SQL GROUP BY to define the columns used to
determine unique groups. Other columns that are not part of the GROUP BY
statement need to be included in some sort of aggregation expression to
be used in SELECT clause.
When calculating map unit weighted averages using custom SQL, be
aware that comppct_r (Component Percentage) does not always
sum to 100% for a map unit due to minor component inclusions that may
not be exported or missing data.
A robust weighted average should normalize weights by the sum of component percentages present in the data, rather than assuming a sum of 100. This is especially important if you are relying on INNER JOIN to chorizon table, as many mapunits have at least some components with no horizon data.
When querying SDA, especially without specific map unit keys or area symbols, it is often important to exclude STATSGO2 data to prevent mixing these two distinct datasets.
Common filter to exclude STATSGO2:
When aggregating soil properties, it is sometimes important to exclude layers that do not represent mineral soil so that depth-weighted averages and other statistics are not skewed.
Note that if zero values are present for a soil property they can
drag down averages. If you include a weighting factor in a custom query
for the thickness of a layer with a NULL value, that is
essentially the same as if the property value were 0.
The two most common types of non-mineral soil layers to exclude are:
Bedrock and other Root-Limiting Layers: Below
the “bottom of the soil” (e.g., R, Cr
horizons).
Dry Surface Organic Layers (Duff): High-carbon, low bulk density, dry organic (O) horizons
Since SSURGO data spans decades of standards, no single column perfectly identifies these non-soil layers.
The safest approach is to use the new computed horztype
column in combination with in lieu texture class codes.
Currently horztype only supports identification of dry
organic layers on the surface, but it may be extended to other material
types in the future.
1. Filter Bedrock:
Bedrock layers often have NULL property values, which is
not much of a problem provided there are data available for the
overlying soil layers; but you will need to exclude the thickness of the
bedrock layers for accurate depth-weighted average calculations.
The horztype column does not capture bedrock. You must
filter by texture code:
This includes the modern “bedrock” (BR), as well as the obsolete “weathered bedrock” and “unweathered bedrock”. For the purposes of excluding non-soil bedrock material, they can be treated as equivalent and will cover many cases.
There are a variety of other “in lieu” texture codes that are used to identify cemented materials (“CEM”, “IND”, “MAT”) which may or may not have other texture information and may also need to be considered for exclusion.
While it might be tempting to just filter on horizon designation (e.g. excluding R or Cr layers explicitly), as discussed below this is not completely reliable on its own.
2. Filter Dry Organics (Duff):
The horztype column is currently primarily for
identifying “Dry Organic” layers. This is a new column that has been
programmatically populated to help users filter out a common set of
data.
Combine horztype with texture codes like
"SPM" (which is the least-decomposed organic soil material:
dry fibric material) for more robust confirmation of filtering logic.
You can also include "MPM" and "HPM" if you
are interested in excluding even more decomposed dry organic
materials:
/* Exclude "Duff" / Dry Organics */
AND ISNULL(horztype, '') != 'Dry Organic'
AND texturerv NOT IN ('SPM', 'MPM', 'HPM') The population of O horizons in SSURGO is inconsistent due to evolving data collection standards. While modern surveys often include numerical data for organic surface layers, legacy data may only mention them in descriptions without populated properties.
Retain these layers when analyzing data completeness or identifying diagnostic features like folistic epipedons. However, for analyses focused on mineral soil properties, excluding them is often necessary to prevent skewing depth-weighted averages.
Note: We generally want to keep “Wet Organic” layers (PEAT, MPT [mucky peat], and MUCK) as these may be diagnostic for Histosols, Histic epipedons etc., which are important concepts for identification of hydric soils.
3. Filter by Horizon Designation
It is also possible, but not completely reliable due to soil survey vintage and patterns in historical data population, to filter by horizon designation.
For instance, a logical statement to exclude the master horizon designation “O” for organics:
While using horizon designations is conceptually appealing, it is problematic when it is the only logic you are using. Horizon designations were not historically populated for the aggregate soil component data (SSURGO).
In the last two decades layers in the chorizon table
have been assigned morphologic horizon designations, but this has not
been done retroactively. Some older soil surveys still use “H” as the
master horizon designation for all layers e.g. H1, H2, H3 instead of A,
B, C. If this is the case the only way to identify bedrock, or organic
layers, is to use other data elements such as in lieu texture class.
This section demonstrates filtering horizon data based on material types.
We will start with a custom query that targets components with “Histic” taxonomic subgroup formative element.
q <- "
SELECT TOP 10
component.compname,
ch.hzname,
ch.hzdept_r,
ch.hzdepb_r,
ch.texturerv,
ch.horztype,
ch.claytotal_r,
ch.om_r
FROM chorizon AS ch
INNER JOIN component ON component.cokey = ch.cokey AND taxsubgrp LIKE '%Histic%'
ORDER BY component.cokey, hzdept_r
"
result <- SDA_query(q)
resultHere is a sample query to select mineral and wet organic soil layers:
q <- "
SELECT TOP 10
ch.hzname,
ch.hzdept_r,
ch.hzdepb_r,
ch.texturerv,
ch.horztype,
ch.claytotal_r,
ch.om_r
FROM chorizon AS ch
WHERE
-- 1. Exclude Bedrock
ch.texturerv NOT IN ('BR', 'WB', 'UWB')
-- 2. Exclude Dry Organic / Duff
AND ISNULL(ch.horztype, '') != 'Dry Organic'
AND ch.texturerv NOT IN ('SPM', 'MPM', 'HPM')
ORDER BY cokey, hzdept_r
"
result <- SDA_query(q)
head(result)In contrast to bedrock, organic soil materials are soil materials, whether they are wet or dry–but it is important to know when you may need to look past them. Organic layers have very low bulk density and very high organic carbon content, both of which can skew “mineral soil” averages if included in depth-weighted calculations.
Take for example the K factor for estimating erosion hazard: users often want properties for an exposed mineral soil surface rather than that of an organic layer over the mineral soil surface. In general, the concept of K factor does not apply to the organic surface.
To demonstrate, we will look at the whole SSURGO database. We count
the number of horizons with master designation “O” and horizon top depth
0 within groups of different combinations of
Kw and Kf, and sort in decreasing order.
q <- "
SELECT
COUNT(chkey) AS n,
kwfact,
kffact
FROM chorizon
WHERE desgnmaster = 'O' AND hzdept_r = 0
GROUP BY kwfact, kffact
ORDER BY n DESC
"
result <- SDA_query(q)
result$percent <- round(prop.table(result$n) * 100, 1)
head(result, 10)This query shows that many O horizons at the soil surface have NULL K factor values.
The take away from this should be: if your analysis relies on having K factor everywhere you will either need to filter out the O horizons, or come up with a way to impute a value to replace the NULL values.
Rock fragments are particles larger than 2 mm. In SSURGO, fragment data are stored in several places both aggregated up to the horizon level, and in child tables of horizon.
Fragments are classified by size class (e.g. gravel, cobbles, stones, boulders) and recorded as volume percentages. Derived weight fractions are also stored at the horizon level.
# Query horizons with their rock fragment content
q <- "
SELECT TOP 20
mu.musym,
co.compname,
ch.hzname,
ch.hzdept_r,
ch.hzdepb_r,
rf.fragkind,
rf.fragsize_h,
rf.fragvol_r
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey
LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r, rf.fragsize_h
"
result <- SDA_query(q)
head(result, 20)Above you can see multiple fragment size classes can exist in a
single horizon (a many-to-one relationship). Horizons without fragments
generally have no records in the chfrags table.
chfrags table contains individual fragment records
per horizon and size class (e.g. at least one each for gravel, cobbles,
stones, boulders, where present)
fragsize_h indicates upper end of fragment size
range for a specific record
fragvol_r is the representative volume percentage
for that specific record
Sum fragment volumes to get total rock fragment content per horizon:
# Total rock fragment content per horizon
q <- "
SELECT
mu.musym,
co.compname,
co.comppct_r,
ch.hzname,
ch.hzdept_r,
ch.hzdepb_r,
SUM(rf.fragvol_r) AS calc_fragvoltot_r,
COUNT(DISTINCT rf.fragsize_h) AS n_frag_classes
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
LEFT JOIN chorizon AS ch ON ch.cokey = co.cokey
LEFT JOIN chfrags AS rf ON rf.chkey = ch.chkey
WHERE leg.areasymbol = 'CA630'
AND co.majcompflag = 'Yes'
GROUP BY mu.musym, co.compname, co.comppct_r, ch.hzname,
ch.hzdept_r, ch.hzdepb_r, ch.chkey
ORDER BY mu.musym, co.comppct_r DESC, ch.hzdept_r
"
result <- SDA_query(q)
head(result)Note: T-SQL is strict about aggregation syntax. Every column in the SELECT clause that is not wrapped in an aggregate function (SUM, COUNT, etc.) MUST be listed in the GROUP BY clause.
Use get_SDA_property() to retrieve rock fragment
properties with aggregation:
# Get rock fragment weight percentages (3-10 inch class)
frag_3_10 <- get_SDA_property(
property = "Rock Fragments 3 - 10 inches - Rep Value", # equivalent to 'frag3to10_r'
method = "Dominant Component (Numeric)", # dominant component, weighted average by depth
areasymbols = "CA630",
top_depth = 0,
bottom_depth = 100 # depth range for averaging
)
head(frag_3_10)
# get fragment volume (fragvoltot_l, fragvoltot_r, fragvoltot_h)
fragvol_total <- get_SDA_property(
property = "fragvoltot_r",
method = "Weighted Average", # weighted by component percentage and depth
top_depth = 0,
bottom_depth = 100,
areasymbols = "CA630"
)
head(fragvol_total)Note: Fragment weight fraction properties in
chorizon (like frag3to10_r and
fraggt10_r) are always populated, but currently the
fragvoltot_* columns are not and data availability depends
on the vintage of the specific SSURGO map unit data.
The SDA system has important hard limits:
Record Limit: Queries that return more than 100,000 records are rejected
Result Size: Response serialization limited to approximately 32 Mb
Memory Limit: Queries generally cannot exceed approximately 300 Mb memory usage
Timeouts and Complexity: Long-running queries may timeout or hit other internal limits during query planning
Note on Local vs. Remote: These limits primarily
apply to the Remote SDA API. When querying a
local SQLite database (e.g. created via
createSSURGO() or downloadSSURGO()), you are
generally unrestricted by record counts or timeouts. Complex tabular
queries can often be run on entire states or the whole country locally
without the need for chunking or simplification.
This query will return >100k records (all components across all of SSURGO and STATSGO) (bad):
This query selects a specific component name pattern in
WHERE clause (good):
Use a TOP clause to test with small result set first
TOP can be used after sorting with ORDER BY to easily
find the values associated with an ordered sequence. For instance, if
you ORDER BY comppct_r DESC (order by component percentage
in descending order), then take TOP 1 you will get the
records associated with the dominant component. This is a very common
aggregation scheme available as an option in most tools that use SSURGO
data.
TOP is also generally useful for testing queries, or checking table result structure for a few resulting rows. If testing is successful, remove TOP, add filtering with WHERE or in JOIN condition (see below), or increase the TOP limit.
Efficient queries often perform filtering operations as part of their JOIN statements. This can improve server-side query planning and reduce overhead that is not needed for the final result.
Return aggregated results (fewer rows) instead of raw data and constrain musym in join to mapunit instead of WHERE:
SELECT mu.mukey,
COUNT(*) AS n_hz,
AVG(ch.claytotal_r) AS avg_clay
FROM component co
INNER JOIN mapunit mu ON mu.mukey = co.mukey AND
INNER JOIN chorizon ch ON ch.cokey = co.cokey
GROUP BY mu.mukey
WHERE compname LIKE 'Musick%'Efficient queries also are explicit about columns in the SELECT clause.
While SELECT * is fine to use for testing, it is prone
to issues if schemas change, as well as returning more data than is
necessary in most cases.
SELECT * will return all columns, usually including many
columns you do not need:
SELECT TOP 1 *
FROM mapunit
INNER JOIN component ON mapunit.mukey = component.mukey
WHERE compname LIKE 'Musick%'Use SELECT column1, column2, ... for concise results,
with no duplicated columns in complex joins:
SELECT TOP 1 mapunit.mukey, musym, muname,
compname, localphase, comppct_r, majcompflag, compkind, cokey
FROM mapunit
INNER JOIN component ON mapunit.mukey = component.mukey
WHERE compname LIKE 'Musick%'If a column name occurs multiple times in different tables, you need
specify which one you want it to come from with
<tablename>.<columname> syntax.
If you do need all columns from a specific table use
<tablename>.* syntax, for example here we get all
columns from chfrags, and specific columns from
chorizon and component:
SELECT hzname, chfrags.*, component.cokey
FROM chfrags
INNER JOIN chorizon ON chorizon.chkey = chfrags.chkey
INNER JOIN component ON component.cokey = chorizon.cokey
WHERE compname LIKE 'Musick%'makeChunks() for large vectors
(see next section)makeChunks() is a helper function in the `soilDB``
package. It takes a vector of values as input, and assigns each value a
group (chunk) number. You can then calculate the unique chunk numbers
and iterate over them to process data in bite-sized pieces. This is
covered extensively in the next section.
Check and filter for non-NULL - SDA uses IS NULL / IS NOT NULL:
Use ISNULL() to provide defaults such as converting
missing clay content to 0
Use CAST() for explicit type conversion:
SELECT TOP 10
claytotal_r,
CAST(claytotal_r AS INT) AS clay_int,
CAST(chkey AS CHAR(10)) AS chkey_str
FROM chorizon
WHERE claytotal_r != CAST(claytotal_r AS INT) Note: in example above, casting to INT is equivalent
to truncating the value by removing decimals, it does not follow the
rounding rules that R round() does, for example.
When you need data for many map units, components, soil survey areas, or other entities, a single query can exceed SDA’s limits.
SDA is a shared public resource, so it is generally encouraged to make requests that are as small and efficient as possible.
The makeChunks() function divides a large vector into
smaller chunks for iterative querying.
Before implementing any chunked query approach, you should also be
certain that your query can run properly on a small set, for example
just single chunk, or a result that is truncated using a
TOP clause.
makeChunks()Suppose you have 5000 map unit keys to query. We can break that set
up into chunks of 1000 map units, and iterate over each chunk using
lapply()
large_mukey_list <- 461994:466995
# Divide into chunks of 1000
chunks <- soilDB::makeChunks(large_mukey_list, 1000)
# Now loop through chunks
results_list <- lapply(split(large_mukey_list, chunks), function(chunk_keys) {
# Build IN clause
in_clause <- soilDB::format_SQL_in_statement(chunk_keys)
# construct query
q <- sprintf(
"SELECT mukey, musym, muname FROM mapunit WHERE mukey IN %s", in_clause
)
cat(sprintf("Chunk mukey %d to %d\n", min(chunk_keys), max(chunk_keys)))
SDA_query(q)
})
# Combine results
all_results <- do.call(rbind, results_list)The above approach is readily converted to parallel processing, for
example using future::future.lapply(), or other custom
iterator functions that allow for progress reporting and other
diagnostics.
If you do choose to use parallel processing, you should limit the number of concurrent connections made to the API to a reasonable number, say, 2 to 5 requests at a time, with a maximum of about 10. You should also implement proper error handling and retry mechanisms (see Handling Errors section above).
Here we query soil properties for 6 soil survey areas in chunks of 2.
test_areasymbols <- c("CA067", "CA077", "CA632", "CA644", "CA630", "CA649")
# Chunk into groups of 5
chunks <- soilDB::makeChunks(test_areasymbols, 2)
results_list <- lapply(split(test_areasymbols, chunks), function(chunk_keys) {
# Build IN clause
in_clause <- soilDB::format_SQL_in_statement(chunk_keys)
q <- sprintf("
SELECT
mu.mukey,
mu.musym,
mu.muname,
co.cokey,
co.compname,
SUM(hzdepb_r - hzdept_r) AS sum_hz_thickness
FROM legend AS leg
INNER JOIN mapunit AS mu ON mu.lkey = leg.lkey
INNER JOIN component AS co ON co.mukey = mu.mukey
INNER JOIN chorizon AS ch ON ch.cokey = co.cokey
WHERE leg.areasymbol IN %s
GROUP BY mu.mukey, mu.musym, mu.muname, co.cokey, co.compname
", in_clause)
cat(sprintf("Chunk %s completed\n", paste0(chunk_keys, collapse = ", ")))
SDA_query(q)
})
all_results <- do.call(rbind, results_list)
head(all_results)Note: The above sample query is not a good way to determine soil thickness, as it does not exclude thickness of non-soil layers (e.g. bedrock) that should not be counted as part of the soil depth. As an exercise, use the logic from the previous sections to improve this example to filter out bedrock.
Typical queries: 1000-10000 items per chunk (safe for most queries)
Simple lookups: 10000-100000 items per chunk (less data per item)
Complex joins: 500-1000 items per chunk (more processing per item)
Monitor your query results and adjust based on SDA response times.
You may get an error:
Invalid query: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
If you see this, first confirm that your query syntax is correct for the result you are trying to obtain, for instance by trying it on a minimal small set of data. Critically evaluate the need for joins, subqueries, and complex filtering logic. Identify any areas of repeated logic and refactor to simplify redundant steps. If your query works as expected on a small data set, and you have optimized it, then consider breaking your big query up into (more) chunks.
This document mainly focuses on details of custom spatial queries
using the low-level SDA_query() function, but there are
several functions that provide high-level convenience interface for
spatial queries in soilDB. First we will discuss converting
spatial data to Well-Known Text for use in generating custom queries.
Then we will discuss the high- and low-level interfaces in
soilDB.
Important: SDA requires spatial inputs (WKT) to be in WGS84 (EPSG:4326) geographic coordinates.
Passing projected coordinates (like UTM or Albers) will result in query failures or zero results.
SDA does store an alternate projected version of the geometry data
(e.g. mupolygonproj, rather than
mupolygongeo). The projection is a Web Mercator projection
("EPSG:3857") so it is best avoided in analysis, though it
is an option for visualization web map applications that lack capability
to project the data.
You can use sf to define a geometry, such as a bounding
recatangle, and convert it to WKT (Well-Known Text) for spatial
queries:
The soilDB package provides two higher-level functions
(SDA_spatialQuery() and fetchSDA_spatial())
that make obtaining spatial data easier. Both of these functions
internally use SDA_query() and are often a better option
for most use cases compared to crafting a custom spatial query.
SDA_spatialQuery(): spatial inputs
(e.g. point location, AOI rectangle) => spatial or tabular
outputs
fetchSDA_spatial(): tabular inputs
(e.g. map unit key, area symbol, ecological site ID) => spatial
outputs
The functions take different inputs, and support various outputs
including SSURGO and STATSGO2 map unit polygon geometry, special feature
points/lines/polygons, soil survey area boundaries, Major Land Resource
Area (MLRA) boundaries, or simple tabular information using spatial data
as input. These functions will handle projection to
"EPSG:4326" internally as long as the input data has the
correct CRS metadata.
# Example: Retrieve map unit polygons that intersect a bounding box
# This handles the WKT conversion and query construction automatically
# geomIntersection = TRUE clips the polygons to the bounding box
polygons <- SDA_spatialQuery(bbox_sf, what = "mupolygon", geomIntersection = TRUE)
polygonsNote that bbox_sf could be any geometry. Here is is a
rectangular polygon, but it could be a point, or collection of points,
or lines, or similar.
SDA_spatialQuery() allows you to get tabular data based
on a spatial geometry.
For instance, to simply identify what soil survey area or map unit a single point overlaps with, you can do something like the following.
# Example: get some legend and mapunit-level tabular data at a point
point_sf <- sf::st_as_sf(data.frame(wkt = "POINT (-120.85 37.75)"),
wkt = "wkt",
crs = "EPSG:4236")
ssa <- SDA_spatialQuery(
point_sf,
what = "areasymbol"
)
ssa
mu <- SDA_spatialQuery(
point_sf,
what = "mukey"
)
muAbove we pass a WKT string (the result of
sf::st_centroid(bbox_sf)) within a data.frame to
sf::st_as_sf(). We specify the wkt argument to
specify which column the WKT string is stored in to construct our
spatial query object.
This section will be expanded in the future, but for now an example application of the two higher-level functions is available in the “Dominant Ecological Site” vignette.
For optimal performance, separate your spatial and tabular queries.
Fetch the list of intersecting map unit keys (mukey) first,
then use those keys to query attribute tables.
This avoids joining heavy geometry columns with complex attribute tables, which can trigger the 32Mb serialization limit. Then we can use the WKT in a custom SDA query to get map units intersecting the bounding box:
# Step 1: Get the mukeys that intersect the bounding box
q <- sprintf("
SELECT DISTINCT mu.mukey
FROM mapunit AS mu
INNER JOIN mupolygon AS mp ON mp.mukey = mu.mukey
WHERE mp.mupolygongeo.STIntersects(geometry::STGeomFromText('%s', 4326)) = 1
", wkt)
spatial_result <- SDA_query(q)
head(spatial_result, 5)This query uses special geospatial functions defined in SQL Server to perform the geometric operations (in this case “intersection”)
In addition to standard T-SQL spatial functions, SDA provides pre-defined Table-Valued Functions (TVFs) optimized for intersections.
SDA_Get_Mukey_from_intersection_with_WktWgs84(wkt):
Returns mukey values intersecting the input WKT.SDA_Get_MupolygonWktWgs84_from_Mukey(mukey): Returns
WKT geometry for a specific mukey.This makes the logic much simpler for the common operations of
finding mukey given a spatial location or geometry:
# Input MUST be WKT in WGS84 (EPSG:4326)
q <- "SELECT * FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('POINT(-120.9 37.7)')"
result <- SDA_query(q)Also, you can do the inverse (map unit key to geometry):
# Step 2: Get Geometry from Map Unit Key (mukey)
# Useful for retrieving the polygon boundary for a specific map unit
target_mukey <- 461994
q <- sprintf("SELECT * FROM SDA_Get_MupolygonWktWgs84_from_Mukey('%s')", target_mukey)
# Result contains the mukey and the geometry in WKT format
geometry_df <- SDA_query(q)
head(geometry_df)Using the sf (or terra) package, you can
convert the resulting text WKT column to a spatial object:
if (requireNamespace("sf", quietly = TRUE) &&
!is.null(geometry_df)) {
# Parse WKT column (usually named 'mupolygongeo' in mupolygon table, but 'MupolygonWktWgs84' in TVF result)
poly_sf <- sf::st_as_sfc(geometry_df$MupolygonWktWgs84, crs = 4326)
plot(poly_sf, main = paste("Geometry for mukey=", target_mukey))
}Note: The TVF function name implies the CRS is WGS84
(authority:code EPSG:4326, SRS ID 4326; or,
more correctly, OGC:CRS84).
See the SDA Advanced Query Guide for more details on TVF functions and other high-level features built into SDA.
We can use the result of our spatial filtering with property queries.
Here we extract the unique map unit keys and pass as
mukeys argument to get_SDA_property()
# Step 2: Use the mukeys to fetch tabular data
# First, get mukeys in bounding box
spatial_mukeys <- unique(spatial_result$mukey)
# Then query properties for those mukeys
if (length(spatial_mukeys) > 0) {
clay_in_bbox <- get_SDA_property(
property = "Total Clay - Rep Value",
method = "Weighted Average",
mukeys = spatial_mukeys,
top_depth = 0,
bottom_depth = 50
)
head(clay_in_bbox)
}This shows how to get the data, see the Local SSURGO vignette for more spatial
visualization examples using sf.
get_SDA_property(): Component and Horizon
PropertiesThe get_SDA_property() function queries the “SSURGO On
Demand” service to retrieve soil component and horizon properties with
automatic aggregation to the map unit level.
By default get_SDA_property() includes minor components
(include_minors=TRUE) but excludes miscellaneous areas
miscellaneous_areas=FALSE. Even miscellaneous areas that
have some soil data are excluded.
SDA has 80+ horizon properties and 40+ component properties.
Here are some common examples.
Component properties:
"Taxonomic Order", "Taxonomic Suborder",
"Taxonomic Temperature Regime""Drainage Class", "Hydrologic Group""Corrosion of Steel",
"Corrosion of Concrete""Range Production - Favorable/Normal/Unfavorable Year"Horizon properties:
Water:
"Available Water Capacity - Rep Value",
"0.1 bar H2O - Rep Value",
"0.33 bar H2O - Rep Value",
"15 bar H2O - Rep Value"
Bulk Density:
"Bulk Density 0.1 bar H2O - Rep Value",
"Bulk Density 0.33 bar H2O - Rep Value",
"Bulk Density 15 bar H2O - Rep Value",
"Bulk Density oven dry - Rep Value"
Texture: "Total Sand - Rep Value",
"Total Silt - Rep Value",
"Total Clay - Rep Value", and sand/silt
sub-fractions
Chemistry:
"pH 1:1 water - Rep Value",
"Electrical Conductivity - Rep Value",
"Cation Exchange Capacity - Rep Value",
"Sodium Adsorption Ratio - Rep Value",
"Calcium Carbonate - Rep Value",
"Gypsum - Rep Value"
Other:
"Saturated Hydraulic Conductivity - Rep Value",
"Organic Matter - Rep Value",
"Free Iron - Rep Value",
"Oxalate Iron - Rep Value"
Alternately, you can directly use physical column names from the
SSURGO schema like "claytotal_r",
"ph1to1h2o_r", "ksat_r", etc.
Dominant component (category) get the categorical value from the dominant soil component. This query is a distinct method as categorical data need to be handled differently from numeric data; the numeric method is described below.
Dominant component (numeric) retrieves properties from the dominant component, optionally averaged over a depth range (in centimeters):
The weighted average method retrieves component percentage and
depth-weighted average across all components in a map unit. In this
example we explicitly exclude minor components by changing the default
include_minors argument.
Method `“Min/Max” is used to get minimum or maximum values across all components in a mapunit.
Dominant condition aggregates components with the same category value for the properties of interest, then picks the dominant percentage. This gives extra weight to categories where multiple soils in the same map unit have the same value.
Method "none" returns one row per component or component
horizon, depending on the property selected, with no map unit
aggregation. This generally returns much more data, but allows for
custom aggregation outside of the SQL query, and deeper inspection of
the source data.
You cannot mix component-level and horizon-level properties with this
method. Run it twice, once for horizon-level and once for
component-level, then join the component data to the horizon data using
the cokey column to get a single data frame.
Also, the top_depth and bottom_depth
arguments are deliberately ignored with this method. Apply your own
filtering to the result if you only need a specific depth range.
We can use query_string = TRUE to see the SQL generated
by get_SDA_property():
q <- get_SDA_property(
property = "Taxonomic Suborder",
method = "Dominant Component (Category)",
areasymbols = "CA630",
query_string = TRUE
)
# View first 500 characters of query
cat(substr(q, 1, 500), "...\n")get_SDA_interpretation(): Soil InterpretationsSoil interpretations are ratings that assess how suitable or limited
a soil is for various land uses. An interpretation for a specific
component or map unit includes a rating class, and a numeric value
(“fuzzy rating”). The get_SDA_interpretation() function
retrieves these ratings with multiple aggregation methods.
SDA contains 600+ interpretations organized by category:
| Category | Examples |
|---|---|
FOR (Forestry) |
Potential Seedling Mortality, Road Suitability, Harvest Suitability, Equipment Operability |
AGR (Agriculture) |
Crop Yield, Irrigation Suitability, Pesticide Pollution Potential |
ENG (Engineering) |
Septic Tank Absorption Fields, Construction Roads, Dwelling Foundations, Dam Sites |
URB/REC (Urban/Recreation) |
Trails, Playgrounds, Picnic Areas, Recreational Development |
AWM (Animal Waste Management) |
Wastewater Application Sites, Manure Storage |
WMS (Water Management System) |
Irrigation Suitability, Drainage, Wetland Rating |
CPI (Commodity Crop Productivity Index) |
National Commodity Crop Productivity Index |
Get the rating from the dominant soil component:
Aggregate similar ratings, then return the dominant:
Component-percentage-weighted average of ratings:
# Get weighted average agricultural rating
agr_weighted <- get_SDA_interpretation(
rulename = "AGR - Winter Wheat Yield (MT)",
method = "Weighted Average",
areasymbols = "MT041",
include_minors = TRUE
)
head(agr_weighted)Note: in this example, a regional (Montana-specific) interpretation is used, so we used a Montana areasymbol (MT041). Some interpretations are only exported for specific states, regions, or soil survey areas.
Return one row per component with no map unit aggregation:
Use wide_reason = TRUE to pivot reason columns into
separate sub-rule columns:
# Get detailed ratings with reasons pivoted
detailed <- get_SDA_interpretation(
rulename = "FOR - Mechanical Planting Suitability",
method = "Dominant Component",
areasymbols = "CA630",
wide_reason = TRUE
)
head(detailed)get_SDA_hydric(): Hydric Soil ClassificationsHydric soils are inundated or saturated long enough during the
growing season to develop anaerobic conditions. The
get_SDA_hydric() function evaluates the proportion of
hydric components in a map unit.
Returns an overall classification for each map unit:
hydric_class <- get_SDA_hydric(
areasymbols = c("CA077", "CA630"),
method = "MAPUNIT"
)
head(hydric_class)
# Check unique ratings
unique(hydric_class$HYDRIC_RATING)Possible classifications:
"Nonhydric" - No hydric components
"Hydric" - All major components are hydric
"Predominantly Hydric" - Hydric components >=
50%
"Partially Hydric" - One or more major components
are hydric
"Predominantly Nonhydric" - Hydric components <
50%
The output includes:
hydric_majors - Percentage of hydric major
components
hydric_inclusions - Percentage of hydric
minor/inclusion components
total_comppct - Total component percentage (should
sum to 100)
Get hydric status of the dominant component only:
Aggregate by hydric condition (Yes/No), then pick the dominant:
get_SDA_muaggatt(): Map Unit Aggregate AttributesMap unit aggregate attributes are pre-computed statistics summarizing the components in a map unit. These include surface texture, slope range, permeability, and other physical properties.
This function returns a data frame with one row per map unit,
containing dozens of pre-calculated aggregate properties. See the
get_SDA_muaggatt() documentation for the full list of
available columns.
get_SDA_pmgroupname(): Parent Material GroupsParent material groups classify the primary geological or organic
origin of soil material. The get_SDA_pmgroupname() function
retrieves parent material classifications by component.
Get parent material from the dominant component:
Aggregate parent materials by group, then return the dominant:
By default, simplify = TRUE groups parent materials into
broader categories. Set simplify = FALSE to get detailed
group names:
# Simplified (broader groups)
pm_simple <- get_SDA_pmgroupname(
mukeys = c(461994, 461995),
simplify = TRUE
)
head(pm_simple)
# Detailed (specific groups)
pm_detailed <- get_SDA_pmgroupname(
mukeys = c(461994, 461995),
simplify = FALSE
)
head(pm_detailed)get_SDA_coecoclass(): Component Ecological ClassesEcological classes describe potential natural vegetation and
ecological conditions. The get_SDA_coecoclass() function
retrieves ecological site classifications by component.
A worked example of this function is available in the “Dominant Ecological Site” vignette.
Return component-level ecological classes without aggregation:
The default behavior targets the NRCS Ecological Site database-related entries, but there are many other ecological/vegetation class types in SSURGO.
Columns include:
ecoclassid - Ecological Class ID
ecoclassname - Full ecological class name
ecoclasstypename - Type (e.g.,
"NRCS Rangeland Site",
"NRCS Forestland Site")
ecoclassref - Reference document
Get ecological site from the dominant component only:
Aggregate ecological classes by ID, then return the dominant:
Return all ecological sites per map unit in wide format:
eco_all <- get_SDA_coecoclass(
method = "All",
mukeys = c(461994, 461995),
threshold = 5
)
head(eco_all)Filtering by ecological class type:
# Get rangeland sites only
eco_range <- get_SDA_coecoclass(
method = "Dominant Condition",
areasymbols = "CA630",
ecoclasstypename = "NRCS Rangeland Site"
)
head(eco_range)get_SDA_cosurfmorph(): Component Surface
MorphometrySurface morphometry describes the three-dimensional shape and
position of a landscape. The get_SDA_cosurfmorph() function
returns component-level geomorphic classifications aggregated in various
ways.
SDA contains four cosurfmorph tables:
| Table | Variables | Description |
|---|---|---|
cosurfmorphgc |
geomposmntn, geomposhill,
geomposflats, geompostrce |
3D geomorphic classification |
cosurfmorphhpp |
hillslopeprof |
2D hillslope position profile |
cosurfmorphss |
shapeacross, shapedown |
Surface shape (convex/concave/linear) |
cosurfmorphmr |
geomicrorelief |
Microrelief (slope complexity) |
# Get geomorphic position by component name
geom_3d <- get_SDA_cosurfmorph(
table = "cosurfmorphgc",
areasymbols = "CA630"
)
head(geom_3d)Output includes columns like:
compname - Component name
geomposmntn, geomposmntn_n - Mountain
position (count)
p_geomposmntn - Mountain position
proportion
fetchSDA(): Get Soil Profile CollectionThe fetchSDA() function is a high-level wrapper around
SDA_query() that simplifies the process of downloading and
assembling soil profile data into a SoilProfileCollection
object from the package aqp. It automatically handles the
complex joins between map unit, component, and horizon tables.
You can fetch all components and horizons for a specific area or set
of map units using the WHERE argument:
library(aqp)
# Query soil components by areasymbol and musym
s <- fetchSDA(WHERE = "areasymbol = 'IN005' AND musym = 'MnpB2'")
# The result is a SoilProfileCollection
s
# Check the horizon data
head(horizons(s))This function is very powerful for getting a quick snapshot of the soil profiles in an area.
It is worth noting that fetchSDA() returns data from
both SSURGO and STATSGO2. To limit to SSURGO, use
WHERE = "areasymbol != 'US'" (see the Data Filtering
section), or specify a SSURGO area symbol.
The duplicates argument (defaulting to
FALSE) controls how components that appear in multiple
survey areas are handled.
duplicates = FALSE (Default): Returns
only one instance of a component per nationalmusym. This
eliminates “duplication” where multiple survey area (legend) map units
are linked to the same source mapunit and
component records. This is generally preferred for
statistical analysis of soil properties.duplicates = TRUE: Returns a record
for each unique map unit key (mukey). This is useful for
visualization or when you need to account for every occurrence of a
component across different survey area correlations.fetchLDM(): Get Laboratory DataThe fetchLDM() function provides access to the Kellogg
Soil Survey Laboratory (KSSL) Data Mart via Soil Data Access. It allows
for querying laboratory data by various criteria and returning it as a
SoilProfileCollection.
You can fetch lab data by specifying an identifier (like an area
symbol) and the column it corresponds to (what
argument):
# Fetch KSSL data for a specific soil survey area (CA630)
# 'what' argument specifies we are searching by 'area_code'
# 'tables' argument specifies which data tables to include (defaults to core tables)
ldm_data <- fetchLDM("CA630", what = "area_code")
# The result is a SoilProfileCollection
ldm_data
# Inspect site data
head(site(ldm_data))You can also search by correlated or sampled-as taxon names using a
custom WHERE clause:
# Fetch lab data where the correlated or sampled name is 'Musick' or 'Holland'
# CASE statement handles differences between correlated and sampled names
ldm_taxon <- fetchLDM(WHERE = "(CASE WHEN corr_name IS NOT NULL
THEN LOWER(corr_name)
ELSE LOWER(samp_name)
END) IN ('musick', 'holland')")
ldm_taxonBy default, fetchLDM() retrieves a standard set of
tables. You can request specific data, such as physical properties,
using the tables argument:
# Fetch physical properties for soils correlated as "Typic Argialbolls"
ldm_phys <- fetchLDM(x = "Typic Argialbolls",
what = "corr_taxsubgrp",
tables = "lab_physical_properties")
# Inspect the available horizon data columns
names(horizons(ldm_phys))fetchLDM supports retrieving data from various KSSL
tables, including chemical properties, mineralogy, and x-ray
analysis.
Compare the same property using different aggregation methods:
# Get clay content using three different methods
methods <- c("Dominant Component (Numeric)", "Weighted Average", "Max")
clay_results <- data.frame()
for (method in methods) {
result <- get_SDA_property(
property = "Total Clay - Rep Value",
method = method,
areasymbols = "CA630",
top_depth = 0,
bottom_depth = 50
)
result$method <- method
clay_results <- rbind(clay_results, result)
}
# compare methods for a single map unit
subset(clay_results, mukey == 1865918)Combine soil properties and interpretations for a land use suitability assessment:
# Get drainage class and hydrologic group
drain_hydro <- get_SDA_property(
property = c("Drainage Class", "Hydrologic Group"),
method = "Dominant Condition",
areasymbols = "CA630"
)
# Get an engineering interpretation
eng_interp <- get_SDA_interpretation(
rulename = "ENG - Septic Tank Absorption Fields",
method = "Dominant Condition",
areasymbols = "CA630"
)
# Explicitly merge on mukey (and other shared columns to avoid duplication)
suitability <- merge(drain_hydro, eng_interp,
by = c("mukey", "areasymbol", "musym", "muname"))
head(suitability)Combine hydric classifications with wetland-related interpretations:
# Get a generalized mapunit-level hydric classification
# see ?get_SDA_hydric for details on method="mapunit"
hydric_stat <- get_SDA_hydric(
areasymbols = "CA077",
method = "MAPUNIT"
)
wet_interp <- get_SDA_interpretation(
rulename = "WMS - Excavated Ponds (Aquifer-fed)",
method = "Dominant Condition",
areasymbols = "CA077"
)
wetland_assess <- merge(hydric_stat, wet_interp,
by = c("mukey", "areasymbol", "musym", "muname"),
all.x = TRUE)
subset(wetland_assess, rating_WMSExcavatedPondsAquiferfed < 0.6)Here we use base R aggregate() (feel free to swap in
your favorite R aggregation method e.g. dplyr,
data.table, or collapse) to summarize across
survey areas:
# Get properties for two areas
props_ca630 <- get_SDA_property(
property = "Total Clay - Rep Value",
method = "Dominant Component (Numeric)",
areasymbols = "CA630"
)
props_ca649 <- get_SDA_property(
property = "Total Clay - Rep Value",
method = "Dominant Component (Numeric)",
areasymbols = "CA649"
)
# Combine
all_props <- rbind(props_ca630, props_ca649)
# Aggregate by area symbol
area_summary <- aggregate(
claytotal_r ~ areasymbol,
data = all_props,
FUN = function(x) {
c(
mean = mean(x, na.rm = TRUE),
median = median(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE),
n = sum(!is.na(x))
)
}
)
area_summaryUse method = "None" to perform custom component-level
analysis:
# Get all component properties without aggregation
clay_by_comp <- get_SDA_property(
property = "Total Clay - Rep Value",
method = "None",
areasymbols = "CA630",
top_depth = 0,
bottom_depth = 25,
include_minors = TRUE
)
head(clay_by_comp)
# Calculate average clay by major vs. minor components
clay_summary <- aggregate(
claytotal_r ~ majcompflag,
data = clay_by_comp,
FUN = mean
)
clay_summarySDA_query() for custom SQL queries
when predefined functions don’t fit your needsTOP,
ISNULL(), CAST() for SDA queriesDominant Component, Weighted Average,
None, etc.) for your analysisdsn parameter to
query local SSURGO databases (see Local
SSURGO vignette)SDA_query() for location-based searches (or use
SDA_spatialQuery())TOP 10, or a
small number of inputs, to verify syntax and result table structurequery_string = TRUE to inspect generated SQLareasymbols or mukeys
parametersmakeChunks() section for suggested chunk sizes)| Table | Primary Key | Parent Table | Description |
|---|---|---|---|
legend |
lkey |
N/A | Soil survey area (e.g., CA630) |
mapunit |
mukey |
legend |
Delineated soil mapping unit |
component |
cokey |
mapunit |
Soil component within a map unit |
chorizon |
chkey |
component |
Soil horizon within a component |
cointerp |
cointerpkey |
component |
Component-level interpretations |
coecoclass |
coecoclasskey |
component |
Component ecological classes |
copmgrp |
copmgrpkey |
component |
Component parent material groups |
mupolygon |
mupolygonkey |
N/A | Map unit polygons |
| Column | Table | Description | Example |
|---|---|---|---|
areasymbol |
legend | Soil survey area code | “CA630” |
musym |
mapunit | Map unit symbol | “7159” |
muname |
mapunit | Map unit name | “Sierra coarse sandy loam, 3 to 8 percent slopes” |
compname |
component | Component/soil series name | “Sierra” |
comppct_r |
component | Component percentage (representative) | 85 |
majcompflag |
component | Major component flag | “Yes”/“No” |
claytotal_r |
chorizon | Total clay percentage | 12.5 |
hzdept_r |
chorizon | Horizon depth top (cm) | 0 |
hzdepb_r |
chorizon | Horizon depth bottom (cm) | 25 |
hydricrating |
component | Hydric soil status | “Yes”/“No” |
mrulename |
cointerp | Interpretation rule name | “FOR - Mechanical Planting Suitability” |
interphr |
cointerp | Interpretation rating/class | “Somewhat Limited” |
mupolygongeo |
mupolygon | Map unit polygon geometry (WKT; WGS84 geographic coordinates in decimal degrees) | “POLYGON (…)” |