% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vcf-tables.R
\name{vcftable}
\alias{vcftable}
\title{read VCF/BCF contents into R data structure}
\usage{
vcftable(
  vcffile,
  region = "",
  samples = "-",
  vartype = "all",
  format = "GT",
  ids = NULL,
  qual = 0,
  pass = FALSE,
  info = TRUE,
  collapse = TRUE,
  setid = FALSE,
  mac = 0,
  rmdup = FALSE
)
}
\arguments{
\item{vcffile}{path to the VCF/BCF file}

\item{region}{region to subset in bcftools-like style: "chr1", "chr1:1-10000000"}

\item{samples}{samples to subset in bcftools-like style.
comma separated list of samples to include (or exclude with "^" prefix).
e.g. "id01,id02", "^id01,id02".}

\item{vartype}{restrict to specific type of variants. supports "snps","indels", "sv", "multisnps","multiallelics"}

\item{format}{the FORMAT tag to extract. default "GT" is extracted.}

\item{ids}{character vector. restrict to sites with ID in the given vector. default NULL won't filter any sites.}

\item{qual}{numeric. restrict to variants with QUAL > qual.}

\item{pass}{logical. restrict to variants with FILTER = "PASS".}

\item{info}{logical. drop INFO column in the returned list.}

\item{collapse}{logical. It acts on the FORMAT. If the FORMAT to extract is "GT", the dim of raw genotypes matrix of diploid is (M, 2 * N),
where M is #markers and N is #samples. default TRUE will collapse the genotypes for each sample such that the matrix is (M, N).
Set this to FALSE if one wants to maintain the phasing order, e.g. "1|0" is parsed as c(1, 0) with collapse=FALSE.
If the FORMAT to extract is not "GT", then with collapse=TRUE it will try to turn a list of the extracted vector into a matrix.
However, this raises issues when one variant is mutliallelic resulting in more vaules than others.}

\item{setid}{logical. reset ID column as CHR_POS_REF_ALT.}

\item{mac}{integer. restrict to variants with minor allele count higher than the value.}

\item{rmdup}{logical. remove duplicated sites by keeping the first occurrence of POS. (default: FALSE)}
}
\value{
Return a list containing the following components:
\describe{
\item{samples}{: character vector; \cr
the samples ids in the VCF file after subsetting
}

\item{chr}{: character vector; \cr
the CHR column in the VCF file
}

\item{pos}{: character vector; \cr
the POS column in the VCF file
}

\item{id}{: character vector; \cr
the ID column in the VCF file
}

\item{ref}{: character vector; \cr
the REF column in the VCF file
}

\item{alt}{: character vector; \cr
the ALT column in the VCF file
}

\item{qual}{: character vector; \cr
the QUAL column in the VCF file
}

\item{filter}{: character vector; \cr
the FILTER column in the VCF file
}

\item{info}{: character vector; \cr
the INFO column in the VCF file
}

\item{format}{: matrix of either integer or numberic values depending on the tag to extract; \cr
a specifiy tag in the FORMAT column to be extracted
}
}
}
\description{
The swiss army knife for reading VCF/BCF into R data types rapidly and easily.
}
\details{
\code{vcftable} uses the C++ API of vcfpp, which is a wrapper of htslib, to read VCF/BCF files.
Thus, it has the full functionalities of htslib, such as restrict to specific variant types,
samples and regions. For the memory efficiency reason, the \code{vcftable} is designed
to parse only one tag at a time in the FORMAT column of the VCF. In default, only the matrix of genotypes,
i.e. "GT" tag, are returned by \code{vcftable}, but there are many other tags supported by the \code{format} option.
}
\examples{
library('vcfppR')
vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR")
res <- vcftable(vcffile, "chr21:1-5050000", vartype = "snps")
str(res)
}
\author{
Zilong Li \email{zilong.dk@gmail.com}
}
