findOverlaps-methods {GenomicRanges}R Documentation

GRanges, GRangesList and GappedAlignments Interval Overlaps

Description

Finds interval overlaps between a GRanges, GRangesList or GappedAlignments object and another object containing ranges.

Usage

## S4 method for signature 'GenomicRanges,GenomicRanges'
findOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    select = c("all", "first"), ignore.strand = FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
countOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"), 
    ignore.strand = FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
subsetByOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    ignore.strand = FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
match(x, table,
    nomatch = NA_integer_, incomparables = NULL)
# Also:  x %in% table

Arguments

query, subject, x, table

A GRanges, GRangesList or GappedAlignments object. RangesList and RangedData are also accepted for one of query or subject (x or table for match).

maxgap, minoverlap, type, select

See findOverlaps in the IRanges package for a description of these arguments.

ignore.strand

When set to TRUE, the strand information is ignored in the overlap calculations.

nomatch

The integer value to be returned in the case when no match is found.

incomparables

This value is ignored.

Details

When the query and the subject are GRanges or GRangesList objects, findOverlaps uses the triplet (sequence name, range, strand) to determine which features (see paragraph below for the definition of feature) from the query overlap which features in the subject, where a strand value of "*" is treated as occurring on both the "+" and "-" strand. An overlap is recorded when a feature in the query and a feature in the subject have the same sequence name, have a compatible pairing of strands (e.g. "+"/"+", "-"/"-", "*"/"+", "*"/"-", etc.), and satisfy the interval overlap requirements. Strand is taken as "*" for RangedData and RangesList.

In the context of findOverlaps, a feature is a collection of ranges that are treated as a single entity. For GRanges objects, a feature is a single range; while for GRangesList objects, a feature is a list element containing a set of ranges. In the results, the features are referred to by number, which run from 1 to length(query)/length(subject).

When the query or the subject (or both) is a GappedAlignments object, it is first turned into a GRangesList object (with as( , "GRangesList")) and then the rules described previously apply.

Value

For findOverlaps either a RangesMatching object when select = "all" or an integer vector when select = "first".

For countOverlaps an integer vector containing the tabulated query overlap hits.

For subsetByOverlaps an object of the same class as query containing the subset that overlapped at least one entity in subject.

For match same as findOverlaps when select = "first".

For %in% the logical vector produced by !is.na(match(x, table)).

For RangedData and RangesList, with the exception of subsetByOverlaps, the results align to the unlisted form of the object. This turns out to be fairly convenient for RangedData (not so much for RangesList, but something has to give).

Author(s)

P. Aboyoun, S. Falcon, M. Lawrence, N. Gopalakrishnan and H. Pages

See Also

findOverlaps, RangesMatching-class, GRanges-class, GRangesList-class, GappedAlignments-class

Examples

## ---------------------------------------------------------------------
## WITH GRanges AND/OR GRangesList OBJECTS
## ---------------------------------------------------------------------

## GRanges object:
gr <-
  GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, width = 10:1, names = head(letters,10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10))
gr

## GRangesList object:
gr1 <-
  GRanges(seqnames = "chr2", ranges = IRanges(3, 6),
          strand = "+", score = 5L, GC = 0.45)
gr2 <-
  GRanges(seqnames = c("chr1", "chr1"),
          ranges = IRanges(c(7,13), width = 3),
          strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
gr3 <-
  GRanges(seqnames = c("chr1", "chr2"),
          ranges = IRanges(c(1, 4), c(3, 9)),
          strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1))
grlist <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3)

## Overlapping two GRanges objects:
table(gr %in% gr1)
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)

countOverlaps(gr, gr1, type = "start")
findOverlaps(gr, gr1, type = "start")
subsetByOverlaps(gr, gr1, type = "start")

findOverlaps(gr, gr1, select = "first")

findOverlaps(gr1, gr)
findOverlaps(gr1, gr, type = "start")
findOverlaps(gr1, gr, type = "within")
findOverlaps(gr1, gr, type = "equal")

## Overlapping a GRanges and a GRangesList object:
table(grlist %in% gr)
countOverlaps(grlist, gr)
findOverlaps(grlist, gr)
subsetByOverlaps(grlist, gr)
countOverlaps(grlist, gr, type = "start")
findOverlaps(grlist, gr, type = "start")
subsetByOverlaps(grlist, gr, type = "start")
findOverlaps(grlist, gr, select = "first")

## Overlapping two GRangesList objects:
countOverlaps(grlist, rev(grlist))
findOverlaps(grlist, rev(grlist))
subsetByOverlaps(grlist, rev(grlist))

## ---------------------------------------------------------------------
## WITH A GappedAlignments OBJECT
## ---------------------------------------------------------------------
library(Rsamtools)  # because file ex1.bam is in this package
galn_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
galn <- readGappedAlignments(galn_file)

subject <- granges(galn)[1]

## Note the absence of query no. 9 (i.e. 'galn[9]') in this result:
as.matrix(findOverlaps(galn, subject))

## This is because, by default, findOverlaps()/countOverlaps() are
## strand specific:
galn[8:10]
countOverlaps(galn[8:10], subject)
countOverlaps(galn[8:10], subject, ignore.strand=TRUE)

## Advanced examples:
subsetByOverlaps(galn, subject)
table(match(galn, subject), useNA = "ifany")
table(galn %in% subject)

[Package GenomicRanges version 1.6.2 Index]