Package 'pafr' reference manual

Title:	Read, Manipulate and Visualize 'Pairwise mApping Format' Data
Description:	Provides functions to read, process and visualize pairwise sequence alignments in the 'PAF' format used by 'minimap2' and other whole-genome aligners. 'minimap2' is described by Li H. (2018) <doi:10.1093/bioinformatics/bty191>.
Authors:	David Winter [aut, cre] , Kate Lee [ctb] , Murray Cox [ctb]
Maintainer:	David Winter <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.2
Built:	2025-03-03 04:49:29 UTC
Source:	https://github.com/dwinter/pafr

Coerce a data.frame or tibble into a pafr object

Description

The main reason to use this function is speed up the process of reading in a large paf file that has no tags. Functions like read.table, read_delim (reader) and fread (data.table) can process a 12 column file more quickly than pafr's read_paf. If you you do not need tag data for your analyses or visualizations, it might make sense to use a fast reading function to get a 12 column data.frame, convert that data.frame into a 'pafr object with this function. The 'pafr' object can then work easily with the functions in this package.

Usage

as_paf(paf_data_frame)
as_paf(paf_data_frame)

Arguments

paf_data_frame

a data.frame object with 12 columns. Column names and types will be over-written in the returned object

Value

a pafr object

Extract the sizes of all sequences in a paf alignment

Description

Extract the sizes of all sequences in a paf alignment

Usage

chrom_sizes(ali)
chrom_sizes(ali)

Arguments

ali

pafr or tibble containing the genome alignment (as returned by read_paf)

Value

list with two elements (tlens and qlens) Each element is a dataframe with one column of sequence names and another column containing the length of each sequence

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
chrom_sizes(ali)
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
chrom_sizes(ali)

Generate a dot plot from a paf alignment

Description

Generate a dot plot from a paf alignment

Usage

dotplot(
  ali,
  order_by = c("size", "qstart", "provided"),
  label_seqs = FALSE,
  dashes = TRUE,
  ordering = list(),
  alignment_colour = "black",
  xlab = "query",
  ylab = "target",
  line_size = 2
)
dotplot(
  ali,
  order_by = c("size", "qstart", "provided"),
  label_seqs = FALSE,
  dashes = TRUE,
  ordering = list(),
  alignment_colour = "black",
  xlab = "query",
  ylab = "target",
  line_size = 2
)

Arguments

`ali`	pafr or tibble containing the genome alignment (as returned by `read_paf`)
`order_by`	How the query and target sequences should be ordered in the dot plot. Option must be one of 'size' (smallest-to-largest), 'qstart' (query organised smallest to largest, target by first match in the query genome) or 'provided' (ordering as specified in the `ordering` argument)
`label_seqs`	boolean If TRUE, label centre of query and target sequences in margins of the dot plot
`dashes`	boolean If TRUE, add dashes to borders of query and target sequences in the dot plot
`ordering`	If `order_by` is set to TRUE, this variable should be a list with two elements specifying the order of query and then target sequences in the dot plot. This option is ignored if `order_by` is set to other values
`alignment_colour`	character The colour used to draw each aligned section in the dot plot (defaults to black)
`xlab`	character The x-axis label (defaults to 'query')
`ylab`	character The y-axis label (defaults to 'target')
`line_size`	The width of the line used to represent an alignment in the dot plot (defaults to 2)

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
dotplot(ali)
dotplot(ali) + theme_bw()
dotplot(ali, label_seqs=TRUE, order_by="qstart", alignment_colour="blue")
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
dotplot(ali)
dotplot(ali) + theme_bw()
dotplot(ali, label_seqs=TRUE, order_by="qstart", alignment_colour="blue")

Remove secondary alignments from a pafr alignment

Description

Remove secondary alignments from a pafr alignment

Usage

filter_secondary_alignments(ali, remove_inversions = FALSE)
filter_secondary_alignments(ali, remove_inversions = FALSE)

Arguments

`ali`	Genomic alignment in `pafr` or `tbl_df` format, as returned by `read_paf`
`remove_inversions`	logical If TRUE, also remove inversions (tp flag 'I' or 'i') from the alignment

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") ) 
ali
filter_secondary_alignments(ali)
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") ) 
ali
filter_secondary_alignments(ali)

Number formatters for scales in base pairs

Description

For use with ggplot2

Usage

Gb_lab(x)

Mb_lab(x)

Kb_lab(x)
Gb_lab(x)

Mb_lab(x)

Kb_lab(x)

Arguments

`x`	The data (in base pairs) to be formatted as Gb, Mb or Kb

Value

A character vector with scale labels

Examples

## Not run: 
ali <- read_paf(system.file("extdata", "fungi.paf", package="pafr"))
doplot(ali) + scale_x_continuous("Genomic position", label=Mb_lab)

## End(Not run)
## Not run: 
ali <- read_paf(system.file("extdata", "fungi.paf", package="pafr"))
doplot(ali) + scale_x_continuous("Genomic position", label=Mb_lab)

## End(Not run)

Highlight segments of a query or target genome in a dot plot

Description

This plot is intended to be used in conjunction with link{dotplot}. Adding higlight_query or highlight_target to a dotplot function call (see examples below) will add a rectangular 'highlight' corresponding to a particular genomic interval in the corresponding genome.

Usage

highlight_query(bed, fill = "yellow", colour = "black", alpha = 0.6)

highlight_target(bed, fill = "yellow", colour = "black", alpha = 0.6)
highlight_query(bed, fill = "yellow", colour = "black", alpha = 0.6)

highlight_target(bed, fill = "yellow", colour = "black", alpha = 0.6)

Arguments

`bed`	`data.frame` or `tbl_df` containing a bed file, as returned by `read_bed`. Should contain three columns named 'chrom', 'start' and 'end'
`fill`	character Fill colour for highlight segment
`colour`	character Outline colour for highlight segment
`alpha`	character Opacity ([0-1]) for highlight segment

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
cen <- read_bed(system.file("extdata", "Q_centro.bed", package="pafr"))
dotplot(ali) + highlight_query(cen)
interval <- data.frame(chrom="T_chr3", start=2000000, end=3000000)
dotplot(ali, label_seqs=TRUE) + 
   highlight_target(interval)
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
cen <- read_bed(system.file("extdata", "Q_centro.bed", package="pafr"))
dotplot(ali) + highlight_query(cen)
interval <- data.frame(chrom="T_chr3", start=2000000, end=3000000)
dotplot(ali, label_seqs=TRUE) + 
   highlight_target(interval)

Plot the regions of one genome that are covered by alignments in a paf file

Description

Each sequence in the focal genome is displayed as a rectangle, with regions covered by an alignment shaded as per the fill argument described below. Uncovered regions remain white.

Usage

plot_coverage(
  ali,
  target = TRUE,
  fill = "forestgreen",
  direct_label = TRUE,
  label_colour = "black",
  xlab = "Position in sequence",
  x_labeller = Mb_lab
)
plot_coverage(
  ali,
  target = TRUE,
  fill = "forestgreen",
  direct_label = TRUE,
  label_colour = "black",
  xlab = "Position in sequence",
  x_labeller = Mb_lab
)

Arguments

`ali`	alignment An alignment as read by `read_paf`
`target`	logical If TRUE, display coverage for the target genome; if FALSE, display coverage for the query
`fill`	character How to colour the alignment blocks. If the character provided is the name of a column in the alignment, this column will be passed to `ggplot2` to shade alignment blocks. Otherwise, the character is treated as a single colour to be used for all alignment blocks.
`direct_label`	logical If TRUE, use geom_text to directly label the name of the focal sequences; if FALSE, no direct labels are drawn
`label_colour`	character Colour used for direct labels
`xlab`	string Name for the x-axis
`x_labeller`	function Function to be used to label the x-axis (defaults to `Mb_lab`

Details

Note that this function uses theme_coverage_plot to style the graph. Using another ggplot theme on the plot may produce unexpected results.

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
plot_coverage(ali)
plot_coverage(ali, fill='qname', direct_label=FALSE) + 
   scale_fill_brewer(palette="Set1")
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
plot_coverage(ali)
plot_coverage(ali, fill='qname', direct_label=FALSE) + 
   scale_fill_brewer(palette="Set1")

Plot synteny between a query and target sequence in a PAF alignment

Description

Plot synteny between a query and target sequence in a PAF alignment

Usage

plot_synteny(
  ali,
  q_chrom,
  t_chrom,
  centre = TRUE,
  rc = FALSE,
  xlab = "Position in query",
  ylab = "",
  x_labeller = Mb_lab
)
plot_synteny(
  ali,
  q_chrom,
  t_chrom,
  centre = TRUE,
  rc = FALSE,
  xlab = "Position in query",
  ylab = "",
  x_labeller = Mb_lab
)

Arguments

`ali`	pafr or tibble containing the genome alignment (as returned by `read_paf`)
`q_chrom`	character Name for the query sequence
`t_chrom`	character Name for the target sequence
`centre`	logical If TRUE (default), adjust the position of the target sequence, so it is centred on the query. If not, both sequences start at position zero
`rc`	logical If TRUE, use the reverse and complement for the target sequence
`xlab`	string Name for the x-axis
`ylab`	string Name for the y-axis
`x_labeller`	Function to be used to label the x-axis

Value

A ggplot object that displays synteny between query and target sequences

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
long_ali <- subset(ali, alen > 1e4)
plot_synteny(long_ali, q_chrom="Q_chr3", t_chrom="T_chr4", centre=TRUE)
plot_synteny(long_ali, q_chrom="Q_chr5", t_chrom="T_chr5", centre=TRUE)
plot_synteny(long_ali, q_chrom="Q_chr5", t_chrom="T_chr5", centre=TRUE, rc=TRUE)
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
long_ali <- subset(ali, alen > 1e4)
plot_synteny(long_ali, q_chrom="Q_chr3", t_chrom="T_chr4", centre=TRUE)
plot_synteny(long_ali, q_chrom="Q_chr5", t_chrom="T_chr5", centre=TRUE)
plot_synteny(long_ali, q_chrom="Q_chr5", t_chrom="T_chr5", centre=TRUE, rc=TRUE)

Read genomic intervals in bed format

Description

The first three columns of the file specified by file_name must contain data in the standard bed format (i.e., a genomic interval represented by 0-based half-open interval with seq-id, start and end position). These columns will be renamed to 'chrom', 'start' and 'end', respectively. Any other columns present in the data will be left unmodified.

Usage

read_bed(file_name, tibble = FALSE, ...)
read_bed(file_name, tibble = FALSE, ...)

Arguments

`file_name`	Path to the bed file to be read in
`tibble`	logical If TRUE, the genomic intervals are returned as a tidy `tbl_df`.
`...`	Other arguments passed to `read.table`

Details

The file is read into memory with read.table, with the argument sep set to '\t' and stringsAsFactors set to FALSE. All other arguments are left as default, but arguments can be passed from read_bed to read.table.

Value

Either a data.frame or a tbl_df with at least three columns named 'chrom', 'start' and 'end'

Examples

bed_path <- system.file("extdata", "Q_centro.bed", package="pafr")
centro <- read_bed(bed_path)
centro
# Can pass arguments to read.table
miss_two <- read_bed(bed_path, skip=2)
miss_two 
bed_path <- system.file("extdata", "Q_centro.bed", package="pafr")
centro <- read_bed(bed_path)
centro
# Can pass arguments to read.table
miss_two <- read_bed(bed_path, skip=2)
miss_two

Read a genomic alignment in PAF format

Description

See the package vignette for detailed information on the file format and its representation as an R object.

Usage

read_paf(file_name, tibble = FALSE, include_tags = TRUE)
read_paf(file_name, tibble = FALSE, include_tags = TRUE)

Arguments

`file_name`	Path to the .paf file
`tibble`	logical If TRUE, the genomic alignments are returned as a tidy `tbl_df`
`include_tags`	logical if TRUE (default) read additional information about each alignment encoded as PAF tags. Setting this to FALSE will speed up parsing of paf alignments, specially those with large CIGAR strings/

Value

Either a pafr object, which acts as a data.frame, or a tbl_df containing information on genomic alignments. The contents of this table are described in detail in the pafr package vingette.

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
ali
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
ali

A minimalistic ggplot2 theme designed for use with genome coverage plots

Description

This theme is used as the default when plot_coverage is called, so you should usually only call this function to modify the appearance of the coverage plot.

Usage

theme_coverage_plot(facet_labs = TRUE, show_legend = TRUE)
theme_coverage_plot(facet_labs = TRUE, show_legend = TRUE)

Arguments

`facet_labs`	logical If TRUE (default), label sequences using the facet labels; if FALSE, sequences are labeled directly using `geom_text`
`show_legend`	logical If TRUE (default), label display any legend associated with the fill parameter of `plot_coverage`; if FALSE, do not display a legend

Examples

ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
plot_coverage(ali) + theme_coverage_plot(show_legend=FALSE)
ali <- read_paf( system.file("extdata", "fungi.paf", package="pafr") )
plot_coverage(ali) + theme_coverage_plot(show_legend=FALSE)

Package 'pafr'

Help Index

Coerce a data.frame or tibble into a pafr object

Description

Usage

Arguments

Value

See Also

Extract the sizes of all sequences in a paf alignment

Description

Usage

Arguments

Value

Examples

Generate a dot plot from a paf alignment

Description

Usage

Arguments

Examples

Remove secondary alignments from a pafr alignment

Description

Usage

Arguments

Examples

Number formatters for scales in base pairs

Description

Usage

Arguments

Value

Examples

Highlight segments of a query or target genome in a dot plot

Description

Usage

Arguments

Examples

Plot the regions of one genome that are covered by alignments in a paf file

Description

Usage

Arguments

Details

Examples

Plot synteny between a query and target sequence in a PAF alignment

Description

Usage

Arguments

Value

Examples

Read genomic intervals in bed format

Description

Usage

Arguments

Details

Value

Examples

Read a genomic alignment in PAF format

Description

Usage

Arguments

Value

Examples

A minimalistic ggplot2 theme designed for use with genome coverage plots

Description

Usage

Arguments

Examples