agptools commands

agptools's command-line interface consists of several sub-commands, in the style of, e.g., samtools or git. Here we document each of the commands including an example of their usage.

Example AGP

Here is an example input AGP file that will be used to demonstrate the usage of the modules in the following sections:

scaffold_16  1         1096465   1   W  tig00005080 1    1096465 -
scaffold_16  1096466   1096965   2   N  500     scaffold        yes  na
scaffold_16  1096966   1973201   3   W  tig00001012 1    876236  +
scaffold_16  1973202   1973701   4   N  500     scaffold        yes  na
scaffold_16  1973702   4258994   5   W  tig00182876 1    2285293 -
scaffold_16  4258995   4259494   6   N  500     scaffold        yes  na
scaffold_16  4259495   11764263  7   W  tig00000113 1    7504769 +
scaffold_16  11764264  11764763  8   N  500     scaffold        yes  na
scaffold_16  11764764  13768005  9   W  tig00004962 1    2003242 -
scaffold_16  13768006  13768505  10  N  500     scaffold        yes  na
scaffold_16  13768506  17994060  11  W  tig00004933 1    4225555 -
scaffold_16  17994061  17994560  12  N  500     scaffold        yes  na
scaffold_16  17994561  21066363  13  W  tig00000080 1    3071803 -
scaffold_16  21066364  21066863  14  N  500     scaffold        yes  na
scaffold_16  21066864  21807834  15  W  tig00183148 1    740971  +

It contains eight contigs (and seven gaps, naturally).

flip

The flip command reverse complements a segment of a scaffold. The use case for this is the scaffolder putting piece of a scaffold in the wrong orientation.

The required arguments for the flip command are a list of flips to make and the input agp you want to modify. The list of flips has three columns: 1. The name of the scaffold you want to flip a piece of 2. The beginning position in base pairs, in scaffold coordinates, of the piece you want to flip. 3. The end position in base pairs, in scaffold coordinates, of the piece you want to flip.

Both coordinates are in the usual DNA range format (i.e., "1-100" takes everything from the first base of the sequence to the 100th base of the sequence, including the first and 100th base). The begin coordinate must be at the beginning of a component and the end coordinate must be at the end of a component. To do something more complicated, like reverse complement only part of a contig, use the split and join modules instead.

Here is an example flips file:

scaffold_16   1        1973201
scaffold_16   11764764 21066363

Here is the command you would use to perform these flips on the example file shown in the introduction:

agptools flip flips.tsv test_in.agp > test_flip.agp

And here is the output:

scaffold_16  1         876236    1    W   tig00001012 1    876236      -
scaffold_16  876237    876736    2    N   500     scaffold        yes     na
scaffold_16  876737    1973201   3    W   tig00005080 1    1096465     +
scaffold_16  1973202   1973701   4    N   500     scaffold        yes     na
scaffold_16  1973702   4258994   5    W   tig00182876 1    2285293     -
scaffold_16  4258995   4259494   6    N   500     scaffold        yes     na
scaffold_16  4259495   11764263  7    W   tig00000113 1    7504769     +
scaffold_16  11764264  11764763  8    N   500     scaffold        yes     na
scaffold_16  11764764  14836566  9    W   tig00000080 1    3071803     +
scaffold_16  14836567  14837066  10   N   500     scaffold        yes     na
scaffold_16  14837067  19062621  11   W   tig00004933 1    4225555     +
scaffold_16  19062622  19063121  12   N   500     scaffold        yes     na
scaffold_16  19063122  21066363  13   W   tig00004962 1    2003242     +
scaffold_16  21066364  21066863  14   N   500     scaffold        yes     na
scaffold_16  21066864  21807834  15   W   tig00183148 1    740971      +

split

The split module breaks a scaffold into multiple scaffolds. There are two common cases where this operation may be necessary: 1. The scaffolder (or even contig assembler) joined two sequences together that aren't actually on the same chromosome. 2. The scaffolder joined two sequences together that are in fact on the same chromosome, but it did it in the wrong order, so you want to split the scaffold up into pieces and then put the pieces back together in a different order later with the join module.

The two required inputs to the split module are the AGP you want to operate on and a list of splits you want to make. The output is the edited AGP file. The format of the list of splits is a tab-separated file with the following columns: 1. Name of the scaffold you want to split into parts 2. A comma-separated list of breakpoint coordinates where you want to break the scaffold

Let's say you want to split the example AGP into three pieces: one containing the first three contigs, one containing the next four, and one containing the last contig. Here is what the relevant line of your splits file would look like:

scaffold_16  4258995,21066364

Note that I used the BEGIN coordinate of the gaps, but you can use any coordinate inside the gap and it will have the same result.

Here is the command:

agptools split splits.txt test.agp > split_out.agp

Here is the output:

scaffold_16.1   1        1096465    1   W   tig00005080  1  1096465 -
scaffold_16.1   1096466  1096965    2   N   500 scaffold    yes na
scaffold_16.1   1096966  1973201    3   W   tig00001012  1  876236  +
scaffold_16.1   1973202  1973701    4   N   500 scaffold    yes na
scaffold_16.1   1973702  4258994    5   W   tig00182876  1  2285293 -
scaffold_16.2   1        7504769    1   W   tig00000113  1  7504769 +
scaffold_16.2   7504770  7505269    2   N   500 scaffold    yes na
scaffold_16.2   7505270  9508511    3   W   tig00004962  1  2003242 -
scaffold_16.2   9508512  9509011    4   N   500 scaffold    yes na
scaffold_16.2   9509012  13734566   5   W   tig00004933  1  4225555 -
scaffold_16.2   1373456  13735066   6   N   500 scaffold    yes na
scaffold_16.2   13735067 16806869   7   W   tig00000080  1  3071803 -
scaffold_16.3   1        740971     1   W   tig00183148  1  740971  +

join

The join module is for taking two different scaffolds and joining them into one scaffold. Common use-cases for this include: * The scaffolder failed to join two contigs that belong together * You want to split up the pieces of a scaffold and put them back together again in a different order

The two required arguments for this module are a file specifying what joins you want to make, and the AGP file you want to modify. The joins list contains one join per line. Each line is a comma-separated list of scaffolds you want to put together in the correct order. You can prefix the name of a scaffold with '+' or '-' to specify its orientation; scaffolds with no orientation specified are '+' by default.

You can also change the default size, type, and evidence of the newly created gaps with command-line arguments. See help message for details.

Here is an example joins file:

scaffold_16.2,-scaffold_16.3,+scaffold_16.1

Here is an example command:

agptools join joins.txt split_out.agp > join_out.agp

And here is the output of that command:

scaffold_16.2p16.3p16.1 1        7504769    1   W   tig00000113 1   7504769 +
scaffold_16.2p16.3p16.1 7504770  7505269    2   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 7505270  9508511    3   W   tig00004962 1   2003242 -
scaffold_16.2p16.3p16.1 9508512  9509011    4   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 9509012  1373456    5   W   tig00004933 1   4225555 -
scaffold_16.2p16.3p16.1 13734567 1373506    6   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 13735067 1680686    7   W   tig00000080 1   3071803 -
scaffold_16.2p16.3p16.1 16806870 1680736    8   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 16807370 1754834    9   W   tig00183148 1   740971 -
scaffold_16.2p16.3p16.1 17548341 1754884    1   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 17548841 1864530    1   W   tig00005080 1   1096465 -
scaffold_16.2p16.3p16.1 18645306 1864580    1   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 18645806 1952204    1   W   tig00001012 1   876236 +
scaffold_16.2p16.3p16.1 19522042 1952254    1   N   500 scaffold    yes na
scaffold_16.2p16.3p16.1 19522542 2180783    1   W   tig00182876 1   2285293 -

You can also specify a new name to use instead of the "16.2p16.3p16.1" scheme. Add a column to the joins file after a tab giving the new name. For example,

scaffold_16.2,-scaffold_16.3,+scaffold_16.1       chr1

will result in

chr1   1        7504769    1   W   tig00000113 1   7504769 +
chr1   7504770  7505269    2   N   500 scaffold    yes na
chr1   7505270  9508511    3   W   tig00004962 1   2003242 -
chr1   9508512  9509011    4   N   500 scaffold    yes na
chr1   9509012  1373456    5   W   tig00004933 1   4225555 -
chr1   13734567 1373506    6   N   500 scaffold    yes na
chr1   13735067 1680686    7   W   tig00000080 1   3071803 -
chr1   16806870 1680736    8   N   500 scaffold    yes na
chr1   16807370 1754834    9   W   tig00183148 1   740971 -
chr1   17548341 1754884    1   N   500 scaffold    yes na
chr1   17548841 1864530    1   W   tig00005080 1   1096465 -
chr1   18645306 1864580    1   N   500 scaffold    yes na
chr1   18645806 1952204    1   W   tig00001012 1   876236 +
chr1   19522042 1952254    1   N   500 scaffold    yes na
chr1   19522542 2180783    1   W   tig00182876 1   2285293 -

assemble

Once you've got your final final agp, you probably want to create a fasta of these nice new corrected scaffolds. The assemble module takes a fasta file containing the original contigs and an agp of how you want to assemble these contigs into scaffolds, and outputs a fasta of the assembled scaffolds. For example,

agptools assemble contigs.fa corrected_scaffolds.agp > corrected_scaffolds.fa

Please note that if using SALSA as your scaffolder, it makes some breaks to input contigs based on the Hi-C data, and then gives the pieces different names (e.g., "contig1" to "contig1_1" and "contig1_2"), so you should use the fasta containing broken contigs as the contigs argument to assemble rather than the actual original contigs you started out with. This file is called assembly.clean.fasta and lives in the same directory as the rest of the SALSA output.

remove

You may want to get rid of some scaffolds because they contain contamination, duplication, or something else. The remove module can help with that. Just give it a list of scaffolds you don't want in the final assembly and it will remove them. The list of scaffolds to remove should have one scaffold per line, e.g.,

scaffold_5
scaffold_7

Running this command:

agptools remove scaffolds_to_remove.txt scaffolds.agp > corrected_scaffolds.agp

would output scaffolds.agp but with all lines corresponding to scaffolds_5 and scaffolds_7 removed.

rename

Often, you end up with scaffolds that correspond to whole chromosomes, and you want to therefore give them names befitting chromosomes. The input file for this command has two required columns and an optional third one:

Current name of scaffold
New name of scaffold
Orientation, either + or - (optional). If this column is -, the new scaffold will be reverse-oriented compared to the input. If this column is left blank, the new scaffold will be exactly the same as the old one, just with a different name.

transform

You may have bed files of alignments or annotations where the coordinates are given in reference to the original contigs, but you want those coordinates transformed to your new scaffolds. This module makes those transformations. For example, if you had this bed file:

tig00005080   10952    10960
tig00004962   1        2003242

and wanted to convert it to scaffold coordinates, you would run the command:

agptools transform contig_coordinates.bed example.agp > scaffold_coordinates.bed

and the output would be

scaffold_16   10952     10960
scaffold_16   11764764  13768005