This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files. See also http://www.broadinstitute.org/software/igv/igvtools_commandline. --------------------------------------------------------------------------- Starting with shell scripts --------------------------------------------------------------------------- The utilities are invoked from one of the following scripts: igvtools (command line version for linux and Mac OS 10.x) igvtools_gui (gui version for linux and Mac OS 10.x) igvtools_gui.command (alternative double-clickable gui version for Mac OS 10.x) igvtools.bat (command line version for windows) igvtools_gui.bat (gui version for windows) The general form of the command-line version is: igvtools [command] [options][arguments] or igvtools.bat [command] [options][arguments] Recognized commands, options,arguments, and file types are described below. --------------------------------------------------------------------------- Starting with java --------------------------------------------------------------------------- Igvtools can also be started directly using java as shown below. This option allows more control over java parameters, such as the maximum memory to allocate. In the example below igvtools is started with 1500 MB of memory allocated java -Xmx1500m -jar igvtools.jar [command] [options][arguments] To start with a gui the command is java -Xmx1500m -jar igvtools.jar gui --------------------------------------------------------------------------- Memory settings --------------------------------------------------------------------------- The scripts above allocate a fixed amount of memory. If this amount is not available on your platform you will get an obscure error along the lines of "Could not start the Virtual Machine". If this happens you will need to edit the scripts to reduce the amount of memory requested, or use the java startup option. The memory is set via a "-Xmx" parameter. For example -Xmx1500m requests 1500 MB, -Xmx1g requests 1 gigabyte. --------------------------------------------------------------------------- Genome --------------------------------------------------------------------------- The genome argument in the tile and count command can be either an id, or a full path to an IGV .genome file. The id for IGV supplied genomes are listed below. Genome definitions corresponding to these files are in the "genomes" subdirectory of the igvtools install. The id is derived by removing the .extension from the filename. --------------------------------------------------------------------------- COMMANDS --------------------------------------------------------------------------- The recognized commands are tile, count, sort, and index. Note that these utilities are for working with ascii file formats, including SAM, but do not work with BAM files. For manipulating BAM files use samtools. --------------------------------------------------------------------------- Command "tile" --------------------------------------------------------------------------- Warning: This command is deprecated. Use "toTDF" instead. --------------------------------------------------------------------------- Command "toTDF" --------------------------------------------------------------------------- The "toTDF" command converts a sorted data input file to a binary tiled data (.tdf) file. Input file formats supported are .wig, .cn, .igv, and .gct, TCGA mage-tab files, and "list" files. List files are text files containing a list of files in one of the supported formats, one file per line. When using a list file the format of the contained files must be specified explicitly with the "fileType" parameter. List files must end with the extension ".list". File paths can be absolute or relative to the directory containing the list file. Usage: igvtools toTDF [options] [inputFile] [outputFile] [genome] Required arguments: inputFile The input file (see supported formats above). outputFile Binary output file. Must end in ".tdf". genome A genome id or filename. See details below. Default is hg18. Options: -z, --maxZoom num Specifies the maximum zoom level to precompute. The default value is 7 and is sufficient for most files. To reduce file size at the expense of IGV performance this value can be reduced. -f, --windowFunctions list A comma delimited list specifying window functions to use when reducing the data to precomputed tiles. Possible values are min, max, mean, median, p2, p10, p90, and p98. The "p" values represent percentile, so p2=2nd percentile, etc. -p, --probeFile file Specifies a "bed" file to be used to map probe identifiers to locations. This option is useful when preprocessing gct files. The bed file should contain 4 columns: chr start end name where name is the probe name in the gct file. --fileType Explicitly specify the file type. This is a required parameter for TCGA mage-tab and ".list" files. Possible values are mage-tab, .wig, .cn, .igv, and .gct. Only mage-tab files downloaded from the TCGA data center or related sights are supported at this time. Example: igvtoolsh toTDF -z 5 copyNumberFile.cn copyNumberFile.tdf hg18 Notes: Data file formats, with the exception of .gct files, must be sorted by start position. If neccessary files can be sorted with the "sort" command described below. Attempting to preprocess an unsorted file will result in an error. --------------------------------------------------------------------------- Command "count" --------------------------------------------------------------------------- The "count" command computes average feature density over a specified window size across the genome. Common usages include computing coverage for alignment files and counting hits in Chip-seq experiments. Supported file formats are .sam, .bam, .aligned, .sorted.txt, and .bed, and .bam.list files. The latter format is a plain text file containing a list of alignment or bed files, one file per line. Usage: igvtools count [options] [inputFile] [outputFile] [genome] Required arguments: inputFile The input file (see supported formats above). outputFile Either a binary tdf file, a text wig file, or both. The output file type is determined by file extension, for example "output.tdf". To output both formats supply two file names separated by a commas, for example "outputBinary.tdf,outputText.wig". genome A genome id or filename. See details below. Default is hg18. Options: -z, --maxZoom num Specifies the maximum zoom level to precompute. -w, --windowSize num The window size over which coverage is averaged. Defaults to 25 bp. -e, --extFactor num The read or feature is extended by the specified distance in bp prior to counting. This option is useful for chip-seq and rna-seq applications. The value is generally set to the average fragment length of the library. -f, --windowFunctions list A comma delimited list specifying window functions to use when reducing the data to precomputed tiles. Possible values are min, max, mean, median, p2, p10, p90, and p98. The "p" values represent percentile, so p2=2nd percentile, etc. --strands [arg] By default, counting is combined among both strands. This setting outputs the count for each strand separately. Legal argument values are 'read' or 'first'. 'read' Separates count by 'read' strand, 'first' uses the first in pair strand" --bases Count the occurrence of each base (A,G,C,T,N). Takes no arguments --query [querystring] Only count a specific region. Query string has syntax :-. e.g. chr1:100-1000. Input file must be indexed. --minMapQuality [mqual] Set the minimum mapping quality of reads to include. Default is 0. --includeDuplicates Include duplicate alignments in count. Default false. If this flag is included, duplicates are counted. Takes no arguments Notes: The input file must be sorted by start position. The samtools package can be used to sort .bam files. Other files types can be sorted with the "sort" command (see below). Example: igvtools count -z 5 -w 25 -e 250 alignments.bam alignments.cov.tdf hg18 --------------------------------------------------------------------------- Command "sort" --------------------------------------------------------------------------- Sorts the input file by start position. This command supports the following file formats: .cn, .igv, .sam, .aligned, and .bed. NOTE: This command will not sort a binary (BAM) file. Use samtools to sort and index BAM files. Usage: igvtools sort [options] [inputFile] [outputFile] Options: -t, --tmpDir tmpdir Specify a temporary working directory. For large input files this directory will be used to store intermediate results of the sort. The default is the users temp directory. -m, --maxRecords number The maximum number of records to keep in memory during the sort. The default value is 500000. Increase this number if you receive "too many open files" errors. Decrease it if you experience "out of memory" errors. --------------------------------------------------------------------------- Command "index" --------------------------------------------------------------------------- Creates an index for an alignment or the bed feature file formats. Indexes required for loading alignment files into IGV, and can significantly improve performance for large feature files. The input file must be sorted by start position. This command does not take an output file argument, rather the filename is generated by appending ".sai" (for alignments) or ".idx" (for features) to the input filename. IGV relies on this naming convention to find the index. Supported file formats are .sam, .aligned, .sorted.txt, and .bed. NOTE: This command will not index a binary (BAM) file. Use samtools to sort and index BAM files. Usage: igvtools index [inputFile] --------------------------------------------------------------------------- Command "formatexp" --------------------------------------------------------------------------- Format GCT or RES files for display. This should only be used if the file has not previously been log-transformed and has no negative numbers. The module: 1. Takes the log2 of the data. 2. Computes the median and subtracts it from each log2 probe value (i.e., centers on the median). 3. Computer the MAD (mean absolute deviation) using the definition here: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/mad.html 4. Divides each log2 probe value by the MAD. Supported input file formats are: .gct and .res Usage: igvtools formatexp [inputFile] [outputFile] --------------------------------------------------------------------------- Command "gui" --------------------------------------------------------------------------- Start the igvtools gui Usage: igvtools gui --------------------------------------------------------------------------- Command "help" --------------------------------------------------------------------------- "igvtools help" will display a list of available commands. "igvtools help [command]" displays help on a particular command. Example: igvtools help index --------------------------------------------------------------------------- Command "version" --------------------------------------------------------------------------- Prints the igvtools version number.