Importing and Demultiplexing Illumina BCL Files

Primary sequencing output of an Illumina sequencer are per-cycle base call (bcl) files, which first need to be converted to fastq format, so that the data can be pushed to downstream applications. The Partek^® Flow^® software comes with a conversion tool that can be used to import data in the bcl file format . In addition to the file conversion, this tool also demultiplexes the bcl files in the same step and, hence, outputs demultiplexed fastq files as the primary result.

To base a new project on bcl files, first select Import data option on the Data tab and then push the Import bcl files button. The resulting window shows the configuration dialog (Figure 1).

Figure 1. Bcl file import setup dialog. Required input includes: RunInfo.xml file, SampleSheet.csv file, and a directory hosting .bcl files

The RunInfo.xml file is generated by the primary analysis software and contains information on the run, flow cell, the instrument, the time stamp, and the read structure (number of reads, number of cycles per read, whether a read is an index read). The SampleSheet.csv file provides the information on the relationship between the samples and indices specified during library creation. Although it has four sections, two sections (Settings and Data) are important for the data import and conversion. Finally, the bcl files hold the base calls and are contained within the BaseCalls directory (note that the Select base calls directory option needs to point to the directory, not to an individual bcl file). For more information on the files, consult Illumina documentation.

Selecting the Configure option under Advanced options section enables a granular control of the import (Figure 2). The Select tiles option (--tiles) enables the user to process only a subset of tiles available in the flow cell. The input for this option is a comma-separated list of regular expressions. Min trimmed read length (--minimum-trimmed-read-length) specifies the minimum read length after adapter removal. Mask short adapter reads (--mask-short-adapter-reads) applies when a read is trimmed below the length specified by Min trimmed read length. If the number of bases after adapter removal is less than Min trimmed read length, it forces the read length to be equal to Min trimmed read length by replacing the adapter bases that fall below the specified length by Ns. If the number of remaining bases falls below Mask short adapter sequences, then it replaces all the bases in a read with Ns. Adapter stringency (--adapter-stringency) specifies the minimum match rate that triggers the masking or trimming of adapters. The rate is calculated as MatchCount / (MatchCount + MismatchCount). Only the reads exceeding the specified rate of sequence identity with adapters are trimmed. Barcode mismatches (--barcode-mismatches) controls the number of allowed mismatches per index sequence. Do not split files by lane (--no-lane-splitting) prevents splitting of fastq files by lane, i.e. the converter will merge multiple lanes and generate one fastq file per sample.

Figure 2. Advanced options of bcl importer

The result of the import is an Unaligned reads data node, containing demultiplexed fastq files.

For more information about the BCL to FASTQ conversion tool, including information on the proper folder structure and instructions for formatting the SampleSheet.csv file, please consult the bcl2fastq2 Conversion Software Guide.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating:

Results:

3

rates

Partek Flow Documentation

Page tree

Additional Assistance