A Python3.6.1+ package to trim and extract flags from FASTA and FASTQ files.
Requirements
fastx-barber
has been tested with Python 3.6.1, 3.7, and 3.8. We recommend installing it using pipx
(see below) to avoid dependency conflicts with other packages. The packages it depends on are listed in our dependency graph. We use poetry
to handle our dependencies.
Install
We recommend installing fastx-barber
using pipx
. Check how to install pipx
here if you don’t have it yet! Once you have pipx
ready on your system, install the latest stable release of fastx-barber
by running: pipx install fastx-barber
. If you see the stars (✨ 🌟 ✨), then the installation went well!
Features
- Works on both FASTA and FASTQ files.
- Selects reads based on a pattern (regex).
- Trims reads by pattern (regex), length, or single-base quality.
- Extracts parts (flags) of reads based on a pattern (either a regular expression or a simple alfanumeric pattern in the format
AAA111BBB222
, whereAAA
andBBB
are flag name and111
and222
are flag lengths), and stores them in the read headers.- Optionally extracts the corresponding portions of the quality string (only for fastq files).
- Optionally filters based on quality score of extracted flags (only for fastq files).
- Supports Sanger QSCORE definition (not old Solexa/Illumina one).
- Supports custom PHRED offset.
- Optionally exports reads that do not pass the specified filters.
- Optionally split output based on flag value.
- Optionally calculates the frequency of each value of a set of flags (flagstats).
- Filtering by flag quality, splitting by flag value, and calculating flag value frequency are all features available also as separate scripts. This allows to perform these operations on files with previously extracted flags.
- Filters a FASTX file with extracted flags by applying patterns to different flags.
- Generates BED file with the locations of a substring in FASTX records.
- Regular expression support fuzzy matching (fuzzy matching might affect the barber’s speed).
- Optionally exports reads that do not match the provided pattern(s).
- Parallelizes processing by splitting the fastx file in chunks.
Usage
Run:
fbarber
to access the barber’s services.fbarber flag
to extract or manipulate read flags.fbarber match
to select reads based on a pattern (regular expression).fbarber trim
to trim your reads.
Add -h
to see the full help page of a command or visit the usage page!
Contributing
We welcome any contributions to fastx-barber
. In short, we use black
to standardize code format. Any code change also needs to pass mypy
checks. For more details, please refer to our contribution guidelines if this is your first time contributing! Also, check out our code of conduct.
License
MIT License - Copyright (c) 2020 Gabriele Girelli