The swBuildDb module allows generating SHAPEwarp-compliant databases starting from RNA reactivity profiles. Reactivity profiles must be provided in the RNAframework's XML format

Usage

To list the required parameters, simply type:

$ swBuildDb --help
Parameter Type Description
-o or --output string Output database folder (Default: sw_db/)
-ow or --overwrite Overwrites output database folder if already existing
--threads int Number of processors to use (Default: 1)
--blockSize int Size (in nt) of the blocks for shuffling (Default: 10)
--inBlockShuffle int Besides shuffling blocks, residues within each block will be shuffled as well
--chunkSize int For each shuffling, only a chunk of this size will be extracted and used to build the shuffled database (Default: 1000)
Note: this setting works fine for short queries (<1000 nt). If you plan to search longer queries, then it is advisable to increase the value of chunkSize
--shufflings int Number of shufflings to perform for each database entry (Default: 100)
--foldDb Provided SHAPE profiles are first used to calculate base-pairing probability profiles, that are then used to generate the database
Note: query searches must be performed with the foldQuery option of SHAPEwarp
Probability profile database construction options
--maxBPspan int Maximum allowed base-pairing distance (Default: 600)
--noLonelyPairs Disallows lonely pairs (helices of 1 bp)
--noClosingGU Dissalows G:U wobbles at the end of helices
--slope float Slope for SHAPE reactivities conversion into pseudo-free energy contributions (Default: 1.8)
--intercept float Intercept for SHAPE reactivities conversion into pseudo-free energy contributions (Default: -0.6)
--temperature float Folding temperature (Default: 37.0)
--winSize int Size (in nt) of the sliding window for partition function calculation (Default: 800)
--offset int Offset (in nt) for partition function window sliding (Default: 200)
--winTrim int Number of bases to trim from both ends of partition function windows to avoid terminal biases (Default: 50)

Note

Shuffling SHAPE data in 10 nucleotide-long blocks (blockSize = 10) yields more realistic profiles, as it preserves the relationship between neighboring residues. Although enabling inBlockShuffle might produce hits with lower E-values, hence increasing the chance to recover more distal matches, it also increases the chances of recovering more false positive matches.