Parente operates in two phases, first it uses training haplotypes in order to learn the distribution of scores for IBD segments and the distribution of scores for non-IBD segments. With these distributions, Parente can use the embedded likelihood ratio test (eLRT) and it can create block-specific thresholds.
By default, Parente runs using the LRT using block-specific thresholds. However, using program options, one can also use the standard likelihood ratio test (LRT) or use a fixed threshold.
Parente is runs on one contiguous chromosome at a time.
When you provide Parente with training haplotypes, it will perform an internal simulation that simulates IBD and non-IBD segments along the entire chromosome. The results of this simulation is then used when performing inference.
Usage: parente train [options] <training.hap> <markers.map/tped> <out_prefix> Computes score distributions from training haplotypes to facilitate inference. Options: -h, --help Display this help message -t, --threads INT The number of threads [4] -e, --geno-err FLT The modeled genotyping error rate [0.005] -w, --window-size INT Window size [5] -n, --num-pairs INT Num training pairs to generate for each data set. [1000] -s, --seed INT Seed for simulating IBD and non-IBD segment pairs. [-1]
When performing IBD inference, Parente targets a particular IBD segment size to detect. It uses this target size when generating blocks of consecutive windows. The default target size of 4 cM, and sizes of blocks that are generated are between 3.5 and 3.9 cM, depending on the SNP density around where the blocks are created. The min/max block size will automatically adjust to the target segment size as described in the options, or it can be manually specified. It should be noted that the target IBD segment size should really be used as a minimum IBD segment size since any longer IBD segment can be detected by examining a portion of it.
By default, Parente uses the eLRT with a fixed threshold for inference. However, this can be changed with the --lrt and --threshtype flags, respectively.
The threshold argument is interpreted based on which threshold type is being used:
For very high-specificity scenarios, using --lrt 2 and --threshtype max is recommended.
Though it is not recommended to use --lrt 1 because of its lower performance. However, if it is used, then we strongly recommended that you use --threshtype max to achieve a reasonable false positive rate.
Usage: parente infer [options] <train_prefix> <data.geno> <threshold> <out_prefix> Infers IBD segments. Options: -h, --help Display this help message -s, --target-segment-size FLT Target (minmum) IBD segment size in cM [4] -t, --threads INT Sets number of threads [4] -l, --lrt INT Use LRT (1) or eLRT (2) [2] --threshtype STR Threshold type: fixed or max [fixed] -e, --geno-err FLT Set the modeled genotyping error rate [0.005] -S, --smoothing-factor FLT Divide the error rate by this amount [100] -b, --min-block-size FLT Minimum block size (in cM) to accept when creating blocks. If < 0, then it is set to: <max-block-size> - 0.1 * <target-segment-size> [-1] -B, --max-block-size FLT Maximum block size (in cM) to accept when creating blocks. If <= 0, then it is set to: <target-segment-size> - 0.1 [-1] -p, --partition Only infer IBD between even-indexed and odd-indexed individuals, otherwise infer for all pairs. [false]
#hapname1 hapname2 hapname3 1 1 1 0 1 1 1 0 1 1 0 -1
#john jane sally 2 1 2 0 1 -1 2 2 2 1 2 1
This file describes summary statistics of block scores based on the training data. Block scores are computed using the .win.lrt1 and .win.lrt2 files along with the .fblock file to sum the window scores that belong to each block. These block summary statistics are used for block-specific thresholding. It follows the same format as the window model file, but refers to blocks instead of windows, and the comma-separated values refer to windows instead of SNPs.
This file is produced by parente infer when using --threshtype max or if verbosity is set to to a value greater than 0.
One can use parente tool tped_to_ints to convert from PLINK-formatted data to Parente genotype or haplotype files.
First, make sure your PLINK-formatted data is in transposed formats. That is, make sure you have .tped and .tfam files. You can convert your data in .ped format (eg data.ped) with the command below. It will create data.tped and data.tfam.
plink --file data --transpose --out data
You must also make sure to have a .frq file, which is used by the conversion tool to know which allele is the major allele and which allele is the minor allele. It is generally advixed to use the same .frq file for all your experiments for the same population. If one uses a separate .frq file generated from each individual data set, markers with high minor allele frequencies can flip the major/minor allele encoding. You can generate data.frq using the --freq flag, as in the example below.
plink --tped data.tped --tfam data.tfam --freq --out data
To convert to the Parente haplotype file format, you can use the command below which will generate data.hap.
parente tool tped_to_ints data data.frq hap data
To convert to the Parente genotype file format, you can use the command below which will generate data.geno.
parente tool tped_to_ints data data.frq geno data