Welcome to WGDI’s documentation!
Description
WGDI (Whole-Genome Duplication Integrated analysis), a Python-based command-line tool that facilitates comprehensive analysis of recursive polyploidization events and cross-species genome alignments. WGDI supports three main workflows (polyploid inference, hierarchical inference of genomic homology, and ancestral chromosome karyotyping) that can improve the detection of WGD and characterization of WGD-related events based on high-quality chromosome-level genomes. Significantly, it can extract complete synteny blocks and facilitate reconstruction of detailed karyotype evolution. This toolkit is freely available at GitHub (https://github.com/SunPengChuan/wgdi).
Table of Contents
Introduction

The WGDI workflow consists of three main parts: (1) Polyploidy inference using Dotplot, Collinearity extraction, Ks distributions (2) Hierarchical inference of genomic homology resulted from recursive paleopolyploidizations. (3) Subgenomic and ancestral genome reconstruction and other evolutionary scenarios.
WGDI contains multiple subroutines, the user only needs to modify the configuration file simply, and then enter the name of the subroutine to be executed,
Such as wgdi -d your.conf
. In the following, we will describe in detail the subroutines of WGDI software.
WGDI subroutine and function
Parameters |
Functions |
|
---|---|---|
-h |
Help |
Show help message and exit |
-v |
Version |
Show program’s version number |
-d |
DotPlot |
Show homologous gene dotplot |
-icl |
Collinearity |
Improved version of ColinearScan |
-ks |
CalKs |
Calculate Ka/Ks for homologous gene pairs by YN00 |
-bi |
BlockInfo |
Collinearity and Ks speculate whole genome duplication |
-c |
Correspondence |
Extract event-related genomic alignment |
-bk |
BlockKs |
Show Ks of blocks in a dotplot |
-kp |
KsPeaks |
A simple way to get ks peaks |
-pf |
PeaksFit |
Gaussian fitting of ks distribution |
-pc |
polyploidy_classification |
Show event-related genomic alignment in a dotplot |
-km |
karotype_mapping |
Mapping from the known karyotype result to this species |
-k |
karotype |
Show genome evolution from reconstructed ancestors |
-a |
Alignment |
Alignment of hierarchical and event-related gene collinearity |
-at |
AlignmentTrees |
Phylogenetic trees constructed by collinear genes |
-p |
P-index |
Polyploidy-index characterize the degree of divergence among subgenomes of a polyploidy |
-r |
Retain |
Show subgenomes in gene retention or genome fractionation |
-ci |
Circos |
A simple way to run circos |
-conf |
Configure |
Display and modify the environment variable |
Installation
Python package and command line interface (IDLE) for the analysis of whole genome duplications (WGDI). WGDI can be deployed in Windows, Linux, and Mac OS operating systems and can be installed via pip and conda.
Bioconda
conda install -c bioconda wgdi
Pypi
pip install wgdi
Github
git clone https://github.com/SunPengChuan/wgdi.git
cd wgdi
python setup.py install
Dependencies
Some parts of WGDI use the following third-party softwares:
PAML | MAFFT | MUSCLE | PAL2NAL IQTREE
After you download and install the above packages. You can run wgdi -conf help > conf.ini
to configure the path of the existing software.
[ini]
mafft_path = C:\bio\mafft-win\mafft.bat
pal2nal_path = C:\bio\[pal2nal.v14\pal2nal.pl
yn00_path = C:\bio\paml4.9j\bin\yn00.exe
muscle_path = C:\bio\muscle3.8.31_i86win32.exe
Add the directory of your software to the conf.ini file, and then execute wgdi -conf conf.ini
to complete the configuration path.
[ini]
mafft_path = /usr/bin/mafft
pal2nal_path = /usr/local/bin/pal2nal.v14/pal2nal.pl
yn00_path = /usr/bin/yn00
muscle_path = /usr/bin/muscle
iqtree_path = /usr/bin/iqtree
Uninstall
If you don’t need wgdi
, you can uninstall with pip uninstall wgdi
or conda remove wgdi
.
Usage
We support the use of WGDI to complete the work on the icon number.
Dotplot
Dotplot shows homologous gene dotplot.
Parameters
Parameters |
Standards and instructions |
blast_reverse |
Type: bool | Default: false The first two columns of the blast result swap positions. |
multiple |
Type: int | Default: 1 The best number of homologous genes shown with red dots. |
score |
Type: int | Default: 100 Score value in blast result. |
evalue |
Type: float | Default: 1e-5 Evalue value in blast result. |
repeat_number |
Type: int | Default: 10 The maximum number of homologous genes is allowed to remove more than part of the population. |
position |
Type: {order, start , end } | Default: order The position of a gene corresponds to the gff file. |
ancestor_left |
Type: file | Default: none The ancestral chromosome region of the species on the left of dotplot. |
ancestor_top |
Type: file | Default: none The ancestral chromosome region of the species on the top of dotplot. |
markersize |
Type: float | Default: 0.5 The size of the point in the plot. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png,*. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -d help >> total.conf
Take out the parameter file.
[dotplot]
blast = blast file
blast_reverse = false
gff1 = gff1 file
gff2 = gff2 file
lens1 = lens1 file
lens2 = lens2 file
genome1_name = Genome1 name
genome2_name = Genome2 name
multiple = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = savefile(.png,.pdf)
Quick start
After the parameters are modified properly, then run wgdi -d total.conf
Example
The original results are easily accessible at wgdi-example


Improved collinearity
The algorithm for extracting collinearity is based on the dynamic programming algorithm, similar to the ColinearScan and MCScan.
Parameters
Parameters |
Standards and instructions |
multiple |
Type: int | Default: 1 The best number of homologous genes shown with red dots. |
evalue |
Type: float | Default: 1e-5 Evalue in the blast result. |
score |
Type: int | Default: 100 Score value in the blast results. |
grading |
Type: int , int , int | Default: 50, 40 , 25 Assign different scores based on the colors in the dotplot, with a default of 50 for red, 40 for blue, and 25 for gray. |
mg |
Type: int , int | Default: 40, 40 The maximum gap(mg) value is an important parameter for detecting collinear regions. |
pvalue |
Type: float | Default: 1 Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, and the better collinearity range is 0-0.2. |
repeat_number |
Type: int | Default: 10 The maximum number of homologous genes is allowed to remove more than part of the population. |
process |
Type: int | Default: 8 Number of multiple processes. |
position |
Type: order | Default: order The position of a gene corresponds to the gff file. |
Use command to enter the folder wgdi -icl ? >> total.conf
Take out the parameter file.
[collinearity]
gff1 = gff1 file
gff2 = gff2 file
lens1 = lens1 file
lens2 = lens2 file
blast = blast file
blast_reverse = false
multiple = 1
process = 8
evalue = 1e-5
score = 100
grading = 50,40,25
mg = 40,40
pvalue = 0.2
repeat_number = 10
positon = order
savefile = collinearity file
Quick start
After the parameters are modified properly, then run wgdi -icl total.conf
Example
The original results are easily accessible at wgdi-example

Non-synonymous (Ka) and synonymous (Ks)
Non-synonymous (Ka) and synonymous (Ks) substitution rates are estimated using the Nei-Gojobori method implemented in the YN00 program in the PAML (4.9h)
Parameters
Parameters |
Standards and instructions |
cds_file |
Type: file | Default: - A cds file of one or more genomes. |
pep_file |
Type: file | Default: - A protein file for one or more genomes. |
align_software |
Type:{ muscle, mafft } | Default: muscle Software of multiple sequence alignment. |
pairs_file |
Type: file | Default: - Colinearscan or MCScanX result file or gene pairs divided by tab. |
ks_file |
Type: file | Default: - The output file name of ks. |
Use command to enter the folder wgdi -ks ? >> total.conf
Take out the parameter file.
[ks]
cds_file = cds file
pep_file = pep file
align software = muscle
pairs_file = gene pairs file
ks_file = ks result
Quick start
After the parameters are modified properly, then run wgdi -ks total.conf
Example
The original results are easily accessible at wgdi-example

BlockInfo
The color distinction in the dot plot, the collinearity result and the Ks result are integrated into one file. This file contains the main information to achieve the purpose of easily screening the collinearity fragments.
Parameters
Parameters |
Standards and instructions |
collinearity |
Type: file | Default: - Colinearscan or MCScanX result file or gene pairs divided by tab. |
score |
Type: int | Default: 100 Score value in the blast results. |
evalue |
Type: float | Default: 1e-5 Evalue value in blast result. |
repeat_number |
Type: int | Default: 10 The maximum number of homologous genes is allowed to remove more than part of the population. |
position |
Type: {order} | Default: order The position of a gene corresponds to the gff file. |
ks |
Type: file | Default: - Ks calculation results. |
ks_col |
Type: str Default: NG86 The result calculated by ks_YN00 or other methods is a single column. |
savefile |
Type: file | Default: * .csv Result file. |
Use command to enter the folder wgdi -bi ? >> total.conf
Take out the parameter file.
[blockinfo]
blast = blast file
gff1 = gff1 file
gff2 = gff2 file
lens1 = lens1 file
lens2 = lens2 file
collinearity = collinearity file
score = 100
evalue = 1e-5
repeat_number = 10
position = order
ks = ks file
ks_col = ks_NG86
savefile = block information (*.csv)
Quick start
After the parameters are modified properly, then run wgdi -bi total.conf
Example
The original results are easily accessible at wgdi-example
columns |
Information |
id |
Type: str Unique id. |
chr1,chr2 |
Type: int Two chromosomes corresponding to collinearity block. |
start1,end1 |
Type: int The region of chromosome 1 on collinearity block. |
start2,end2 |
Type: int The region of chromosome 2 on collinearity block. |
pvalue |
Type: float Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2. |
length |
Type: int The length of collinearity block. |
ks_median |
Type: float The median of ks value on collinearity block. |
ks_average |
Type: float The average of ks value on collinearity block. |
homo1-5 |
Type: float The average of scores of different color dots (red=1, blue=0, gray=-1) on synteny blocks. |
block1,block2 |
Type: str Gene pairs in collinear blocks |
ks |
Type: str Output result of parameter ks. |
tandem_ratio |
Type: float The density of tandem repeats, in general, a synteny block greater than 0.5 is unreliable. |
density1,density2 |
Type: float Density of collinear blocks. |
class1,class2 |
Type: str Class of collinear blocks. |
Correspondence
Extract event-related genomic alignment.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
tandem |
Type: bool | Default: false Whether to display the collinearity block that may be generated by tandem. |
tandem_length |
Type: int | Default: 200 If tandem=true, the maximum range of tandem influence. |
pvalue |
Type: float | Default: 1 Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2. |
block_length |
Type: int | Default: 5 Show the minimum length of a collinear block. |
multiple |
Type: int | Default: 1 The best number of homologous genes shown with red dots. |
homo |
Type: int [1-5] Default: 1 Evaluate the ratio of the best homologous gene pairs of collinearity block, with a range of -1, 1. |
savefile |
Type: file | Default: * .csv The resulting file. |
Use command to enter the folder wgdi -c ? >> total.conf
Take out the parameter file.
[correspondence]
blockinfo = blockinfo file(.csv)
lens1 = lens1 file
lens2 = lens2 file
tandem = (true/false)
tandem_length = 200
pvalue = 0.2
block_length = 5
multiple = 1
homo = 0,1
savefile = savefile(.csv)
Quick start
The original results are easily accessible at wgdi-example
BlockKs
blockks is show Ks of blocks in a dotplot.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
pvalue |
Type: float | Default: 1 Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2. |
tandem |
Type: bool | Default: false Whether to display the collinearity block that may be generated by tandem. |
tandem_length |
Type: int | Default: 200 If tandem=true, the maximum range of tandem influence. |
area |
Type: str Default: -1,3 Show the range of ks. |
block_length |
Type: int | Default: 5 Show the minimum length of a collinear block. |
position |
Type: {order, start , end } | Default: order The position of a gene corresponds to the gff file. |
markersize |
Type: float | Default: 0.5 The size of the point in the plot. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png,*. pdf} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -bk ? > blockks.conf
Take out the parameter file.
[blockks]
lens1 = lens1 file
lens2 = lens2 file
genome1_name = Genome1 name
genome2_name = Genome2 name
blockinfo = block information
pvalue = 0.05
tandem = true
tandem_length = 200
markersize = 1
area = 0,2
block_length = minimum length
figsize = 8,8
savefig = save image
Quick start
After the parameters are modified properly, then run wgdi -bk total.conf
Example
The original results are easily accessible at wgdi-example

KsPeaks
kspeaks is a simple way to get ks peaks.
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
pvalue |
Type: float | Default: 0.05 Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2. |
tandem |
Type:str | Default: true The criterion is that there are no more than 200 genes with a difference in genetic location. |
block_length |
Type: int | Default: 5 Minimum length of collinear blocks. |
ks_area |
Type: str | Default: 0,10 Show the range of ks. |
multiple |
Type: int | Default: 1 The best number of homologous genes shown with red dots. |
homo |
Type: int [1-5] Default: 1 Evaluate the ratio of the best homologous gene pairs of collinearity block, with a range of -1, 1. |
fontsize |
Type: str Default: 9 The size of the font. |
area |
Type: str Default: 0,3 Show the range of ks. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png, *. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
savefile |
Type: file | Default: * .csv The resulting file. |
Parameters
Use command to enter the folder wgdi -kp ? >> total.conf
Take out the parameter file.:
[kspeaks]
blockinfo = block information
pvalue = 0.05
tandem = true
block_ length = int number
ks_area = 0,10
multiple = 1
homo = 0,1
fontsize = 9
area = 0,3
figsize = 10,6.18
savefig = saving image
savefile = ks medain savefile
Quick start
After the parameters are modified properly, then run wgdi -kp total.conf
Example
The original results are easily accessible at wgdi-example

PeaksFit
peaksfit is gaussian fitting of ks distribution.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
mode |
Type: {median, average, total} | Default: median Different processing methods of ks value on collinearity block. |
tandem_length |
Type: int | Default: 200 If tandem=true, the maximum range of tandem influence. |
bins_number |
Type: int | Default: 200 Show the minimum length of a collinear block. |
fontsize |
Type: float | Default: 9 The fontsize of label in the plot. |
area |
Type: str Default: -1,3 Show the range of ks. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png,*. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Parameters
Use command to enter the folder wgdi -pf ? >> total.conf
Take out the parameter file.
[peaksfit] blockinfo = block information mode = median bins_number = 200 ks_area = 0,10 fontsize = 9 area = 0,3 figsize = 10,6.18 savefig = saving image
Quick start
After the parameters are modified properly, then run wgdi -pf total.conf
Example
The original results are easily accessible at wgdi-example

KsFigure
A simple way to draw ks distribution map.
Parameters
Parameters |
Standards and instructions |
ksfit |
Type: file | Default: .csv Output result of parameter kp or filtered. |
labelfontsize |
Type: float | Default: 15 Different processing methods of ks value on collinearity block. |
legendfontsize |
Type: float | Default: 15 The fontsize of legend in the plot. |
xlabel |
Type: str | Default: none The xlabel of figure. |
ylabel |
Type: str | Default: none The ylabel of figure. |
title |
Type: str | Default: none The title of figure. |
area |
Type: str Default: 0,3 Show the range of ks. |
shadow |
Type: bool Default: true Whether or not the plotted curve is shown with shadows. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png,*. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -kf ? >> total.conf
Take out the parameter file.
[ksfigure]
ksfit = ksfit result(*.csv)
labelfontsize = 15
legendfontsize = 15
xlabel = none
ylabel = none
title = none
area = 0,2
figsize = 10,6.18
shadow = (true/false)
savefig = save image
Quick start
After the parameters are modified properly, then run wgdi -kf total.conf
Example
The original results are easily accessible at wgdi-example
Polyploid classification
Polyploid distinguish among subgenomes.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
ancestor_left |
Type: file | Default: none The ancestral chromosome region of the species on the left of dotplot. |
ancestor_top |
Type: file | Default: none The ancestral chromosome region of the species on the top of dotplot. |
diff |
Type: float | Default: 0.05 More than this value is considered a significant difference. |
classid |
Type: str | Default: class1,class2 The class name of the grouping according to the divided ancestral area. |
savefile |
Type: file | Default: * .csv The resulting file. |
Use command to enter the folder wgdi -pc ? >> total.conf
Take out the parameter file.
[polyploidy classification]
blockinfo = block information (*.csv)
ancestor_left = file
ancestor_top = file
diff = 0.05
classid = class1,class2
savefile = result file(.csv)
Quick start
The original results are easily accessible at wgdi-example
ancestral_karyotype
Generation of ancestral karyotypes from chromosomes that retain the same structure in modern species.
Parameters
Parameters |
Standards and instructions |
gff |
Type: file | Default: - Cat the relevant ‘gff’ files into a file. |
pep_file |
Type: file | Default: none Cat the relevant ‘pep.fa’ files into a file. |
ancestor |
Type: file | Default: none This file requires you to provide. |
mark |
Type: str | Default: aak A shorthand for ancestral karyotype. |
ancestor_gff |
Type: file | Default: none Gff file of the created ancestral karyotype. |
ancestor_lens |
Type: file | Default: none Lens file of the created ancestral karyotype. |
ancestor_pep |
Type: file | Default: - Pep file of the created ancestral karyotype. |
ancestor_file |
Type: file | Default: - Updated the ancestral karyotype file. |
Use command to enter the folder wgdi -ak ? >> total.conf
Take out the parameter file.
[ancestral_karyotype]
gff = gff file (cat the relevant 'gff' files into a file)
pep_file = pep file (cat the relevant 'pep.fa' files into a file)
ancestor = ancestor file (this file requires you to provide)
mark = aak
ancestor_gff = result file
ancestor_lens = result file
ancestor_pep = result file
ancestor_file = result file
Quick start
The original results are easily accessible at wgdi-example
ancestral_karyotype_repertoire
Incorporate genes from collinearity blocks into the ancestral karyotype repertoire.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter -bi and -c. |
blockinfo_reverse |
Type: bool | Default: false The two species aligned swap positions. |
gff1 |
Type: file | Default: - Cat the relevant ‘gff’ files into a file. |
gff2 |
Type: file | Default: - Cat the relevant ‘gff’ files into a file. |
gap |
type: int | Default: 5 Minimum number of genes bounded by collinear genes corresponds to two ordered protogenes. |
ancestor |
Type: file | Default: none This file requires you to provide. |
ancestor_new |
Type: file | Default: none Updated the ancestor file of ancestral karyotype.. |
mark |
Type: str | Default: aak A shorthand for ancestral karyotype. |
ancestor_gff |
Type: file | Default: none Gff file of the created ancestral karyotype. |
ancestor_lens |
Type: file | Default: none Lens file of the created ancestral karyotype. |
ancestor_pep |
Type: file | Default: - Pep file of the created ancestral karyotype. |
ancestor_pep_new |
Type: file | Default: - Updated the pep file of ancestral karyotype. |
Use command to enter the folder wgdi -akr ? >> total.conf
Take out the parameter file.
[ancestral_karyotype_repertoire]
blockinfo = block information (*.csv)
# blockinfo: processed *.csv
blockinfo_reverse = False
gff1 = gff1 file (*.gff)
gff2 = gff2 file (*.gff)
gap = 5
mark = aak1s
ancestor = ancestor file
ancestor_new = result file
ancestor_pep = ancestor pep file
ancestor_pep_new = result file
ancestor_gff = result file
ancestor_lens = result file
Quick start
The original results are easily accessible at wgdi-example
karyotype_mapping
Mapping from the known karyotype result to this species.
Parameters
Parameters |
Standards and instructions |
blast_reverse |
Type: bool | Default: false The first two columns of the blast result swap positions. |
score |
Type: int | Default: 100 Score value in blast result. |
evalue |
Type: float | Default: 1e-5 Evalue value in blast result. |
repeat_number |
Type: int | Default: 10 The maximum number of homologous genes is allowed to remove more than part of the population. |
ancestor_left |
Type: file | Default: none Ancestor location file (Only one of (‘left’, ‘top’) can be reserved) |
ancestor_top |
Type: file | Default: none Ancestor location file (Only one of (‘left’, ‘top’) can be reserved) |
blockinfo |
Type: file | Default: - blockinfo.csv filtered according to certain conditions ‘-c’ |
the_other_lens |
Type: file | Default: - The lens file for the species for which you want to generate ancestor files. |
limit_length |
Type: int | Default: 5 Show the minimum length of blocks. |
the_other_ancestor_file |
Type: file | Default: * .csv The resulting file. |
Use command to enter the folder wgdi -km ? >> total.conf
Take out the parameter file.
[karyotype_mapping]
blast = blast file
blast_reverse = false
gff1 = gff1 file
gff2 = gff2 file
score = 100
evalue = 1e-5
repeat_number = 5
ancestor_left = ancestor location file (Only one of ('left', 'top') can be reserved)
ancestor_top = ancestor location file
the_other_lens = the other lens file
blockinfo = block information (*.csv)
blockinfo_reverse = false
limit_length = 5
the_other_ancestor_file = result file
Quick start
The original results are easily accessible at wgdi-example
Karyotype
Show genome evolution from reconstructed ancestors.
Parameters
Parameters |
Standards and instructions |
ancestor |
Type: file | Default: - The specie’s ancestral chromosome regions, result of parameter ‘-km’. |
width |
Type: float | Default: 0.5 The width of the chromosomes in the figure. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefile |
Type: file | Default: * .csv The resulting file. |
Use command to enter the folder wgdi -k ? >> total.conf
Take out the parameter file.
[karyotype]
ancestor = ancestor chromosome file
width = 0.5
figsize = 10,6.18
savefig = save image(.png, .pdf, .svg)
Quick start
The original results are easily accessible at wgdi-example
Alignment
Alignment of hierarchical and event-related gene collinearity.
Parameters
Parameters |
Standards and instructions |
blockinfo |
Type: file | Default: - Output result of parameter bi |
colors |
Type: { color1,color2,color3,– } Default: red,blue,green Set multiple sets of colors based on grouping, split with a comma. |
position |
Type: {order, start , end } | Default: order The position of the gene corresponds to the gff file. |
blockinfo_reverse |
Type: bool | Default: false The two species aligned swap positions.. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefile |
Type: file | Default: * .csv Result file. |
markersize |
Type: float | Default: 0.5 The size of the point in the plot. |
savefig |
Type: {*. png, *. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -a ? > align.conf
Take out the parameter file.
[alignment]
gff1 = gff1 file
gff2 = gff2 file
lens1 = lens1 file
lens2 = lens2 file
genome1_name = Genome1 name
genome2_name = Genome2 name
markersize = 0.5
position = order
colors = red,blue,green
figsize = 10,10
savefile = savefile(.csv)
savefig= save image
blockinfo = block information file
blockinfo_reverse = false
Quick start
After the parameters are modified properly, then run wgdi -a total.conf
Example
The original results are easily accessible at wgdi-example

Alignmenttrees
Phylogenetic tree construction using alignment or collinear genes. A phylogenetic tree is constructed for gene sets that each row of the alignment file during this process. By using ASTRAL merge, you can obtain a multispecies or subgenomes coalescent tree.
Parameters
Parameters |
Standards and instructions |
alignment |
Type: file | Default: - The merged file of multiple alignment files with the same reference. |
gff |
Type: file | Default: - Gff of reference, If alignment has no reference species, delete it. |
lens |
Type: file | Default: - lens of reference, If alignment has no reference species, delete it. |
dir |
Type: folder | Default: - Folder for phylogenetic trees. |
sequence_file |
Type: file | Default: - In general, it is protein sequences (pep), if it is coding sequences (cds), cds_file need to be deleted. |
cds_file |
Type: file | Default: none It is required when the method of constructing trees involvesv codon. Otherwise, it can be discarded. |
codon_positon |
Type:str | Default: 1,2,3 1,2 mean codon1&2; 1,2,3 mean no codon removed. |
trees_file |
Type: file | Default: - Merge multiple nwk-format tree files. |
align_software |
Type:{ muscle, mafft } | Default: muscle Software of multiple sequence alignment. |
tree_software |
Type:{ iqtree, fasttree } | Default: iqtree Software of constructing phylogenetic trees. |
model |
Type: str | Default: - Merge multiple nwk-format tree files. |
trimming |
Type:{ trimal, divvier} | Default: none Software of removal of spurious sequences. |
minimum |
Type: int | Default: 4 Minimum number of gene sets in constructing phylogenetic tree. |
delete_detail |
Type: bool | Default: false Whether or not to keep intermediate files when constructing phylogenetic tree. |
Use command to enter the folder wgdi -at ? >> total.conf
Take out the parameter file.
Afterwards, use the ASTRAL command java -jar /path/astral.5.7.7.jar -i trees_file.nwk -o out.tre -t 8
, you can obtain a multispecies or subgenomes coalescent tree.
[alignmenttrees]
alignment = alignment file (.csv)
gff = gff file (reference genome, If alignment has no reference species, delete it)
lens = lens file (If alignment has no reference species, delete it)
dir = output folder
sequence_file = sequence file (.fa)
cds_file = cds file (.fa)
codon_positon = 1,2,3 (1,2 mean codon1&2; 1,2,3 mean no codon removed)
trees_file = trees (.nwk)
align_software = mafft
tree_software = (iqtree,fasttree)
model = MFP
trimming = trimal
minimum = 4
delete_detail = true
Quick start
After the parameters are modified properly, then run wgdi -at total.conf
Example
The original results are easily accessible at wgdi-example
Retain
Retain is show subgenomes in gene retention or genome fractionation.
Parameters
Parameters |
Standards and instructions |
alignment |
Type:file | Default: - Output result of parameter a |
colors |
Type: { color1,color2,color3,– } Default: red,blue,green Set multiple sets of colors based on grouping, split with a comma. |
refgenome |
Type: str | Default: - A short handbook of reference species. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
step |
type: int | Default: 50 The size of the sliding window. |
ylabel |
Type: str | Default: none The y-axis label of the picture. |
savefile |
Type:file | Default: - Results of the drawing. |
savefig |
Type: {*. png,*. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -r ? >> total.conf
Take out the parameter file.
[retain]
alignment = alignment file
gff = gff file
colors = red,blue,green
refgenome = shorthand
figsize = 10,12
step = 50
ylabel = y label
savefile = retain file
savefig = result
Quick start
After the parameters are modified properly, then run wgdi -r total.conf
Example
The original results are easily accessible at wgdi-example
P-index
The polyploidy index(P-index) is used to characterize the degree of divergence between subgenomes of a polyploidy, to find whether there has been a balanced or unbalanced gene removal from the homoeologous regions.
Parameters
Parameters |
Standards and instructions |
alignment |
Type:file | Default: - Output result of parameter a |
gap |
type: int | Default: 50 The size of the sliding window. |
colors |
Type: { color1,color2,color3,– } Default: red,blue,green Set multiple sets of colors based on grouping, split with a comma. |
retention |
Type: float | Default: 0.05 The region where the retention rate of the sub-genome relative to the reference genome is low, which is 0.05 by default. |
diff |
Type: float | Default: 0.05 More than this value is considered a significant difference. |
remove_delta |
Type: bool | Default: true Whether to remove the parameter retention. |
savefile |
Type:file | Default: - Results of the drawing. |
Parameters
Use command to enter the folder wgdi -p ? >> total.conf
Take out the parameter file.
[pindex]
alignment = alignment file
gff = gff file
lens =lens file
gap = 50
retention = 0.05
diff = 0.05
remove_delta = (true/false)
savefile = result file
Quick start
After the parameters are modified properly, then run wgdi -p total.conf
Example
The detailed explanation is in the published article
Circos
Circos is a simple way to run circos.
Parameters
Parameters |
Standards and instructions |
alignment |
Type:file | Default: - Output result of parameter a |
radius |
Type: float | Default: 0.2 Radius, value between 0-1. |
angle_gap |
Type: float | Default: 0.05 Gap between chromosomes. |
ring_width |
Type: float | Default: 0.015 The width of the ring. |
colors |
Type:{chr1:color1,chr2:color2,} Default: 1:red,2:orange,3:blue,4:cyan,5:green Set multiple sets of colors based on grouping, split with a comma. |
position |
Type: {order, start , end } | Default: order The position of a gene corresponds to the gff file. |
chr_label |
Type: str | Default: - A shorthand for chromosomes. |
column_names |
Type: str | Default: - Column markers of the alignment file. |
ancestor_location |
Type: file | Default: none The ancestral chromosome region of the species. |
alignment |
Type:file | Default: - Alignment of hierarchical and event-related gene collinearity. |
figsize |
Type: int,int | Default: 10,10 Control the proportion of the size of the saved picture. |
savefig |
Type: {*. png,*. pdf, *. svg} | Default: *. png Save pictures support png, pdf, svg formats. |
Use command to enter the folder wgdi -ci ? >> total.conf
Take out the parameter file.
[circos]
gff = gff file
lens = lens file
radius = 0.2
angle_gap = 0.05
ring_width = 0.015
colors = 1:c,2:m,3:blue,4:gold,5:red,6:lawngreen,7:darkgreen,8:k,9:darkred,10:gray
alignment = alignment file
chr_label = chr
ancestor_location = ancestor file
ancestor = alignment file
figsize = 10,10
label_size = 9
columns_name = 1,2,3,4,5
savefig = result(.png, .pdf, .svg)
Quick start
After the parameters are modified properly, then run wgdi -ci total.conf
Example

Common file
conf
The conf file contains parameters required for the corresponding operation, which are read when WGDI is performed.
Using wgdi -* ? > *.conf
to get the configuration file in the current directory and modify it to run.
If you don’t know which files are needed, you can view it through wgdi -* ?
or wgdi -* help
or wgdi -* example
. These three commands are equivalent.
In conf file: gff1 , lens1 , genome1_name and gff2, lens2, genome2_name represent the files of species 1 and 2 respectively. We will no longer explain these in this documentation.
gff
Column |
Information |
Explanation |
1 |
Chr |
Chromosome number |
2 |
Id |
Gene name |
3 |
Strat |
The starting location of a gene |
4 |
End |
The ending location of a gene |
5 |
Direction |
Direction of a gene sequence |
6 |
Order |
Order of each chromosome, starting from 1 |
7 |
Original |
Original id and not read |
lens
Column |
Information |
Explanation |
1 |
Chr |
Chromosome number |
2 |
Length |
Length of chromosome sequences |
3 |
Number |
Number of chromosome genes |
blast
The protein-coding genes from each genome were compared against itself and other genomes using BLASTP (e-value < 10-5, and outfmt = 6) or other similar protein sequence searching software ( MMseqs2 , DIAMOND ).
ancestor file
Required documents for karyotype evolution analysis
Column |
Information |
Explanation |
1 |
Chr |
Chromosome number |
2 |
Start |
Homologous regions of these protochromosomes in this genome |
3 |
End |
Homologous regions of these protochromosomes in this genome |
4 |
Color |
These protochromosomes with different colors |
5 |
Subgenomes |
Subgenomes according to protochromosomes |
Tips
You can use
wgdi -conf ? > total.conf
generates a total.conf file with all parameters, and when you modify the parameters and run WGDI, WGDI will only read the parameters corresponding to the total.conf file to execute your command.When a folder runs WGDI, WGDI automatically generates results for you in the background, and you can exit the folder and go to the next folder to start working.
WGDI performs the conf file for the current folder, so you can copy the conf file in bulk and make parameter modifications appied to the target folder.
Examples
Introductory examples for WGDI will be updated soon. There are also some data I tested on wgdi-example.
Changelog
0.6.1
Fixed issue with alignment (-a). Only version 0.6.0 has this bug.
0.6.0
Fixed issue with improved collinearity (-icl).
Added a parameter ‘tandem_ratio’ to blockinfo (-bi).
0.5.9
Update the improved collinearity (-icl). Faster than before, but lower than MCscanX, JCVI.
Fixed issue with ancestral karyotype repertoire (-akr).
0.5.8
Fixed issue with gene names (-ks).
0.5.7
Fixed issue with chromosome order (-ak).
Fixed issue with gene names (-ks). This version is not fixed, please install the latest version.
0.5.5 and 0.5.6
Add ancestral karyotype (-ak)
Add ancestral karyotype repertoire (-akr)
0.5.4
Improved the alignmenttrees (-km) effect.
little change (-at).
0.5.3
Fixed legend issue with (-kf).
Fixed calculate Ks issue with (-ks).
Improved the karyotype_mapping (-km) effect.
Improved the alignmenttrees (-at) effect.
0.5.2
Fixed some bugs.
0.5.1
Fixed the error of the command (-conf).
Improved the karyotype_mapping (-km) effect.
Added the available data set of alignmenttree (-at). Low copy data set (for example, single-copy_groups.tsv of sonicparanoid2 software).
0.4.9
The latest version adds karyotype_mapping (-km) and karyotype (-k) display.
The latest version changes the calculation of extracting pvalue from collinearity (-icl), making this parameter more sensitive. Therefore, it is recommended to set to 0.2 instead of 0.05.
The latest version has also changed the drawing display of ksfigure (-kf) to make it more beautiful.
Citating WGDI
If you use wgdi in your work, please cite:
Sun, P., Jiao, B., Yang, Y., Shan, L., Li, T., Li, X., … & Liu, J. (2022). WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Molecular plant, 15(12), 1841-1851. https://doi.org/10.1016/j.molp.2022.10.018
Contact
If you have any questions or good suggestions , send email via Pengchuan Sun’s mailbox or submit changes on our GitHub.