Welcome to WGDI’s documentation!

Description

WGDI (Whole-Genome Duplication Integrated analysis), a Python-based command-line tool that facilitates comprehensive analysis of recursive polyploidization events and cross-species genome alignments. WGDI supports three main workflows (polyploid inference, hierarchical inference of genomic homology, and ancestral chromosome karyotyping) that can improve the detection of WGD and characterization of WGD-related events based on high-quality chromosome-level genomes. Significantly, it can extract complete synteny blocks and facilitate reconstruction of detailed karyotype evolution. This toolkit is freely available at GitHub (https://github.com/SunPengChuan/wgdi).

Table of Contents

Introduction

_images/workflow.png

The WGDI workflow consists of three main parts: (1) Polyploidy inference using Dotplot, Collinearity extraction, Ks distributions (2) Hierarchical inference of genomic homology resulted from recursive paleopolyploidizations. (3) Subgenomic and ancestral genome reconstruction and other evolutionary scenarios.

WGDI contains multiple subroutines, the user only needs to modify the configuration file simply, and then enter the name of the subroutine to be executed, Such as wgdi -d your.conf. In the following, we will describe in detail the subroutines of WGDI software.

WGDI subroutine and function

Parameters

Functions

-h

Help

Show help message and exit

-v

Version

Show program’s version number

-d

DotPlot

Show homologous gene dotplot

-icl

Collinearity

Improved version of ColinearScan

-ks

CalKs

Calculate Ka/Ks for homologous gene pairs by YN00

-bi

BlockInfo

Collinearity and Ks speculate whole genome duplication

-c

Correspondence

Extract event-related genomic alignment

-bk

BlockKs

Show Ks of blocks in a dotplot

-kp

KsPeaks

A simple way to get ks peaks

-pf

PeaksFit

Gaussian fitting of ks distribution

-pc

polyploidy_classification

Show event-related genomic alignment in a dotplot

-km

karotype_mapping

Mapping from the known karyotype result to this species

-k

karotype

Show genome evolution from reconstructed ancestors

-a

Alignment

Alignment of hierarchical and event-related gene collinearity

-at

AlignmentTrees

Phylogenetic trees constructed by collinear genes

-p

P-index

Polyploidy-index characterize the degree of divergence among subgenomes of a polyploidy

-r

Retain

Show subgenomes in gene retention or genome fractionation

-ci

Circos

A simple way to run circos

-conf

Configure

Display and modify the environment variable

Installation

Python package and command line interface (IDLE) for the analysis of whole genome duplications (WGDI). WGDI can be deployed in Windows, Linux, and Mac OS operating systems and can be installed via pip and conda.

Bioconda

conda install -c bioconda  wgdi

Pypi

pip install wgdi

Github

git clone https://github.com/SunPengChuan/wgdi.git
cd wgdi
python setup.py install

Dependencies

Some parts of WGDI use the following third-party softwares:

PAML | MAFFT | MUSCLE | PAL2NAL IQTREE

After you download and install the above packages. You can run wgdi -conf help > conf.ini to configure the path of the existing software.

[ini]
mafft_path = C:\bio\mafft-win\mafft.bat
pal2nal_path = C:\bio\[pal2nal.v14\pal2nal.pl
yn00_path = C:\bio\paml4.9j\bin\yn00.exe
muscle_path = C:\bio\muscle3.8.31_i86win32.exe

Add the directory of your software to the conf.ini file, and then execute wgdi -conf conf.ini to complete the configuration path.

[ini]
mafft_path = /usr/bin/mafft
pal2nal_path = /usr/local/bin/pal2nal.v14/pal2nal.pl
yn00_path = /usr/bin/yn00
muscle_path = /usr/bin/muscle
iqtree_path = /usr/bin/iqtree

Uninstall

If you don’t need wgdi, you can uninstall with pip uninstall wgdi or conda remove wgdi.

Usage

We support the use of WGDI to complete the work on the icon number.

Dotplot

Dotplot shows homologous gene dotplot.

Parameters

Parameters

Standards and instructions

blast_reverse

Type: bool | Default: false

The first two columns of the blast result swap positions.

multiple

Type: int | Default: 1

The best number of homologous genes shown with red dots.

score

Type: int | Default: 100

Score value in blast result.

evalue

Type: float | Default: 1e-5

Evalue value in blast result.

repeat_number

Type: int | Default: 10

The maximum number of homologous genes is allowed to remove more than part of the population.

position

Type: {order, start , end } | Default: order

The position of a gene corresponds to the gff file.

ancestor_left

Type: file | Default: none

The ancestral chromosome region of the species on the left of dotplot.

ancestor_top

Type: file | Default: none

The ancestral chromosome region of the species on the top of dotplot.

markersize

Type: float | Default: 0.5

The size of the point in the plot.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png,*. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -d help >> total.conf Take out the parameter file.

[dotplot]
blast = blast file
blast_reverse = false
gff1 =  gff1 file
gff2 =  gff2 file
lens1 = lens1 file
lens2 = lens2 file
genome1_name =  Genome1 name
genome2_name =  Genome2 name
multiple  = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = savefile(.png,.pdf)

Quick start

After the parameters are modified properly, then run wgdi -d total.conf

Example

The original results are easily accessible at wgdi-example

_images/vvi161s_vvi161s.dotplot.order.png _images/vvi161s_vvi161s.dotplot.end.png

Improved collinearity

The algorithm for extracting collinearity is based on the dynamic programming algorithm, similar to the ColinearScan and MCScan.

Parameters

Parameters

Standards and instructions

multiple

Type: int | Default: 1

The best number of homologous genes shown with red dots.

evalue

Type: float | Default: 1e-5

Evalue in the blast result.

score

Type: int | Default: 100

Score value in the blast results.

grading

Type: int , int , int | Default: 50, 40 , 25

Assign different scores based on the colors in the dotplot, with a default of 50 for red, 40 for blue, and 25 for gray.

mg

Type: int , int | Default: 40, 40

The maximum gap(mg) value is an important parameter for detecting collinear regions.

pvalue

Type: float | Default: 1

Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, and the better collinearity range is 0-0.2.

repeat_number

Type: int | Default: 10

The maximum number of homologous genes is allowed to remove more than part of the population.

process

Type: int | Default: 8

Number of multiple processes.

position

Type: order | Default: order

The position of a gene corresponds to the gff file.

Use command to enter the folder wgdi -icl ? >> total.conf Take out the parameter file.

[collinearity]
gff1 = gff1 file
gff2 = gff2 file
lens1 = lens1 file
lens2 = lens2 file
blast = blast file
blast_reverse = false
multiple  = 1
process = 8
evalue = 1e-5
score = 100
grading = 50,40,25
mg = 40,40
pvalue = 0.2
repeat_number = 10
positon = order
savefile = collinearity file

Quick start

After the parameters are modified properly, then run wgdi -icl total.conf

Example

The original results are easily accessible at wgdi-example

_images/collinearity.png

Non-synonymous (Ka) and synonymous (Ks)

Non-synonymous (Ka) and synonymous (Ks) substitution rates are estimated using the Nei-Gojobori method implemented in the YN00 program in the PAML (4.9h)

Parameters

Parameters

Standards and instructions

cds_file

Type: file | Default: -

A cds file of one or more genomes.

pep_file

Type: file | Default: -

A protein file for one or more genomes.

align_software

Type:{ muscle, mafft } | Default: muscle

Software of multiple sequence alignment.

pairs_file

Type: file | Default: -

Colinearscan or MCScanX result file or gene pairs divided by tab.

ks_file

Type: file | Default: -

The output file name of ks.

Use command to enter the folder wgdi -ks ? >> total.conf Take out the parameter file.

[ks]
cds_file = cds file
pep_file = pep file
align software = muscle
pairs_file = gene  pairs file
ks_file = ks result

Quick start

After the parameters are modified properly, then run wgdi -ks total.conf

Example

The original results are easily accessible at wgdi-example

_images/calks.png

BlockInfo

The color distinction in the dot plot, the collinearity result and the Ks result are integrated into one file. This file contains the main information to achieve the purpose of easily screening the collinearity fragments.

Parameters

Parameters

Standards and instructions

collinearity

Type: file | Default: -

Colinearscan or MCScanX result file or gene pairs divided by tab.

score

Type: int | Default: 100

Score value in the blast results.

evalue

Type: float | Default: 1e-5

Evalue value in blast result.

repeat_number

Type: int | Default: 10

The maximum number of homologous genes is allowed to remove more than part of the population.

position

Type: {order} | Default: order

The position of a gene corresponds to the gff file.

ks

Type: file | Default: -

Ks calculation results.

ks_col

Type: str Default: NG86

The result calculated by ks_YN00 or other methods is a single column.

savefile

Type: file | Default: * .csv

Result file.

Use command to enter the folder wgdi -bi ? >> total.conf Take out the parameter file.

[blockinfo]
blast = blast file
gff1 =  gff1 file
gff2 =  gff2 file
lens1 = lens1 file
lens2 = lens2 file
collinearity = collinearity file
score = 100
evalue = 1e-5
repeat_number = 10
position = order
ks = ks file
ks_col = ks_NG86
savefile = block information (*.csv)

Quick start

After the parameters are modified properly, then run wgdi -bi total.conf

Example

The original results are easily accessible at wgdi-example

columns

Information

id

Type: str

Unique id.

chr1,chr2

Type: int

Two chromosomes corresponding to collinearity block.

start1,end1

Type: int

The region of chromosome 1 on collinearity block.

start2,end2

Type: int

The region of chromosome 2 on collinearity block.

pvalue

Type: float

Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2.

length

Type: int

The length of collinearity block.

ks_median

Type: float

The median of ks value on collinearity block.

ks_average

Type: float

The average of ks value on collinearity block.

homo1-5

Type: float

The average of scores of different color dots (red=1, blue=0, gray=-1) on synteny blocks.

block1,block2

Type: str

Gene pairs in collinear blocks

ks

Type: str

Output result of parameter ks.

tandem_ratio

Type: float

The density of tandem repeats, in general, a synteny block greater than 0.5 is unreliable.

density1,density2

Type: float

Density of collinear blocks.

class1,class2

Type: str

Class of collinear blocks.

Correspondence

Extract event-related genomic alignment.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

tandem

Type: bool | Default: false

Whether to display the collinearity block that may be generated by tandem.

tandem_length

Type: int | Default: 200

If tandem=true, the maximum range of tandem influence.

pvalue

Type: float | Default: 1

Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2.

block_length

Type: int | Default: 5

Show the minimum length of a collinear block.

multiple

Type: int | Default: 1

The best number of homologous genes shown with red dots.

homo

Type: int [1-5] Default: 1

Evaluate the ratio of the best homologous gene pairs of collinearity block, with a range of -1, 1.

savefile

Type: file | Default: * .csv

The resulting file.

Use command to enter the folder wgdi -c ? >> total.conf Take out the parameter file.

[correspondence]
blockinfo =  blockinfo file(.csv)
lens1 = lens1 file
lens2 = lens2 file
tandem = (true/false)
tandem_length = 200
pvalue = 0.2
block_length = 5
multiple  = 1
homo = 0,1
savefile = savefile(.csv)

Quick start

The original results are easily accessible at wgdi-example

BlockKs

blockks is show Ks of blocks in a dotplot.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

pvalue

Type: float | Default: 1

Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2.

tandem

Type: bool | Default: false

Whether to display the collinearity block that may be generated by tandem.

tandem_length

Type: int | Default: 200

If tandem=true, the maximum range of tandem influence.

area

Type: str Default: -1,3

Show the range of ks.

block_length

Type: int | Default: 5

Show the minimum length of a collinear block.

position

Type: {order, start , end } | Default: order

The position of a gene corresponds to the gff file.

markersize

Type: float | Default: 0.5

The size of the point in the plot.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png,*. pdf} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -bk ? > blockks.conf Take out the parameter file.

[blockks]
lens1 = lens1 file
lens2 = lens2 file
genome1_name =  Genome1 name
genome2_name =  Genome2 name
blockinfo = block information
pvalue = 0.05
tandem = true
tandem_length = 200
markersize = 1
area = 0,2
block_length =  minimum length
figsize = 8,8
savefig = save image

Quick start

After the parameters are modified properly, then run wgdi -bk total.conf

Example

The original results are easily accessible at wgdi-example

_images/vvi161s_vvi161s.blockinfo.png

KsPeaks

kspeaks is a simple way to get ks peaks.

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

pvalue

Type: float | Default: 0.05

Evaluate the compactness and uniqueness of collinear blocks, the range is 0-1, the better collinearity range is 0-0.2.

tandem

Type:str | Default: true

The criterion is that there are no more than 200 genes with a difference in genetic location.

block_length

Type: int | Default: 5

Minimum length of collinear blocks.

ks_area

Type: str | Default: 0,10

Show the range of ks.

multiple

Type: int | Default: 1

The best number of homologous genes shown with red dots.

homo

Type: int [1-5] Default: 1

Evaluate the ratio of the best homologous gene pairs of collinearity block, with a range of -1, 1.

fontsize

Type: str Default: 9

The size of the font.

area

Type: str Default: 0,3

Show the range of ks.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png, *. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

savefile

Type: file | Default: * .csv

The resulting file.

Parameters

Use command to enter the folder wgdi -kp ? >> total.conf Take out the parameter file.:

[kspeaks]

 blockinfo = block information
 pvalue = 0.05
 tandem = true
 block_ length = int number
 ks_area = 0,10
 multiple = 1
 homo = 0,1
 fontsize = 9
 area = 0,3
 figsize = 10,6.18
 savefig = saving image
 savefile = ks medain savefile

Quick start

After the parameters are modified properly, then run wgdi -kp total.conf

Example

The original results are easily accessible at wgdi-example

_images/vvi161s_vvi161s.kspeaks.png

PeaksFit

peaksfit is gaussian fitting of ks distribution.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

mode

Type: {median, average, total} | Default: median

Different processing methods of ks value on collinearity block.

tandem_length

Type: int | Default: 200

If tandem=true, the maximum range of tandem influence.

bins_number

Type: int | Default: 200

Show the minimum length of a collinear block.

fontsize

Type: float | Default: 9

The fontsize of label in the plot.

area

Type: str Default: -1,3

Show the range of ks.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png,*. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Parameters

Use command to enter the folder wgdi -pf ? >> total.conf Take out the parameter file.

[peaksfit] blockinfo = block information mode = median bins_number = 200 ks_area = 0,10 fontsize = 9 area = 0,3 figsize = 10,6.18 savefig = saving image

Quick start

After the parameters are modified properly, then run wgdi -pf total.conf

Example

The original results are easily accessible at wgdi-example

_images/vvi161s_vvi161s.kspeaks.fit.png

KsFigure

A simple way to draw ks distribution map.

Parameters

Parameters

Standards and instructions

ksfit

Type: file | Default: .csv

Output result of parameter kp or filtered.

labelfontsize

Type: float | Default: 15

Different processing methods of ks value on collinearity block.

legendfontsize

Type: float | Default: 15

The fontsize of legend in the plot.

xlabel

Type: str | Default: none

The xlabel of figure.

ylabel

Type: str | Default: none

The ylabel of figure.

title

Type: str | Default: none

The title of figure.

area

Type: str Default: 0,3

Show the range of ks.

shadow

Type: bool Default: true

Whether or not the plotted curve is shown with shadows.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png,*. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -kf ? >> total.conf Take out the parameter file.

[ksfigure]
ksfit = ksfit result(*.csv)
labelfontsize = 15
legendfontsize = 15
xlabel = none
ylabel = none
title = none
area = 0,2
figsize = 10,6.18
shadow = (true/false)
savefig =  save image

Quick start

After the parameters are modified properly, then run wgdi -kf total.conf

Example

The original results are easily accessible at wgdi-example

_images/all_ks.svg

Polyploid classification

Polyploid distinguish among subgenomes.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

ancestor_left

Type: file | Default: none

The ancestral chromosome region of the species on the left of dotplot.

ancestor_top

Type: file | Default: none

The ancestral chromosome region of the species on the top of dotplot.

diff

Type: float | Default: 0.05

More than this value is considered a significant difference.

classid

Type: str | Default: class1,class2

The class name of the grouping according to the divided ancestral area.

savefile

Type: file | Default: * .csv

The resulting file.

Use command to enter the folder wgdi -pc ? >> total.conf Take out the parameter file.

[polyploidy classification]
blockinfo = block information (*.csv)
ancestor_left = file
ancestor_top = file
diff = 0.05
classid = class1,class2
savefile = result file(.csv)

Quick start

The original results are easily accessible at wgdi-example

ancestral_karyotype

Generation of ancestral karyotypes from chromosomes that retain the same structure in modern species.

Parameters

Parameters

Standards and instructions

gff

Type: file | Default: -

Cat the relevant ‘gff’ files into a file.

pep_file

Type: file | Default: none

Cat the relevant ‘pep.fa’ files into a file.

ancestor

Type: file | Default: none

This file requires you to provide.

mark

Type: str | Default: aak

A shorthand for ancestral karyotype.

ancestor_gff

Type: file | Default: none Gff file of the created ancestral karyotype.

ancestor_lens

Type: file | Default: none

Lens file of the created ancestral karyotype.

ancestor_pep

Type: file | Default: -

Pep file of the created ancestral karyotype.

ancestor_file

Type: file | Default: -

Updated the ancestral karyotype file.

Use command to enter the folder wgdi -ak ? >> total.conf Take out the parameter file.

[ancestral_karyotype]
gff = gff file (cat the relevant 'gff' files into a file)
pep_file = pep file (cat the relevant 'pep.fa' files into a file)
ancestor = ancestor file  (this file requires you to provide)
mark = aak
ancestor_gff =  result file
ancestor_lens =  result file
ancestor_pep =  result file
ancestor_file =  result file

Quick start

The original results are easily accessible at wgdi-example

ancestral_karyotype_repertoire

Incorporate genes from collinearity blocks into the ancestral karyotype repertoire.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter -bi and -c.

blockinfo_reverse

Type: bool | Default: false

The two species aligned swap positions.

gff1

Type: file | Default: -

Cat the relevant ‘gff’ files into a file.

gff2

Type: file | Default: -

Cat the relevant ‘gff’ files into a file.

gap

type: int | Default: 5

Minimum number of genes bounded by collinear genes corresponds to two ordered protogenes.

ancestor

Type: file | Default: none

This file requires you to provide.

ancestor_new

Type: file | Default: none

Updated the ancestor file of ancestral karyotype..

mark

Type: str | Default: aak

A shorthand for ancestral karyotype.

ancestor_gff

Type: file | Default: none

Gff file of the created ancestral karyotype.

ancestor_lens

Type: file | Default: none

Lens file of the created ancestral karyotype.

ancestor_pep

Type: file | Default: -

Pep file of the created ancestral karyotype.

ancestor_pep_new

Type: file | Default: -

Updated the pep file of ancestral karyotype.

Use command to enter the folder wgdi -akr ? >> total.conf Take out the parameter file.

[ancestral_karyotype_repertoire]
blockinfo =  block information (*.csv)
# blockinfo: processed *.csv
blockinfo_reverse = False
gff1 =  gff1 file (*.gff)
gff2 =  gff2 file (*.gff)
gap = 5
mark = aak1s
ancestor = ancestor file
ancestor_new =  result file
ancestor_pep =  ancestor pep file
ancestor_pep_new =  result file
ancestor_gff =  result file
ancestor_lens =  result file

Quick start

The original results are easily accessible at wgdi-example

karyotype_mapping

Mapping from the known karyotype result to this species.

Parameters

Parameters

Standards and instructions

blast_reverse

Type: bool | Default: false

The first two columns of the blast result swap positions.

score

Type: int | Default: 100

Score value in blast result.

evalue

Type: float | Default: 1e-5

Evalue value in blast result.

repeat_number

Type: int | Default: 10

The maximum number of homologous genes is allowed to remove more than part of the population.

ancestor_left

Type: file | Default: none

Ancestor location file (Only one of (‘left’, ‘top’) can be reserved)

ancestor_top

Type: file | Default: none

Ancestor location file (Only one of (‘left’, ‘top’) can be reserved)

blockinfo

Type: file | Default: -

blockinfo.csv filtered according to certain conditions ‘-c’

the_other_lens

Type: file | Default: -

The lens file for the species for which you want to generate ancestor files.

limit_length

Type: int | Default: 5

Show the minimum length of blocks.

the_other_ancestor_file

Type: file | Default: * .csv

The resulting file.

Use command to enter the folder wgdi -km ? >> total.conf Take out the parameter file.

[karyotype_mapping]
blast = blast file
blast_reverse = false
gff1 = gff1 file
gff2 = gff2 file
score = 100
evalue = 1e-5
repeat_number = 5
ancestor_left = ancestor location file (Only one of ('left', 'top') can be reserved)
ancestor_top = ancestor location file
the_other_lens = the other lens file
blockinfo = block information (*.csv)
blockinfo_reverse = false
limit_length = 5
the_other_ancestor_file =  result file

Quick start

The original results are easily accessible at wgdi-example

Karyotype

Show genome evolution from reconstructed ancestors.

Parameters

Parameters

Standards and instructions

ancestor

Type: file | Default: -

The specie’s ancestral chromosome regions, result of parameter ‘-km’.

width

Type: float | Default: 0.5

The width of the chromosomes in the figure.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefile

Type: file | Default: * .csv

The resulting file.

Use command to enter the folder wgdi -k ? >> total.conf Take out the parameter file.

[karyotype]
ancestor = ancestor chromosome file
width = 0.5
figsize = 10,6.18
savefig = save image(.png, .pdf, .svg)

Quick start

The original results are easily accessible at wgdi-example

_images/vvi161s.ancestor.svg

Shared_fusions

Quickly find shared fusions between species.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

blockinfo.csv filtered according to certain conditions ‘-c’

lens1

Type: file | Default: none

Lens file and new lens file created by this program.

lens2

Type: file | Default: none

Lens file and new lens file created by this program.

ancestor_left

Type: file | Default: none

Ancestor location file.

ancestor_top

Type: file | Default: none

Ancestor location file.

classid

Type: str | Default: class1,class2

The class name of the grouping according to the divided ancestral area.

limit_length

Type: int | Default: 5

Show the minimum length of blocks.

filtered_blockinfo

Type: file | Default: * .csv

blockinfo.csv filtered by this program.

Use command to enter the folder wgdi -sf ? >> total.conf Take out the parameter file.

[shared_fusion]
blockinfo = block information (*.csv)
lens1 = lens file, new lens file
lens2 =  lens file,  new lens file
ancestor_left = ancestor file
ancestor_top = ancestor file
classid = class1,class2
limit_length = 5
filtered_blockinfo = result blockinfo (.csv)

Quick start

The original results are easily accessible at wgdi-example

Alignment

Alignment of hierarchical and event-related gene collinearity.

Parameters

Parameters

Standards and instructions

blockinfo

Type: file | Default: -

Output result of parameter bi

colors

Type: { color1,color2,color3,– } Default: red,blue,green

Set multiple sets of colors based on grouping, split with a comma.

position

Type: {order, start , end } | Default: order

The position of the gene corresponds to the gff file.

blockinfo_reverse

Type: bool | Default: false

The two species aligned swap positions..

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefile

Type: file | Default: * .csv

Result file.

markersize

Type: float | Default: 0.5

The size of the point in the plot.

savefig

Type: {*. png, *. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -a ? > align.conf Take out the parameter file.

[alignment]
gff1 =  gff1 file
gff2 =  gff2 file
lens1 = lens1 file
lens2 = lens2 file
genome1_name =  Genome1 name
genome2_name =  Genome2 name
markersize = 0.5
position = order
colors = red,blue,green
figsize = 10,10
savefile = savefile(.csv)
savefig= save image
blockinfo = block information file
blockinfo_reverse = false

Quick start

After the parameters are modified properly, then run wgdi -a total.conf

Example

The original results are easily accessible at wgdi-example

_images/soin164s_oda165s_ono163s.alignment.png

Alignmenttrees

Phylogenetic tree construction using alignment or collinear genes. A phylogenetic tree is constructed for gene sets that each row of the alignment file during this process. By using ASTRAL merge, you can obtain a multispecies or subgenomes coalescent tree.

Parameters

Parameters

Standards and instructions

alignment

Type: file | Default: -

The merged file of multiple alignment files with the same reference.

gff

Type: file | Default: -

Gff of reference, If alignment has no reference species, delete it.

lens

Type: file | Default: -

lens of reference, If alignment has no reference species, delete it.

dir

Type: folder | Default: -

Folder for phylogenetic trees.

sequence_file

Type: file | Default: -

In general, it is protein sequences (pep), if it is coding sequences (cds), cds_file need to be deleted.

cds_file

Type: file | Default: none

It is required when the method of constructing trees involvesv codon. Otherwise, it can be discarded.

codon_positon

Type:str | Default: 1,2,3

1,2 mean codon1&2; 1,2,3 mean no codon removed.

trees_file

Type: file | Default: -

Merge multiple nwk-format tree files.

align_software

Type:{ muscle, mafft } | Default: muscle

Software of multiple sequence alignment.

tree_software

Type:{ iqtree, fasttree } | Default: iqtree

Software of constructing phylogenetic trees.

model

Type: str | Default: -

Merge multiple nwk-format tree files.

trimming

Type:{ trimal, divvier} | Default: none

Software of removal of spurious sequences.

minimum

Type: int | Default: 4

Minimum number of gene sets in constructing phylogenetic tree.

delete_detail

Type: bool | Default: false

Whether or not to keep intermediate files when constructing phylogenetic tree.

Use command to enter the folder wgdi -at ? >> total.conf Take out the parameter file. Afterwards, use the ASTRAL command java -jar /path/astral.5.7.7.jar -i trees_file.nwk -o out.tre -t 8, you can obtain a multispecies or subgenomes coalescent tree.

[alignmenttrees]
alignment = alignment file (.csv)
gff = gff file (reference genome, If alignment has no reference species, delete it)
lens = lens file (If alignment has no reference species, delete it)
dir = output folder
sequence_file = sequence file (.fa)
cds_file = cds file (.fa)
codon_positon = 1,2,3  (1,2 mean codon1&2; 1,2,3 mean no codon removed)
trees_file =  trees (.nwk)
align_software = mafft
tree_software =  (iqtree,fasttree)
model = MFP
trimming =  trimal
minimum = 4
delete_detail = true

Quick start

After the parameters are modified properly, then run wgdi -at total.conf

Example

The original results are easily accessible at wgdi-example

Retain

Retain is show subgenomes in gene retention or genome fractionation.

Parameters

Parameters

Standards and instructions

alignment

Type:file | Default: -

Output result of parameter a

colors

Type: { color1,color2,color3,– } Default: red,blue,green

Set multiple sets of colors based on grouping, split with a comma.

refgenome

Type: str | Default: -

A short handbook of reference species.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

step

type: int | Default: 50

The size of the sliding window.

ylabel

Type: str | Default: none

The y-axis label of the picture.

savefile

Type:file | Default: -

Results of the drawing.

savefig

Type: {*. png,*. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -r ? >> total.conf Take out the parameter file.

[retain]
alignment = alignment file
gff = gff file
colors = red,blue,green
refgenome = shorthand
figsize = 10,12
step = 50
ylabel = y label
savefile = retain file
savefig = result

Quick start

After the parameters are modified properly, then run wgdi -r total.conf

Example

The original results are easily accessible at wgdi-example

_images/oin164s_oda165s_ono163s.alignment.retain.svg

P-index

The polyploidy index(P-index) is used to characterize the degree of divergence between subgenomes of a polyploidy, to find whether there has been a balanced or unbalanced gene removal from the homoeologous regions.

Parameters

Parameters

Standards and instructions

alignment

Type:file | Default: -

Output result of parameter a

gap

type: int | Default: 50

The size of the sliding window.

colors

Type: { color1,color2,color3,– } Default: red,blue,green

Set multiple sets of colors based on grouping, split with a comma.

retention

Type: float | Default: 0.05

The region where the retention rate of the sub-genome relative to the reference genome is low, which is 0.05 by default.

diff

Type: float | Default: 0.05

More than this value is considered a significant difference.

remove_delta

Type: bool | Default: true

Whether to remove the parameter retention.

savefile

Type:file | Default: -

Results of the drawing.

Parameters

Use command to enter the folder wgdi -p ? >> total.conf Take out the parameter file.

[pindex]
alignment = alignment file
gff = gff file
lens =lens file
gap = 50
retention = 0.05
diff = 0.05
remove_delta = (true/false)
savefile = result file

Quick start

After the parameters are modified properly, then run wgdi -p total.conf

Example

The detailed explanation is in the published article

Circos

Circos is a simple way to run circos.

Parameters

Parameters

Standards and instructions

alignment

Type:file | Default: -

Output result of parameter a

radius

Type: float | Default: 0.2

Radius, value between 0-1.

angle_gap

Type: float | Default: 0.05

Gap between chromosomes.

ring_width

Type: float | Default: 0.015

The width of the ring.

colors

Type:{chr1:color1,chr2:color2,} Default: 1:red,2:orange,3:blue,4:cyan,5:green

Set multiple sets of colors based on grouping, split with a comma.

position

Type: {order, start , end } | Default: order

The position of a gene corresponds to the gff file.

chr_label

Type: str | Default: -

A shorthand for chromosomes.

column_names

Type: str | Default: -

Column markers of the alignment file.

ancestor_location

Type: file | Default: none

The ancestral chromosome region of the species.

alignment

Type:file | Default: -

Alignment of hierarchical and event-related gene collinearity.

figsize

Type: int,int | Default: 10,10

Control the proportion of the size of the saved picture.

savefig

Type: {*. png,*. pdf, *. svg} | Default: *. png

Save pictures support png, pdf, svg formats.

Use command to enter the folder wgdi -ci ? >> total.conf Take out the parameter file.

[circos]
gff =  gff file
lens = lens file
radius = 0.2
angle_gap = 0.05
ring_width = 0.015
colors  = 1:c,2:m,3:blue,4:gold,5:red,6:lawngreen,7:darkgreen,8:k,9:darkred,10:gray
alignment = alignment file
chr_label = chr
ancestor_location = ancestor file
ancestor = alignment file
figsize = 10,10
label_size = 9
columns_name = 1,2,3,4,5
savefig = result(.png, .pdf, .svg)

Quick start

After the parameters are modified properly, then run wgdi -ci total.conf

Example

_images/3.png

Common file

  • conf

The conf file contains parameters required for the corresponding operation, which are read when WGDI is performed. Using wgdi -* ? > *.conf to get the configuration file in the current directory and modify it to run. If you don’t know which files are needed, you can view it through wgdi -* ? or wgdi -* help or wgdi -* example. These three commands are equivalent.

In conf file: gff1 , lens1 , genome1_name and gff2, lens2, genome2_name represent the files of species 1 and 2 respectively. We will no longer explain these in this documentation.

  • gff

Column

Information

Explanation

1

Chr

Chromosome number

2

Id

Gene name

3

Strat

The starting location of a gene

4

End

The ending location of a gene

5

Direction

Direction of a gene sequence

6

Order

Order of each chromosome, starting from 1

7

Original

Original id and not read

  • lens

Column

Information

Explanation

1

Chr

Chromosome number

2

Length

Length of chromosome sequences

3

Number

Number of chromosome genes

  • blast

The protein-coding genes from each genome were compared against itself and other genomes using BLASTP (e-value < 10-5, and outfmt = 6) or other similar protein sequence searching software ( MMseqs2 , DIAMOND ).

  • ancestor file

Required documents for karyotype evolution analysis

Column

Information

Explanation

1

Chr

Chromosome number

2

Start

Homologous regions of these protochromosomes in this genome

3

End

Homologous regions of these protochromosomes in this genome

4

Color

These protochromosomes with different colors

5

Subgenomes

Subgenomes according to protochromosomes

Tips

  • You can use wgdi -conf ? > total.conf generates a total.conf file with all parameters, and when you modify the parameters and run WGDI, WGDI will only read the parameters corresponding to the total.conf file to execute your command.

  • When a folder runs WGDI, WGDI automatically generates results for you in the background, and you can exit the folder and go to the next folder to start working.

  • WGDI performs the conf file for the current folder, so you can copy the conf file in bulk and make parameter modifications appied to the target folder.

Examples

Introductory examples for WGDI will be updated soon. There are also some data I tested on wgdi-example.

Changelog

0.6.1

  • Fixed issue with alignment (-a). Only version 0.6.0 has this bug.

0.6.0

  • Fixed issue with improved collinearity (-icl).

  • Added a parameter ‘tandem_ratio’ to blockinfo (-bi).

0.5.9

  • Update the improved collinearity (-icl). Faster than before, but lower than MCscanX, JCVI.

  • Fixed issue with ancestral karyotype repertoire (-akr).

0.5.8

  • Fixed issue with gene names (-ks).

0.5.7

  • Fixed issue with chromosome order (-ak).

  • Fixed issue with gene names (-ks). This version is not fixed, please install the latest version.

0.5.5 and 0.5.6

  • Add ancestral karyotype (-ak)

  • Add ancestral karyotype repertoire (-akr)

0.5.4

  • Improved the alignmenttrees (-km) effect.

  • little change (-at).

0.5.3

  • Fixed legend issue with (-kf).

  • Fixed calculate Ks issue with (-ks).

  • Improved the karyotype_mapping (-km) effect.

  • Improved the alignmenttrees (-at) effect.

0.5.2

  • Fixed some bugs.

0.5.1

  • Fixed the error of the command (-conf).

  • Improved the karyotype_mapping (-km) effect.

  • Added the available data set of alignmenttree (-at). Low copy data set (for example, single-copy_groups.tsv of sonicparanoid2 software).

0.4.9

  • The latest version adds karyotype_mapping (-km) and karyotype (-k) display.

  • The latest version changes the calculation of extracting pvalue from collinearity (-icl), making this parameter more sensitive. Therefore, it is recommended to set to 0.2 instead of 0.05.

  • The latest version has also changed the drawing display of ksfigure (-kf) to make it more beautiful.

Citating WGDI

If you use wgdi in your work, please cite:

Sun, P., Jiao, B., Yang, Y., Shan, L., Li, T., Li, X., … & Liu, J. (2022). WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Molecular plant, 15(12), 1841-1851. https://doi.org/10.1016/j.molp.2022.10.018

Contact

If you have any questions or good suggestions , send email via Pengchuan Sun’s mailbox or submit changes on our GitHub.