
Silybum marianum (L.) Gaertner. (also known as milk thistle) is a species of the Asteraceae family that it native to the Mediterranean area and has features similar to annual or biennial, self-fertile plants that grow wild throughout the region (Hetz et al. 1993, Leng-Peschlow 1996). Milk thistle is a serious weed in many countries (LeRoy et al. 1997). It grows preferentially in fertile soils, but it can also grow successfully in sandy soils and heavier clay soils (Khan et al. 2009, Karkanis et al. 2011). It tends to occupy areas and eliminate other plant species through competition (Berner et al. 2002). Silybum marianum contains silymarin, which has hepatoprotective effects. Silymarin is highly accumulated in the external cover of Silybum marianum seeds and is composed of flavonolignan isomers (silybin, isosilybin, silychristin, isosilychristin, and silydianin) (Deep et al. 2008, Valková et al. 2021). Silybin is the principal active compound (Saller et al. 2001). Silybum marianum is a troublesome weed, but it can also be cultivated as a medicinal plant because of its silymarin components.
Mitochondria are membrane-bound cell organelles that play key roles in apoptosis regulation and energy production (Susin et al. 1999). The mitogenome of plants shows high rates of gene loss, accompanying gene transfer to the nucleus, intron acquisition by cross-species horizontal transfer (Palmer et al. 2000), and genetic variability in terms of repetitive sequences, non-coding regions, large introns, frequent duplications, and intergenic alterations (Kenji et al. 1992, Unseld et al. 1997). Therefore, plant mitogenomes show considerable variations in their length, gene order, and gene content (Richardson et al. 2013). In angiosperms, the mitogenome size ranges from 200 to 750 kilobases (kb) (Gualberto et al. 2014). Animal mitogenomes, which are approximately 16.5 kb long, are smaller than those of plants, but the number of encoded genes in plant mitogenomes is smaller than that in animals. Moreover, although the mitogenome size differs between plants, the number of genes in each genome is similar (Morley & Neilsen 2017). The genes encoded by the mitogenome fall into different functional classes such as respiration, oxidative phosphorylation, rRNAs, tRNAs, ribosomal proteins, elongation factor Tu (EF-Tu), RNA maturation, protein import, maturation, and transcription (Burger et al. 2003).
The GenBank Organelle Genome Resource (https://www.ncbi.nlm.nih.gov/genome/organelle/) contains approximately 7460 and 450 reference chloroplast and mitochondrion genomes, respectively. By searching mitochondrial reference genomes, we found that the land plant subgroup occupied approximately 337 (74%) of deposited mitogenome sequences and another subgroup related to green algae occupied the remaining mitogenome sequences. The land plant subgroup is divided into bryophytes and tracheophytes. Approximately 262 (78%) of public mitogenome sequences belong to the tracheophyte division. In the tracheophyte division, reference mitogenomes are available for 18 species in the Asteraceae family that are members of 8 different genera, including Ageratum (NC_053927.1, Ageratum conyzoides), Arctium (NC_058644.1, Arctium lappa; NC_058643.1, Arctium tomentosum), Bidens (NC_060635.1, Bidens bipinnata; NC_062672.1, Bidens biternata; NC_062670.1; Bidens parviflora; NC_062673.1, Bidens pilosa; NC_062671.1, Bidens tripartita), Chrysanthemum (NC_039757.1, Chrysanthemum boreale), Diplostephium (NC_ 034354.1, Diplostephium hartwegii), Helianthus (NC_023337.1, Helianthus annuus; NC_051989.1, Helianthus grosseserratus; NC_058584.1, Helianthus occidentalis; NC_051990.1, Helianthus strumosus; NC_058585.1, Helianthus tuberosus), Lactuca (NC_ 042406.1, Lactuca saligna; NC_042756.1, Lactuca sativa), and Saussurea (NC_059793.1, Saussurea costus). The complete chloroplast genome of Silybum marianum was derived from a plant of ‘SMAR20150709’ (unpublished) and deposited under Accession Number NC_028027.1, but the mitogenome has not been reported. In this study, the mitogenome features of Silybum marianum were analyzed and compared with the published reference mitogenomes of plants in the Asteraceae family.
Silybum marianum DNA was extracted from plants with an unknown genetic source (‘912036’) provided from EL&I, Co., Ltd. in Gyeonggi-do, Korea (Shim et al. 2020). The DNA was sequenced using long-read and short-read sequencing. For long-read sequencing, the Oxford Nanopore PromethION platform was used with the FLO-PRO002 flow cell type, and the libraries were prepared using the SQK-LSK110 Kit. For short-read sequencing, Illumina sequencing libraries were prepared using the TruSeq Nano DNA Kit and sequenced using the Illumina HiSeq X platform (151 base-pair [bp] paired-end reading). Long-read sequencing generated 3,063,041 reads, with a read-length N50 value of 39,844 bp and a mean read length of 25,239.3 bp, for a total of 77,308,942,612 bp. Short-read sequencing generated 195,748,964 reads and containing 29,558,093,564 bp. The Oxford Nanopore long reads were used to assemble the mitogenome of Silybum marianum using NextDenovo software (version 2.3.1), and Illumina short reads were used to correct the assembled data using NextPolish software (version 1.3.1). The default parameters of the NextDenovo and NextDenovo software tools were used. The assembled and corrected data comprised 705,967,878 bp and 67 contigs with an average length of 10,536,834 bp, a maximum length of 49,987,051 bp with an N50 value of 27,691,683 bp (11 contigs; n=11), an N70 value of 22,464,351 bp (n=17), and an N90 value of 12,772,082 bp (n=25). To identify the mitogenome, these contigs were compared with the reference mitogenomes of plants deposited in the National Center for Biotechnology Information (NCBI) by performing BLASTn analysis. Based on ≥99% sequence identity and ≥5 kb of identical matching number and length, a self-looping contig was selected as the potential mitogenome of Silybum marianum.
The potential mitogenome was initially annotated using a publicly available web-based tool (MITOFY; https://vcru.wisc.edu/ cgi-bin/mitofy/mitofy.cgi) to identify genes. Subsequently, the web-based tool GeSeq (which employs tRNAscan-SE software, version 2.0.7) was used for annotation by comparison with three reference mitogenomes from NCBI GenBank (NC_058644.1, Arctium lappa; NC_058643.1, Arctium tomentosum; NC_059793.1, Saussurea costus). The three reference mitogenomes were selected based on ≥99% sequence identity and ≥20 kb of identical matches with the contig using BLASTn. The GenBank file which has been resulted from the GeSeq annotation was edited to draw a circular mitogenome map using OGDRAW based on the MITOFY annotation results.
Eleven reference mitogenomes (NC_058643.1, Arctium tomentosum; NC_058644.1, Arctium lappa; NC_059793.1, Saussurea costus; NC_058584.1, Helianthus occidentalis; NC_ 058585.1, Helianthus tuberosus; NC_051990.1, Helianthus strumosus; NC_051989.1, Helianthus grosseserratus; NC_039757.1, Chrysanthemum boreale; NC_034354.1, Diplostephium hartwegii; NC_042756.1, Lactuca sativa; NC_042406.1, Lactuca saligna) for the Asteraceae family were selected based on this with ≥ 99% sequence identity and ≥5 kb length of identical matches when the contig (i.e., the potential mitogenome) was analyzed by BLASTn. Additionally, four reference mitogenomes (NC_037949.1, Codonopsis lanceolata; NC_035958.1, Platycodon grandifloras; NC_006581.1, Nicotiana tabacum; NC_035963.1, Solanum lycopersicum) of the outgroup were used as controls for the plant mitogenomes. The amino acid sequences of common protein-coding genes in 16 mitogenomes related to the Silybum marianum mitogenome, 11 reference mitogenomes for Asteraceae members, and 4 outgroups were downloaded from NCBI GenBank and used to construct a phylogenetic tree. The amino acid sequences corresponding to each protein-coding gene of the milk thistle and reference plant mitogenomes were aligned using MAFFT (version 7.505) (Katoh & Standley 2013). TrimAl (version 1.4.rev15) was used to trim the aligned amino acid sequences and remove spurious sequences or poorly aligned regions. After alignment and trimming of the amino acid sequences, they were used as input data for the IQ-TREE tool (version 2.2.0). The IQ-TREE analysis generated data in NEWICK format, which was used to construct a phylogenetic tree using the FigTree tool (version 1.4.4).
Simple-sequence repeats (SSRs) of the reference mitogenomes and Silybum marianum mitogenome were discovered using the online website MISA (https://webblast.ipk-gatersleben.de/misa/), with a size of one to six nucleotides and minimum numbers of 8, 4, 4, 3, 3, and 3, respectively. The Tandem Repeats Finder program (version 4.07b) was used with the default parameters to analyze additional repeat elements for tandem repeats (Benson 1999).
In previous studies, nucleotide-diversity (Pi) values were used to evaluate nucleotide differences between multiple sequences (Mehmetoglu et al. 2022; Zhang et al. 2009). The DNA sequence files of the reference mitogenomes (NC_058644.1, Arctium lappa; NC_058643.1, Arctium tomentosum; NC_059793.1, Saussurea costus) that formed a cluster with Silybum marianum mitogenome in the phylogenetic tree were downloaded from NCBI GenBank. The sequences of the mitogenomes with strong matches with the Silybum marianum mitogenome were aligned using MAFFT (version 7.505), and the aligned sequences were used as input data. The Pi positions were defined between the sequence files of the reference mitogenomes and Silybum marianum mitogenome by performing DNA-polymorphism analysis with the DnaSP software package (version 5.10.01). The positions of the high and low Pi values were checked to identify the coding genes of the Silybum marianum mitogenome. A 100-bp sliding window with a 25-bp step size was used to summarize Pi for visualization purposes.
Common protein-coding genes found in the three reference mitogenomes (NC_058644.1, Arctium lappa; NC_058643.1, Arctium tomentosum; NC_059793.1, Saussurea costus) and the Silybum marianum mitogenome were used to estimate nucleotide-substitution rates. The nucleotide-substitution rates, including the non-synonymous-substitution rate (Ka) and synonymous-substitution rate (Ks), as well as the Ka: Ks ratio of the protein-coding genes were estimated using the KaKs Calculator (version 2.0). Pairwise Ka: Ks ratios were plotted using the pheatmap package of R software.
The mitogenomes of Asteraceae family members deposited in NCBI GenBank had an average size of 266,432.61 bp and the following average base compositions: A, 27.38%; T, 27.32%; G, 22.64%; and C, 22.66%. The assembled Silybum marianum mitogenome generated in this study had a typical circular structure with a size of 407,123 bp (Fig. 1). The overall base compositions were as follows: A, 27.41%; T, 27.33%; G, 22.72%; and C, 22.54%. Seventy-four unique genes were identified in the Silybum marianum mitogenome based on the annotation results. These genes included 27 protein-coding genes, 44 tRNA genes, and 3 rRNA genes (Table 1). The 27 protein-coding genes could be divided into seven classes, including ATP synthases (atp1, atp4, atp6, atp8, and atp9), Cytochrome c biogenesis (ccmB, ccmFc, and ccmFn), Cytochrome c oxidases (cox1, cox2, and cox3), NADH dehydrogenases (nad1, nad2, nad3, nad4L, nad5, nad6, and nad7), Large ribosomal subunits (rpl5, rpl10, and rpl16), Small ribosomal subunits (rps3, rps4, rps12, rps13, and rps14), and Succinate dehydrogenase (sdh4). Of the protein-coding genes, ccmFc, cox2, nad2, nad5, nad7, and rps3 contain introns, three genes (ccmFc, cox2, and rps3) harbor one intron, two genes (nad2 and nad5) harbor two introns, and one gene (nad7) harbors four introns. Of the tRNA-coding genes, trnQ-UUG and trnT-UGU contain one intron. rrn5, nad5, nad6, and nad7 were annotated in more than one region. Three copies of the coding gene rrn5 were detected. The annotations for nad6 and nad7 revealed that two copies of these protein-coding genes were present in the mitogenome, but the annotation for nad5 revealed two regions of different sizes for this gene. In another study, the nad5 gene of higher plant mitochondria required trans-splicing to induce maturation of the mRNA, and the coding genes for nad5 were split into three or five exons at distant regions in the mitochondria of the higher plants (wheat and maize), Oenothera, and Arabidopsis (Glanz & Kück 2009; Knoop at al. 1991; Pereira de Souza et al. 1991). In the Silybum marianum mitogenome, the coding genes for nad5 were found to have two exons at two distant genomic regions. One nad5 region was 1,501 bp long (bp 61454-62954) and had one intron of 960 bp. The other nad5 region was 2,286 bp long (bp 266361-268646) and had one intron of 834 bp. It is necessary to confirm whether these coding genes for nad5 require trans-splicing, based on the annotation results for nad5.
After assembly and annotation of the Silybum marianum mitogenome, 11 reference mitogenomes of the Asteraceae family and four reference mitogenomes of the outgroup were compared to identify common protein-coding genes shared with Silybum marianum. Sixteen common protein-coding genes were identified between the reference mitogenomes and the Silybum marianum mitogenome (Fig. 2).
Phylogenetic analysis of the amino acid sequences of 16 common protein-coding genes (atp1, atp6, atp9, ccmB, ccmFc, ccmFn, cox1, cox3, nad3, nad4L, nad6, nad7, rps3, rps4, rps12, and rps13) of the 16 mitogenomes yielded separations between the Asteraceae family group and the outgroup (Fig. 3). In the outgroup, the mitogenomes of the Campanulaceae family (NC_ 037949.1, Codonopsis lanceolata; NC_035958.1, Platycodon grandiflorus) and the Solanaceae family (NC_006581.1, Nicotiana tabacum; NC_035963.1, Solanum lycopersicum) separated into clusters. In the Asteraceae family group, the mitogenomes of identical genera (Arctium, Helianthus, and Lactuca) formed distinct clusters. The three mitogenomes of Arctium tomentosum (NC_058643.1), Arctium lappa (NC_058644.1), and Saussurea costus (NC_059793.1) were closely related to the mitogenome of Silybum marianum.
The Asteraceae family has capitulum features, and the capitula commonly has two types of florets (ray and disc florets). Ray floral symmetry is characterized by three fused ventral petals protruding, whereas disc florets have radial symmetry with five evenly sized petals (Figs. 4A-4C) (Zoulias et al. 2019). In a study conducted by Elomaa et al. (2018), the characteristics of Asteraceae flower heads were heterogamous and homogamous. With the heterogamous flower heads, the capitulum was occupied by ray flowers and the center was occupied by disc flowers. For example, sunflower plants (Helianthus) have marginal ray flowers, which are sterile and have perfect central discs. In contrast, in homogamous flowers, the heads are formed from single-flower types. For example, the heads of lettuce (Lactuca) are composed of only ray flowers, whereas the discoid heads in thistles develop disc flowers. Based on this morphological feature of Asteraceae flowers (Fig. 4), Silybum marianum was similar to the three reference species of the Asteraceae family. These three mitogenomes were identical to the three reference mitogenomes used for Silybum marianum annotation.
The SSR results for four mitogenomes (Silybum marianum mitogenome used in this study; NC_058643.1, Arctium tomentosum; NC_058644.1, Arctium lappa; NC_059793.1, Saussurea costus) revealed similar numbers of repeats that were one to six nucleotides long (Fig. 5). The mitogenomes of Arctium tomentosum and Arctium lappa had identical SSR results, and the number of di-nucleotide repeats was only higher than the number of mono-nucleotide repeats for Silybum marianum. In terms of the distributions of perfect tandem repeats, Silybum marianum, Arctium tomentosum, Arctium lappa, and Saussurea costus had eight, four, four, and seven tandem repeats, respectively (Table 2). The mitogenomes of Arctium tomentosum and Arctium lappa were found to have identical tandem repeat sequences, but the positions of the repeat sequences were slightly different. The repeat sequence ‘GAAAAGGGTATGAAATAGGTTGCTTGT’ is shared between three mitogenomes (Silybum marianum, Arctium tomentosum, and Arctium lappa), and it is located at two regions of the mitogenome of Silybum marianum. The two repeat sequences ‘TGAGAGATTCTATAGTTCCTGAGCT’ and ‘AGGTAAAA CAGTACGCCCACT’ are shared between the Silybum marianum and Saussurea costus mitogenomes.
In this study, the Pi values of the four mitogenomes varied at different positions (Fig. 6). Eight regions had Pi values of >0.5 (nucleotide positions 34,711-34,964, 198,523-198,775, 20,7021-207,400, 208,117-208,242, 209,467-209,616, 210,848-210,988, 214,087-214,217, and 216,649-216,770), and two regions had Pi value of <0.01 (nucleotide positions 322,723-333,182 and 353,080-356,650). In the Silybum marianum mitogenome, the coding genes trnS-UGA, rps4, and cox2 mapped to regions with Pi values higher than 0.5, and the coding genes trnK-UUU, nad6, trnL-GAG, trnE-UUC, trnV-CAC, and trnsec-UCA mapped to regions with Pi values less than 0.01.
Nucleotide-substitution rates (Ka: Ks ratios) are used to understand the evolutionary dynamics of protein-coding genes in closed species (Fay & Wu 2003). The Ka: Ks ratios can be interpreted to indicate evolutionary selective pressure: neutral evolution when the Ka: Ks ratio=1, positive selection when the Ka: Ks ratio is >1, and negative selection when the Ka: Ks ratio is <1 (Zhang et al. 2006). Twenty common protein-coding genes (atp1, atp4, atp6, atp8, atp9, ccmB, ccmFc, ccmFn, cox1, cox3, nad3, nad4L, nad6, nad7, rpl5, rpl10, rps3, rps4, rps12, and rps13) were selected based on sequence-size similarity among the four mitogenomes. The nucleotide-substitution rates of common protein-coding genes were estimated for the Silybum marianum mitogenome and three reference mitogenomes (Fig. 7). The Ka: Ks ratio of ccmB was >1, and those of cox1, rps13, rps12, atp4, nad4L, atp6, rpl5, nad3, rpl10, ccmFc, atp8, cox3, and rps3 were <1, suggesting that both positive and negative selection occurred during evolution. The numbers of protein-coding genes with Ka: Ks ratios of <1 were 18, 16, and 15 in the mitogenomes of Arctium tomentosum, Arctium lappa, and Saussurea costus, respectively. In this study, the average Ka: Ks ratios of most protein-coding genes were <1. These Ka: Ks ratios indicate that negative selection occurred as a means of conserving those genes during evolution.
In this study, we assembled and annotated the mitogenome of Silybum marianum and compared common protein-coding genes between the mitogenomes of Silybum marianum and reference plants (11 members of the Asteraceae family and four outgroup plants) to construct a phylogenetic tree. Phylogenetic analysis using common protein-coding genes showed that the Silybum marianum mitogenome was closely related to the mitogenomes of three reference Asteraceae family plants (Arctium tomentosum, Arctium lappa, and Saussurea costus). Genomic features were compared to repeat elements of four mitogenomes (Silybum marianum, Arctium tomentosum, Arctium lappa, and Saussurea costus). The SSR values were similar in the four mitogenomes. In terms of perfect tandem repeats, the tandem repeat sequences of the two mitogenomes (Arctium tomentosum and Arctium lappa) were identical, and the mitogenome of Silybum marianum had one or two identical tandem repeat sequences with each of the three mitogenomes. These four mitogenomes were used to evaluate nucleotide differences, which showed relatively large and small differences in the nucleotide positions in these four mitogenomes. When analyzing the nucleotide-substitution rates, we found that the Ka: Ks ratios between the mitogenome of Silybum marianum and the three reference mitogenomes were almost <1 for the common protein-coding genes. These values suggest that common protein-coding genes in the mitogenome of Silybum marianum were conserved during evolution.
Plant mitogenomes can be used to analyze phylogenetic relationships with the mitogenomes of other plant species. Zervas et al. (2019) compared the substitution rates of holoparasitic, hemiparasitic, and autotrophic plants by constructing a phylogenetic tree for angiosperms. In this study, the mitogenomes of Viscaceae among parasitic plants were unique with regard to their mitogenome contents and evolutionary substitution rates. In another study, Chang et al. (2013) studied the genome structures of soybean plants and gene evolution at the intercellular and phylogenetic levels. In this study, we used the mitogenome of representative soybean species with conserved genes to construct a phylogenetic tree to analyze mitogenome evolution. Our phylogenetic tree showed that intercellular transfer (loss or acquisition) occurred with genes of the soybean mitogenome and implied that gene loss of the mitogenome in seed plants may be considered a form of evolutionary compaction.
Cirsium, a genus of the Asteraceae family that comprises annual or perennial herbs, is distributed throughout northern Africa, Asia, Central and North America, and Europe (Song & Kim 2007). This genus comprises approximately 250-300 species worldwide and can be delineated by studying their characteristics in terms of cypsela size, color, surface ornamentation, and pericarp, and testa structures (Ghimire et al. 2018). Cirsium japonicum var. maackii (Maxim.) Matsum is known as Korean milk thistle and contains flavonoid compounds with pharmacological effects in various parts of the plant (Lee et al. 2017). The results of a study conducted by Jung et al. (2017) demonstrated that the methanol extracts and flavonoids from Cirsium japonicum var. maackii (Maxim.) Matsum protected human hepatocellular carcinoma (HepG2) cells against oxidative damage are that they potential natural antioxidative biomarkers of oxidative stress-induced hepatotoxicity. Park et al. (2004) described the pharmacological properties of a methanol extract and hispidulin 7-O-neohesperidoside isolated from Cirsium japonicum var. Ussuriense. Their results showed that the extract and compound decreased hepatic lipid peroxidation, along with increased hepatic levels of reduced glutathione, suggesting that the plant may affect alcoholic toxicity by enhancing ethanol oxidation and inhibiting lipid peroxidation. Additionally, other findings have supported the hepatoprotective efficacy of flavonoids and transcriptomics from the Cirsium genus (Mok et al. 2011, Yoo & Bae 2012, Park et al. 2020). The authors of those studies concluded that Cirsium is similar to Silybum mariaum in terms of its hepatoprotective properties.
The results of this study demonstrate that the mitogenome of Silybum mariaum is closely related to three reference mitogenomes (Arctium tomentosum, Arctium lappa, and Saussurea costus) and that phylogenetic relationships with morphological flowers of the Asteraceae family could be recognized in the mitogenome. Previous data suggest that Silybum mariaum is morphologically and pharmacologically similar to Cirsium (Ma et al. 2016, Nam et al. 2018). Therefore, additional mitogenomes of the Asteraceae family of plants in the Cirsium genus may be needed to specifically analyze the phylogenetic relationship of Silybum mariaum within Asteraceae family plants based on their mitogenomes.
This research was supported by the Rural Development Administration of South Korea under Project Number PJ015988.
![]() |
![]() |