of 12

Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline

27 views12 pages

Download

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline
  Analysis of sequence variability in the macronuclearDNA of   Paramecium tetraurelia:   A somatic viewof the germline Laurent Duret, 1 Jean Cohen, 2,3,4 Claire Jubin, 5,6,7 Philippe Dessen, 3,8 Jean-FrançoisGoût, 1 Sylvain Mousset, 1 Jean-Marc Aury, 5,6,7 Olivier Jaillon, 5,6,7 Benjamin Noël, 5,6,7 Olivier Arnaiz, 2,3,4 Mireille Bétermier, 2,3,4,9,10 Patrick Wincker, 5,6,7 Eric Meyer, 9,10 andLinda Sperling 2,3,4,11 1 Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622,France;  2 CNRS, Centre de Génétique Moléculaire, UPR 2167, Gif-sur-Yvette, F-91198, France;  3 Univ Paris-Sud, Orsay, F-91405,France;  4 Université Pierre et Marie Curie-Paris 6, Paris, F-75005, France;  5 Genoscope (CEA), 91057 Evry, France;  6  CNRS, UMR8030, 91057 Evry, France;  7  Université d’Evry, 91057 Evry, France;  8 Laboratoire Génomes et Cancers, FRE 2939 CNRS, Institut Gustave Roussy, 94805 Villejuif Cedex, France;  9 École Normale Supérieure, Laboratoire de Génétique Moléculaire, 75005 Paris,France;  10 CNRS, UMR 8541, 75005 Paris, France  Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silentmicronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express thegenetic information from a streamlined version of the genome but are replaced at each sexual generation. Themacronuclear genome of   Paramecium tetraurelia   was recently sequenced by a shotgun approach, providing access to thegene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced aftersexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences,precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to highcopy number. We report use of the shotgun sequencing data (>10 6 reads representing 13× coverage of a completelyhomozygous clone) to evaluate variability in the somatic DNA produced by these developmental genomerearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes producesequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putativenew IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce prematuretermination of translation in case of excision failure.  Paramecium  is a unicellular eukaryote that belongs to the ciliateclade. One peculiar feature of ciliates is that, like multicellulareukaryotes, they separate germinal and somatic functions, in theform of two kinds of nuclei. A diploid germline micronucleus(MIC) undergoes meiosis to transmit the genetic information tothe next sexual generation. A polyploid somatic macronucleus(MAC) is responsible for gene expression but develops anew ateach sexual generation through reproducible rearrangements of the zygotic genome (for reviews, see Prescott 1994; Bétermier2004; see Fig. 1A for a summary of the  Paramecium  life cycle).In  Paramecium tetraurelia , the developmental genome rear-rangements (Fig. 1B) consist of DNA amplification to a final copynumber of   ∼ 800 n and DNA elimination via two pathways. Thefirst DNA elimination pathway is responsible for the removal of some60,000short,unique-copyelements(IES,forInternalElimi-nated Sequence) that interrupt both coding and non-coding se-quences. IESs are bound by 5  -TA-3  dinucleotides and their per-fectly precise excision is accomplished by a mechanism that pro-duces double-stranded breaks followed by end joining (Gratiasand Bétermier 2003). One TA dinucleotide remains in the chro-mosome after excision of the element.The second DNA elimination pathway removes larger re-gions that often contain transposable elements (TE) and otherrepeated sequences, by an imprecise mechanism similar to thatresponsible for transposon silencing in eukaryotes. This mecha-nism involves short non-coding RNAs that probably target het-erochromatin formation through histone methylation (for re-view, see Meyer and Chalker 2007). The heterochromatin is lostduring MAC development. This DNA elimination pathway oftenleads to chromosome fragmentation. The new chromosome endsare healed by the addition of telomeric repeats, consisting of 200-to 300-nt random mixtures of G 4 T 2  and G 3 T 3  hexamers (Baroinet al. 1987).Although these developmentally programmed genome rear-rangements are highly reproducible, there is evidence that theygenerate some MAC chromosome heterogeneity within clonalcell populations, as shown by characterization of a few locilinked to the telomeric A and G surface antigen genes, in  P.tetraurelia  and  Paramecium primaurelia , respectively. The elimina- 11 Corresponding author.E-mail sperling@cgm.cnrs-gif.fr; fax 33-1-69-82-31-81.  Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.074534.107. Letter 18:585–596 ©2008 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/08; www.genome.org  Genome Research 585 www.genome.org  tion of repeated sequences, which usually leads to chromosomefragmentation, can also be resolved by variable internal dele-tions, as characterized in detail for one DNA elimination regionin  P.primaurelia (Fig.1B;LeMouëletal.2003).Sinceafewroundsof endoreplication of the diploid zygotic genome precede DNAelimination, both chromosome fragmentation and variable in-ternal deletions occur at this locus, even within a single homo-zygous cell. Similarly, the use of four different telomere additionregions, separated from each other by  ∼ 10 kb, generates variabil-ity downstream from the A surface antigen gene in  P. tetraurelia (Forney and Blackburn 1988; Amar and Dubrana 2004).Patterns of MAC rearrangements may also vary betweenclonal cell populations. Indeed, variant MAC rearrangement pat-terns can be maintained across sexual generations, in the pres-ence of a completely wild-type MIC genome (Epstein and Forney1984; Meyer 1992; Duharcourt et al. 1995). The non-Mendelianinheritance of the rearrangement patterns can now be explainedby “genome scanning” during development: The maternal MACDNA is compared with the MIC DNA by a homology-dependentmechanism related to RNA interference (Mochizuki and Gor-ovsky 2004; Nowacki et al. 2005). The comparison ensures thatonly sequences present in the maternal MAC will be amplifiedand maintained in the new zygotic MAC (for review, see Meyerand Chalker 2007).Given the strong heritability of the patterns of MAC rear-rangements, variation in these patterns can give hold to the ac-tion of selection. To understand the constraints that the programof developmental genome rearrangements exerts on the evolu-tion of the genome, it is essential to study these variations.  P.tetraurelia  somatic DNA was recently sequenced to a depth of  ∼ 13  and assembled to provide a 72-Mb draft of the MAC ge-nome (Aury et al. 2006). Annotation and analysis of the generepertoire revealed that the very large number of protein-codinggenes (nearly 40,000) is the consequence of at least three succes-sive events of whole genome duplication (WGD) in the  Parame-cium  lineage (Aury et al. 2006). In the present study, we havetaken advantage of the whole genome shotgun (WGS) sequenc-ing data (final assembly and >10 6 sequencing reads) to evaluateheterogeneity among the MAC chromosomes. It is important tonote that this variability cannot be the result of allelic variations,since the genome sequence was obtained from entirely homozy-gous cells. We analyzed the heterogeneity produced by each of the three developmental processes: DNA amplification, IES exci-sion, and elimination of TE and other repeated sequences. DNAamplification appears to be uniform across the genome. Con-versely, the processes of TE elimination and IES excision are animportant source of variability. Thanks to this heterogeneity, wehave been able to identify hundreds of putative new IESs(whereas only 42 IESs were previously known in  P. tetraurelia ).This allows us to demonstrate that IESs located within codingregions are under selective pressure to introduce premature ter-mination codons (PTC) in case of excision failure. Figure 1.  Nuclear dimorphism and the paramecium life cycle. ( A ) Life cycle.  Left  , vegetative cycle. During vegetative growth, paramecia divide bybinary fission. The two micronuclei (MIC) undergo mitosis in the absence of nuclear envelope breakdown, while the macronucleus (MAC) elongates anddivides by an amitotic process.  Right  , sexual cycle.  P. aurelia   species have two mating types, and sexually reactive cells can conjugate with a partner of the opposite mating type. In the absence of an appropriate partner, an auto-fertilization process (autogamy) occurs, illustrated here. Autogamy beginswith meiosis of the two MIC to yield eight haploid products, seven of which degenerate. The eighth haploid gametic nucleus copies itself by mitosisand the two identical haploid nuclei fuse to from a completely homozygous diploid zygotic nucleus. Two post-zygotic mitotic divisions yield four diploidgermline nuclei, which migrate to positions at the anterior and posterior of the cell. The two nuclei at the cell posterior differentiate into new MACs,the two nuclei at the cell anterior are the new MICs. During the first, caryonidal, cell division the macronuclear anlage do not divide. One is distributedto each daughter cell as they continue to endoreplicate DNA to attain the final MAC copy number of   ∼ 800 n. After a second cell division in which boththe MICs and the MAC divide, the sexual progeny enter the vegetative phase. Note that the progressively fragmented maternal MAC is presentthroughout meiosis, fertilization, and MAC differentiation and remains transcriptionally active. The fragments are lost by dilution in the course of the first cell divisions. The same events occur during conjugation; however, there is reciprocal exchange of haploid gametic MICs, so that the diploid zygoticnucleus produced by fertilization in each conjugating cell is heterozygous at all loci. Both sexual processes can be induced by standard laboratoryprotocols (Sonneborn 1974). ( B  ) Genome reorganization. During MAC differentiation, DNA is amplified to a final copy number of   ∼ 800 n. Differentclasses of repeated sequences, such as transposable elements and minisatellite repeats, schematized here, are eliminated by an imprecise mechanismthat leads to chromosome fragmentation and de novo telomere addition, but which in some cases can be resolved by variable internal deletions. InternalEliminated Sequences (IES) are short (<1 kb) unique-copy elements that interrupt both coding and noncoding sequences. IESs are precisely excisedbetween 5  -TA-3  dinucleotides at each end of the IES. A single TA remains in the MAC DNA. For a detailed review, see Bétermier (2004). Duret et al. 586 Genome Research www.genome.org  Besides somatic chromosomal rearrangements, we also ana-lyzed rearrangements that have occurred in the germline, overevolutionary time. For this purpose, we exploit information fromthe recent WGD that was detected in the  Paramecium  lineage(Aury et al. 2006) that can be analyzed by alignment at thenucleotide level. We show that the rate of chromosomal rear-rangements is remarkably low in  Paramecium . Finally, the dataallow us to infer a simple relationship between MAC chromo-somes and the MIC chromosomes from which they derive. Results DNA amplification during macronuclear development The 72-Mb draft assembly of   P. tetraurelia  MAC DNA consists of 697 scaffolds of which 188 are >47 kb, the minimum size ob-served for MAC chromosomes by pulsed field gel electrophoresis.These chromosome-sized scaffolds contain 96% of the genomeassembly, and at least 60% of them represent complete MACchromosomes since reads with telomere repeats map to bothscaffold ends (Aury et al. 2006). In order to evaluate whethercopy number is uniform across the macronucleus as expected if MAC-destined sequences are amplified to the same extent, wemapped the raw sequence reads (trimmed for quality, see Meth-ods) to the assembly to determine the number of times each baseof the assembly is present in the reads.Figure 2A shows a scatter plot of average sequencing depthas a function of scaffold length for each of the 697 scaffolds. Theaverage sequencing depth for the 188 chromosome-sized (>47kb) scaffolds varies from 5  to 15  . Most of the small (<47 kb)scaffolds, which together represent only 4% of the assembly,have lower sequencing depth. We were able to map 46 of them togaps in the larger scaffolds. We also remapped reads with telo-mere repeats (see Methods) to the small scaffolds. None of the Figure 2.  Depth of sequencing coverage and G+C content of the MAC genome assembly. ( A ) Scatter plot of average sequence depth in each of the697 scaffolds of the assembly as a function of the size in nucleotides of the scaffold. The average sequencing depth is defined as the average number of reads that cover each nucleotide of the scaffold. ( B  ) Scatter plot of the average sequencing depth of each of the 188 chromosome-sized scaffolds (>47kb) as a function of the G+C content of the scaffold. The points were fit by linear regression,  R 2 = 0.82,  P   < 10  4 . ( C  ) Average sequencing depth andG+C content were calculated in 1-kb nonoverlapping windows for the 188 chromosome-sized scaffolds. Primary  Y  -axis: average of the sequencingdepths for all of the 1-kb windows with the same G+C content (bins of 1%), plus or minus the SD, represented by black dots and dark-gray error bars,is plotted as a function of G+C content. Secondary  Y  -axis: histogram of G+C content, calculated using the 1-kb nonoverlapping windows (light-grayvertical bars). Sequence variability of   Paramecium  somatic DNA Genome Research 587 www.genome.org  small scaffolds resembles a complete chromosome; however, 83have multiple remapped telomere reads and could representchromosome ends (data not shown). We conclude that the ma-jority of small scaffolds probably correspond to gaps or ends of the large scaffolds; they will not be taken into consideration inwhat follows.A representation of average sequencing depth as a functionof G+C content for each scaffold reveals a strong correlation be-tween these parameters (Fig. 2B). The scaffolds vary by  ∼ 10% intheir G+C content, and the lower the G+C content, the lower thesequencing depth. To further test this correlation, we calculatedaverage sequencing depth and G+C content in nonoverlapping1-kb windows for the 188 chromosome-sized scaffolds (Fig. 2C).The 1-kb windows with lower G+C content than the genome’s28% average have below average sequencing depth, while thewindows with higher G+C content also have above average se-quencing depth, confirming the correlation. Since  Paramecium has a very A+T-rich genome, the observed correlation can prob-ably be explained by the fact that A+T-rich inserts are unstable in  Escherichia coli . The higher the A+T content of a region, thepoorer its representation in the shotgun sequencing libraries.Thus all of the chromosome-sized scaffolds present approxi-mately the same average sequencing depth, once corrected fortheir variation in G+C content. This implies that the amplifica-tion of the macronuclear genome is uniform, despite the exis-tence of underamplified regions within chromosomes, for ex-ample adjacent to fragmentation/internal deletion sites. We can-not exclude the existence of underamplified chromosomes.However, independent data argue against this possibility. First,all previously characterized  Paramecium  genes are present in theassembly (data not shown). Second, the 78,110 ESTs generated inthe course of the sequencing project (to the exclusion of ribo-somal RNA and mitochondrial contaminants) all map unam-biguously to the assembly (Aury et al. 2006). We conclude thattheexpressedportionofthe  Paramecium genomeisamplifiedandmaintained at uniform copy number in the macronucleus. Low rate of chromosomal rearrangements during evolution A striking characteristic of the  P. tetraurelia  genome is that 51% of the genes duplicated at the most recent WGD are still present intwo copies. Alignment of each of the 12,026 pairs of paralogsrevealed a distribution of amino acid identities comparable tothat of mouse-human orthologs, with a peak near 95% identity.Although the synonymous substitution rates (  K  s ) of the paralogsare close to saturation, the nonsynonymous substitution rates(  K  a ) are very small, indicating that strong purifying selectionmaintains the amino acid sequences. Phylogenetic analysisshowed that the most recent WGD occurred just before the spe-ciation events that gave rise to the  Paramecium aurelia  complex of 15 sibling species (Coleman 2005; Aury et al. 2006).We carried out an all-against-all nucleotide comparison of the 188 chromosome-sized scaffolds in order to obtain a pictureof the recent WGD at the nucleotide sequence level. Segments of >80% nucleotide identity, corresponding to  ∼ 30% of the nucleo-tides in the assembly, cover most of the ORFs that are related bythe recent WGD as well as some noncoding sequences that mayinclude gene regulatory regions or noncoding RNA. The seg-ments were grouped into syntenic blocks and the blocks wereclustered using a transitive algorithm. We then drew all of theclusters (examples in Fig. 3; all of the drawings and a syntenyviewer are available at http://paramecium.cgm.cnrs-gif.fr/tool/synteny). The drawings show the scaffolds (horizontal blacklines) joined by segments of nucleotide alignment in blue (orpink for inverted regions), decorated by remapped telomererepeats (vertical maroon lines) and variable surface antigengenes (turquoise boxes). The drawings guided us in separationof the scaffolds into two half genomes. A dot plot comparisonof the two half genomes presents a nearly continuous diagonal(Fig. 3F).All regions of the genome are paired, and 48 of the 73 clus-ters consist of a single pair of scaffolds, each of which is a com-plete MAC chromosome as indicated by correctly oriented telo-mere repeats at either end. Four additional clusters also consist of a pair of scaffolds, but at least one of the four scaffold ends is notmarked by telomere repeats. The  Paramecium  genome appears tohave undergone remarkably few large-scale rearrangements sincetherecentWGD,whichisatanevolutionarydistance,intermsof synonymous substitution rates, roughly equivalent to that of thedivergence of rodents and primates from their common ancestor(Aury et al. 2006). Only six simple translocations (example, Fig.3E) and one reciprocal translocation were found, along with 76local inversions. This appears to be in striking contrast with themuch higher rate of rearrangements in other taxa, as illustratedby the comparison of mouse and human chromosomes (Water-ston et al. 2002) or by the analysis of the recent WGD in  Arabi-dopsis  (Blanc et al. 2003). Heterogeneity of MAC chromosome fragmentation Most of the sequence reads containing telomere repeats map, asexpected, to the ends of assembly scaffolds. However, additionalsites where telomere repeats have been mapped are locatedwithin assembly scaffolds. Interestingly, one of these sites, foundon scaffold 51 (Fig. 3E), is located in a region orthologous to alocus that has been extensively analyzed in  P. primaurelia : Char-acterization of both MAC and MIC DNA showed that there isheterogeneityinthefragmentationofMACchromosomesatthatlocus (Caron 1992; Le Mouël et al. 2003). The polymorphic MACchromosomes result from elimination of a 21-kb region, contain-ing an inactive copy of the Tc1/mariner-like  Tennessee  TE andminisatellite sequences, leading to chromosome fragmentationas well as variable internal deletions during MAC development.This suggests that most telomere repeats remapped to internalsites within assembly scaffolds result from heterogeneity gener-ated by the process of imprecise elimination of heterochromaticMIC DNA regions (as schematized in Fig. 1B).The comparison of paralogous scaffolds resulting from therecent WGD supports this interpretation. We found 14 caseswhere a single scaffold is paired with two or more scaffolds. Pos-sible explanations are (1) incomplete assembly, i.e., one chromo-some is covered by two or more scaffolds that were not joinedduring assembly because of sequencing gaps or (2) a DNA elimi-nationregionresponsibleforchromosomefragmentationhasap-peared or disappeared since the recent WGD or (3) heterogeneityof chromosome fragmentation. The example shown in Figure 3Bcan be explained by sequencing gaps between scaffolds 154, 59,and 173 that together constitute a single MAC chromosome,since only the extremities of scaffolds 154 and 173 have correctlyoriented, remapped telomere repeats indicative of MAC chromo-some ends. However, the cluster consisting of scaffolds 178, 163,and68(Fig.3C)clearlycorrespondstothethirdpossibility.Threeobservations support this interpretation. First, all three scaffoldsin this cluster appear to be complete MAC chromosomes capped Duret et al. 588 Genome Research www.genome.org  by telomeres. Second, many remapped telomere repeats are vis-ible at the center of scaffold 68, between the 5  part of the scaf-fold that aligns with scaffold 178 and the 3  part that aligns withscaffold 163. Third, we were able to find two BAC clones thatspan the gap between scaffolds 178 and 163, indicating the ex-istence of molecules corresponding to a large chromosome asexpected if the DNA elimination event can be resolved by inter-nal deletions. Hence, scaffold 68 probably represents three mol-ecules, a large chromosome equivalent to the assembled scaffold68, and two smaller versions, as wouldarise if DNA elimination led to bothchromosome fragmentation (the twosmaller chromosomes) and internal de-letions (the large chromosome). Threeother clusters present similar topology(clusters 5, 11, 35; http://paramecium.cgm.cnrs-gif.fr/tool/synteny).We conclude that scaffolds withtelomere repeats remapped to internalsites are very likely to represent the con-sensus assembly for a heterogeneous setofMACchromosomesgeneratedbyvari-able outcomes of MIC DNA elimination.We can unambiguously identify 10 suchsites (clusters 5, 11, 35, 42, 47) and ad-ditional sites may be present in clusters28, 33, 40, and 44. Since the reads pre-sent only 13  redundancy compared tothe 800 haploid copies of the genomethat are present in the MAC, this is aminimal estimate of the number of MICregions responsible for generating MACchromosome heterogeneity. G+C content and chromosome length MAC chromosomes are small and nu-merous. The 114 scaffolds that were vali-dated as being complete chromosomes(see Methods) vary greatly in size, from138 to 982 kb. These scaffolds also varyin G+C content, by  ∼ 10% (cf. Fig. 2B).Strikingly, the scaffolds that are com-plete chromosomes display a significantinverse correlation between size andG+C content (  R 2 = 0.67,  P   < 2  10  16 ;Fig. 4). Confirmation of this strong nega-tive correlation was obtained by exami-nation of the G+C content of noncodingDNA on the chromosomes (introns:  R 2 = 0.565,  P   < 2  10  16 ; intergenic re-gions:  R 2 = 0.418,  P   = 4.8  10  15 ) andof the third base of four codon aminoacids (  R 2 = 0.712,  P   < 2  10  16 ). Thecorrelation—not to mention the qualityof the assembly—is all the more impres-sive given the lower sequencing depth of A+T-rich sequences (cf. Fig. 2C). The cor-relation is weaker for the group of com-plete scaffolds from clusters with a trans-location, i.e., the group of chromosomesthat may have changed size recently ow-ing to large-scale rearrangements. Scaf-folds that are chromosome fragments show no correlation at all(Fig. 4).In order to explain the correlation, we looked for G+C varia-tion within chromosomes. The only systematic variation wecould detect was lower G+C content of the  ∼ 30 kb at the chro-mosome ends (Fig. 5). This effect cannot account for the G+Cvariation between chromosomes: A region of fixed size with ahigh, rather than a low, G+C content would be required to ex-plain the correlation. It is not surprising that subtelomeric se- Figure 3.  Pairs of chromosomes revealed by internal nucleotide comparison of the MAC genomeassembly. Examples of the clusters obtained from the internal nucleotide comparison (see Methods).The majority of clusters show pairs of complete MAC chromosomes as in  A  and  D . The cluster in  B  corresponds to two complete MAC chromosomes, but one of them consists of three scaffolds sepa-rated by two sequencing gaps. The cluster in  C   illustrates possible assemblies of three polymorphicchromosomes that cover a single region; scaffold 68 is a consensus of two shorter chromosomes andone long one, resulting from fragmentation or internal deletion upon DNA elimination, while scaffolds178 and 163 represent chromosomes created by fragmentation. The cluster in  E   shows a translocationthat has occurred since the recent WGD. Note that scaffold 51 contains two internal sites with re-mapped telomeric repeats. The leftmost site corresponds to the MIC elimination region that wassequenced in  P. primaurelia   (Le Mouël et al. 2003). Horizontal black lines, scaffolds; blue polygons,segments of >82% nucleotide identity; pink polygons, inverted segments of >82% nucleotide identity;vertical maroon lines are proportional in height to the number of remapped reads that contain telo-mere repeats (i.e., at least three repeats of CCC[CA]AA with no more than one mismatch), the repeatswere masked for the alignment of the reads against the assembly; turquoise boxes, remapped surfaceantigen genes. ( F  ) Dot plot internal comparison of the MAC genome assembly. The genome wasdivided into two arbitrary half genomes for the dot plot, using the drawings of chromosome clustersrelated by the recent WGD. The small staggered lines along the outside of each axis represent theindividual scaffolds. Sequence variability of   Paramecium  somatic DNA Genome Research 589 www.genome.org
Advertisement
MostRelated
View more
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks