Non-canonical DNA and sequencing challenges in bird genomes
- crwdns4820:0crwdne4820:0
- crwdns4822:0crwdne4822:0
- bioRxiv
- DOI
- 10.1101/2025.10.17.683159
Non-canonical (non-B) DNA motifs are sequences that can fold into structures (e.g., G-quadruplexes and Z-DNA) distinct from the canonical right-handed helix. In mammals, these structures regulate gene expression, act as mutation hotspots, and are associated with cancer, yet they remain undercharacterized in other species. Because non-B DNA motifs are difficult to sequence, many are absent from incomplete genome assemblies, limiting functional analyses. Here, we present the first comprehensive analysis of non-B DNA motifs in birds, using the telomere-to-telomere genome of zebra finch, the near-complete chicken genome, and high-quality genomes of six additional bird species. We show that, first, unlike in mammals, the non-B DNA landscape in birds differs markedly among chromosome groups: gene-rich and extremely small dot chromosomes show the highest coverage (15.1-30.1% in zebra finch), microchromosomes—intermediate coverage (6.4-18.1%), and macrochromosomes—the lowest (5.9-6.9%). Non-B DNA motif coverage on dot chromosomes negatively correlates with PacBio sequencing depth, potentially explaining their assembly challenges. Second, similar to mammals, in zebra finch, G-quadruplexes are enriched at promoters and 5′UTRs, implying regulatory roles. We experimentally validated four common G-quadruplexes and predicted others using long-read methylation data. Overall, non-B DNA distribution reflects distinct features of avian genome architecture, suggests a role in gene regulation, and informs strategies for complete bird genome sequencing.