Mastodon @Mastodon

nf-corePipeline release! nf-core/pairgenomealign v2.0.0 - nf-core/pairgenomealign v2.0.0 – Naga imo!Please see the changelog: <a href="https://github.com/nf-core/pairgenomealign/releases/tag/2.0.0" rel="nofollow noopener" translate="no" target="_blank">https://github.com/nf-core/pairgenomealign/releases/tag/2.0.0</a><a href="https://mstdn.science/tags/comparativegenomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#comparativegenomics</a> <a href="https://mstdn.science/tags/dotplot" class="mention hashtag" rel="nofollow noopener" target="_blank">#dotplot</a> <a href="https://mstdn.science/tags/genomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#genomics</a> <a href="https://mstdn.science/tags/last" class="mention hashtag" rel="nofollow noopener" target="_blank">#last</a> <a href="https://mstdn.science/tags/pairwisealignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#pairwisealignment</a> <a href="https://mstdn.science/tags/synteny" class="mention hashtag" rel="nofollow noopener" target="_blank">#synteny</a> <a href="https://mstdn.science/tags/wholegenomealignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#wholegenomealignment</a> <a href="https://mstdn.science/tags/nfcore" class="mention hashtag" rel="nofollow noopener" target="_blank">#nfcore</a> <a href="https://mstdn.science/tags/openscience" class="mention hashtag" rel="nofollow noopener" target="_blank">#openscience</a> <a href="https://mstdn.science/tags/nextflow" class="mention hashtag" rel="nofollow noopener" target="_blank">#nextflow</a> <a href="https://mstdn.science/tags/bioinformatics" class="mention hashtag" rel="nofollow noopener" target="_blank">#bioinformatics</a>

Jennifer LeonardInteresting opportunity! A permanent position as an Associate Professor in <a href="https://ecoevo.social/tags/Zoology" class="mention hashtag" rel="nofollow noopener" target="_blank">#Zoology</a> is available at the <a href="https://ecoevo.social/tags/NaturalHistoryMuseum" class="mention hashtag" rel="nofollow noopener" target="_blank">#NaturalHistoryMuseum</a> <a href="https://ecoevo.social/tags/University" class="mention hashtag" rel="nofollow noopener" target="_blank">#University</a> of Oslo. <a href="https://ecoevo.social/tags/MuseumJobs" class="mention hashtag" rel="nofollow noopener" target="_blank">#MuseumJobs</a> <a href="https://ecoevo.social/tags/EvolutionaryBiology" class="mention hashtag" rel="nofollow noopener" target="_blank">#EvolutionaryBiology</a>, <a href="https://ecoevo.social/tags/comparativegenomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#comparativegenomics</a> <a href="https://ecoevo.social/tags/Transcriptomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#Transcriptomics</a> <a href="https://ecoevo.social/tags/Systematics" class="mention hashtag" rel="nofollow noopener" target="_blank">#Systematics</a> <a href="https://ecoevo.social/tags/populationgenomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#populationgenomics</a> <a href="https://www.jobbnorge.no/en/available-jobs/job/268410/associate-professor-in-zoology" rel="nofollow noopener" translate="no" target="_blank">https://www.jobbnorge.no/en/available-jobs/job/268410/associate-professor-in-zoology</a>deadline February 23rd 2025

BRC AnalyticsPassing this along to our Galaxy and BRC communities: NCBI is holding a webinar for researchers in the eukaryotic pathogen and desease vector spaces on Feb 12. Use this link to get more information <a href="https://ow.ly/GAlW50UAuR9" rel="nofollow noopener" translate="no" target="_blank">https://ow.ly/GAlW50UAuR9</a><a href="https://mstdn.science/tags/NCBICGR" class="mention hashtag" rel="nofollow noopener" target="_blank">#NCBICGR</a> <a href="https://mstdn.science/tags/ComparativeGenomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#ComparativeGenomics</a>

nf-corePipeline release! nf-core/pairgenomealign v1.1.1 - nf-core/pairgenomealign v1.1.1 – Kani nabe!Please see the changelog: <a href="https://github.com/nf-core/pairgenomealign/releases/tag/1.1.1" rel="nofollow noopener" translate="no" target="_blank">https://github.com/nf-core/pairgenomealign/releases/tag/1.1.1</a><a href="https://mstdn.science/tags/comparativegenomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#comparativegenomics</a> <a href="https://mstdn.science/tags/dotplot" class="mention hashtag" rel="nofollow noopener" target="_blank">#dotplot</a> <a href="https://mstdn.science/tags/genomics" class="mention hashtag" rel="nofollow noopener" target="_blank">#genomics</a> <a href="https://mstdn.science/tags/last" class="mention hashtag" rel="nofollow noopener" target="_blank">#last</a> <a href="https://mstdn.science/tags/pairwisealignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#pairwisealignment</a> <a href="https://mstdn.science/tags/synteny" class="mention hashtag" rel="nofollow noopener" target="_blank">#synteny</a> <a href="https://mstdn.science/tags/wholegenomealignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#wholegenomealignment</a> <a href="https://mstdn.science/tags/nfcore" class="mention hashtag" rel="nofollow noopener" target="_blank">#nfcore</a> <a href="https://mstdn.science/tags/openscience" class="mention hashtag" rel="nofollow noopener" target="_blank">#openscience</a> <a href="https://mstdn.science/tags/nextflow" class="mention hashtag" rel="nofollow noopener" target="_blank">#nextflow</a> <a href="https://mstdn.science/tags/bioinformatics" class="mention hashtag" rel="nofollow noopener" target="_blank">#bioinformatics</a>

**Axel Visel** @axelvisel.bsky.social@bsky.brid.gy · Dec 17, 2024

Dec 17, 2024

Axel Visel @axelvisel.bsky.social@bsky.brid.gy

Microbial Genomics and Metagenomics (MGM) Workshop Learn all about IMG/M and other @jgi.doe.gov data systems for #microbiology #metagenomics #microbiome #bioinformatics #comparativegenomics at our 5-day workshop April 28-May 2, 2025 in Berkeley Registration is now open: mgm.jgi.doe.gov

Rekha Seshadri giving a talk to the participants of the 2024 Microbial Genomics & Metagenomics Workshops (MGM) workshop at the Joint Genome Institute in Berkeley, CA.

**Boas Pucker** @boas_pucker · Oct 2, 2024

Oct 2, 2024

Boas Pucker @boas_pucker

Excited to announce the discovery of a withanolide biosynthetic gene cluster! A huge achievement from our collaboration with the Franke Lab. Explore the findings here:

https://doi.org/10.1101/2024.09.27.614867

#ComparativeGenomics #Biochemistry
@tubraunschweig @PuckerLab @unihannover

Screenshot of the preprint "Phylogenomics and metabolic engineering reveal a conserved gene cluster in Solanaceae plants for withanolide biosynthesis" https://doi.org/10.1101/2024.09.27.614867

**Dong-Ha Oh** @inspirace@genomic.social · May 24, 2024

May 24, 2024

Dong-Ha Oh @inspirace@genomic.social

CGV paper is out. Alignments are added per request, and I guess the team would love to hear any user feedback. :) #comparativegenomics #NCBI
https://doi.org/10.1371/journal.pbio.3002405

doi.orgThe NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignmentsCommonly used genome browsers only show one genome assembly at a time and cannot show comparisons between multiple genomes. This study develops a new visualization tool called the Comparative Genome Viewer (CGV) that aids in the pairwise comparison of whole-genome eukaryotic assembly-assembly alignments.

**Joseph** @admin@josephguhlin.com · Mar 11, 2024

Mar 11, 2024

Joseph @admin@josephguhlin.com

What is SFASTA?

Genomic and bioinformatic-adjacent sequences (RNA, Protein, Peptides) are stored as FASTA files. Sequencing reads off a machine are stored as FASTQ files, adding a quality score associated with each nucleotide. Currently, these are non-human-readable plaintext files. As sequencing increases, we need to be able to process many more gigabytes and terabytes of files rapidly and with random access (currently solved by bgzip/tabix). becomes incredibly important.

SFASTA, my focus-on-random-access-speed FASTA/Q format replacement, has worked well for medium and large FASTA files, defining large as anything smaller than NT nucleotide database (~203Gb gzip-9 compressed, but likely larger whenever you are reading this). Small files did not benefit from stream compression and crazy indices, although the time cost for small files is irrelevant. But the conversion of nt to SFASTA took an inordinate amount of time, and reading the index into memory did as well. While still smaller and faster than gzip-9, this does not accomplish what I want.

Why?

FASTA files are frequently compressed with an outdated, slow, inefficient compression algorithm (gzip). Modern alternatives provide better compression ratios, decompression ratios, and faster throughput. The speed of reading FASTA files is quite important, with multiple tools that try to be the fastest. Clearly, this is an unsolved problem, and sticking to a text-based, non-human-readable format is a choice that only occurs due to the momentum of existing tools.

Genomics is moving to “Genomics at scale” and away from single-genome analyses. A flat file format adds unnecessary processing time to query hundreds of genomes instantly. For my own usage, I’d like to be able to query NT and fill up the GPU with random, on-the-fly examples. This is entirely achievable with modern computers but not with outdated file formats and compression.

What does SFASTA mean?

SFASTA previously stood “Snappy FASTA” as it used the Snappy algorithm, but now it uses ZSTD. The name remains as the command remains sfa, which can be typed with one hand on the home line on a standard keyboard.

Further Speed-ups

So, it’s clear my custom-built index was a failure. Enter B+-tree. While fighting post-COVID brain fog, I eventually managed to build a naive implementation. My benchmarks for creating a tree with 128 million key/value pairs threatened to take over 20 hours (for 20 samples, so 1 hour each). Hacking away at that, I did shrink it, but only some. Then, I modified a copy to use the sorted-vec crate. Finally, while reading further up on the topic, I discovered fractal trees, which merely add a buffer to each node and process it when calling for a flush or exceeding the buffer size. I am now within a minute of creating such a large index. For this implementation, the fractal tree uses sorted-vecs as the key vector.

For B+ trees and fractal trees, the order of the nodes (how many children each node can have) is incredibly important. For creating trees, an order of 32 seems to be the sweet spot (this is tested on u64 as both keys and values). For fractal trees, 64 with a larger buffer seems to be the sweet spot. The figure below shows the fastest order, 64, and buffer 128. The image below is for 1 million items.

Text is difficult to read, but the number is the order, and for fractal trees, the second number is the buffer size.

The Big Tree

My NT test dataset is a bit over 128 million entries, u64 range 0 to 128_369_206, with keys and values as the xxh3 hashed integer. You can see the spread below. Here the larger buffer size (up to 256) performs the best, but many are in the less than a minute sweet spot.

Searching the Tree

Now that building a tree for NT takes under a minute, compressing and queueing the nucleotide sequences and IDs into the file will be the bottleneck for creation. Building the tree is also a one-time cost, so it is not the highest priority. The focus now is searching the tree, which will happen quite frequently depending on the final use case.

I’m just now getting to start on this, but as you can see below, where input is the order of the nodes, a larger order decreases the time to find a key. This is an even better sign for the fractal trees, as they are more efficient with larger orders. The image below shows very little difference, with sorted vec having a bit of a slowdown. I have no idea why, possibly due to a line of code that did not change as I’m simultaneously playing around with three versions. As my fractal tree implementation uses sorted-vec, these results are quite equivalent. The search code is nearly identical. This is the next step.

Here, the x-axis is node order, with tests for 16, 32, 64, and 128.

What Hasn’t Worked

2bit/4bit nucleotide encodings – did not increase throughput or decrease on-disk size. Still worth further investigation.

Immediate Next Steps

As this is a write-once file format, at least at this stage, I plan to do the following:

Smaller struct for read-only mode, i.e., buffer is no longer needed
Benchmark sorted-vec against Eytzinger order
Load only parts of the tree from disk, have efficient serialization
Possibly try a bumpalo arena for querying the on-disk tree
Batch insertion – Maybe this was all for naught
Stream VBytes storage for keys/values of tree?

Ultimate Goals

LD_PRELOAD to work with existing tools
Python library
C API

I’ve been programming in Rust for a couple of years and have experimented with many different things, including the bevy game engine. I would still argue I’m a middling skilled Rust developer, as I’m also a population geneticist. Thus, some weeks are spent without writing a single line of code or only writing in Python for statistical analysis. Thus, I expect much room for improvement, although I’m proud of where I’ve gotten this so far.

Plots made with criterion.

https://josephguhlin.com/sfasta-fast-index-building/

#bioinformatics #comparativeGenomics #fileFormats

**Alliance of Genome Resources** @AllianceGenome@genomic.social · Oct 25, 2023

Oct 25, 2023

Alliance of Genome Resources @AllianceGenome@genomic.social

Exciting news!
The Rat Genome Database rgd.mcw.edu
is pleased to announce the release of an updated #Cardiovascular Disease Portal featuring data and tools to support cardiovascular disease research. To learn more, https://rgd.mcw.edu/wg/updated-cvd-portal/ #comparativegenomics

**Scott Cain** @scottcain@genomic.social · Jul 11, 2023

Jul 11, 2023

Scott Cain @scottcain@genomic.social

I'm wondering if anybody has tried (considered even) #tabix indexing #PAF files. These files are used for #ComparativeGenomics, so you'd need two indexes, one on the reference and one on the query. Conceptually it makes sense to me but I've not seen it done anywhere.

**Dimitris Kontopoulos** @DGKontopoulos@ecoevo.social · Apr 28, 2023 *

Apr 28, 2023 *

Dimitris Kontopoulos @DGKontopoulos@ecoevo.social

Delighted to share our new Science #EvolgenPaper! https://www.science.org/doi/10.1126/science.abn3107

We introduce #TOGA, a #ComparativeGenomics method that combines the detection of orthologous genes with gene annotation. In plain words, TOGA can take advantage of a well-annotated #genome and transfer its annotations to a genome of a different species (e.g., from the human genome to that of a squirrel). 1/2

This figure highlights how TOGA can integrate annotation and orthology inference based on intronic and intergenic alignments (among other things).

Using a high-quality genome as reference, TOGA can generate gene annotations, can identify orthologs/duplicated/lost genes, can produce codon alignments, as well as assembly quality benchmarks for genomes of query species.

Continued thread

**Chiara Bortoluzzi** @chiara_bortoluzzi@genomic.social · Apr 17, 2023

Apr 17, 2023

Chiara Bortoluzzi @chiara_bortoluzzi@genomic.social

We hope that the comparative genomics analyses made available through this study will provide a route towards the application of genomics-informed conservation programmes across the great diversity of invertebrate species. A big thank to all the amazing people from the Darwin Tree of Life Project that made this reserch possible! #biodiversity #conservation #comparativegenomics #PacBio #DarwinTreeofLife #Ensembl #emblebi #sangerinstitute #wellcometrust

**Dong-Ha Oh** @inspirace@genomic.social · Apr 6, 2023 *

Apr 6, 2023 *

Dong-Ha Oh @inspirace@genomic.social

A bit more detailed on how to navigate genome alignments on CGV or the Comparative Genome Viewer (and you can request to add alignments for any pairs of chromosome-level assemblies if they are on NCBI) (and they are not too far apart) (and with similar ploidy levels, for now) :) #comparativegenomics #NCBI
https://youtu.be/_TA86Tu1N0c

YouTubeAnalyze Evolutionary Relationships Between Two Genomes using NCBI's Comparative Genome Viewer (CGV)By National Library of Medicine

**Dong-Ha Oh** @inspirace@genomic.social · Feb 23, 2023 *

Feb 23, 2023 *

Dong-Ha Oh @inspirace@genomic.social

Comparative Genomics Viewer by #NCBI - the browsing experience is quite cool, and I guess more species pairs will be added (they are accepting requests) :) #comparativegenomics
https://ncbiinsights.ncbi.nlm.nih.gov/2023/02/22/cross-species-cgv/

NCBI InsightsNow Available! More Mammalian Cross-Species Alignments in the Comparative Genome Viewer (CGV) - NCBI InsightsIn response to your feedback, we’ve made more whole genome cross-species alignments available in NCBI’s Comparative Genome Viewer (CGV). You can use these alignments to explore genome rearrangements between species. You can also zoom in to analyze regions of conserved gene synteny. There are over 20 new cross-species alignments available, including human-mouse, mouse-rat, human-chimp, human-cattle, … Continue reading Now Available! More Mammalian Cross-Species Alignments in the Comparative Genome Viewer (CGV) →

**Anna Dewar** @AnnaDewar@ecoevo.social · Nov 21, 2022

Nov 21, 2022

Anna Dewar @AnnaDewar@ecoevo.social

#Introduction

I’m a Post-doc at the University of Oxford, working on the evolution of bacterial genomes and pangenomes. I’m interested in bacterial cooperation and ecology, and how these might interact with horizontal gene transfer.

I’m on the train to Manchester for the #MicroEvo22 meeting, where I’ll be presenting a poster (no. 17). Looking forward to seeing lots of you there!

#Bacteria #SocialEvolution #Plasmids

**Tanja Slotte** @tanjaslotte@fediscience.org · Nov 21, 2022

Nov 21, 2022

Tanja Slotte @tanjaslotte@fediscience.org

GERP scores are frequently used to classify and assess the prevalence of deleterious mutations, but what are the limitations of this approach and how sensitive is it to what species are included in the underlying alignment? I was wondering about this and found this nice paper from 2020 that I had missed at the time, and that investigates these issues using simulations.

Here's the paper by Huber, Kim and Lohmueller:

https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008827

journals.plos.orgPopulation genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolutionAuthor summary One of the most significant and challenging tasks in modern genomics is to assess the functional consequences of a particular nucleotide change in a genome. A common approach to address this challenge prioritizes sequences that share similar nucleotides across distantly related species, with the rationale that mutations at such positions were deleterious and removed from the population by purifying natural selection. Our manuscript shows that one popular measure of sequence conservation, the GERP score, performs well at identifying selected mutations if mutations at a site were under selection across all of mammalian evolution. Changes in selection at a given site dramatically reduces the power of GERP to detect selected mutations in humans. We also combine population genetic models with the distribution of GERP scores at noncoding sites across the human genome to show that the degree of selection at individual sites has changed throughout mammalian evolution. Importantly, we demonstrate that at least 80 Mb of noncoding sequence under purifying selection in humans will not have extreme GERP scores and will likely be missed by modern comparative genomic approaches. Our work argues that new approaches, potentially based on genetic variation within species, will be required to identify deleterious mutations.

#ComparativeGenomics #PopulationGenetics #SMBEfolks

**Elisabeth Richardson** @elisabeth@ecoevo.social · Nov 20, 2022

Nov 20, 2022

Elisabeth Richardson @elisabeth@ecoevo.social

My #Introduction: I'm Beth, a #NewPI and Assistant Professor of #Genetics at Mount Royal University in Calgary. I'm interested in #Bioinformatics, #Genomics, #CellBiology, #Microbiology, #eDNA, #MolecularEcology, #ComparativeGenomics, the #AthabascaOilSands and #Protists. Outside of work, I divide my time between video games, crafting, and being the least outdoorsy person within driving distance of the Canadian Rockies.

**John Lovell** @jlo_geno@genomic.social · Nov 9, 2022

Nov 9, 2022

John Lovell @jlo_geno@genomic.social

#rnaseq #comparativegenomics folks ...
So, you have two species, each with its own reference genome. You want to compare #transcript abundance between the species using with RNA-seq reads. What do you do?

67%Map to each, use only single-copy orthologs
17%Map to a single genome, compare all genes
17%Other (specify in comments)

**Mafalda Ferreira** @mafalda_f@ecoevo.social · Nov 7, 2022

Nov 7, 2022

Mafalda Ferreira @mafalda_f@ecoevo.social

Hey #ScienceMastodon,

Do you work on #Evolution #Adaptation #Speciation #Genomics #Phylogenetics #PopGen #ComparativeGenomics?

I WANT TO FOLLOW YOU!

Like this so that I can find you

**Dong-Ha Oh** @inspirace@genomic.social · Nov 4, 2022

Nov 4, 2022

Dong-Ha Oh @inspirace@genomic.social

#introduction I am a new Bioinformatics data wrangler and a contractor for NCBI, trying to help them to develop the Comparative Genomics Resources. Until earlier this year, I studied mostly genomes of "extremophyte" plants that thrive under harsh environments using #comparativegenomics approaches: e.g. to identify lineage(s)-specific modifications of gene copy numbers and gene regulatory networks among the "extremophytes" and their close relatives.

Recent searches

Search options

Administered by:

Server stats:

#comparativegenomics