HQ Team
August 9, 2023: UK researchers have unveiled a compilation of understudied proteins public database, called “unknome,” that are found in the human genome.
Though the existence of these thousands of proteins is known their functions are mostly not.
The human genome sequencing has revealed that the genes encode about 20,000 proteins, whose identities and functions are still uncharacterized.
Scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected.
The database addresses this, and the researchers have developed a customizable “unknome database” that ranks proteins based on how little is known about them.
Research funds scarcity
Scientists tend to ignore the existence of these proteins for reasons including the tendency to focus scarce research funds on already-known targets.
Other factors include the lack of tools, including antibodies, to interrogate cells about the function of these proteins.
The risks of turning a blind eye to these proteins are significant, the authors of the study stated.
“It was likely that some, perhaps many, play important roles in critical cell processes, and may both provide insight and targets for therapeutic intervention,” they wrote in Plos Biology journal.
“Unknome,” is the work of Matthew Freeman of the Dunn School of Pathology, University of Oxford, England, and Sean Munro of MRC Laboratory of Molecular Biology in Cambridge, England, and colleagues.
Model organisms
“The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood,” Mr Munro said.
“To help address this we created a Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”
Proteins from model organisms were included, along with those from the human genome.
The database is open and customizable, allowing the user to provide their own weights to different elements, thereby generating their own set of knownness scores to prioritize their own research.
To promote more rapid exploration of disregarded proteins, the authors created the unknome database, that assigns to every protein a “knownness” score.
Near-zero knownness
The score reflects the information in the scientific literature about function, conservation across species, subcellular compartmentalization, and other elements.
Based on this system, there are many thousands of proteins whose knownness is near-zero.
To test the utility of the database, the authors chose 260 genes in humans for which there were comparable genes in flies, and which had knownness scores of one or less in both species, indicating that almost nothing was known about them.
“Despite decades of detailed study, there are thousands of fly genes that remain to be understood at even the most basic level, and the same is clearly true for the human genome.
“These uncharacterized genes have not deserved their neglect,” Munro said.
“Our database provides a powerful, versatile, and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.”