Metabolites/chemicals are prioritized by their publication score, defined as
(1 + log10(nTC) + log10((Sum(Cit/yr)+1)/10)) * (1 + log10(nU/nT) + log10(nU/nC))
where nTC is the number of PubMed publication related to both the topic and the metabolites/chemical (TC), Cit/yr is the sum of the annualized number of citation of TC, nU is the number of PubMed publication, nT is the number of publication regarding the topic of interest, and nC is the number of publication regarding the metabolite/chemical of interest.
The number of PubMed publications and citations are retrieved by eSummary tools, and the platform is implemented in Java and R.
Kun-Hsing Yu, Tsung-Lu Michael Lee, Yu-Ju Chen, Christopher Ré, S. C. Kou, Jung-Hsien Chiang, Michael Snyder, Isaac S. Kohane. A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-driven Human Proteome Project. J Proteome Res. 2018 Aug 10. doi: 10.1021/acs.jproteome.8b00378. [Epub ahead of print]
A related protein prioritization tool is available athttp://rebrand.ly/proteinpurpose
K.-H. Y. is a Harvard Data Science Fellow. This work was supported in part by grants from National Human Genome Research Institute, National Institutes of Health, grant number 5P50HG007735, National Cancer Institute, National Institutes of Health, grant number 5U24CA160036, the Defense Advanced Research Projects Agency (DARPA) Simplifying Complexity in Scientific Discovery (SIMPLEX) grant number N66001-15-C-4043 and the Data-Driven Discovery of Models contract number FA8750-17-2-0095, and the Ministry of Science and Technology Research Grant, Taiwan, grant number MOST 103-2221-E-006-254-MY2.
The authors express their appreciation to Professor Griffin Weber for his insight on citation counts, Dr. Stephen Bach and Mr. Chen-Ruei Liu for their valuable advice on literature mining and suggestions on the manuscript, Dr. Mu-Hung Tsai for pointing out the literature mining resources, and Ms. Samantha Lemos for her administrative support. The authors thank the AWS Cloud Credits for Research, Microsoft Azure Research Award, and the NVIDIA Corporation for their supports on the computational infrastructure. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) Bridges Pylon at the Pittsburgh Supercomputing Center through allocation TG-BCS180016, which is supported by National Science Foundation grant number ACI-1548562.