Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Text Mining for Scientific Information

Automatic text datamining is an important source of knowledge, with many applications in generating databases from scientific literature, such as protein-disease associations, gene expression patterns, subcellular localization, and protein-protein interactions. [Pg.384]

The NLProt system developed by Mika and Rost combines four support vector machines, trained individually for distinct tasks.The first SVM is trained to recognize protein names, whereas the second learns the environment in which a protein name appears. The third SVM is trained on both protein names and their environments. The output of these three SVMs and a score from a protein dictionary are fed into the fourth SVM, which provides as output the protein whose name was identified in the text. A dictionary of protein names was generated from SwissProt and TrEMBL, whereas the Merriam-Webster Dictionary was used as a source of common words. Other terms were added to the dictionary, such as medical terms, species names, and tissue [Pg.384]


See other pages where Text Mining for Scientific Information is mentioned: [Pg.371]    [Pg.384]   


SEARCH



Text mining

© 2024 chempedia.info