Text-mining Methods

Text-mining methods employ algorithms that rrse similarity-based functions in order to obtain k nearest neighbors for novel query objects [32], Term weighting is performed to measrrre the importance of a term in representing the information contained in the docirment [33], For mining literature, the two most common approaches are ML-based and the rule-based approaches, though in practice a combination of approaches works best [34], [Pg.421]

A complete list with all the entries shorrld be created. A snippet of creating a dictionary is given below [Pg.422]

MapDictionary dictionary = new MapDictionary() dictionary.addEntry (new DictionaryEntry (token, label, CHUNK SCORE) [Pg.422]

The tokens and the labels are represented as feature vectors, n-dimensiorral vectors of numerical features. It is the statistical representation of the input text [40]. The dictiorrary file can be compiled to binary or hexadecimal formats. This makes it difficult to interpret the file without proper readers. Such compiled files facilitate faster tagging of text. For reading a dictionary and using it to tag text, here is a snippet of code [Pg.422]

MapDictionary dictionary = (MapDictionary) AbstractExternalizable.readObject (modelFile) [Pg.422]

Using text-mining methods, retrieve the dmgs associated with cancer... [Pg.445]

Text mining is the bread-and-butter method used by researchers on a daily basis [31, 32]. If you ask researchers what they really want from information management, you might be surprised how often they wish for a science Google to mine for data. As beautiful and simple as this paradigm sounds. [Pg.179]

One cannot expect text mining to produce accurate, final knowledge with no need for human review. However, the current state of the art can reduce the work of the human reader tremendously. The need is indisputable the number of patents and articles has doubled in the last 10 years, but our methods for dealing with the flood remain unchanged. [Pg.155]

A general concern in data mining is the representation of objects. Molecules, text documents, images, nucleic acid, or protein sequences all represent nonnumerical objects. However, all data mining methods require the transformation of objects into an algebraic, i.e., numerical, representation. [Pg.676]

A limiting factor is that text analytics methods are largely confined to specific compounds exemplified in the patents, which are only a small portion of the theoretically possible chemical stmctures represented in the Markush daim. The improved access to searchable databases of Markush stmctures and the development of sophisticated chemoinformatics tools to efficiently mine and enumerate the potentially billions of claimed chemical stmctures are the next logical steps toward capturing the vast chemical space contained in the patent corpus [127, 128]. [Pg.27]

The development of Data Mining methods for different types of data such as multimedia and text data... [Pg.94]

Liu M, Hu Y and Tang B (2013) Role of text mining in early identification of potential drug safety issues. Methods Mol Biol. 1159 227-51. [Pg.10]

Phytochemicals derived from eatable plants represent a remarkable source of bioactive compounds. In a recent study, Jensen et al. [41] performed a high-throughput analysis of phytochemicals in order to uncover associations between diet and health benefits using text mining and chemoinformatic methods. The first step of that study involved the extraction of associations between the terms of plants and phytochemicals, analyzing 21 million abstracts in PubMed/MEDLINE covering the period 1998-2012. This information was merged with the Chinese Natural Product Database and the Ayurveda dataset, which was also curated by the authors. The final dataset contained almost 37000 phytochemicals. A remarkable outcome... [Pg.106]

An approach based on clustering methods has the problem that the cluster prototype is as high dimensional as the original data set, and additional visualization methods are needed to visualize the data. While this is generally not a problem, it becomes important when the data set is large and highly dimensional, as is the case with text mining problems. [Pg.252]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...