Data mining rules

Figure 2.1 The evolution of the drug-likeness concept. Drug-likeness evolved from empirical rules such as Lipinski s rule of 5 through more sophisticated data mining algorithms into utilization of preclinical profiling and safety pharmacology data [3].

Because rules can be extracted in such a straightforward fashion, neurofuzzy computing is rapidly becoming accepted as a useful data mining technique. Even with relatively small amounts of data, it can give useful information that can be used to suggest which direction future experiments should take. [Pg.2405]

Furthermore, there are laws both in the United States and Europe which regulate data privacy, and in addition the FDA has separate rules on data integrity and traceability. All of these issues will have an impact on the way data are collected, data mined and how these results are used. [Pg.554]

Additionally, as errors can easily occur in databases, it cannot be assumed that the data they contain are entirely correct. Even after data cleaning - a process to remove obvious errors and duplicates - there may be inherent errors or mis-classification in the data being collected, particularly if there is subjectivity involved in the measurement that is used. Furthermore, in large, constantly changing databases, there must be rules in place for the data mining algorithm to capture the most current data. [Pg.554]

Data mining of glycan array data to develop rules/classifiers that govern HA-glycan interactions... [Pg.273]

The rules to predict AlP04-5 (AFI) synthesis are preliminarily built by data mining. The results associate with six attributes, involving the longest atomic distance > 0.496 nm, the secondary distance < 0.765 nm, the ratio of the number of protons acceptable by the template to the number of N atoms < 8, and the formation enthalpy <421.41 kJ/mol. The reliability of the constraint is 178/190 = 93.7%, and the support-ability of that is 190/549 = 34.6%. [Pg.432]

Kretschmann, E., W. Fleischmann, and R. Apweiler. 2001. Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17 920-6. [Pg.283]

Different data cleansing, normalization, and curation rules may be developed and enforced to the generic data repository and the specific data repositories. Derived data, such as data-mining outcomes, query results or search results that are frequently requested and can be written back into the specific data repositories. This is to enable the derived data to be used as input for processing or to be used in lieu of materialized query tables in the future, and also to enable the derived data to be shared among other systems in a specific region or in a specific domain. [Pg.364]

There is more about data mining in Chapter 9, but there is an important reason for bringing it up here. When we focus on prediction such as the chance of getting Alzheimer s and congestive heart failure, the mind/concept maps or semantic nets expressed in similar information theoretic terms reduce to the same inference process as described above. This would be clearer to the statistically minded if a simple small table could show all rules, triples, and so on, especially as the technique becomes more complicated. The data could also be probabilities (in which case values are multiplied, not added), which would then bring us very close to an alternative technique called a Bayes s net or Bayesian net, after the bishop who published his ideas in Philosophical Transactions back in 1763. [Pg.374]

The modern way to get rules into an Expert System is not by human experts, but by data mining. [Pg.571]

Selection of the data in the target database. The data that are stored in the primary source databases are often collected by different users using different automated methods and business rules. As a consequence, the quality of the data is not the same for all the records in the database and data will be contaminated. Depending on the goal of the data mining process and method, the data in the target database should be cleaned first and obvious inconsistencies between data points should be resolved. [Pg.672]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...