Data Mining Survey

Data mining methods are widely available and can often be highly sophisticated algorithms that use advanced techniques from computer science and artificial intelligence. However, simple and intuitive methods can often work well, without much loss in predictive ability. With small datasets, where the focus is developing interpretable models, these simple methods may be the best first approach, perhaps as part of a conscious elfort to explore the data. In any case it is useful to have some benchmark result against which the performance of more complex, computationally expensive and difficult to interpret methods can be compared. The simplest naive model is prediction by the mean of the dataset, which is in elfect prediction without using a model, and this can serve as a useful reality check and comparator. [Pg.271]

Simple rule induction methods can also perform well compared to more sophisticated examples, particularly if the descriptors or attributes are well chosen. A rule is simple to express and interpretable, for example [Pg.271]

These simple rule induction and probability methods are straightforward to use and understand, and can often outperform more sophisticated approaches. They are sensitive to being skewed by redundant variables however, as well as by dependencies between attributes and the non-normal distributions of attribute values. These problems can be overcome by the careful selection of a sub-set of attributes and the use of other estimation methods for a more appropriate distribution for each attribute, although this is probably not necessary for most analyses. [Pg.272]

Kernel methods, which include support vector machines and Gaussian processes, transform the data into a higher dimensional space, where it is possible to construct one or more hyperplanes for separation of classes or regression. These methods are more mathematically rigorous than neural networks and have in recent years been widely used in QSAR modeling. [Pg.273]

Gaussian process modelling works by a similar functional transformation of the data and has also begun to be deployed by QSAR practitioners. It is thought to be particularly good at building models with little expert supervision, because it allows statistically sound alternatives to both model discovery and validation, therefore is well suited to automated QSAR model discovery and updating. [Pg.273]

Matthias Schonlau is Head of the RAND Statistical Consulting Service. His research interests include computer experiments, data mining, web surveys, and data visualization. [Pg.342]

Berkhin, P. (2002). Survey of clustering data mining techniques. Available http //www.stats.ox.ac. uk/ mercer/documents/Transfer.pdf, accessed March 6, 2008. [Pg.124]

Eerteua de OUveira, M. C., and Levkowitz, H. (2003). Etom visual data exploration to visual data mining A survey. IEEE Trans. Visualizat. Comput. Graphics, 9(3) 378-394. [Pg.183]

Overall, however, the major application area for Data Mining is still Business. For example a recent survey of Data Mining software tools (Fig. 2) showed that over three-quarters (80%) are used in business applications, primarily in areas such as finance, insurance, marketing and market segmentation. Around half of the vendor tools surveyed were suited to Data Mining in medicine and industrial applications, whilst a significant number are most useful in scientific and engineering fields. [Pg.82]

In this overview section, both traditional statistical methods and the more recent machine-learning methods are briefly surveyed. Excluded here is only the main tool for data analysis and data mining of catalytic materials, i.e., the application of artificial neural networks, to which two of the remaining chapters will be devoted. [Pg.62]

The importance of a coal deposit depends on the amount that is economically recoverable by conventional mining techniques. The world total recoverable reserves of lignitic coals were 3.28 x 10 metric tons at the end of 1990 (3), of which ca 47% was economically recoverable as of 1994 (Table 4). These estimates of reserves change as geological survey data improve and as the resources are developed. [Pg.153]

Table II summarizes analytical data for dissolved inorganic matter in a number of natural water sources (J3, 9, J 9, 20, 21). Because of the interaction of rainwater with soil and surface minerals, waters in lakes, rivers and shallow wells (<50m) are quite different and vary considerably from one location to another. Nevertheless, the table gives a useful picture of how the composition of natural water changes in the sequence rain ->- surface water deep bedrock water in a granitic environment. Changes with depth may be considerable as illustrated by the Stripa mine studies (22) and other recent surveys (23). Typical changes are an increase in pH and decrease in total carbonate (coupled), a decrease in 02 and Eh (coupled), and an increase in dissolved inorganic constituents. The total salt concentration can vary by a factor of 10-100 with depth in the same borehole as a consequence of the presence of strata with relict sea water. Pockets with such water seem to be common in Scandinavian granite at >100 m depth.

The country-wide dataset of stream sediment analyses in Austria consists of 36,136 samples analyzed for 34 chemical elements (Fig. 1), (Thalmann et al. 1989). Complemented by local surveys of hydrochemistry, whole rock geochemistry, soil chemistry and mineralogical phase analyses, these data are used to derive natural background levels of different rock units, investigate chemical fluxes between soil, rock and groundwater, and evaluate the emission risks of historical mine waste. [Pg.417]

Quirt, D.FI. 1985. Lithogeochemistry of the Athabasca Group Summary of sandstone data. In Summary of Investigations 1985 Saskatchewan Geological Survey, Saskatchewan Energy and Mines, Miscellaneous Report 85-4, 128-132. [Pg.443]

According to a literature survey conducted by Shahalam [28], the contents of various chemicals found in the natural mined phosphate rocks vary widely, depending on location, as shown in Table 1. For instance, the mineralogical and chemical analyses of low-grade hard phosphate from the different mined beds of phosphate rock in the Rusaifa area of Jordan indicate that the phosphates are of three main types carbonate, siliceous, and silicate-carbonate. Phosphate deposits in this area exist in four distinct layers, of which the two deepest - first and second (the thickness of bed is about 3 and 3.5 m, respectively, and depth varies from about 20 to 30 m) - appear to be suitable for a currently cost-effective mining operation. A summary of the data from chemical analyses of the ores is shown in Table 2 [28]. [Pg.400]

Hemingway, B. S., Seal, R. R. Ming Chou, I. 2002. Thermodynamic Data for Modeling Acid Mine Drainage Problems Compilation and Estimation of data for Selected Soluble Iron-Sulfate Minerals. United States Geological Survey Open File Report 02-161. [Pg.513]

Very little data have been reported on the analysis of elements in whole coal and mine dusts in particular. Kessler, Sharkey, and Friedel analyzed trace elements in coal from mines in 10 coal seams located in Pennsylvania, West Virginia, Virginia, Colorado, and Utah (5). Sixty-four elements ranging in concentration from 0.01 to 41,000 ppm wt were determined. Several surveys published previously have provided data on the concentration of minor elements in ashes from coals rather than a direct determination on the whole coals or mine dusts. Previous investigations include studies by Headlee and Hunter (6), Nunn, Lovell, and Wright (7), Abernethy, Peterson, and Gibson (8), and others (9, 10, 11,12). [Pg.57]

According to Cook (Ref 3), the first field trial of Al-sensitized slurries was made in Canada, and the first commercial slurry, Hydromex, was marketed in 1958 by Canadian Industries Ltd (CIL). Shortly thereafter slurries were successfully tried on Mesabi taconite ores. By the end of I960, CIL, IRECO, Dupont and Hercules were all marketing slurry expls. Since then the commercial use of SE and SBA has experienced a continuous and rapid growth. This is shown in Table 1 with data taken from the Mineral Industrial Surveys of the US Bureau of Mines. [Pg.349]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...