Data preparation

My co-citation analysis proceeds in 3 steps, which will be described in the course of this chapter (Chen et aL 2010] data collection, data preparation (construction and visualization of matiiz], identification and interpretation of clusters. [Pg.16]

In the next phase, the co-dtation matrix needed to be constructed and visualized. For the construction of the matrix, references were either extracted automatically (ISl) or manually (EBSCO) for each publication included in the analysis. As the entries in die reference lists included different citation styles or misspellings, all references needed to be normalized. I then excluded those references which were not part of the dataset, i.e. those which were not included in the 111 papers. By this procedure many references were excluded from the analysis, but the matrix could be constructed more effidendy. The focus of the citation analysis was rather on the connection of papers within the field of user innovation than on the epistemological foundations thereof (cf. Raasch et al. forthcoming). Thus it was sufficient to include only papers that focus on the topic of user innovation, and not its intellectual ancestry. In the next step I constructed the citation matrix. This 111x111 matrix included all articles and indicated which other publications from the pool were cited. Publications that were not died at all and did not co-dte others were deleted from the matrix, as they were obsolete for the analysis. This yielded a list of 100 publications. In the last step the co-citation matrix was built I exduded papers that were not co-cited (46 publications) from the analysis. The doseness measure used in this study (CoCit score) was calculated. The CoCit score of two documents (A and B) ranges from 0 to 1 and can be calculated as follows (Gmiir 2003) [Pg.16]

The final, symmetric matrix included 54 publications with corresponding CoCit scores. Appendix 1 gives an overview of the papers included in the matrix. [Pg.17]

Data Mining is the core of the more comprehensive process of knowledge dis-coveiy in data bases (KDD). However, the term data mining" is often used synonymously with KDD. KDD describes the process of extracting and storing data and also includes methods for data preparation such as data cleaning, data selection, and data transformation as well as evaluation, presentation, and visualization of the results after the data mining process. [Pg.472]

Figure 10.4-3. Work flow for virtual screening, from data preparation to finding new leads.

Preparation of the data. Preparation of the initial coordinates (adding hydrogen atoms, minimization) and assignment of initial velocities. [Pg.51]

Figure 9-31 A. Comparison of HETP for No. 2 Nutter Rings and Pall rings in a system at 24 psia and 5 psia using the FRI tubed drip pan distributor. Data prepared and used by permission of Nutter Engineering, Harsco Corp. and by special permission of Fractionation Research, Inc. all rights reserved.

Using these data, prepare these plots and determine which one is a straight line. [Pg.297]

This chapter describes the key clinical data preparation issues and the different classes of clinical data found in clinical trials. Each class of data brings with it a different set of challenges and special handling issues. Sample case report form (CRF) pages are provided with each type of data to aid you in visualizing what the data look like. The key data preparation issues presented are concepts that apply universally across the various classes of clinical trial data. [Pg.20]

The purpose of an Exposure Route and Receptor Analysis is to provide methods for estimating individual and population exposure. The results of this step combined with the output of the fate models serve as primary input to the exposure estimation step. Unlike the other analytic steps, the data prepared in this step are not necessarily pollutant-specific. The two discrete components of this analysis are (1) selection of algorithms for estimating individual intake levels of pollutants for each exposure pathway and (2) determination of the regional distribution of study area receptor populations and the temporal factors and behavioral patterns influencing this distribution. [Pg.292]

The above two objectives, data examination and preparation, are the primary focus of this section. For data examination, two major techniques are presented the scattergram and Bartlett s test. Likewise, for data preparation (with the issues of rounding and outliers having been addressed in a previous chapter) two techniques are presented randomization (including a test for randomness in a sample of data) and transformation. Exploratory data analysis (EDA) is presented and briefly reviewed later. This is a broad collection of techniques and approaches to probe data, that is, to both examine and to perform some initial, flexible analysis of the data. [Pg.900]

To use the ECES system, activity coefficient data for FeCl2 had to be developed. A recent paper by Susarev et al (15) presented experimental results of the vapor pressure of water over ferrous chloride solutions for temperatures from 25 to 100°C and concentrations of 1 to 4.84 molal. This data was entered into the ECES system in the Data Preparation Block with a routine VAPOR designed to regress such data and develop the interaction coefficients B, C, D of our model. These results replaced an earlier entry which was based on more limited data. All other data for studying the equilibria in the FeCl2-HCl-H20 system was already contained within the ECES system. [Pg.242]

The schemes described are useful for collecting experimental data, preparing crucial experimental work and giving some predictions , at least for selected groups of... [Pg.250]

Another important aspect of data preparation for PCA is scaling. The PCA results will change if the original (mean-centered) data are taken or if the data were, for instance, autoscaled first. Figure 3.7 (left) shows mean-centered data... [Pg.79]

Brodzinsky R, Singh HB. 1982. Volatile organic chemicals in the Environmental Sciences Research Lab atmosphere An assessment of available data. Prepared under Contract No. 68-02-3452 for U.S. Environmental Protection Agency, ESRL, ORD, Research Triangle Park, NC, 16-18. [Pg.205]

The following sections describe the methodology for each of the three major phases of the study described above, beginning with Issues Analysis , proceeding to aggregation and data preparation, and culminating with a discussion of the sequence comparison analyses. [Pg.91]

Data validation is very labour-intensive and expensive and is, at best, a coarse screen. A positive solution to the problem is on-line data capture, in which case both the data preparation stage and the need to recheck calculations are obviated. However, interaction by the analyst is stiU advised so that the data are vahdated prior to acceptance. [Pg.78]

Returning to the mainstream discussion of data preparation, we note that, for a 6-dB-per-octave-rolloff RC filter network in a lock-in amplifier, the continuous scan rate amounts to approximately one time constant per data point or 10 time constants per resolution element (Blass, 1976a). Some time is saved if only six data points are taken per resolution element. We have tried acquiring in this fashion, with no visible negative effects. [Pg.180]

Data preparation and quality control is a key step in applying Free-Wilson methodology to model biological data. Care must be taken to make sure the underlying data complies with F-W additive assumption. [Pg.107]

Kessler asserted, FDA would be forced to approve new drugs using summaries of safety data prepared by drug companies. Untrue. The bill would have allowed FDA experts to depend on condensed, tabulated, or summarized data (when considered adequate) rather than reviewing the voluminous raw data from clinical trials, often running to hundreds of thousands of pages. In all cases, agency reviewers would have had access to additional materials as well, and could have obtained them by a simple request from an FDA supervisory official. [Pg.75]

Data preparation began by excluding any possibly unreliable and irrelevant data from the set of 33 elements. The heavy mineral solution could have imparted excess sodium (Na) and was thus ignored. The mortar and pestle used could have contaminated the aluminum (Al). Based on the hardness of the material and relative contribution to the sample this is likely not a significant problem however the role of Al was monitored closely during the data analysis. Previous literature on the geochemistry of specularite (10, 11) and preliminary... [Pg.467]

Data preparation, specular hematite source fingerprinting by INAA, 467-468... [Pg.560]

Fixed cost effects are included in most production network design models but scale and scope effects related to variable costs and learning curve effects lead to concave cost functions (cf. Cohen and Moon 1990, p. 274). While these can be converted into piecewise linear cost functions, model complexity increases significantly both from a data preparation perspective (see Anderson (1995) for an approach to measure the impact on manufacturing overhead costs) and the mathematical solution process. Hence, most production network design models assume linear cost functions ignoring scale and scope effects related to variable costs. [Pg.77]

Optimization models are capable of evaluating all possible plant-site combinations simultaneously. However, this approach would considerably increase the data preparation efforts required since for each plant-site combination both investment and operating expenditures would have to be estimated. Additionally, calculation times increase significantly with the number of alternative investment opportunities. [Pg.176]

The final pre-processing step is data preparation for the evaluation. The particular operations are reliant on the chosen evaluation algorithm. Generally, a baseline... [Pg.165]

Typesetting Data prepared by SPi using a Springer LyjljiX macro package... [Pg.449]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...