KNIME

KNIME stands for the Konstanz Information Miner and is a visualization platform for creating and editing data evaluation pipelines and workflows using certain features called as Node Repository . It is an open-source tool for creating chemical workflows and was developed by Prof. Michael Berthold [18]. KNIME is downloadable from www.knime.org. CDK chemistry project was incorporated in KNIME and was written in Java. It can work in integration with chemoinformatics software. [Pg.455]

9 Integration of Automated Workflow in Chemoinformatics for Drug Discovery [Pg.456]

Orsifit Retails r T2 ifdniti Pilflle Rietim t Tnmti 2 prrvPFw [Pg.456]

Go to the Download KNIME option imder Getting Started tab [Pg.456]

Select KNIME Desktop and you can select one of the two options (with registration or without registration) [Pg.456]

CellProfiler and KNIME Open Source Tools for High Content Screening... [Pg.105]

Key words High content screening. Image processing. Statistics, Open Soiuce, CeUProfiler, KNIME,... [Pg.105]

KNIME provides a user-friendly interface to visually create workflows allowing a step-by-step data analysis flow. A node—as a single entity of such a workflow— provides a very confined analysis step with a set of parameter configurations. Workflows can branch at any point, which allows easily implementing multiple approaches. [Pg.111]

The second set of extensions is the Open Source scripting integration framework for KNIME (https //www.github.com/knime-mpicbg/knime-scripting/wiki). Even for software developers it is often a challenge to implement complicated statistic or data-mining... [Pg.111]

To read the exported CSV result files of CellProfiler into KNIME, we first capture the path of all files with a List Files node. The list of paths is then connected to an Iterate List of Files node to load the data into a KNIME workflow. The barcode, plate row, and plate column metadata contained in the CSV files are used to associate a plate layout file (either a CSV file or a Microsoft Excel file) to an experimental condition for each line (lines representing either objects or images). This association can be carried out either with a Joiner or a dedicated Join Layout node developed by us. We have generated tables containing over 10 million lines and hundreds of columns. KNIME is able to carry out computations... [Pg.114]

Lastly, we have developed two KNIME nodes to select parameters based on mutual information (24). The Parameter Mutual Information computes the mutual information matrix for all parameters. The Group Mutual Information computes the mutual information between two reference populations for a set of selected parameters. In this manner it is possible to select parameters and discover new phenotypes in a screen. [Pg.115]

In order to eliminate parameters that are correlated to each other, we calculate their Pearson correlation coefficients (25). Linearly uncorrelated parameters have Pearson correlation coefficients close to zero and likely describe different aspects of the phenotype under study (exception for non-linearly correlated parameters which cannot be scored using Pearson s coefficient). We have developed an R template in KNIME to calculate Pearson correlation coefficients between parameters. Redundant parameters that yield Pearson correlation coefficients above 0.4 are eliminated. It is important to visually inspect the structure of the data using scatter matrices. A Scatter Plot and a Scatter Matrix node from KNIME exist that allow color-coding the controls for ease of viewing. [Pg.117]

Lastly, it is desirable that parameters are able to discriminate between positive and negative conditions in a variety of experimental conditions. In other words they should be robust and reproducible. For this purpose, the Pearson correlation coefficient between all experimental repeats using control wells is calculated. Robust parameters have high Pearson correlation coefficients (above 0.7) in pairwise comparisons of experimental repeats. For this analysis we have developed another R template in KNIME to calculate the Pearson correlation coefficient between experimental runs. [Pg.117]

Lastly, it is important to visualize plates to obtain a graphical overview of the screen. To this end we have developed a Plate Viewer to create heatmaps and an R template in KNIME to generate scatter plots of screening campaigns. These tools allow to visualize row and column artifacts and to compare the performance of various plate batches during a screening campaign. [Pg.118]

Due to plate-to-plate variations from different days or runs a normalizing step is necessary to render the data comparable across entire screens. We have developed several KNIME nodes for popular normalization methods in HTS such as POC, normalized percentage inhibition (NPI), standard score (z-score), and 5-score (26). For all nodes, robust statistics, grouping, negative control, and parameters can be chosen. The method chosen for normalization is dependent on the screening results and the normality of the data. A fiill discussion on this issue is beyond the scope of this chapter and the reader is referred to excellent reviews (27, 28). [Pg.118]

Classification of hits into different phenotype classes is a debated issue in the field. Many clustering algorithms are implemented in KNIME and a discussion about the choice of algorithm to use goes beyond the scope of this chapter. However, a clustering approach... [Pg.118]

It is also paramount to visualize the images of the identified hits. The path information and file name information of the control images generated by CellProfiler can be used to this end. In KNIME all the images of the hits can be opened using the Picture Chooser node. In this manner, all the control images with the corresponding object outlines can be quickly visualized. [Pg.119]

Selected hits are rescreened and, for validated hits, a dose-response relationship is generated. We have developed a Dose Response node in KNIME to plot dose-response curves and to calculate IC50 values. [Pg.119]

Information of our chemical and siRNA libraries is stored in a PostgreSQL database. We annotate our validated hits using a DatabaseReader and a Joiner node to obtain either chemical structures for chemical screens or GenelD for RNAi screens. KNIME has many tools for cheminformatics, can visualize the molecule structures, and has tools to retrieve data from external public databases via Web queries to further annotate hits, allowing clustering either by chemical substructures or GO terms. [Pg.120]

Lastly, both CellProfiler and KNIME are easily extendable Open Source software that both have a highly active and fast-growing community since the release of their first versions in 2005 and 2006, respectively, a key for further development and implementation of... [Pg.120]

Data pipelining via enterprise server applications such as Pipeline Pilot (Accelrys/Scitegic), Inforsense, or the Talend (http //www.talend.com/) and KNIME (http //www.knime.org/) open source tools represent a powerful approach to integrate applications and develop custom functionality. As a specific application, Pipeline Pilot 7.0 includes a plate analytics collection for the development of complex plate-based data analysis and visualization protocols. [Pg.246]

The recent development of pipelining tools such as Pipeline Pilot, Orange and Knime (see Box 23.3) has allowed QSAR developers to automate many predictive procedures. These tools give the user facile access to databases, stractures and data analysis methods. They allow complex QSAR procedures to be predefined and subsequently executed by non-experts so bringing predictive QSAR models directly to the medicinal chemist s work bench. [Pg.493]

Workflow is the connection of sequential steps for data management and analysis in chemistry. There are several tools for creating a workflow or pipelines Aeeehys ipeline Pilot [11], IDBS Chemsense (Inforsense snite) [12], chemistry development kit (CDK) Tavema [13], KNIME [14] etc. [Pg.453]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...