Help
This web tool allows users to predict the biopharmaceutical or toxicological properties of chemical compounds.
The predicted properties are:
- Aqueous Solubility (logS): A regression model predicting the aqueous solubility (logS) of a molecule based on supervised recursive random forest techniques. It is built from a curated public dataset of aqueous solubility (12,674 molecules) [1].
- Caco-2 Permeability (logPapp_Caco2): A regression model predicting the logarithm of the apparent permeability (logPapp) in Caco-2 cells of a molecule based on supervised recursive random forest techniques. It is based on a curated public dataset of logPapp (4,913 molecules) [2].
- Oral Bioavailability (HOB_Class): A binary classification model predicting the Human Oral Bioavailability Class of molecules. Molecules with high bioavailability (≥ 50%) are classified as label 1, while those with moderate to low bioavailability are categorized as label 0. The model was trained using a publicly curated dataset of compounds with known hERG inhibition data (1,159 molecules) [3].
- hERG Inhibition (hERG_Class): A binary classification model predicting the hERG inhibition class of molecules. Molecules with a high risk of inhibiting the hERG potassium channel (associated with cardiac toxicity) are classified as label 1, while those with a low risk are categorized as label 0. The model was trained on a publicly curated dataset of compounds with known hERG inhibition data (291,219 molecules) (article under review in Nature Scientific Reports).
Input Instructions
- Enter a chemical structure using the Sketcher tool (Submit from sketcher) or upload a CSV file with a single column of SMILES strings (Submit CSV). If the file is incorrectly formatted or contains over 100 molecules, the following message will appear: "Invalid CSV format detected. Please review the file structure and resubmit".
- Click the "See details" button to visualize the predicted properties for each molecule.
- Click the "Export CSV" button to download the results file.
Molecule Preparation
Molecules go through a cleaning process before the descriptor calculation and the prediction process. The cleaning of molecules includes keeping the largest fragment for disconnected molecules, removing the stereo features, and generating the canonical SMILES.
Interpreting Results
- The tool processes the input molecules and provides the predicted properties, along with the prediction confidence and the Euclidean distance of each molecule from the training set used for model training. For quantitative outputs such as logS (aqueous solubility) and logPapp_Caco2 (permeability), confidence is represented by the standard deviation of the predicted value. For binary outputs like HOB (oral bioavailability class) and hERG inhibition (cardiac toxicity risk), confidence is given as the class probability, indicating the likelihood that the molecule belongs to a specific class.
- Predictions are estimates based on computational models and should be verified with experimental data.
- Use the results to prioritize compounds for further testing.
- Incorrect SMILES strings are represented in the results as "C" and property values of "-999" are used to indicate that a prediction could not be made.
See Related Articles
- ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches
- Reliable Prediction of Caco-2 Permeability by Supervised Recursive Machine Learning Approaches
- ADME Prediction with KNIME: Development and Validation of a Publicly Available Workflow for the Prediction of Human Oral Bioavailability