Structurechecker - The Command-line Tool of Structure Checker
Contents
- Structurechecker command-line tool
- Options
- Usage
- Input
- Output
- Examples
- Structure checker user guide (GUI)
- List of available checkers
Structurecheck Command-line
Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. Structurecheck is the command-line tool of Structure Checker.
Options
StructureCheck 5.6, (C) 1999-2011 ChemAxon Ltd. Licenses of additionally used third party programs can be found in license.html Online version: http://www.chemaxon.com/marvin/license.html Molecule checker. Usage: structurecheck [input file(s)/string(s)] -c <config file/string> [options] General options: -m, --mode <operationmode> mode of the operation: fix or check default mode is check <operationmode> = [fix|check] check - only check is executed, does not modify molecules fix - fixes molecules containing structure errors whenever possible -x fix mode (deprecated, use --mode fix) Input options: -c, --config <filepath|string> action string configuration actions separated by "..", valid checker actions are: - 3d (detect atoms with 3D coordinates) - abbrevgroup (detect abbreviated groups) - abbrevgroup:expanded=true (detect expanded abbreviated groups) - abbrevgroup:contracted=true (detect contracted abbreviated groups) - alias (detect atoms with alias) - aromaticity (detect aromaticity errors) - aromaticity:type=[basic,loose,general] (detect aromaticity errors with the given aromatization type) - atommap (detect atoms with map number) - atomqueryproperty (detect atom query properties) - atomqueryproperty:H=[true,false]: X=[true,false]: D=[true,false]: R=[true,false]: h=[true,false]: r=[true,false]: a=[true,false]: s=[true,false]: u=[true,false]: rb=[true,false]: (detect hydrogen count/ connection count/explicit connection count/ring count/implicit hydrogen count/smallest ring count/aromaticity/ substitution count/unsaturation/ring bond count atom query properties) - atomvalue (detect atoms with atom value) - attacheddata (detect atoms with attached data) - bondangle (detect unpreferred bond angles in 2d) - bondlength (detect bonds that are too long or too short) - chiralflag (detect non-chiral molecules with chiral flag) - coordsystem (detect invalid coordination systems) - covalentcounterion (detect covalent counterions) - crosseddoublenond (detect crossed double bonds) - empty (detect items without atoms) - explicith (detect explicit hydrogens) - explicith:lonely=[true,false]: mapped[true,false]: charged=[true,false]: isotopic=[true,false]: radical=[true,false]: wedged=[true,false] (detect lonely/charged/mapped/isotopic radical/wedged explicit hydrogens) - explicitlp (detect explicit lone pairs) - isotope (detect isotopes) - metallocene (detect incorrect metallocene representations) - missingatommap (detect atoms without map numbers) - multicenter (detect multicenters) - multicomponent (detect molecules containing disconnected parts) - moleculecharge (detect non-neutral molecules) - ocr (detect structures that are probably not chemical structures but originated from other drawings, usually results of incorrect optical structure recognition) - overlappingAtoms (detect atoms that are too close to each other) (detect bonds that are too close to each other) - pseudoatom (detect pseudo atoms) - queryatom (detect query atoms) - querybond (detect query bonds) - racemate (detect asymmetric tetrahedral atoms without specific stereo configuration) - radical (detect radical atoms) - ratom:all=[true,false]: disconnected=[true,false]: generic=[true,false]: linker=[true,false]: nested=[true,false]: (detect all/disconnected/generic linker/nested R-atoms) - rare (detect rare elements) - reactionmap (detect reactions with invalid atom mapping) - rgroupattachmenterror (detect R-group attachment errors) - rgroupreferenceerror: missingratom=[true,false]: missingrgroup=[true,false]: selfreference=[true,false]: (detect missing R-atom/ missing R-group/invalid attachment/ self reference errors in R-group definitions) - ringstrainerror (detect small rings with trans or cumulative double bonds, or triple bond) - solvent (detect common solvents appearing by a main component) - staratom (detect star atoms) - substructure:reactionSmarts=[smarts] (detects if the give smarts structure could be found as a substructure of the original molecule) - unbalancedreaction (detect reactions with orphan atoms) - valence (detect valence errors) - valenceproperty (detect atoms with valence properties) - valenceproperty:defaultvalence=true (detect atoms with default valence properties) - valenceproperty:nondefaultvalence=true (detect atoms with default valence properties) - wedge (detect incorrect wedge bonds) - wigglydoublebond (detects non_stereo double bonds with wiggly representation) connected to a double bond) valid fixer actions are: - aliastoatom (remove aliases from atoms) - aliastogroup (convert atoms with aliases to abbreviated groups if the alias) is recognized) - aliastocarbon (remove alias values from atoms and convert the atom to a carbon) - clearabsstereo (remove the chiral flag) - clean (calculate 2D coordinates) - contractgroup (contract all abbreviated groups) - converttoelementalform (convert isotopes into elemental atoms) - converttoionicform (convert covalent counterions to ionic form) - converttometalloceneform (convert non-standard metallocene representations into coordinated multicenter representation - crosseddoublebond (convert non-stereo double bond represented by wiggly ligand) to crossed double bond representation) - crossedtowiggly (convert non-stereo double bond represented by crossed double bond) to wiggly ligand representation) - dearomatize (convert aromatic rings into Kekule form) - expandgroup (expand all abbreviated groups) representation if it is possible) - fixrgroupattachment (add missing attachments to members with single location) - fixvalence (correct valence problem by removing hydrogens or setting charges) - mapmolecule (add atom maps to each atom of the molecule) - mapreaction (add atom maps to the reaction) - neutralize (remove charges from the molecule) - pseudotogroup (convert pseudo atoms to abbreviated groups if pseudo label is a known abbreviated group) - removeexplicith (remove explicit hydrogens) - rearomatize (dearomatize the molecule and aromatize it again) - removealias (remove alias values from atoms) - removeatom (remove the problematic atoms from the molecule) - removeatommap (remove atom map numbers) - removeatomqueryproperty (remove atom query properties) - removeatomvalue (remove atom values) - removeattacheddata (remove data attached to atoms) - removebond (remove problematic bonds from the molecule) - removeradical (convert radicals to non_radical atoms) - removevalenceproperty (remove valence properties from atoms) - removezcoordinate (set the z-coordinates of atoms to zero) - ungroup (ungroup all abbreviated groups) - wedgeclean (recalculate the orientation of the wedge bonds in the molecule) Output options: -t --output-type <output type> output type (default: single) <output type> = [single|separated|accepted|discarded] single - both accepted and discarded structures are written to the <output path> separated - accepted structures are written to the <output path>, discarded structures are written to the <discarded path> accepted - only accepted structures are written to the <output path> discarded - only discarded structures are written to the <discarded path> -o, --output <output path> output file (default: standard output) -d, --discarded <discarded path> writes molecules with structure error to a separate file (default: standard output) -f, --format <format> output file format (default: smiles) -rf, --report-file <filepath> writes report to a file -rp, --report-property <propname> writes report to the property of the output, with the specified propery name -l, --log <filepath> writes software-error log messages to file -ocr, --discard-scan-errors discard incorrectly scanned molecules Examples: structurecheck -c config.xml -t separated -o out.smiles -d discarded.smiles in .smiles structurecheck -c config.xml -m fix -t separated -d discarded.smiles in.smiles structurecheck -c config.xml -m fix -t discarded in.sdf structurecheck -c "bondLength" in.sdf structurecheck -c "isotope->converttoelementalform" in.sdf structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
Usage
structurecheck -c <config file> -m [mode] [<options>] [input list]
- The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string .
- The optional parameter -m or --mode specifies the operation mode. The following operation modes are available:
- check (default): searches for errors;
- fix: fixes automatically fixable errors.
Note: When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).
Input
Structurecheck accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc. ). The input can be specified as:
- input file(s),
- input string(s), or
- SMILES (default).
Note: If neither the input file nor the input string is specified, the standard input (console) will be read.
Output
Structurecheck's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:
- single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)
- separated: valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which can not be fixed automatically).
- If --discarded parameter is omitted, molecules with invalid structures are written to standard output;
- If --output parameter is omitted, molecules with valid structures are written to standard output;
Note: The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.
accepted: only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case) discarded: only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)
The report of the structure check can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.
Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.
Usage examples
Below you can find the short descriptions of some examples.
-
structurecheck -c "metallocene"
Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);
-
structurecheck -c "bondLength" in.sdf
Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
-
structurecheck -c "isotope->converttoelementalform" in.sdf
Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
-
structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;
-
structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf
Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.
Note: The format of both outputs is SMILESas --format (-f) is not defined;
-
structurecheck -c config.xml -m fix -t separated -d discarded.sdf
Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures to discarded.sdf, and writes molecules with valid structures to the standard output (console);
-
structurecheck -c config.xml -m fix -t discarded in.sdf
Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures to discarded.sdf, and omits molecules with valid structures.
Go to top Copyright © 1999-2010 ChemAxon Ltd. All rights reserved.