Structurechecker - The Command-line Tool of Structure Checker
Contents
- Structurechecker command-line tool
- Options
- Usage
- Input
- Output
- Examples
- Structure checker user guide (GUI)
- List of available checkers
Structurecheck Command-line
Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. Structurecheck is the command-line tool of Structure Checker.
Options
StructureCheck 5.6, (C) 1999-2011 ChemAxon Ltd.
Licenses of additionally used third party programs can be found in license.html
Online version: http://www.chemaxon.com/marvin/license.html
Molecule checker.
Usage:
structurecheck [input file(s)/string(s)] -c <config file/string> [options]
General options:
-m, --mode <operationmode> mode of the operation: fix or check
default mode is check
<operationmode> = [fix|check]
check - only check is executed, does not modify molecules
fix - fixes molecules containing structure errors whenever possible
-x fix mode (deprecated, use --mode fix)
Input options:
-c, --config <filepath|string> action string configuration
actions separated by "..",
valid checker actions are:
- 3d
(detect atoms with 3D coordinates)
- abbrevgroup
(detect abbreviated groups)
- abbrevgroup:expanded=true
(detect expanded abbreviated groups)
- abbrevgroup:contracted=true
(detect contracted abbreviated groups)
- alias
(detect atoms with alias)
- aromaticity
(detect aromaticity errors)
- aromaticity:type=[basic,loose,general]
(detect aromaticity errors
with the given aromatization type)
- atommap
(detect atoms with map number)
- atomqueryproperty
(detect atom query properties)
- atomqueryproperty:H=[true,false]:
X=[true,false]:
D=[true,false]:
R=[true,false]:
h=[true,false]:
r=[true,false]:
a=[true,false]:
s=[true,false]:
u=[true,false]:
rb=[true,false]:
(detect hydrogen count/
connection count/explicit connection
count/ring count/implicit hydrogen
count/smallest ring count/aromaticity/
substitution count/unsaturation/ring
bond count atom query properties)
- atomvalue
(detect atoms with atom value)
- attacheddata
(detect atoms with attached data)
- bondangle
(detect unpreferred bond angles
in 2d)
- bondlength
(detect bonds that are too long
or too short)
- chiralflag
(detect non-chiral molecules with
chiral flag)
- coordsystem
(detect invalid coordination systems)
- covalentcounterion
(detect covalent counterions)
- crosseddoublenond
(detect crossed double bonds)
- empty
(detect items without atoms)
- explicith
(detect explicit hydrogens)
- explicith:lonely=[true,false]:
mapped[true,false]:
charged=[true,false]:
isotopic=[true,false]:
radical=[true,false]:
wedged=[true,false]
(detect lonely/charged/mapped/isotopic
radical/wedged explicit hydrogens)
- explicitlp
(detect explicit lone pairs)
- isotope
(detect isotopes)
- metallocene
(detect incorrect metallocene
representations)
- missingatommap
(detect atoms without map numbers)
- multicenter
(detect multicenters)
- multicomponent
(detect molecules containing
disconnected parts)
- moleculecharge
(detect non-neutral molecules)
- ocr
(detect structures that are probably
not chemical structures but originated
from other drawings, usually results of
incorrect optical structure
recognition)
- overlappingAtoms
(detect atoms that are too close to
each other)
(detect bonds that are too close to
each other)
- pseudoatom
(detect pseudo atoms)
- queryatom
(detect query atoms)
- querybond
(detect query bonds)
- racemate
(detect asymmetric tetrahedral atoms
without specific stereo configuration)
- radical
(detect radical atoms)
- ratom:all=[true,false]:
disconnected=[true,false]:
generic=[true,false]:
linker=[true,false]:
nested=[true,false]:
(detect all/disconnected/generic
linker/nested R-atoms)
- rare
(detect rare elements)
- reactionmap
(detect reactions with invalid
atom mapping)
- rgroupattachmenterror
(detect R-group attachment errors)
- rgroupreferenceerror:
missingratom=[true,false]:
missingrgroup=[true,false]:
selfreference=[true,false]:
(detect missing R-atom/
missing R-group/invalid attachment/
self reference errors in R-group
definitions)
- ringstrainerror
(detect small rings with
trans or cumulative double bonds,
or triple bond)
- solvent
(detect common solvents appearing
by a main component)
- staratom
(detect star atoms)
- substructure:reactionSmarts=[smarts]
(detects if the give smarts
structure could be found as
a substructure of the
original molecule)
- unbalancedreaction
(detect reactions with orphan atoms)
- valence
(detect valence errors)
- valenceproperty
(detect atoms with valence properties)
- valenceproperty:defaultvalence=true
(detect atoms with default valence
properties)
- valenceproperty:nondefaultvalence=true
(detect atoms with default valence
properties)
- wedge
(detect incorrect wedge bonds)
- wigglydoublebond
(detects non_stereo double bonds
with wiggly representation)
connected to a double bond)
valid fixer actions are:
- aliastoatom
(remove aliases from atoms)
- aliastogroup
(convert atoms with aliases to
abbreviated groups if the alias)
is recognized)
- aliastocarbon
(remove alias values from atoms and
convert the atom to a carbon)
- clearabsstereo
(remove the chiral flag)
- clean
(calculate 2D coordinates)
- contractgroup
(contract all abbreviated groups)
- converttoelementalform
(convert isotopes into elemental atoms)
- converttoionicform
(convert covalent counterions
to ionic form)
- converttometalloceneform
(convert non-standard metallocene
representations into coordinated
multicenter representation
- crosseddoublebond
(convert non-stereo double bond
represented by wiggly ligand)
to crossed double bond representation)
- crossedtowiggly
(convert non-stereo double bond
represented by crossed double bond)
to wiggly ligand representation)
- dearomatize
(convert aromatic rings into Kekule
form)
- expandgroup
(expand all abbreviated groups)
representation if it is possible)
- fixrgroupattachment
(add missing attachments to members
with single location)
- fixvalence
(correct valence problem by removing
hydrogens or setting charges)
- mapmolecule
(add atom maps to each atom
of the molecule)
- mapreaction
(add atom maps to the reaction)
- neutralize
(remove charges from the molecule)
- pseudotogroup
(convert pseudo atoms to
abbreviated groups if pseudo label
is a known abbreviated group)
- removeexplicith
(remove explicit hydrogens)
- rearomatize
(dearomatize the molecule and
aromatize it again)
- removealias
(remove alias values from atoms)
- removeatom
(remove the problematic atoms
from the molecule)
- removeatommap
(remove atom map numbers)
- removeatomqueryproperty
(remove atom query properties)
- removeatomvalue
(remove atom values)
- removeattacheddata
(remove data attached to atoms)
- removebond
(remove problematic bonds
from the molecule)
- removeradical
(convert radicals to non_radical atoms)
- removevalenceproperty
(remove valence properties from atoms)
- removezcoordinate
(set the z-coordinates of atoms to
zero)
- ungroup
(ungroup all abbreviated groups)
- wedgeclean
(recalculate the orientation of the
wedge bonds in the molecule)
Output options:
-t --output-type <output type> output type (default: single)
<output type> = [single|separated|accepted|discarded]
single - both accepted and discarded structures are written to the
<output path>
separated - accepted structures are written to the <output path>,
discarded structures are written to the <discarded path>
accepted - only accepted structures are written to the <output path>
discarded - only discarded structures are written to the <discarded path>
-o, --output <output path> output file (default: standard output)
-d, --discarded <discarded path> writes molecules with structure
error to a separate file (default:
standard output)
-f, --format <format> output file format (default: smiles)
-rf, --report-file <filepath> writes report to a file
-rp, --report-property <propname> writes report to the property of the
output, with the specified propery name
-l, --log <filepath> writes software-error log messages
to file
-ocr, --discard-scan-errors discard incorrectly scanned molecules
Examples:
structurecheck -c config.xml -t separated -o out.smiles -d discarded.smiles in
.smiles
structurecheck -c config.xml -m fix -t separated -d discarded.smiles in.smiles
structurecheck -c config.xml -m fix -t discarded in.sdf
structurecheck -c "bondLength" in.sdf
structurecheck -c "isotope->converttoelementalform" in.sdf
structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
Usage
structurecheck -c <config file> -m [mode] [<options>] [input list]
- The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string .
- The optional parameter -m or --mode specifies the operation mode. The following operation modes are available:
- check (default): searches for errors;
- fix: fixes automatically fixable errors.
Note: When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).
Input
Structurecheck accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc. ). The input can be specified as:
- input file(s),
- input string(s), or
- SMILES (default).
Note: If neither the input file nor the input string is specified, the standard input (console) will be read.
Output
Structurecheck's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:
- single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)
- separated: valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which can not be fixed automatically).
- If --discarded parameter is omitted, molecules with invalid structures are written to standard output;
- If --output parameter is omitted, molecules with valid structures are written to standard output;
Note: The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.
accepted: only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case) discarded: only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)
The report of the structure check can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.
Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.
Usage examples
Below you can find the short descriptions of some examples.
-
structurecheck -c "metallocene"
Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);
-
structurecheck -c "bondLength" in.sdf
Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
-
structurecheck -c "isotope->converttoelementalform" in.sdf
Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
-
structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;
-
structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf
Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.
Note: The format of both outputs is SMILES
as --format (-f) is not defined; -
structurecheck -c config.xml -m fix -t separated -d discarded.sdf
Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures to discarded.sdf, and writes molecules with valid structures to the standard output (console);
-
structurecheck -c config.xml -m fix -t discarded in.sdf
Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures to discarded.sdf, and omits molecules with valid structures.
Go to top Copyright © 1999-2010 ChemAxon Ltd. All rights reserved.