aligns arbitrarily ordered isomers
ArbAlign is a small tool for optimally aligning two arbitrarily ordered isomers using the
Hungarian or Kuhn-Munkres algorithm. The final ordering of the two isomers should give the lowest
root mean-square distance (RMSD) between the two structures.
The repo contains the following items:
Examples
- a directory containing coordinates of different molecular systems to play around withPaper
- the manuscript and supporting information of the companion paper to this workArbAlign-driver.py
- a Python source code for the munkresRMSD script,ArbAlign.py/ ArbAlign-standard.py/
- a Python script using the faster implementation of Hungarian algorithmArbAlign-scipy.py/
- a Python script using the slower SciPy implementation of Hungarian algorithmPrinCoords.py
- a Python source code for calculating principal coordinatesgenTypes.csh
– a shell script to use OpenBabel to convert a Cartesian (XYZ) file to a SYBYL Mol2 file formatted as a conventional XYZ filegenConn.csh
– a shell script to use OpenBabel to convert a Cartesian (XYZ) file to a NMA connectivity file formatted as a conventional XYZ fileRMSD-Kabsch.py
- a Python script for calculating RMSDs by Jimmy Charnley Kromann (jimmy@charnley.dk) and Lars Bratholm’s script from https://github.com/charnley/rmsd - Accessed on Nov 10, 2015. Please see rmsd-script/LICENSE for usage rights.ArbAlign-driver.py
is a Python driver script to run Hungarian optimal RMSD matching.
ArbAlign can be used either as a command line or web tool. The command line tool has a driver script
that can take in many options or resort to sensible defaults when necessary.
Usage: ArbAlign-driver.py [-b/--by {l, t, c}] [-n/--noHydrogens] [-s/--simple] A.xyz B.xyz
-b {l,t,c}, --by {l,t,c}
Match atoms by l-label, SYBYL t-type, or NMA connectivity (-c).
The default is by atom label (-l)
-s, --simple
Perform Kuhn-Munkres assignment reordering without axes swaps and reflections.
The default is to perform axes swaps and reflections
-n, --noHydrogens
Ignore hydrogens.
The default is to include all atoms
-v, --verbose
prints detailed output. The default is to print minimal amount of information
ArbAlign.py A.xyz B.xyz
ArbAlign-driver.py A.xyz B.xyz
ArbAlign-driver.py --by c --noHydrogens A.xyz B.xyz
ArbAlign-driver.py --simple --noHydrogens A.xyz B.xyz
-b {l,t,c},
--by {l,t,c}
The default is to align the isomers by element or atom label (-l). If one wants to align the isomers
by the elements’ SYBYL atom type or NMA connectivity, the driver needs to interface with OpenBabel.
Match atoms by l-label, SYBYL t-type, or NMA connectivity (-c).
-s,
--simple
Perform Kuhn-Munkres assignment reordering without axes swaps and reflections.
The default is to perform axes swaps and reflections
-n,
--noHydrogens
Ignore hydrogens.
The default is to include all atoms
numpy
module. If you don’t have it already, you can install it using pip
pip install numpy
hungarian
module.pip install hungarian
build/lib-XXX/hungarian.so
into a location that’s included in your $PYTHONPATH
or whatever directory you are running ArbAlign from.scipy
linear assignment solver, you would need to install itpip install scipy
.OpenBabel
- We use OpenBabel to convert Cartesian coordinates (XYZ) to formats containing atmm types including connectivity and hybridization information. It is necessary to use OpenBabel to convert the Cartesian coordinates to SYBYL Mol2 (sy2) and MNA (mna) formats.genTypes.csh
- a small shell script which converts XYZ file to SYBYL Mol2 (sy2) format and recasts the atom label to contain atom type information.genConn.csh
- a small shell script which converts XYZ file to NMA (nma) format and recasts the atom label to contain atom’s bonding/connectivity information.If the pairs of structures pass a sanity test, the tool will align them optimally and provide the
following information.
For example, aligning Examples/Simple/Mol-A
and Examples/Simple/Mol-B
using ArbAlign-driver.py Mol-A.xyz Mol-B.xyz
would print
- [('H', 17), ('C', 13), ('O', 4), ('N', 3)]
- C Swap: (0, 1, 2) Refl: (1, 1, 1) RMSD: 3.6057335750670982 [1, 0, 5, 6, 4, 9, 3, 7, 8, 2, 10, 11, 12]
- H Swap: (0, 1, 2) Refl: (1, 1, 1) RMSD: 2.7427738425469443 [0, 3, 4, 1, 6, 7, 5, 2, 10, 9, 8, 12, 11, 15, 14, 16, 13]
- N Swap: (0, 1, 2) Refl: (1, 1, 1) RMSD: 2.7427738425469443 [0, 1, 2]
- O Swap: (0, 1, 2) Refl: (1, 1, 1) RMSD: 2.725809328162452 [0, 1, 3, 2]
- .
- .
- (48x)
- .
- .
- Swap Transform: (0, 1, 2)
- Reflection Transform: (-1, -1, -1)
- Initial unsorted RMSD: 3.996
- Initial sorted RMSD: 3.388
- Best RMSD: 0.746
- Best alignment of Mol-B.xyz with Mol-A.xyz is written to Mol-B-aligned_to-Mol-A.xyz
One can generate a graphic representation of the alignment using UCSF Chimera that would look something like this:
If you find this tool useful for any publishable work, please cite the companion paper:
Berhane Temelso, Joel M. Mabey, Toshiro Kubota, Nana Appiah-padi, George C. Shields. J. Chem. Info. Model.
2017, 57 (5), 1045–1054
http://doi.org/10.1021/acs.jcim.6b00546