[Open Babel] (no subject)

Discussion:

Marcos Villarreal

2017-05-22 17:56:51 UTC

Hello,

For an application we are developing, we would like to get an atom typing
independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
the same molecule (i.e. identical heavy atom coordinates) should get the
same atom types.
The attached program is our try in that direction, but unfortunately
without success. How could one get ride off all the input information and
let babel do all the new calculations of atom types?

Thank you in advance.

int main(int argc,char **argv)
{

OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];

conv.ReadFile(&mol,filename);

mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();

int i=0;
FOR_ATOMS_OF_MOL(atom, mol) {
i++;
std::cout <GetType() << std::endl ;
}

}

--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de CÃ³rdoba
Argentina.

Noel O'Boyle

2017-05-22 19:00:33 UTC

Permalink

In other words, you want to assign atom types based on the structure.
The source of the structure is immaterial except in so far as it
introduces noise. For example, to read a PDB file you need to guess
various things. To read a MOL file, you don't need to guess anything.

Regarding your code, you should never throw away information and then
try to guess it. Also, I note in passing that DeleteHydrogens()
doesn't delete anything, it just suppresses any explicit hydrogens.

I'm a bit unclear why you are using the internal Open Babel atom
types. Personally, I would avoid this as the atom types may not be
suitable. Instead, just implement your own atom type function to suit
your needs. Any atom typing can be implemented as a function that
takes an OBAtom* and returns the type, perhaps as an enum.

- Noel

Post by Marcos Villarreal
Hello,
For an application we are developing, we would like to get an atom typing
independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
the same molecule (i.e. identical heavy atom coordinates) should get the
same atom types.
The attached program is our try in that direction, but unfortunately without
success. How could one get ride off all the input information and let babel
do all the new calculations of atom types?
Thank you in advance.
int main(int argc,char **argv)
{
OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];
conv.ReadFile(&mol,filename);
mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();
int i=0;
FOR_ATOMS_OF_MOL(atom, mol) {
i++;
std::cout <GetType() << std::endl ;
}
}
--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
Argentina.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Noel O'Boyle

2017-05-23 12:43:39 UTC

Permalink

Maybe if you can give an example of the problem with aromaticity, we
can help? The only information that is used by that function is the
structure, so it was probably wrong at that point.

Dear Noel Thank you for your answer. Please see my comments bellow.

Post by Noel O'Boyle
In other words, you want to assign atom types based on the structure.

Yes, that's right.

Post by Noel O'Boyle
The source of the structure is immaterial except in so far as it
introduces noise. For example, to read a PDB file you need to guess
various things. To read a MOL file, you don't need to guess anything.

That noise is what we are trying to avoid by always calculating (guessing)
things with the same algorithm.

Post by Noel O'Boyle
Regarding your code, you should never throw away information and then
try to guess it.

Well, that depend on your faith on the quality of the information putted in
the input format.
One can always set a flag to keep the input information if its considered
accurate enough, but if you want consistency regarding the input file format
I don't see other way but to strip off all the information in the input and
recalculate it.

Post by Noel O'Boyle
Also, I note in passing that DeleteHydrogens()
doesn't delete anything, it just suppresses any explicit hydrogens.
I'm a bit unclear why you are using the internal Open Babel atom
types. Personally, I would avoid this as the atom types may not be
suitable.
Instead, just implement your own atom type function to suit
your needs. Any atom typing can be implemented as a function that
takes an OBAtom* and returns the type, perhaps as an enum.

Are you referring to functions like "IsAmideNitrogen" or so?. We used these
functions, and they worked just fine for our needs.
The problem we faced was with "IsAromatic" that we couldn't make it
input-format agnostic. Our guess is that some information of the input
format is always remaining when calling it, regardless
UnsetAromaticPerceived and the like were called before.
This lead us to try the route of put all the atom types in internal Open
Babel types and build upon it.

Post by Noel O'Boyle
- Noel

--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Cordoba

Marcos Villarreal

2017-05-23 15:10:28 UTC

Permalink

Thank you Noel for look into this.

So how do you suggest to do this inside the code, that is without passing
for and intermediate file.
I remind you that our gol is to get the same atom types (say aromatics)
regardless the input format.
For now we are interested in consistency before "accuracy", which is
another subject. As a related note, we have tested several atom typing
programs (Knodle, I-interpret, Unicon and also Open Babel) and the
perception of the number of aromatic atoms typically differ in 10-20 % when
analyzing a 3600 structures in the PDBbind database.

When I convert the molecules as given with obabel, you're right - you
run into a bug that's been fixed on the development branch -
aromaticity is perceived differently depending on the presence/absence

obabel 3rlb_ligand.* -osmi

Cc1nc(N)c(Cn2csc(CCO)c2C)cn1 3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb
If you delete the explicit Hs first, you can get the same aromaticity

obabel 3rlb_ligand.* -d -O tmp.sdf
obabel tmp.sdf -osmi

Cc1nc(N)c(CN2=CSC(=C2C)CCO)cn1 3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb
If you paste these SMILES into Marvin Sketch you can see the
difference. The MOL2 file contains an extra double bond to a nitrogen.
So what's going on?...
I'm guessing that the correct structure is in the MOL2 file, but it
was read incorrectly by Open Babel and so is missing the charge on the
4-valent nitrogen. MOL2 is a horrible format but we should do a better
job. I note in passing that MarvinSketch interprets it the same as
Open Babel but that's no excuse.
The PDB file of course does not contain any bond orders and so we
guess them. We do an okay job - this is an example where we miss the
bond. If you removed these bond orders from the MOL2 file you would
get the same wrong structure too.
- Noel

Here is one example from the PDBBind refine data set.
Please find bellow the code, the output, and attached the mol2 and the

pdb

input files.
#include <iostream>
#include <openbabel/obconversion.h>
#include <openbabel/obiter.h>
#include <openbabel/mol.h>
#include <openbabel/atom.h>
int main(int argc,char **argv)
{
OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];
conv.ReadFile(&mol,filename);
mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();
mol.UnsetAromaticPerceived();
FOR_ATOMS_OF_MOL(atom, mol) {
std::cout << atom->IsAromatic() ;
}
}
000000111110000000 (mol2)
000000000000111111 (pdb)

Post by Noel O'Boyle
Maybe if you can give an example of the problem with aromaticity, we
can help? The only information that is used by that function is the
structure, so it was probably wrong at that point.

Dear Noel Thank you for your answer. Please see my comments bellow.

Post by Noel O'Boyle
In other words, you want to assign atom types based on the structure.

Yes, that's right.

That noise is what we are trying to avoid by always calculating (guessing)
things with the same algorithm.

Post by Noel O'Boyle
Regarding your code, you should never throw away information and then
try to guess it.

Well, that depend on your faith on the quality of the information

putted

Post by Noel O'Boyle

in
the input format.
One can always set a flag to keep the input information if its considered
accurate enough, but if you want consistency regarding the input file format
I don't see other way but to strip off all the information in the

input

Post by Noel O'Boyle

and
recalculate it.

Open

Post by Noel O'Boyle

Babel types and build upon it.

Post by Noel O'Boyle
- Noel

get

Post by Noel O'Boyle

Post by Marcos Villarreal
the
same atom types.
The attached program is our try in that direction, but

unfortunately

Post by Noel O'Boyle

Post by Marcos Villarreal
without
success. How could one get ride off all the input information and

let

Post by Noel O'Boyle

Post by Marcos Villarreal
babel
do all the new calculations of atom types?
Thank you in advance.
int main(int argc,char **argv)
{
OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];
conv.ReadFile(&mol,filename);
mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();
int i=0;
FOR_ATOMS_OF_MOL(atom, mol) {
i++;
std::cout <GetType() << std::endl ;
}
}
--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de CÃ³rdoba
Argentina.
------------------------------------------------------------

------------------

Post by Noel O'Boyle

Post by Marcos Villarreal
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de Cordoba

Geoffrey Hutchison

2017-05-23 15:38:30 UTC

Permalink

For now we are interested in consistency before "accuracy", which is another subject. As a related note, we have tested several atom typing programs (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 structures in the PDBbind database.

This is hardly surprising. For one, if I take 10 organic chemists in a room and ask them to identify aromatic rings, I’ll get at least 10-20% variation.

More specifically, there is not one uniform cheminformatics model for aromaticity - because there is no well-defined chemical definition. That’s omitting the hard cases, even given a specific aromatic model. I’d guess we get 5-10 bug reports per year on specific cases for OB aromaticity detection.

But your question is how do you get uniform atom types, regardless of the input file format. This is probably impossible. If you have data in format X with correct bond and formal charge assignments (e.g., SDF) and data in XYZ format with atoms and no bonds or formal charges, you have to assume that all the bond perception is perfect. I don’t have a good metric for OB’s implementation, but I’d guess somewhere in the ~90-95% range.

In short, please don’t throw away good data. Stick to file formats that retain as much information as possible.

-Geoff

Marcos Villarreal

2017-05-23 17:24:49 UTC

Permalink

Hello Geoff, thank you for your answer. Please see my comments which are
inline with yours comments below

Post by Marcos Villarreal

Post by Marcos Villarreal
For now we are interested in consistency before "accuracy", which is

another subject. As a related note, we have tested several atom typing
programs (Knodle, I-interpret, Unicon and also Open Babel) and the
perception of the number of aromatic atoms typically differ in 10-20 % when
analyzing a 3600 structures in the PDBbind database.

This is hardly surprising. For one, if I take 10 organic chemists in a room

Post by Marcos Villarreal
and ask them to identify aromatic rings, Iâll get at least 10-20% variation.
More specifically, there is not one uniform cheminformatics model for
aromaticity - because there is no well-defined chemical definition. Thatâs
omitting the hard cases, even given a specific aromatic model. Iâd guess we
get 5-10 bug reports per year on specific cases for OB aromaticity
detection.

That was exactly the point implied in this comment. Open Babel seems as
good as any other program at detecting aromaticity.

Post by Marcos Villarreal
But your question is how do you get uniform atom types, regardless of the
input file format. This is probably impossible. If you have data in format
X with correct bond and formal charge assignments (e.g., SDF) and data in
XYZ format with atoms and no bonds or formal charges, you have to assume
that all the bond perception is perfect. I donât have a good metric for
OBâs implementation, but Iâd guess somewhere in the ~90-95% range.

Well, as long as coordinates and atomic numbers are provided in a file, it
should be possible to always come up with the same atom typing, regardless
the format. Indeed you will have to loose information for the sake of
consistency.

Post by Marcos Villarreal
In short, please donât throw away good data. Stick to file formats that
retain as much information as possible.

I agree with you in principle, but consider the following not uncommon
scenario. We are working on docking (autodock vina) whose score depends on
atom typing. As you know the ligands come in different formats, usually
pdb, mol2 or sdf. We would expect to obtain the same docking result
regardless the input format.

-Marcos.

Post by Marcos Villarreal
-Geoff

--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de Cordoba

Dimitri Maziuk

2017-05-23 18:06:47 UTC

Permalink

Post by Marcos Villarreal
I agree with you in principle, but consider the following not uncommon
scenario. We are working on docking (autodock vina) whose score depends on
atom typing. As you know the ligands come in different formats, usually
pdb, mol2 or sdf. We would expect to obtain the same docking result
regardless the input format.

Why? PDB files contain a 3D structure, complete with stereo config
(because that's how the crystal structure works). MOL/SDF doesn't have
to include 3D coordinates, nor any usable stereo flags. Unless all my
MOL/SDFs were generated from PDBs with zero information loss, I wouldn't
expect anything from them.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Miro Moman

2017-05-22 19:06:27 UTC

Permalink

Quick and dirty workaround: Convert it to .xyz (removing the Hs if needed)
then compute the atom types from that file and see what happens...

Post by Marcos Villarreal
Hello,
For an application we are developing, we would like to get an atom typing
independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
the same molecule (i.e. identical heavy atom coordinates) should get the
same atom types.
The attached program is our try in that direction, but unfortunately
without success. How could one get ride off all the input information and
let babel do all the new calculations of atom types?
Thank you in advance.
int main(int argc,char **argv)
{
OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];
conv.ReadFile(&mol,filename);
mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();
int i=0;
FOR_ATOMS_OF_MOL(atom, mol) {
i++;
std::cout <GetType() << std::endl ;
}
}
--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de CÃ³rdoba
Argentina.
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Marcos Villarreal

2017-05-23 12:18:38 UTC

Permalink

Thank you Miro for your answer. Tha is in the spirit of we want to do, but
without writing an intermediate file. We think that all the conversions can
be done inside the code.

Marcos.

Post by Miro Moman
Quick and dirty workaround: Convert it to .xyz (removing the Hs if needed)
then compute the atom types from that file and see what happens...

Post by Marcos Villarreal
Hello,
For an application we are developing, we would like to get an atom typing
independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
the same molecule (i.e. identical heavy atom coordinates) should get the
same atom types.
The attached program is our try in that direction, but unfortunately
without success. How could one get ride off all the input information and
let babel do all the new calculations of atom types?
Thank you in advance.
int main(int argc,char **argv)
{
OpenBabel::OBConversion conv;
OpenBabel::OBMol mol;
std::string filename;
filename = argv[1];
conv.ReadFile(&mol,filename);
mol.DeleteHydrogens();
mol.ConnectTheDots();
mol.PerceiveBondOrders();
int i=0;
FOR_ATOMS_OF_MOL(atom, mol) {
i++;
std::cout <GetType() << std::endl ;
}
}
--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de CÃ³rdoba
Argentina.
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

--
Marcos Villarreal
Dpto de QuÃmica TeÃ³rica y Computacional
Facultad de Ciencias QuÃmicas
Universidad Nacional de Cordoba