check ======================== .. contents:: :local: In this task, pdbtop will check the input molecule and do some corrections if necessary. Arguments ------------- .. option:: -i, --in .. list-table:: :stub-columns: 1 * - Mandatory - Yes * - Argument - PDB filename or XYZ filename * - Default - None Give the input molecule. .. option:: -o, --out .. list-table:: :stub-columns: 1 * - Mandatory - No * - Argument - A filename prefix * - Default - None Give the output filename prefix. If not given, the output will be written to ``x-check.pdb``. Example: .. code-block:: bash $ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file It will output the following information: .. code-block:: bash ... Warning: The residue name of the 1112-th atom is changed from "HIS" to "HSE". Warning: The atom name of the 1120-th atom is changed from "CD1" to "CD". Warning: The atom name of the 1128-th atom is changed from "CD1" to "CD". Warning: The atom name of the 1164-th atom is "OXT". It is probably a non-standard atom in C-terminus and is deleted. ... Write PDB: new.pdb The output indicates that some residues and atoms are renamed, and some atoms are deleted. The corrected molecule will be written to ``new.pdb``. Usually, pdbtop can do these corrections in the right way. However, if you have special requirements, you have to correct the input molecule manually. So, we strongly **recommend** you read all the ``Warning`` information carefully. Theoretical Background -------------------------- PDB files obtained from https://www.rcsb.org/ are often generated from X-ray experiments. The PDB file format is not always suitable to be used in computations and often contains ambiguouities that **CANNOT** be treated automatically. You have to treat some issues manually. This section will explain what pdbtop does in the check task. Long Atom Names and Location Indicators ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Below are two examples of PDB files: .. tabs:: .. tab:: md.pdb .. code-block:: bash :caption: md.pdb, a PDB file for molecular dynamic simulation :emphasize-lines: 4-6 :linenos: ATOM 2225 CG LEU A 160 37.975 11.053 20.429 1.00 32.24 C ATOM 2226 HG LEU A 160 38.934 11.441 20.009 1.00 0.00 H H ATOM 2227 CD1 LEU A 160 38.210 9.708 21.137 1.00 34.92 C ATOM 2228 HD11LEU A 160 38.574 8.946 20.415 1.00 0.00 H H ATOM 2229 HD12LEU A 160 38.969 9.817 21.942 1.00 0.00 H H ATOM 2230 HD13LEU A 160 37.268 9.334 21.591 1.00 0.00 H H ATOM 2231 CD2 LEU A 160 36.995 10.852 19.267 1.00 33.22 C ATOM 2232 HD21LEU A 160 37.422 10.150 18.519 1.00 0.00 H H ATOM 2233 HD22LEU A 160 36.030 10.429 19.613 1.00 0.00 H H ATOM 2234 HD23LEU A 160 36.791 11.819 18.759 1.00 0.00 H H .. tab:: 3kab.pdb .. code-block:: bash :caption: 3kab.pdb, a PDB file from RCSB PDB :emphasize-lines: 1-2, 15-16 :linenos: ATOM 812 CA AARG A 119 31.582 16.256 14.775 0.50 35.04 C ATOM 813 CA BARG A 119 31.512 16.275 14.809 0.50 34.80 C ATOM 814 C ARG A 119 32.251 16.478 16.124 1.00 35.47 C ATOM 815 O ARG A 119 32.920 15.565 16.637 1.00 36.21 O ATOM 816 CB AARG A 119 32.441 15.211 14.037 0.50 34.55 C ATOM 817 CB BARG A 119 32.145 15.106 14.033 0.50 34.28 C ATOM 818 CG AARG A 119 31.785 14.470 12.894 0.50 34.42 C ATOM 819 CG BARG A 119 31.511 14.870 12.670 0.50 33.10 C ATOM 820 CD AARG A 119 32.831 14.137 11.844 0.50 31.29 C ATOM 821 CD BARG A 119 32.183 13.779 11.867 0.50 29.12 C ATOM 822 NE AARG A 119 32.779 15.183 10.842 0.50 30.39 N ATOM 823 NE BARG A 119 31.239 13.328 10.868 0.50 27.73 N ATOM 824 CZ AARG A 119 33.598 16.233 10.751 0.50 27.44 C ATOM 825 CZ BARG A 119 31.217 12.113 10.329 0.50 26.91 C ATOM 826 NH1AARG A 119 34.654 16.422 11.573 0.50 28.45 N ATOM 827 NH1BARG A 119 32.123 11.182 10.695 0.50 23.52 N ATOM 828 NH2AARG A 119 33.357 17.080 9.788 0.50 18.54 N ATOM 829 NH2BARG A 119 30.263 11.846 9.442 0.50 19.86 N If you have ever peroformed some molecular dynamic simulations, you may have encountered some files like ``md.pdb``. You can see 3 highlighted hydrogen atoms named ``HD11``, ``HD12``, and ``HD13``. However, this is **NOT** a standard format. In **standard** PDB format, the 4th character is actually called **location indicator**. In the example ``3kab.pdb``, you can see that in residue ARG119, there are two sets of atoms with location indicators ``A`` and ``B``. The location indicator is used to distinguish atoms with the same name in the same residue. The conformation ``A`` and ``B`` have a ratio given by occupancy factor, i.e. ``0.5``:\ ``0.5``. The 2 conformations are shown below. .. image:: _static/figs/p1.png Therefore, sometimes it is difficult to distinguish the location indicator from the long atom name. In the example ``md.pdb``, the atom names are actually ``HD11``, ``HD12``, and ``HD13`` without location indicators, but they can also be interpreted as an atom with atom name ``HD1`` with location indicator ``1``, ``2``, and ``3``. Therefore, when you are dealing with PDB files from X-ray experiments, this must be taken into account. pdbtop can recognize some location indicators and atom names, and gives some suggestions. For example, .. code-block:: bash $ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file ... Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CA812 and CA813 can be kept! Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500. Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CB816 and CB817 can be kept! Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500. Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CG818 and CG819 can be kept! Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500. ... You can see that pdbtop has realized that there are 2 sets of atoms with the same name in the same residue, like ARG119. But, pdbtop does **NOT** do modifications. You have to do it manually. For example, for this ARG119, we only keep the ``A`` set of conformations, by deleting the ``B`` set of conformations: .. code-block:: bash :caption: 3kab.pdb, a PDB file from RCSB PDB :linenos: ATOM 811 N ARG A 119 31.491 17.524 14.015 1.00 34.28 N ATOM 812 CA ARG A 119 31.582 16.256 14.775 0.50 35.04 C ATOM 814 C ARG A 119 32.251 16.478 16.124 1.00 35.47 C ATOM 815 O ARG A 119 32.920 15.565 16.637 1.00 36.21 O ATOM 816 CB ARG A 119 32.441 15.211 14.037 0.50 34.55 C ATOM 818 CG ARG A 119 31.785 14.470 12.894 0.50 34.42 C ATOM 820 CD ARG A 119 32.831 14.137 11.844 0.50 31.29 C ATOM 822 NE ARG A 119 32.779 15.183 10.842 0.50 30.39 N ATOM 824 CZ ARG A 119 33.598 16.233 10.751 0.50 27.44 C ATOM 826 NH1 ARG A 119 34.654 16.422 11.573 0.50 28.45 N ATOM 828 NH2 ARG A 119 33.357 17.080 9.788 0.50 18.54 N Note that we also delete ``A``. Different Name for the Same Thing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Due to some reasons, some atoms or residues have slightly different names in different PDB files. For example, the residue name of amino acid "histidine" can be ``HIS`` or ``HSE``; the atom name of :math:`\delta`\ C in isoleucine ``ILE`` can be ``CD1`` or ``CD``. This is not a big issue. pdbtop will rename them consistently in the output file. Element Symbols ^^^^^^^^^^^^^^^^^ In many PDB files, the element symbol is not given. In this case, pdbtop will add the element symbol to each atom. For example, .. code-block:: bash :linenos: ATOM 181 HA SER A 18 29.928 -4.431 43.044 1.00 0.00 H H ATOM 182 C SER A 18 31.411 -5.775 42.354 1.00 60.56 C C ATOM 183 O SER A 18 30.902 -6.728 42.936 1.00 60.65 O O Unfortunatelly, the element symbol **CANNOT** be always determined from the atom name. For example, the atom name ``CL`` can be carbon or chlorine. In this case, you have to **manually** correct the element symbol! Models in the Same File ^^^^^^^^^^^^^^^^^^^^^^^^ In a PDB file, there can be multiple models. The models are seperated by ``MODEL`` and ``ENDMDL``. For exapmle, .. code-block:: bash :caption: 2mz7.pdb, a PDB file from RCSB PDB :linenos: MODEL 1 ATOM 1 N LYS A 267 26.791 -7.054 -26.130 1.00 0.00 N ATOM 2 CA LYS A 267 27.796 -6.500 -27.025 1.00 0.00 C ... ATOM 708 HD3 PRO A 312 5.473 -10.093 -12.553 1.00 0.00 H TER 709 PRO A 312 ENDMDL MODEL 2 ATOM 1 N LYS A 267 21.011 -23.102 -13.354 1.00 0.00 N ATOM 2 CA LYS A 267 19.831 -22.754 -12.570 1.00 0.00 C ... ATOM 706 HG3 PRO A 312 2.202 -3.813 -8.777 1.00 0.00 H ATOM 707 HD2 PRO A 312 -0.115 -5.325 -8.685 1.00 0.00 H ATOM 708 HD3 PRO A 312 1.253 -5.649 -9.765 1.00 0.00 H TER 709 PRO A 312 ENDMDL When pdbtop detect that there are several models, it will save each model in a file. You can then treat the model you need with pdbtop again. For example, .. code-block:: bash $ pdbtop check -i 2mz7.pdb -o 2mz7-model Read: 2mz7.pdb There are 20 models in "2mz7.pdb". Each of them is saved to "2mz7-model-X.pdb". $ ls 2mz7-model-1.pdb 2mz7-model-13.pdb 2mz7-model-17.pdb 2mz7-model-20.pdb 2mz7-model-6.pdb 2mz7.pdb 2mz7-model-10.pdb 2mz7-model-14.pdb 2mz7-model-18.pdb 2mz7-model-3.pdb 2mz7-model-7.pdb 2mz7-model-11.pdb 2mz7-model-15.pdb 2mz7-model-19.pdb 2mz7-model-4.pdb 2mz7-model-8.pdb 2mz7-model-12.pdb 2mz7-model-16.pdb 2mz7-model-2.pdb 2mz7-model-5.pdb 2mz7-model-9.pdb