Navigation:

Potential Development using CamCASP

Warning

This tutorial is being updated to reflect the new methodology we have published in our recent paper in JCTC (2016) on the pyridine dimer.

See also:

  1. Potential Development using CamCASP

  2. Charge-Transfer

  3. Polarization & Polarization Models

  4. Rory's notes on the water potential

  5. Orient: Energy Calculations

Or, potential development made easier!!!

There are a few things you need to do before getting here:

  1. Optimize the molecule geometries. You will have to start from a fixed geometry before trying to include flexibility (current research).
  2. Determine the vertical ionization energies of the molecules, and, if you are using NWChem or DALTON2016, also determine the AC-SHIFT for these systems.
  3. Molecular Properties: The potential is created out of a short-range and long-range part. The short-range part is what we will do here. The long-range part is based on molecular properties calculated on the isolated molecules. These properties are:
    1. Distributed multipoles
    2. Distributed polarizabilities
    3. Distributed dispersion coefficients
  4. Interaction energy calculations for a set of pseudo-random dimer configurations. This is a time-consuming step.
  5. First-order energy calculations on a dense set of pseudo-random dimer configurations. If the above set of configurations is already fairly dense, this step may be skipped.
    1. Density overlap calculations on the dense set of dimer configurations used for the first-order energy calculations.

The potential we are after has the form: $$ V^{ab}(r_{ab}) = K \exp{[-\alpha^{ab}(r_{ab}-\rho^{ab}(\Omega^{ab}))]} + \sum_{n=6}^{12} f_{n}(\beta r_{ab}) \frac{C^{ab}_{n}}{r_{ab}^n} + \{ \mathcal{Q}^{a}_{lm}, \alpha^{ab}_{lm,l'm'} \} $$

The first term is the short-ranged term that needs to be fitted. This is what we will be concerned with here. The second is the dispersion model. In principle, all parameters in this model are theoretically derived. The last term is an abbreviated way of including the terms that depend on multipole moments (the electrostatic term) and those that depend on multipole-moments and polarizabilities (the polarization term). The electrostatic term is obtained directly from the distributed multipole moments and no fitting is required. The polarization model requires a damping function (different from the damping used for the dispersion). This needs to be fitted.

Notice that almost all terms are site-site-dependent. The exceptions are the damping parameters $\beta$ in the dispersion/polarization models. Though, in principle, these too should be dependent on the sites involved.

Initial geometries: Geometry optimization

A few pointers:

Ionization energy and AC-SHIFT

Eventually we will use LC-PBE/LC-PBE0 for SAPT(DFT) calculations, but till then, we will continue to use an asymptotically corrected hybrid functional like PBE0.

Note

Click here for an introduction to the asymptotic correction, but bear in mind that the Fermi-Amaldi correction that I talk about on that page is not the one we now recommend. We now think that the GRLB correction is best. More on this some other time.

For the asymptotic correction you need to know the shift needed to correct the XC potential. This shift is defined as $$ \Delta = {\rm IP} + \epsilon_{\rm HOMO}. $$

So you need two quantities to define the shift:

Ionization potential

First see if there is an experimental IP available from the NIST Chemistry WebbookExternal Link. Convert it to atomic units.

If you need to compute the IP use the $\Delta$DFT approach: $$ {\rm IP} = E[N-1] - E[N], $$

that is, calculate the energies of the $N$ and $N-1$ electron systems and determine the IP by subtraction. Use PBE0 or LC-PBE0 to determine these energies. Basis sets could be Sadlej-pVTZ, cc-pVDZ or cc-pVTZ. These basis sets may be augmented, but augmentation is typically needed for anions.

NWChem and DALTON2016 also need the AC-SHIFT. This is just the $\Delta$ in the equation defined above. DALTON2006 uses only the IPs and determines the shift on-the-fly. This is better but is more time-consuming. To compute the shift you need to calculate the HOMO eigenvalue //without// an asymptotic correction.

Important

For notes on using NWChem see my tutorials at:

In particular, see the following:

Of course, the one on Using Comanche only applies if you have an account there.

It may be easiest to use the Psi4 code for this calculation. Psi4 has nice tutorials on all sorts of calculations that you can find on their wedsite.

Molecular Properties

Now we need to calculate the molecular properties for the interacting molecules in their reference geometries. These properties include:

The latter is formally not in this group, but for homo-dimers is obtained as part of the localization step. For hetero-dimers you need to do another step to calculate the dispersion model.

I have already written a tutorial on Molecular Properties so use that to perform this step.

Fitting the short-range parameters

$$ V^{ab}_{\rm sr}(r_{ab}) = K \exp{[-\alpha^{ab}(r_{ab}-\rho^{ab}(\Omega^{ab}))]} $$

It is not easy to fit the sum of exponentials - in fact it is ill-conditioned. The results are typically nonsense and most non-linear fitting packages will fail. Instead of attempting (and failing) at a direct fit of these parameters, we will use the Overlap Model as an intermediate step.

First, what exactly do we mean by the short-range energy? $$ E_{\rm sr} = E^{(1)}_{\rm sr} + E^{(2-\infty)}_{\rm sr} $$

The first-order short-range energy is defined as $$ E^{(1)}_{\rm sr} = E^{(1)}_{\rm exch} + ( E^{(1)}_{\rm elst} - E_{\rm elst}({\rm DMA}) ), $$

that is, it is the sum of the first-order exchange(-repulsion) energy and the first-order penetration energy which is the difference of the first-order electrostatic energy and electrostatic energy calculated using the distributed multipoles.

We will fit the two parts of $E_{\rm sr}$ in stages.

Overlap Model: Fitting $E^{(1)}_{\rm sr}$

Details of this model is described the Atom-Atom review paper. So we will fill this up later. $$ E^{(1)}_{\rm sr} = \sum_{ab} K^{ab} \int \rho^{a}(r) \rho^{b}(r) dr $$

our job is to find the parameters $K^{ab}$.

Data files needed:

(perform the Sanity-Check below if needed)

The general procedure for fitting the overlap model parameters $K^{ab}$ is as follows:

A typical input block (excluding the molecule definitions etc) looks like:

  Begin Overlap-model
    Units kJ/mol  ! All energies in these units
    Weight Type 4 and E0 = 10
    ! Fit the first-order energies:
    Energy-Options Check-Asymp  Abs-Max-Energy = 500.0
    Energy = E1exch + E1pen  ! Definition of energy
    Emin = 0.0001
    Emax = 500.0
    Constraint = 10e-6  ! Don't change
    Condition = 10e-7   ! don't change
    Energy-file  Eint-H2O2-aTZ-2048-camcasp.dat
    Overlap-file ovr-H2O2-daTZ-tzvpp-GELSS-2048.dat
    Orient files No     ! See below
    Symmetrize          ! See below
    (Debug)
  End

The goal is to obtain a good fit by varying the parameters Emin, Emax, the type of weight (in the above example the 4 refers to the Gauss-Log weight described in the CamCASP User's Guide) and the energy E0. A good fit would be one with a weighted r.m.s. error of around 10-15 %.

If your list of dimers contains dimers with very short contacts you may face problems with the electrostatic energies calculated with the multipoles (essentially, the $1/R$ expansion diverges and causes pathalogical energies). To eliminate such terms use the Abs-max-Energy command with a suitable value for the absolute manimum energy. As a rule, choose a value not far from what you selected for Emax.

A word about the other commands:

To calculate an overlap model run CamCASP:

  $ camcasp < H2O2-ovrmodel.cks

The output is split into three sections:

It is the third section we use.

The code prints out the reference energies together with fitted energies. Plot these on a scatter plot. Once you are happy with the quality of the fit we can proceed to the next step.

Sanity-check

To make sure it all works, it is best to first fit the first-order exchange only. To do this use the file H2O2-ovrmodel-e1exch.cks. The reason for this check is that it is quite easy to make mistakes calculating the electrostatic energies with the multipole models. Also, the density-overlap model is know to work best for the exchange-repulsion only. So if this fit fails, there is probably an error in the setup.

0: Fit to exponentials

Once the ovrlap model is satisfactory we need to use the results of the model to fit each of the exponential terms in the short-range potential in turn. The way this works is as follows.

The overlap model allows us to write $$ E^{(1)}_{\rm sr} = \sum_{ab} K^{ab} \int \rho^{a}(r) \rho^{b}(r) dr = \sum_{ab} E^{(1)}_{\rm sr}(ab), $$

that is, we have effectively split the short-range energy into contributions arising from all pairs of sites (more correctly, all pairs of site-types). What we do now is to fit each of these contributions to a single exponential: $$ E^{(1)}_{\rm sr}(ab) = G \exp[-\alpha^{ab}(r_{ab} - \rho^{ab}(\Omega^{ab}))] $$

Here $G$ is an energy constant (an exponential has no units!) that sets the unit of energy. These fits are what we will do next. Before getting there, bear in mind that the overlap model is approximate!!! So we will need to correct for the errors later on.

For these fits we need to have ORIENT input files with the partitioned energies included. This can be done for us by CamCASP by repeating the overlap model calculation (the one we were happy with), but this time using Orient file YES. This tells CamCASP to write out the ORIENT input files.

Run the water example and get the following files:

  H2OA_H2OB_total.ornt  H_H.ornt  O_H.ornt  O_O.ornt

The first includes the total (first order) energy and the rest include the energies for the sites listed in the file names. We will now use the last three.

Local axis systems: We now have to decide on a set of local axes which ORIENT will use to define the axis system for each pair of sites. This is done in the file H2O.axes which contains the lines:

  Axes
    O z between H1 and H2  x from H1 to H2
    H1 z from O to H1       x from H1 to H2
    H2 z from O to H2       x from H2 to H1
  End

You should edit the ORIENT command files to include this local axis file. But a simple trick works. The default name of the local axis file is local_axes. So simply make a link as follows:

  $ ln -s H2O.axes local_axes

This will allow ORIENT to find the local axis files.

We can now run ORIENT. This is usually done like so:

  $ orient < H_H.ornt > H_H.ornt.out

but we have a script that will do this for you and extract the result in an appropriate form:

  $ RunOrient_OvrModels.bash

will search for all files of the form *.ornt and run them through ORIENT and extract the potential and places it in a file called potential.raw. Convert this potential into a form ORIENT can use using the command:

  $ raw2ORIENT.pl potential.raw >  H2O2-model0-0-1.pot

Note The potential may contain releated site-pairs: H..O is the same as O..H. You may want to edit it to keep only unique pairs. There is a way to get the CLUSTER program to do this. See the CamCASP User's Guide for details.

Relaxing the 0-fit

As mentioned above, the overlap model is approximate. This is the case for two reasons:

  1. The basis set we had used to calculate the distributed overlaps was small. This was deliberate so as to avoid linear-dependencies.
  2. The other reason is that the overlap model was meant to work for $E^{(1)}_{\rm exch}$ only. There is no reason to expect it to work for the penetration energy too.

  3. Besides, we have not even damped the electrostatic multipole model.

So we now relax the parameters we have obtained from the previous step. We do this using anchors that pin the parameter values to guesses from an earlier step.

Note This is a technique we will use quite often and probably needs to be described on a separate page!

Files needed:

Creating the anchors file

This is done using the anchors.pl script:

  $ anchors.pl 
  Usage:
  anchors.pl [-f] <potential file> [-alpha <alpha constraint>]
       [-iso <rho00 constraint>] [-aniso <anisotropy constraint>]
       [-C6 <constraint for C6> | -noC6 ]

The potential consists of the $\alpha$ parameter (this parameter is always isotropic in our potentials so there is only one such), the $\rho(\Omega)$ parameters that can be anisotropic and, optionally, a $C_6$ parameter. More on the latter later. The anchors.pl script allows us to define a set of anchor values for each catagory of parameters:

For example, we can define anchors for our potential as follows:

  $ anchors.pl -noC6 H2O2-model0-0-1.pot

This uses the default values for the anchors. The -noC6 is needed as the potential doesn't contain any $C_6$ coefficients. The output is:

  ! Anchors for H2O2-model0-0-1.pot 
  ! C_alpha  = 0.050000 
  ! C_iso    = 0.010000 
  ! C_aniso  = 0.001000 
         aHH      1.765547      0.050000 
       rHH_0      3.671346      0.010000 
       rHH_1     -0.316753      0.001000 
         aOH      1.922795      0.050000 
       rOH_0      4.966920      0.010000 
       rOH_1     -0.264044      0.001000 
       rOH_2     -0.016025      0.001000 
       rOH_4      0.032885      0.001000 
         aOO      2.154523      0.050000 
       rOO_0      5.621948      0.010000 
       rOO_1     -0.052563      0.001000 
       rOO_3     -0.019649      0.001000

We can set specific values for the various types of parameters like so:

  $ anchors.pl $ anchors.pl -noC6 -alpha 1.0 -iso 0.1 -aniso 10.0 H2O2-model0-0-1.pot
  ! Anchors for H2O2-model0-0-1.pot 
  ! C_alpha  = 1.000000 
  ! C_iso    = 0.100000 
  ! C_aniso  = 10.000000 
         aHH      1.765547      1.000000 
       rHH_0      3.671346      0.100000 
       rHH_1     -0.316753     10.000000 
         aOH      1.922795      1.000000 
       rOH_0      4.966920      0.100000 
       rOH_1     -0.264044     10.000000 
       rOH_2     -0.016025     10.000000 
       rOH_4      0.032885     10.000000 
         aOO      2.154523      1.000000 
       rOO_0      5.621948      0.100000 
       rOO_1     -0.052563     10.000000 
       rOO_3     -0.019649     10.000000

With these choices the anisotropic parameters in $\rho(\Omega)$ are effectively pinned to the values they had in the potential, i.e., they won't vary much. The $\alpha$ parameter will be allowed to vary a little, and the isotropic part of $\rho$ will have the potential to vary the most. Imagine the anchors to be the spring-constants pining the parameters to their starting values; the larger the anchor (spring constant) the less they will stray.

These anchors are needed when we have insufficient data to determine all parameters or when relaxing all parameters is not possible in a heavily ill-conditioned problem. This is the case here.

Your job is to figure out what is a good set of anchors to fit the data. This is not a well-defined problem, at least, not as yet. As a rule, use the tightest anchors possible, relaxing the anisotropy the least. Anchors should be in the range $10.0$ (for the tightest pinning) to $0.001$ for the weakest. Use the latter on the $\alpha$ and isotropc part of $\rho$.

Fitting all parameters using ORIENT

We need to modify the ORIENT command file created for us by CamCASP. The one we want contains the total energies: H2OA_H2OB_total.ornt for the water example. We are going to edit this so copy it to, say, H2OA_H2OB_total_Boltzmann.ornt. The Boltzmann refers to the Boltzmann weighting we will use.

Steps to edit this file:

  Pairs
    ! Select Tang--Toennies damping functions:
    ! Dispersion damping Factor <BETA-D>
    ! Induction  damping Factor <BETA-I>
    #include H2O2-model0-0-1.pot
  End

$$ w(e) = \exp(-e/e_0) $$

Imposing shape-function constraints

See Rory's example.

Fitting all short-range energies

At this stage we should have a reasonable set of parameters that have been obtained using $E^{(1)}_{\rm sr}$ calculated on a dense (pseudo-random) set of dimers. We now need to relax these parameters to include all short-range energies. Since calculating higher-order energies is computationally expensive, we will typically not have a very dense set of configurations for $E^{(1-\infty)}_{\rm sr}$. This makes parameter determination difficult or even impossible, which is why we perform the fit in stages:

  1. The fit to first-order energies using a dense coverage of data allowed us to fit reliable parameters, in particular, the atomic shape-functions should be reasonably well-defined.
  2. We may not have enough data at second- and higher-order to allow all these parameters to relax. So we now need to be careful and use anchors in the relaxation process. Appropriate anchor weights will need to be found by experimentation, but as a rule, the shape-functions should probably be help tightly to the values obtained from the first-order fit.

Definition of $E^{(1-\infty)}_{\rm sr}$

$$ E^{(1-\infty)}_{\rm sr} = E^{(1)}_{\rm sr} + {\rm CT}^{(2-\infty)} + \Delta_{\rm disp} $$ Here $E^{(1)}_{\rm sr}$ is the first-order short-range energy defined above, ${\rm CT}^{(2-\infty)}$ is the infinite-order charge-transfer energy defined on the Charge-Transfer and Polarization pages, and $\Delta_{\rm disp}$ is a residual error in the dispersion model that is defined as: $$ \Delta_{\rm disp} = E^{(2)}_{\rm disp} - E^{(2)}_{\rm disp}(C_{n}). $$ We expect this to be mainly a short-range residual though there isn't a complelling reason to believe that it will always be the case. Still, we wil try to absorb this typically small residual in the short-range fit.

Notice that $$ E^{(1-\infty)}_{\rm int} = E^{(1-\infty)}_{\rm sr} + E^{(1)}_{\rm elst}({\rm DMA}) + E^{(2)}_{\rm disp}(C_{n}) + {\rm Pol}^{(2-\infty)}, $$ that is, if we add the multipole-expanded electrostatic, polarization and dispersion energies to $E^{(1-\infty)}_{\rm sr}$, we get the total infinite-order interaction energy.

Data files required for fitting $E^{(1-\infty)}_{\rm sr}$

In principle, all we really need to do is define our electrostatic model (as for the first-order short-range fit), dispersion model and polarization model. As with the first-order fit, we may expect that Orient should then subtract the polarization, dispersion and electrostatic energies calculated using the multipole models and the remainder, $E^{(1-\infty)}_{\rm sr}$, would then be exactly the energy we'd want to fit to. Alas, this is not the case. For reasons I do not yet understand, while Orient will subtract off the multipole-expanded electrostatic and dispersion energies, it will not do the same for the multipole-expanded polarization energy. So we need to do this by hand and give Orient: $$ E^{(1-\infty)}_{\rm int-nopol} = E^{(1-\infty)}_{\rm int} - {\rm Pol}^{(2-\infty)} $$

Note

We could also play it safe and subtract all multipole-expanded energies from the total interaction energy. If this were done the Orient command file would not contain any multipole model. While this would make it simpler, it can also be a little cumbersome as we'd need to modify the energy file each time we changed the multipole models. If we took this approach, CamCASP will need to subtract all multipole-expanded energies from the total interaction energy. That is, in the example below we subtract not only Eind(MP), but also Eelst(MP) and Edisp(MP). And the Orient input file should not include any of the multipole models.

Of course, we do not do this by hand, but use CamCASP to calculate this for us. But to make this possible, we first need to give CamCASP the infinite-order polarization energy ${\rm Pol}^{(2-\infty)}$.

To do this, we first need to calculate ${\rm Pol}^{(2-\infty)}$ for all dimer configurations. Instructions for this can be found on the Orient: Energy Calculations and Polarization & Polarization Models pages.

Next, we insert these energies into the Energy-file. In our water example, this file contains a column headed //Eind-MP//. The infinite-order polarization energies need to go under this column. (In older versions of this file, the column may be titled //E2ind-Asymp//.)

Once the ${\rm Pol}^{(2-\infty)}$ energies are included in the energy file, we use CamCASP to create the Orient input files using the following modification of water dimer : CamCASP Overlap Model input file

Begin Overlap-model
  Units kJ/mol
  ! ==
  Only-Total-Orient No-Weights
  ! ==
  Weight None
  ! We are fitting to total energy including Delta:
  ! BUT we must exclude the induction energy as ORIENT requires it:
  Energy = E1exch + E1elst + E2exch + E2ind - E2ind(as) + E2disp + Delta
  Emin = -100.0
  Emax =  150.0
  !
  Energy-file  Eint-H2O2-aTZ-2048-camcasp.dat
  Overlap-file ovr-H2O2-daTZ-tzvpp-GELSS-2048.dat
  !
  Symmetrize
  (Debug)
End

Notice that the only multipole-expanded energies included in the energy definition are those from the induction/polarization energy. By contrast, when fitting the overlap model to first-order short-range energies we had subtracted out the multipole-expanded electrostatic energy. Both methods are permissable.

Also, we no longer need to calculate the overlap model. All we need CamCASP to do is create the correct Orient input file for us. In a way, we are using a fairly complicated code (CamCASP is over 130,000 lines long) to perform a rather simple parsing task! But that's OK as this maintains a consistency in our procedure. We also benefit from the error checking done by CamCASP.

We do not want any weights in the Orient file as we will use the Boltzmann weighting scheme.

Finally, our energy range has been expanded to include all negative energies too. Adjust it as you see fit.

Note

As a rule of thumb, if $E_0$ is the most negative interaction energy in the data set, use a range of $E_0$ to $5 |E_0|$.

Important

Errors are easily possible in this procedure. Besides the usual errors associated with the wrong axis definitions or wrong types/sites, here we have the additionaly possibility that the multipole-expanded polarization energies in the energy file may not be consistent with the polarization model defined in the Orient file! It is best to label all files and directories with the polarization model used. So when (and you will!) change the polarization model, fewer errors result.

Once CamCASP has created the Orient input file do the following (for help on creating the Orient input file see Orient: Energy Calculations):

  1. Check to see that the energies included are indeed the energies you require. Best to check a couple of cases by hand.
  2. Include the electrostatic model (the DMA moments).
  3. Do not include a polarization model. You have already subtracted out this energy.

  4. Include your dispersion model in the Pairs section. Also don't forget the dispersion damping parameters.

  5. Include the short-range potential obtained from the first-order fit. It is possible to merge the short-range and dispersion potentials into one file using the Cluster program. See the CamCASP manual for details.

  6. Check the variables in the Adjust block. These should be the ones you want to relax. Comment out variables you wish to keep fixed. You could also use very tight anchors to fix them.

  7. Include an anchors file. You can create one from the potential file using the anchors.pl script. This script has a help page.

  8. Set Boltzmann weights. This line may need to be un-commented.
  9. Something like //Weights Boltzmann $N_w \times |E_0|$ Above $E_0$// should do.

  10. Run Orient and see what you get.

  11. You will need to play with the anchor values and $N_w$ to get a decent fit.

AJMPublic/camcasp/potentials (last edited 2021-04-14 12:58:15 by apw109)