Contents

Potential Development using CamCASP

Navigation:

CamCASP

Potential Development using CamCASP

Warning

This tutorial is being updated to reflect the new methodology we have published in our recent paper in JCTC (2016) on the pyridine dimer.

Initial geometries: Geometry optimization

A few pointers:

If you wish to use optimized molecular geometries, then perform the optimization using
- PBE0. This is usually good. If your system has a small HOMO-LUMO gap then you will need to consider alternatives such as LC-PBE/LC-PBE0.
- either the cc-pVDZ or cc-pVTZ basis sets. Do not use augmentation unless you think it is essential.
If you have an experimental geometry then use it.
If you have a lot of hydrogens, consider lengthening the X-H bond lengths by 10% to take quantum effects into account. The exact elongation needs to be checked from papers by Szalewicz and Jankowski. Basically what this means is that rather than use the equilibrium geometry, $R_e$, you will be using $R_0$. Do this step only if you have understood the literature!

Ionization energy and AC-SHIFT

Eventually we will use LC-PBE/LC-PBE0 for SAPT(DFT) calculations, but till then, we will continue to use an asymptotically corrected hybrid functional like PBE0.

Note

Click here for an introduction to the asymptotic correction, but bear in mind that the Fermi-Amaldi correction that I talk about on that page is not the one we now recommend. We now think that the GRLB correction is best. More on this some other time.

For the asymptotic correction you need to know the shift needed to correct the XC potential. This shift is defined as $$ \Delta = {\rm IP} + \epsilon_{\rm HOMO}. $$

So you need two quantities to define the shift:

The vertical ionization energy, and
the energy of the HOMO eigenvalue.
NOTE: all quantities must be in atomic units!

Ionization potential

First see if there is an experimental IP available from the NIST Chemistry Webbook External Link. Convert it to atomic units.

If you need to compute the IP use the $\Delta$DFT approach: $$ {\rm IP} = E[N-1] - E[N], $$

that is, calculate the energies of the $N$ and $N-1$ electron systems and determine the IP by subtraction. Use PBE0 or LC-PBE0 to determine these energies. Basis sets could be Sadlej-pVTZ, cc-pVDZ or cc-pVTZ. These basis sets may be augmented, but augmentation is typically needed for anions.

NWChem and DALTON2016 also need the AC-SHIFT. This is just the $\Delta$ in the equation defined above. DALTON2006 uses only the IPs and determines the shift on-the-fly. This is better but is more time-consuming. To compute the shift you need to calculate the HOMO eigenvalue //without// an asymptotic correction.

Important

For notes on using NWChem see my tutorials at:

Electronic Structure Methods

In particular, see the following:

Of course, the one on Using Comanche only applies if you have an account there.

It may be easiest to use the Psi4 code for this calculation. Psi4 has nice tutorials on all sorts of calculations that you can find on their wedsite.

Molecular Properties

Now we need to calculate the molecular properties for the interacting molecules in their reference geometries. These properties include:

ISA multipole moments
ISA-Pol non-local polarizabilities
Localized ISA-Pol polarizabilities.
Dispersion model for the interacting molecules.

The latter is formally not in this group, but for homo-dimers is obtained as part of the localization step. For hetero-dimers you need to do another step to calculate the dispersion model.

I have already written a tutorial on Molecular Properties so use that to perform this step.

Fitting the short-range parameters

$$ V^{ab}_{\rm sr}(r_{ab}) = K \exp{[-\alpha^{ab}(r_{ab}-\rho^{ab}(\Omega^{ab}))]} $$

It is not easy to fit the sum of exponentials - in fact it is ill-conditioned. The results are typically nonsense and most non-linear fitting packages will fail. Instead of attempting (and failing) at a direct fit of these parameters, we will use the Overlap Model as an intermediate step.

First, what exactly do we mean by the short-range energy? $$ E_{\rm sr} = E^{(1)}_{\rm sr} + E^{(2-\infty)}_{\rm sr} $$

The first-order short-range energy is defined as $$ E^{(1)}_{\rm sr} = E^{(1)}_{\rm exch} + ( E^{(1)}_{\rm elst} - E_{\rm elst}({\rm DMA}) ), $$

that is, it is the sum of the first-order exchange(-repulsion) energy and the first-order penetration energy which is the difference of the first-order electrostatic energy and electrostatic energy calculated using the distributed multipoles.

We will fit the two parts of $E_{\rm sr}$ in stages.

Overlap Model: Fitting $E^{(1)}_{\rm sr}$

Details of this model is described the Atom-Atom review paper. So we will fill this up later. $$ E^{(1)}_{\rm sr} = \sum_{ab} K^{ab} \int \rho^{a}(r) \rho^{b}(r) dr $$

our job is to find the parameters $K^{ab}$.

Data files needed:

Distributed density overlap calculations for all dimer configurations. Water example: ovr-H2O2-daTZ-tzvpp-GELSS-2048.dat
First-order energies: All three energies needed to calculate $E^{(1)}_{\rm sr}$. Water example: Eint-H2O2-aTZ-2048-camcasp.dat
Local axis file that will be used by ORIENT. Water example: H2O.axes
CamCASP input file for the overlap model. Water example: H2O2-ovrmodel.cks

(perform the Sanity-Check below if needed)

The general procedure for fitting the overlap model parameters $K^{ab}$ is as follows:

Define the energy correctly in the CamCASP input file.
Choose the minimum and maximum energies to be included in the fit.
Choose the type of weight and the parameter $E_0$.

A typical input block (excluding the molecule definitions etc) looks like:

  Begin Overlap-model
    Units kJ/mol  ! All energies in these units
    Weight Type 4 and E0 = 10
    ! Fit the first-order energies:
    Energy-Options Check-Asymp  Abs-Max-Energy = 500.0
    Energy = E1exch + E1pen  ! Definition of energy
    Emin = 0.0001
    Emax = 500.0
    Constraint = 10e-6  ! Don't change
    Condition = 10e-7   ! don't change
    Energy-file  Eint-H2O2-aTZ-2048-camcasp.dat
    Overlap-file ovr-H2O2-daTZ-tzvpp-GELSS-2048.dat
    Orient files No     ! See below
    Symmetrize          ! See below
    (Debug)
  End

The goal is to obtain a good fit by varying the parameters Emin, Emax, the type of weight (in the above example the 4 refers to the Gauss-Log weight described in the CamCASP User's Guide) and the energy E0. A good fit would be one with a weighted r.m.s. error of around 10-15 %.

If your list of dimers contains dimers with very short contacts you may face problems with the electrostatic energies calculated with the multipoles (essentially, the $1/R$ expansion diverges and causes pathalogical energies). To eliminate such terms use the Abs-max-Energy command with a suitable value for the absolute manimum energy. As a rule, choose a value not far from what you selected for Emax.

A word about the other commands:

Orient files No: says do not create any of the files for the ORIENT code. We will set this to Yes once we are happy with the quality of the overlap model.
Symmetrize: Use this when you wish to symmetrize between atom types of the two molecules. For example, the hydrogen atoms in the two water molecules are equivalent as are the two oxygen atoms. So symmetrize between these. A reminder: TYPEs of atoms are set in the MOLECULE block in the CamCASP file.

To calculate an overlap model run CamCASP:

  $ camcasp < H2O2-ovrmodel.cks

The output is split into three sections:

First the code calculates a total overlap model with just one parameter $K$.
Then the code calculates a distributed overlap model. These parameters may not be well-defined and you may find that some of them turn out to be negative (unlikely in the water example).
Finally, the code re-calculates the distributed overlap model, this time using the value $K$ obtained in the first step as a pinning/anchor for the parameters $K^{ab}$. This makes the model well-defined in a Bayesian sense (i.e., we used a prior).

It is the third section we use.

The code prints out the reference energies together with fitted energies. Plot these on a scatter plot. Once you are happy with the quality of the fit we can proceed to the next step.

Sanity-check

To make sure it all works, it is best to first fit the first-order exchange only. To do this use the file H2O2-ovrmodel-e1exch.cks. The reason for this check is that it is quite easy to make mistakes calculating the electrostatic energies with the multipole models. Also, the density-overlap model is know to work best for the exchange-repulsion only. So if this fit fails, there is probably an error in the setup.

0: Fit to exponentials

Once the ovrlap model is satisfactory we need to use the results of the model to fit each of the exponential terms in the short-range potential in turn. The way this works is as follows.

The overlap model allows us to write $$ E^{(1)}_{\rm sr} = \sum_{ab} K^{ab} \int \rho^{a}(r) \rho^{b}(r) dr = \sum_{ab} E^{(1)}_{\rm sr}(ab), $$

that is, we have effectively split the short-range energy into contributions arising from all pairs of sites (more correctly, all pairs of site-types). What we do now is to fit each of these contributions to a single exponential: $$ E^{(1)}_{\rm sr}(ab) = G \exp[-\alpha^{ab}(r_{ab} - \rho^{ab}(\Omega^{ab}))] $$

Here $G$ is an energy constant (an exponential has no units!) that sets the unit of energy. These fits are what we will do next. Before getting there, bear in mind that the overlap model is approximate!!! So we will need to correct for the errors later on.

For these fits we need to have ORIENT input files with the partitioned energies included. This can be done for us by CamCASP by repeating the overlap model calculation (the one we were happy with), but this time using Orient file YES. This tells CamCASP to write out the ORIENT input files.

Run the water example and get the following files:

  H2OA_H2OB_total.ornt  H_H.ornt  O_H.ornt  O_O.ornt

The first includes the total (first order) energy and the rest include the energies for the sites listed in the file names. We will now use the last three.

Local axis systems: We now have to decide on a set of local axes which ORIENT will use to define the axis system for each pair of sites. This is done in the file H2O.axes which contains the lines:

  Axes
    O z between H1 and H2  x from H1 to H2
    H1 z from O to H1       x from H1 to H2
    H2 z from O to H2       x from H2 to H1
  End

You should edit the ORIENT command files to include this local axis file. But a simple trick works. The default name of the local axis file is local_axes. So simply make a link as follows:

  $ ln -s H2O.axes local_axes

This will allow ORIENT to find the local axis files.

We can now run ORIENT. This is usually done like so:

  $ orient < H_H.ornt > H_H.ornt.out

but we have a script that will do this for you and extract the result in an appropriate form:

  $ RunOrient_OvrModels.bash

will search for all files of the form *.ornt and run them through ORIENT and extract the potential and places it in a file called potential.raw. Convert this potential into a form ORIENT can use using the command:

  $ raw2ORIENT.pl potential.raw >  H2O2-model0-0-1.pot

Note The potential may contain releated site-pairs: H..O is the same as O..H. You may want to edit it to keep only unique pairs. There is a way to get the CLUSTER program to do this. See the CamCASP User's Guide for details.

Relaxing the 0-fit

As mentioned above, the overlap model is approximate. This is the case for two reasons:

The basis set we had used to calculate the distributed overlaps was small. This was deliberate so as to avoid linear-dependencies.
The other reason is that the overlap model was meant to work for $E^{(1)}_{\rm exch}$ only. There is no reason to expect it to work for the penetration energy too.
Besides, we have not even damped the electrostatic multipole model.

So we now relax the parameters we have obtained from the previous step. We do this using anchors that pin the parameter values to guesses from an earlier step.

Note This is a technique we will use quite often and probably needs to be described on a separate page!

Files needed:

ORIENT input file containing total energies (as defined in the CamCASP overlap calculation, $E^{(1)}_{\rm sr}$ in our case).
The potential from the previous step: H2O2-model0-0-1.pot
An anchors file generated from this potential (done below).

Creating the anchors file

This is done using the anchors.pl script:

  $ anchors.pl 
  Usage:
  anchors.pl [-f] <potential file> [-alpha <alpha constraint>]
       [-iso <rho00 constraint>] [-aniso <anisotropy constraint>]
       [-C6 <constraint for C6> | -noC6 ]

The potential consists of the $\alpha$ parameter (this parameter is always isotropic in our potentials so there is only one such), the $\rho(\Omega)$ parameters that can be anisotropic and, optionally, a $C_6$ parameter. More on the latter later. The anchors.pl script allows us to define a set of anchor values for each catagory of parameters:

-alpha refers to the parameter $\alpha$
-iso refers to the isotropic part of $\rho(\Omega)$
-aniso refers to the anisotropic part of $\rho(\Omega)$
-C6 or -noC6 refers to the $C_6$ parameters.

For example, we can define anchors for our potential as follows:

  $ anchors.pl -noC6 H2O2-model0-0-1.pot

This uses the default values for the anchors. The -noC6 is needed as the potential doesn't contain any $C_6$ coefficients. The output is:

  ! Anchors for H2O2-model0-0-1.pot 
  ! C_alpha  = 0.050000 
  ! C_iso    = 0.010000 
  ! C_aniso  = 0.001000 
         aHH      1.765547      0.050000 
       rHH_0      3.671346      0.010000 
       rHH_1     -0.316753      0.001000 
         aOH      1.922795      0.050000 
       rOH_0      4.966920      0.010000 
       rOH_1     -0.264044      0.001000 
       rOH_2     -0.016025      0.001000 
       rOH_4      0.032885      0.001000 
         aOO      2.154523      0.050000 
       rOO_0      5.621948      0.010000 
       rOO_1     -0.052563      0.001000 
       rOO_3     -0.019649      0.001000

We can set specific values for the various types of parameters like so:

  $ anchors.pl $ anchors.pl -noC6 -alpha 1.0 -iso 0.1 -aniso 10.0 H2O2-model0-0-1.pot
  ! Anchors for H2O2-model0-0-1.pot 
  ! C_alpha  = 1.000000 
  ! C_iso    = 0.100000 
  ! C_aniso  = 10.000000 
         aHH      1.765547      1.000000 
       rHH_0      3.671346      0.100000 
       rHH_1     -0.316753     10.000000 
         aOH      1.922795      1.000000 
       rOH_0      4.966920      0.100000 
       rOH_1     -0.264044     10.000000 
       rOH_2     -0.016025     10.000000 
       rOH_4      0.032885     10.000000 
         aOO      2.154523      1.000000 
       rOO_0      5.621948      0.100000 
       rOO_1     -0.052563     10.000000 
       rOO_3     -0.019649     10.000000

With these choices the anisotropic parameters in $\rho(\Omega)$ are effectively pinned to the values they had in the potential, i.e., they won't vary much. The $\alpha$ parameter will be allowed to vary a little, and the isotropic part of $\rho$ will have the potential to vary the most. Imagine the anchors to be the spring-constants pining the parameters to their starting values; the larger the anchor (spring constant) the less they will stray.

These anchors are needed when we have insufficient data to determine all parameters or when relaxing all parameters is not possible in a heavily ill-conditioned problem. This is the case here.

Your job is to figure out what is a good set of anchors to fit the data. This is not a well-defined problem, at least, not as yet. As a rule, use the tightest anchors possible, relaxing the anisotropy the least. Anchors should be in the range $10.0$ (for the tightest pinning) to $0.001$ for the weakest. Use the latter on the $\alpha$ and isotropc part of $\rho$.

Fitting all parameters using ORIENT

We need to modify the ORIENT command file created for us by CamCASP. The one we want contains the total energies: H2OA_H2OB_total.ornt for the water example. We are going to edit this so copy it to, say, H2OA_H2OB_total_Boltzmann.ornt. The Boltzmann refers to the Boltzmann weighting we will use.

Steps to edit this file:

Edit the local_axes file names as necessary.
The FIT block consists of geometries, the energy and weights. We will not need the weights so delete the last column and the Weight header in this block. Use vim to do this as it's quite easy to manipulate columns.
Change the Pairs block to look like this:

  Pairs
    ! Select Tang--Toennies damping functions:
    ! Dispersion damping Factor <BETA-D>
    ! Induction  damping Factor <BETA-I>
    #include H2O2-model0-0-1.pot
  End

Notice that we have used the #include command to include the potential in the ORIENT file.
Next we come to the Adjust block. This contains the parameters we want ORIENT to adjust, i.e., fit to the data.
This block contains a Variables section that should contains a list of variables. This list should be like the potential file H2O2-model0-0-1.pot. Compare these to make sure the list has no extra parameters. There can be fewer variables than in the potential file (you may want to fit some parameters only), but there can't be more!!!
Also in this block are the Anchors. We have saved the anchors to a file H2O2-model0-0-1-a.anchors (named using the potential name and a modifier -a) and included it into the ORIENT file using the #include command.
Finally we come to the weights: Weights Boltzmann 20.0 Above 0.0. This tells ORIENT to use Boltzmann weights. These are of the form:

$$ w(e) = \exp(-e/e_0) $$

Here $e$ is the energy and $e_0$ is a reference energy which you will need to vary. Values of $e_0$ will typically be related to the well-depth of the potential. Since the water dimer is bound by $20$ kJ/mol (approximately), a sensible value for $e_0$ could be a few times this, say from $40$ to $100$ kJ/mol. You will need to vary $e_0$ till you get a sensible fit.
As a rule of thumb: get a reasonably good set of anchors (for a fixed $e_0$) and then vary $e_0$. Iterate a few times till the fit looks good enough.
Finally, the Boltzmann line contains Above 0.0. This is used when we have negative energies. See the ORIENT manual for details.

Imposing shape-function constraints

See Rory's example.

Fitting all short-range energies

At this stage we should have a reasonable set of parameters that have been obtained using $E^{(1)}_{\rm sr}$ calculated on a dense (pseudo-random) set of dimers. We now need to relax these parameters to include all short-range energies. Since calculating higher-order energies is computationally expensive, we will typically not have a very dense set of configurations for $E^{(1-\infty)}_{\rm sr}$. This makes parameter determination difficult or even impossible, which is why we perform the fit in stages:

The fit to first-order energies using a dense coverage of data allowed us to fit reliable parameters, in particular, the atomic shape-functions should be reasonably well-defined.
We may not have enough data at second- and higher-order to allow all these parameters to relax. So we now need to be careful and use anchors in the relaxation process. Appropriate anchor weights will need to be found by experimentation, but as a rule, the shape-functions should probably be help tightly to the values obtained from the first-order fit.

Definition of $E^{(1-\infty)}_{\rm sr}$

$$ E^{(1-\infty)}_{\rm sr} = E^{(1)}_{\rm sr} + {\rm CT}^{(2-\infty)} + \Delta_{\rm disp} $$ Here $E^{(1)}_{\rm sr}$ is the first-order short-range energy defined above, ${\rm CT}^{(2-\infty)}$ is the infinite-order charge-transfer energy defined on the Charge-Transfer and Polarization pages, and $\Delta_{\rm disp}$ is a residual error in the dispersion model that is defined as: $$ \Delta_{\rm disp} = E^{(2)}_{\rm disp} - E^{(2)}_{\rm disp}(C_{n}). $$ We expect this to be mainly a short-range residual though there isn't a complelling reason to believe that it will always be the case. Still, we wil try to absorb this typically small residual in the short-range fit.

Notice that $$ E^{(1-\infty)}_{\rm int} = E^{(1-\infty)}_{\rm sr} + E^{(1)}_{\rm elst}({\rm DMA}) + E^{(2)}_{\rm disp}(C_{n}) + {\rm Pol}^{(2-\infty)}, $$ that is, if we add the multipole-expanded electrostatic, polarization and dispersion energies to $E^{(1-\infty)}_{\rm sr}$, we get the total infinite-order interaction energy.

Data files required for fitting $E^{(1-\infty)}_{\rm sr}$

In principle, all we really need to do is define our electrostatic model (as for the first-order short-range fit), dispersion model and polarization model. As with the first-order fit, we may expect that Orient should then subtract the polarization, dispersion and electrostatic energies calculated using the multipole models and the remainder, $E^{(1-\infty)}_{\rm sr}$, would then be exactly the energy we'd want to fit to. Alas, this is not the case. For reasons I do not yet understand, while Orient will subtract off the multipole-expanded electrostatic and dispersion energies, it will not do the same for the multipole-expanded polarization energy. So we need to do this by hand and give Orient: $$ E^{(1-\infty)}_{\rm int-nopol} = E^{(1-\infty)}_{\rm int} - {\rm Pol}^{(2-\infty)} $$

Note

We could also play it safe and subtract all multipole-expanded energies from the total interaction energy. If this were done the Orient command file would not contain any multipole model. While this would make it simpler, it can also be a little cumbersome as we'd need to modify the energy file each time we changed the multipole models. If we took this approach, CamCASP will need to subtract all multipole-expanded energies from the total interaction energy. That is, in the example below we subtract not only Eind(MP), but also Eelst(MP) and Edisp(MP). And the Orient input file should not include any of the multipole models.

Of course, we do not do this by hand, but use CamCASP to calculate this for us. But to make this possible, we first need to give CamCASP the infinite-order polarization energy ${\rm Pol}^{(2-\infty)}$.

To do this, we first need to calculate ${\rm Pol}^{(2-\infty)}$ for all dimer configurations. Instructions for this can be found on the Orient: Energy Calculations and Polarization & Polarization Models pages.

Next, we insert these energies into the Energy-file. In our water example, this file contains a column headed //Eind-MP//. The infinite-order polarization energies need to go under this column. (In older versions of this file, the column may be titled //E2ind-Asymp//.)

Once the ${\rm Pol}^{(2-\infty)}$ energies are included in the energy file, we use CamCASP to create the Orient input files using the following modification of water dimer : CamCASP Overlap Model input file

Begin Overlap-model
  Units kJ/mol
  ! ==
  Only-Total-Orient No-Weights
  ! ==
  Weight None
  ! We are fitting to total energy including Delta:
  ! BUT we must exclude the induction energy as ORIENT requires it:
  Energy = E1exch + E1elst + E2exch + E2ind - E2ind(as) + E2disp + Delta
  Emin = -100.0
  Emax =  150.0
  !
  Energy-file  Eint-H2O2-aTZ-2048-camcasp.dat
  Overlap-file ovr-H2O2-daTZ-tzvpp-GELSS-2048.dat
  !
  Symmetrize
  (Debug)
End

Notice that the only multipole-expanded energies included in the energy definition are those from the induction/polarization energy. By contrast, when fitting the overlap model to first-order short-range energies we had subtracted out the multipole-expanded electrostatic energy. Both methods are permissable.

Also, we no longer need to calculate the overlap model. All we need CamCASP to do is create the correct Orient input file for us. In a way, we are using a fairly complicated code (CamCASP is over 130,000 lines long) to perform a rather simple parsing task! But that's OK as this maintains a consistency in our procedure. We also benefit from the error checking done by CamCASP.

We do not want any weights in the Orient file as we will use the Boltzmann weighting scheme.

Finally, our energy range has been expanded to include all negative energies too. Adjust it as you see fit.

Note

As a rule of thumb, if $E_0$ is the most negative interaction energy in the data set, use a range of $E_0$ to $5 |E_0|$.

Important

Errors are easily possible in this procedure. Besides the usual errors associated with the wrong axis definitions or wrong types/sites, here we have the additionaly possibility that the multipole-expanded polarization energies in the energy file may not be consistent with the polarization model defined in the Orient file! It is best to label all files and directories with the polarization model used. So when (and you will!) change the polarization model, fewer errors result.

Once CamCASP has created the Orient input file do the following (for help on creating the Orient input file see Orient: Energy Calculations):

Check to see that the energies included are indeed the energies you require. Best to check a couple of cases by hand.
Include the electrostatic model (the DMA moments).
Do not include a polarization model. You have already subtracted out this energy.
Include your dispersion model in the Pairs section. Also don't forget the dispersion damping parameters.
Include the short-range potential obtained from the first-order fit. It is possible to merge the short-range and dispersion potentials into one file using the Cluster program. See the CamCASP manual for details.
Check the variables in the Adjust block. These should be the ones you want to relax. Comment out variables you wish to keep fixed. You could also use very tight anchors to fix them.
Include an anchors file. You can create one from the potential file using the anchors.pl script. This script has a help page.
Set Boltzmann weights. This line may need to be un-commented.
Something like //Weights Boltzmann $N_w \times |E_0|$ Above $E_0$// should do.
Run Orient and see what you get.
You will need to play with the anchor values and $N_w$ to get a decent fit.

AJMPublic/camcasp/potentials

Potential Development using CamCASP

Initial geometries: Geometry optimization

Ionization energy and AC-SHIFT

Ionization potential

Molecular Properties

Fitting the short-range parameters

Overlap Model: Fitting $E^{(1)}_{\rm sr}$

Sanity-check

0: Fit to exponentials

Relaxing the 0-fit

Creating the anchors file

Fitting all parameters using ORIENT

Imposing shape-function constraints

Fitting all short-range energies

Definition of $E^{(1-\infty)}_{\rm sr}$

Data files required for fitting $E^{(1-\infty)}_{\rm sr}$