Processing for beginners

Introduction

Either through luck, brilliance or blood, sweat and tears you have managed to grow crystals of your protein and even better, found they diffract AND there is a homologous structure in the PDB. This document is meant as a guide as to what to do next. It does not discuss the theory of crystallography (see references) and it is by no means complete. If you think it should include more material or fails at points then please let me know and I will try and fix it.

Russell Thom

Index of what is on this site

Other useful crystallography sites

More information on the programs can be obtained by typing cdoc then more program_name.doc or http://www.dl.ac.uk/CCP/CCP4/dist/html/INDEX.html

Processing

XDS

DENZO

In processing you want to:

Index the image correctly
Assign profiles

To do this we use one of a number of programs:

XDS if you use the multiwire seimens detector

DENZO

MOSFILM

XDS

XDS.DATA

This is a program to process data from the Siemens multiwire collector. For information about this program look up xds.doc by typing cdoc then more xds.doc

XDS works via a number of steps ;

XYCORR-calculates spatial corrections.

INIT-determines initial background, anode wire modulation correction factors, and the trusted region on the detector surface.

COLSPOT-collects strong diffraction spots occurring on the first nFRAME data frames.

IDXREF-finds crystal orientation from spot locations and refines all parameters.

COLPROF-collects 3-dimensional reflection profiles.

PROFIT-estimates reflection intensities from profiles.

CORRECT-corrects intensities for decay, absorption and detector surface sensitivity.

GLOREF-refines all diffraction parameters.

In order for XDS to work a command file must be in place, this is known as XDS.DATA and looks something like this

XDS.DATA

XDS ./images/brs12cm1.001 DIRECT HARVARD ./images/adrians1. DIRECT HARVARD NONE 1.5418 -1.0 0.0 0.0 0.44443 0.0 -1.0 0.0 2 200 0.0 0.2 0.0 0.0 -1.0 120.25 258.16 242.29 0.0 -1.0 0.0 0.0 0.0 1.0 0.001 4 54.9 77.5 62.9 90.0 105.5 90.0 1 1 0.5 0.6 10 -1 0 0 0 0 0 -1 0 0 -1 0 0

What it means

Line no.

1. Program we want to run, type in XDS (all the programs) or part i.e. CORRECT.
2. File name of data with calibration brass plate data.
3. Where the data frames are, this always ends with a full stop.
4. File name of reference data used for scaling.

5. (a) x-ray wavelength (Angstrom).
(b) x, y, z components of the direct beam wavevector.
(c) Fraction of polarization of direct beam in a plane specified by its normal (d) x, y, z components of polarization plane normal.

6. (a) The number of the first data frame collected.
(b) The number of the last data frame collected.
(c) Spindle position (degrees) at beginning of first data frame
(d) Oscillation range (degrees) covered by each data frame.
(e) x, y, z components of the rotation axis with respect to the laboratory system.

7. (a) Distance between crystal and detector (mm).
(b) Detector X-coordinate (pixels) of origin.
(c) Detector Y-coordinate (pixels) of origin.
(d) Orientation matrix of the detector with respect to the laboratory system.
(e) Fraction of intensity loss per mm due to air absorption.

8. (a) Space-group number of crystal.
(b) Unit cell parameters a, b, c (Angstrom) and alpha, beta, gamma (degrees).

9. (a) Number of minutes the program should wait before next expected data frame-file will be available.
(b) Refinement control variable - can be 0 or 1 see xds.doc.
(c) Half-angular size (degrees) of x-ray source as seen from origin of coordinate system.
(d) Half-angular size (degrees) of mosaic spread of crystal.
(e) Maximum number of frames between Friedel-pairs.

10. Integer array of 12 numbers providing a possibility of re-indexing the reflections in the final CORRECT step.

Results of XDS can be seen by typing.
more ----.LP (---- can be GLOREF OR CORRECT ETC, type ls after running program to see what is available).

DENZO

Homepage of HKL suite

Denzo has a nice printed manual termed HKL which describes a number of programs used in processing, it is well worth a read!!Also there are a few tutorials on the web. One of the best one can be found at the Department of Crystal and Structural Chemistry of the Bijvoet Center, Utrecht

A brief introduction on how to get started is given below

Open up two windows:

window one- image window
window two- command window

To begin processing using DENZO you need a param.den file(this is an ammalgamation of the site.dat and myexperiment.dat files in the manual)

Param.den

[For the BigDIP:]
[===============]

format dip 2030b
interactive
monochromator graphite
wavelength 1.54178
error density 1.000 positional 0.045
film rotation 180.0
oscillation range 2.00
distance 230.00
radial offset -0.813 angular offset 0.055
crossfire x 0.106 y 0.110 xy -0.041
X beam 149.801 Y beam 150.109
cassette rotx -0.020 roty 0.040 rotz 0.0
space group P21
unit cell 54.32 76.77 61.63 90.00 105.04 90.00
[unit cell 54.28 76.66 61.67 90.00 105.53 90.00]
crystal rotx 120.854 roty 91.008 rotz -48.684
resolution limits 80.0 3.4
mosaicity 0.50
[ignore circle 150.0 150.0 7.0]
ignore quadrilateral 145.2 143.2 299.6 151.2 299.2 164.1 149.2 160.4
spot elliptical 0.6 0.6
background elliptical 0.7 0.7
box 2.2 2.2
reject slope 150
[reject slope 50 cutoff 3.0 fraction 0.75]
weak level 5.0
profile fitting radius 50.0
title 'GST 2.0 deg osc, 24/2/98'
raw data file 'images/gst1_2_###.ipf'
film output file './gst2_###.x'
write predictions
sector 225

The idea in processing is to try and index your intensities and determine a unit cell for your data.Using the param.den file we begin to specify the information we know best (wavelength, crystal to detector distance, detector type)and from this try and calculate information which we do not know:- unit cell. The other parts of the command file are variables in the experiment due to equipment and errors and during processing we are trying to determine these accurately.

To run param.den

setup denzo

denzo

(once you have setup denzo don't keep doing it as it will fill up your path length, type echo $PATH to see this)

when the program appears to have stopped type

@param.den

@auto.den

auto.den
use partials position [default]

peak search file 'peaks.file'
write predictions

fit cell crystal rotx roty rotz x beam y beam
print zones

go

The program will give you an X and Y chi squared value followed by a list of errors. Using the Keywords to alter the parameters you want to try and get your chi squares close to one for your first image.

Corrections may have to made to the box sizes in which the peaks are looked for and this is done by altering the lines spot, background and box in the parameter file. By looking at the image window it can be seen if the box is covering the spot correctly (example shown in figure 1).

Figure 1 Figure 1 If some of the spots are not covered by the predictions you should try altering your mosaicity. Having autoindexed the data for your first image, the information you have gained (refined experimetal variables, unit cell, etc should be input back into the param.den) Using this new param.den and the file refine.den all the images can be autoindexed by subbing process.com. refine.den START REFINEMENT fit y beam x beam go go go go go fit angular offset radial offset go go go go go fit crystal rotx roty rotz refine partiality go go go go go fit crossfire y x xy go go go go go fit cassette roty rotx go go go go go fit cell go go go go calculate go end of pack The "go" tells denzo how many times to refine that command Now process.com is used process.com

setup denzo

denzo << EOF
@param.den

@refine.den
EOF

Examine the images have been processed correctly by checking its .log file and .x file. The hkl manual gives good explanations of what different columns etc mean in .log and .x files.

A little program devised by Paul

denzolog

setup paule denzolog process.log

This will allow you to print nice graphs about various statistics of the data

Scalepack

Scalepack.com

Scalepack is used when merging data together to ensure all the intensities are scaled correctly in regard to one another given slight changes in crystals or the way the data was collected.

Like denzo it is part of the HKL suite and the documentation is quite good in the printed manual.

To run scalepack

scalepack.com

setup denzo
scalepack << eof-scale
space group P21
unit cell 54.28 76.66 61.67 90.00 105.53 90.00
number of iterations 20
estimated error
0.025 0.030 0.030 0.033 0.040 0.040 0.045 0.060 0.070 0.080
0.100 0.120 0.125 0.130 0.150 0.180 0.200 0.220 0.250 0.280
resolution 100 2.8
number of zones 20
error scale factor 1.3
reference batch 1
add partials 1 to 27 32 to 36 43 to 62
postrefine 10
fit crystal a* 1 to 36
fit crystal b* 1 to 36
fit crystal c* 1 to 36
fit crystal beta* 1 to 36
fit film rotx 1 to 36
fit film roty 1 to 36
fit film mosxx 1 to 36
rejection probability 1.e-4
write rejection file
output file gst2t.sca
[@reject]
format denzo_ip
sector 1 to 27
FILE 1 '../process_adrian/gst1_###.x'
format denzo_ip
add 30
sector 28 to 33
FILE 2 '../denzo/gst2_###.x'

[xds stuff]
hkl matrix -1 0 0 0 0 -1 0 -1 0
format xds profit.hkl
add 40
sector width 2.0
frame width 0.2
profiles summed
FILE 3 '../PROFIT.HKL'
eof-scale

Information such as the space group and the unit cell size is gained from the previous processing packages such as xds or denzo.

rotx, roty and mosxx are fitted to film and not to crystal unless the crystal is cryocooled.

Again to check and see if scalepack has worked we use more scalepack.log the important bits are

-Mosaicity which should not vary too much.
-Linear R factor right at the very end of the document which should increase exponentialy with the resolution of the data.
-Chi squared should be close to one if under 1 then we are overestimating our errors and error scale factorshould be altered. If over 1 then we have underestimated our errors.
-Error scale factor should be = 1 if errors have been estimated correctly.

Remember to put refined cell paramaters from post refinement with scalepack into scalepack.com before final run. This improves merging Rs lots.

Scalepack2mtz

Scalepack2mtz.com

Once scaling of the data has been completed the formatted reflection file has to be converted to mtz format for the following programs to accept the data.

To do this a program called F2MTZ is used information information on this program can be gained from

http://www.dl.ac.uk/CCP/CCP4/dist/html/f2mtz.html

To run this program a file called scalepack2mtz.com was formed

scalepack2mtz.com

scalepack2mtz \
hklin gst2t.sca \
hklout GST1_2xtals.mtz \ << eof
SYMM 4
END
eof

After conversion to mtz format (check make of .log file to check all has gone according to plan the data was ready to be enter into the next program truncate.

TRUNCATE Truncate.com	The program reads a file of averaged intensities from SCALEPACK2MTZ and writes a file containing mean amplitudes and the original intensities. It allows us to analyse the data to see if it shows sensible crystallographic behaviour. If anomalous data is present then F(+), F(-), with the anomalous difference, plus I(+) and I(-) are also written out. The amplitudes are put on an approximate absolute scale using the scale factor taken from a Wilson plot For information on this program look at http://www.dl.ac.uk/CCP/CCP4/dist/html/truncate.html The command file used to for Truncate looks like this truncate.com `truncate \` `HKLIN GST_2xtals.mtz \` `HKLOUT truncated.mtz \` `<< eof` `TITLE GST 2crystals 26/2/98 xds data and denzo` `LABOUT F=F SIGF=SIGF` `NRESIDUE 428` `TRUNCATE no` `eof` In this case the TRUNCATE no in the second last line means the program takes the square root of the intensities, setting any negative ones to zero.

TRUNCATE

Truncate.com

The program reads a file of averaged intensities from SCALEPACK2MTZ and writes a file containing mean amplitudes and the original intensities. It allows us to analyse the data to see if it shows sensible crystallographic behaviour. If anomalous data is present then F(+), F(-), with the anomalous difference, plus I(+) and I(-) are also written out. The amplitudes are put on an approximate absolute scale using the scale factor taken from a Wilson plot

For information on this program look at

http://www.dl.ac.uk/CCP/CCP4/dist/html/truncate.html

The command file used to for Truncate looks like this

truncate.com

truncate \
HKLIN GST_2xtals.mtz \
HKLOUT truncated.mtz \
<< eof
TITLE GST 2crystals 26/2/98 xds data and denzo
LABOUT F=F SIGF=SIGF
NRESIDUE 428
TRUNCATE no
eof

In this case the TRUNCATE no in the second last line means the program takes the square root of the intensities, setting any negative ones to zero.

MTZ 2

VARIOUS

Mtz2various.com

This produces a reflection file written to HKLOUT in a suitable form for other systems such as MULTAN, SHELX, TNT,X-PLOR or any other Fortranic ASCII format, from an MTZ file on HKLIN.

For information on this program look at

http://www.dl.ac.uk/CCP/CCP4/dist/html/mtz2various.html

command file used in this case is

mtz2various.com

mtz2various \
HKLOUT gst1.hkl \
hklin truncated.mtz \
<< EOF
LABIN FP=F SIGFP=SIGF
OUTPUT SHELX
RESOLUTION 100 3.0
END

EOF

This puts the data in a form AMoRe can read. i.e. ascii

mtz 2 dump

This dumps part or all of any mtz file as plain text. It gives header information about the range of observations etc. It can also convert the mtz fii.e. into ASCI format which can be read by AMoRe.

Type mtzdump hklin filename.mtz then type go

AMoRe

Lsqkab.com

Pdbset.com-H atoms

Jorge Navaza's state-of-the-art molecular replacement package. In molecular replacement you are trying to solve a structure of a molecule using a molecule homologous in structure. In MR a preliminary model of the crystal structure is obtained by first rotating then translating the model molecule in the crystal lattice. Once this has been accomplished we can calculate phases from the model and combine them with observed structure factor amplitudes. The structure factors thus obtained, and the corresponding electron density map, contain a strong bias toward the starting model, but are usually sufficiently close the allow refinement.

AMoRe includes routines to run a complete molecular replacement. It reformats the data from the new crystal form, generates structure factors from the model, calculates the rotation function (tests agreement from the Patterson functions from the data and the model at various orientations), and the translation function (correlation between observed intensities and the cross vectors between the symmetry related molecules of the model as it is moved about the unit cell). The program then applies rigid body refinement to the solutions

For information on this program look at

http://www.dl.ac.uk/CCP/CCP4/dist/html/amore.html

and also

more aide-memoire

To set this up in the directory you wish it to be type

setup jorge

SETA
in d directory

create xyz.d file
create hkl.d file
create data.d file
in i directory
create dato.i3

To run

csh ./e/job

sub amore

Go for coffee as this takes a while!!!

Remember to look in the /o directory for the output, the .log file is for the experts i.e. Jorge.

sort.s - describes resolution ranges collected and the reflections used in AMoRe
mr2ic.s -gives translation and rotation matrices used on the model input coordinates
tabl1.s - describes minimal box, minimal sphere, center of mass, rotation inertia,moments, resolution limit for interpolation, dimension of array
or1.s -gives rotation function for our protein to the model. We want high correlation coefficieints (3 in from rhs of table) and low R factor (2 in form rhs)
ot1.s -as above, but for the translation coefficient
of2.s -statistics after rigid body refinement

Tips for Molecular replacement

Use the best start model available (obvious eh!!)

Remove any atoms in the model that are known to differ from the target structure i.e. If side-chains are different to that from known sequence then clip them to ALA. Differences (insertions) in the loop regions can be removed, but be careful, including to many wrong atoms increases overall noise, but removing too many correct atoms reduces the signal you are looking for. Structure is normally conserved to a greater extent than sequence and two pieces of sequences may have different residues, but show a common fold.

Low resolution data contains information mostly about crystal packing and the arrangement of solvent and so should be ommited from the search thus dont use data greater than 10Å. High resolution data will differ greatly between the homologous model and the target protein as this data describes the precise conformation of residues. Thus we want to use a resolution range of between 4-10Å

If AMoRe is able to molecular replace your data then take the rotation and translation matrices from the mr2ic.s file in /o and put them in lsqkab.com .

This reads in the model pdb and applies a rotation and translation and then outputs a new pdb file.

Lsqkab.com

#!/bin/csh -f
#

lsqkab \
WORKCD d/xyz1.d \
LSQOP testn1a.pdb \
DELTAS lsq_cat_residues.deltas \
<< 'END-lsqkab'
title
ROTAT MATRIX 0.48986 -0.21954 -0.84371 -0.20192 0.91289 -0.35478 0.84810 0.34415 0.40286
TRAN 49.56 -41.85 -54.08
output XYZ
fit WATOM 1 TO 10000

end

'END-lsqkab'

If the pdb that your model is from contains hydrogens, these can be removed by running the program below to remove hydrogens

Pdbset.com

#
pdbset xyzin solution.pdb xyzout solutiona.pdb << eof-1
cell 52.942 75.460 106.978 90.00 100.04 90.00
spacegroup P21
select occupancy 0.1
exclude hydrogens
END
eof-1

Now you have a pdb of your target protein, check the packing of your target protein in SETOR to make sure you have no overlaps of protein molecules in the unit cell. If you see overlaps this is not a good thing, but check that they are not due to loops which will be deleted in your target protein.

Calculating a map

Pdb to map

After this we can run sfall.com which calculates structure factors (Fcalc) and phases (Acalc) from our model. Our model isn't very good at this stage so we use SigmaA to calculate weights for each reflection (FOM) figures of merit. To try and improve the phases density modification is performed using the program DM.

Dm.com

Dmaverage.com

Density Modification

Solvent flattening can be carried out because of a few truths found for all crystal strucutres. Solvent fills the gaps formed by imperfect packing of the molecules. This solvent is usually disordered from cell to cell and so in the diffraction structure the solvent electron density can be thought of as constant and zero.

Also as proteins have fairly similar proportions of atomic types throughout the cell (C,N,O) according to the constraints of atomic spacing we can use this predictable atomic makeup to predict density values in the protein region (histogram matching). This is works even better if non crystallographic symmetry exists in the assymetric unit as averaging can be performed.

dm.com

Humble apologies, but I have not included this yet dmaverage is available if this is any help to you

fft.com

fast fourier transform

fft.com

This program is used to generate a map and extend it over our model. Unlike visible wavelengths, x-rays cannot be focussed by a lens, the fourier transform is a way the computer can act as a lens, using phases and intensities to generate electron density. As each diffracted x-ray arrives to the film it produces a reflection which can be described by a fourier series. The fourier series that describes a diffracted ray is called its structure-factor equation. This program converts these structure factors to p(x,y,z) (electron density). A good place to learn about fourier transforms is in the book Crystallography made Crystal Clear .

fft.com

#!/bin/csh -f
#goto sf
#goto fftend
##########################################################
# 1) Calculate dm phased map.
##########################################################
#
fft \
MAPOUT gst1.map \
HKLIN dmed2.mtz \
<<'END'
SCALE F1 1.0 0.0
resol 30 3.0
FFTSPACEGROUP 1
BINmapout
LABI F1=F PHI=PHIDMed W=FOMDMed
end
'END'
##########################################################
# extend map to cover protein #
##########################################################
#
#
extend \
MAPIN gst1.map \
MAPOUT gst1.ext \
XYZIN solution.pdb \
<<'END'
BORDER 10
end
'END'

If you are not averaging your map then the file .ext can be put straight into mappage if you are going to alter your model in O, or it can be imported into Quanta.

Mappage

>Mappage

setup o
mappage
enter .ext
enter name.brk
return for CCP4
press return a lot

When it is finished it will say toodle pip and you now have a map file to put in to your o macro.

Go use O or Quanta and build your structure.

Judging

Electron

density

When you have drawn your map in O or quanta then you should be able to see a large difference between solvent and protein.

If the map is contoured at 1 sigma the protein map should have connected regions clearly separated from the solvent and these regions should all be the roughly the same height.

Look out for model bias if using molecular replacement. This is seen as parts of the map having too much or too little density, with respect to the residue at that site. Model bias occurs, because the phase portion of the fourier transform makes up a large part of the synthesis, resulting in the homologous model being seen, rather than the target protein. Aromatic residues present in the target, but not in the model should show through in the map.

Different types of maps can be used to try and overcome phase bias.

OMIT maps - In this case residues with little electron density are removed and we rely on the fourier transform principle that every point in real space is contributed to by every point in reciprocal space. This means that the other atoms will contribute phases to give the correct image of the portion left out.

DIFFERENCE maps - Fo-Fc maps.These will show unaccounted density and allows density contributed from the model and the target structure to be seen.

NCS

Non -Crystallographic Symmetry

By now you will know your proteins space group off by heart, but from looking at the crystal packing in SETOR earlier you might have noticed that you have more than one molecule in the asymmetric unit. These molecules may share NCS or local symmetry meaning that they show symmetry within the asymmetric unit. This can be used to produce averaged maps in which the noise will tend to cancel out and phases will be improved.

Averaging

Averaging can be used when we have more than one molecule in the asymmetric unit.

It requires a bit more work to get your maps. To make it work we need

- A pdb for the A subunit - ie take the created pdb and delete all the data apart from that for chain A - this is the REFRCD.

- A pdb containing all the chains in the molecule - this is WORKCD.

We want to define matrices for

A to A
A to B
A to C
A to D

We do this using lsqkab.com which optimises the fit of a subset of atomic coordinates from one file (assigned to WORKCD) to the same subset of another file (assigned to REFRCD). The program assumes both sets of coordinates are in the protein data bank format.

lsqkab_averaging.com

# #!/bin/csh -f # lsqkab \
REFRCD solution.pdb \
WORKCD solutiona.pdb \
DELTAS lsq_cat_residues.deltas \
<< 'END-lsqkab'
title
output deltas xyz
fit WRESIDU CA 2 TO 211 WCHAIN A
MATCH RRESIDU CA 2 TO 211 RCHAIN D
end
'END-lsqkab'

From the log file, pull out the rotationand translation matrix and enter them in the next file for Ncsmask.

ncsmask

ncsmask creaks a mask file which prevents the overlaps between the maps of the molecules being averaged. The matrices are specified so that the program can remove any overlaps. The pdb you read in should be a monomer from

ncsmask xyzin solutiona.pdb mskout gst1a.msk << eof

GRID 106 152 212
SYMMETRY P21
AVER 4
ROTA MATRIX 1.0 0.0 0.0 -
0.0 1.0 0.0 -
0.0 0.0 1.0
TRANS 0.0 0.0 0.0
ROTA MATRIX -0.60908 -0.02148 0.79281 -
-0.01623 -0.99909 -0.03954 -
0.79294 -0.03695 0.60818
TRANS -0.22491 0.61754 0.17298
ROTA MATRIX -0.38159 -0.00270 0.92433 -
0.02259 -0.99972 0.00640 -
0.92406 0.02332 0.38154
TRANS -9.13438 0.43814 53.00378
ROTA MATRIX 0.96540 -0.02325 0.25974 -
0.00754 0.99809 0.06132 -
-0.26067 -0.05724 0.96373
TRANS -8.89051 -0.18300 52.87630
RADIUS 3.0
SMOOTH 2
OVERLAP
END
eof

Density

modifiction

As discussed earlier density modification of averaged maps is a powerful technique to improve the phases of the target model.

dm_average.com

dm HKLIN gst1mod1.mtz \
HKLOUT dmed2.mtz \
NCSIN1 gst1a.msk \
HISTLIB $CLIB/data/hist.lib \
<< 'EOF-dm'
SOLC 0.36
MODE SOLV HIST MULT AVER
NCYCLE 10
AVER REFI
ROTA MATRIX 1.0 0.0 0.0 -
0.0 1.0 0.0 -
0.0 0.0 1.0
TRANS 0.0 0.0 0.0
AVER REFI
ROTA MATRIX -0.60908 -0.02148 0.79281 -
-0.01623 -0.99909 -0.03954 -
0.79294 -0.03695 0.60818
TRANS -0.22491 0.61754 0.17298
AVER REFI
ROTA MATRIX -0.38159 -0.00270 0.92433 -
0.02259 -0.99972 0.00640 -
0.92406 0.02332 0.38154
TRANS -9.13438 0.43814 53.00378
AVER REFI
ROTA MATRIX 0.96540 -0.02325 0.25974 -
0.00754 0.99809 0.06132 -
-0.26067 -0.05724 0.96373
TRANS -8.89051 -0.18300 52.87630
SCHEME ALL
COMBINE OMIT
LABI FP=F SIGFP=SIGF FREE=FreeR_flag PHIO=ACMR FOMO=WCMB
LABO PHIDM=PHIDMed FOMDM=FOMDMed
'EOF-dm'

dm should have much better phases !! it will also have improved the matrices, so take these from the log file. As usual they are in a different format, so take care when putting them into mapprot!

Calculating

an average map

Maprot

MAPROT is used to generate an averaged map by transforming the electron density from molecules B, C and D onto A.

Maprot

maprot mapin gst1.ext mskin gst1a.msk wrkout gst1_av.map
eof
MODE FROM
AVER 4
OMAT
1.000000 0.000000 0.000000
0.000000 1.000000 0.000000
0.000000 0.000000 1.000000
0.000000 0.000000 0.000000
OMAT
-0.609893 -0.021294 0.792196
-0.018272 -0.998999 -0.040921
0.792266 -0.039434 0.608896
-0.149770 0.699037 0.141890
OMAT
-0.382810 0.018158 0.923654
-0.005582 -0.999830 0.017337
0.923813 0.001478 0.382842
-9.120162 0.575860 52.957893
OMAT
0.964548 0.006734 -0.263830
-0.021200 0.998420 -0.052027
0.263062 0.055776 0.963166
-8.958243 -0.115023 52.941040
eof

Refinement

This will explain how to perform refinement from an averaged map, but if non averaged maps have been used input the pdb created form O or quanta directly into refmac and avoid Pdbset.com.

Above lsqkab.com was used to calculate matrices from A to B, C and D, then using the averaged map only molecule A was built into. To reverse this process for refinement a pdb containing coordinates for molecules B, C and D must be created from molecule A . To do this pdbset.com is used. The matrices are those that were gained from lsqkab when the averaged map was built.

Pdbset.com

pdbset XYZIN muta_ref1.pdb \
XYZOUT all_muta_ref1.pdb << 'END-pdbset'
remark Producing new 4 molecules in assymetric unit form modified monomer a
cell 52.942 75.460 106.978 90.00 100.04 90.00
spacegroup P21
SYMGEN NCS
TRANSFORM \
1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
TRANSFORM \
-0.60908 -0.02148 0.79281 -0.01623 -0.99909 -0.03954 0.79294 -0.03695
0.60818 -0.22491 0.61754 0.17298
TRANSFORM \
-0.38159 -0.00270 0.92433 0.02259 -0.99972 0.00640 0.92406 0.02332 0.38154
-9.13438 0.43814 53.00378
TRANSFORM \
0.96540 -0.02325 0.25974 0.00754 0.99809 0.06132 -0.26067 -0.05724
0.96373 -8.89051 -0.18300 52.87630
CHAIN SYMMETRY 1 A A
CHAIN SYMMETRY 2 A B
CHAIN SYMMETRY 3 A C
CHAIN SYMMETRY 4 A D
END
'END-pdbset'

This will generate a new pdb file and a log file. The pdb file can be examined using SETOR to check that the transform looks correct.

Refmac

This is not the only refinement program available, but it is the only one I have so far used.

Xplor another refinement program uses energy minimisation to refine data. Its manual can be found here, but I have not used this yet so sorry no more help than that at the moment.
The refmac program minimizes the coordinate parameters to satisfy either a maximum likelihood or Least Squares residual. Geometric constraints can be used by including the program Protin which analyses the protein geometry and produces an output file which contains restraints for Refmac. Refmac also produces an HKL output file containing weighted coefficients for SigmaA, weighted difference FoFc and 2FoFc maps.

Refmac.com

#!/bin/csh -f
#
# Refmac refinement of GST1.pdb is all_muta_ref1.pdb in quanta
file
#
# collected in Daresbury, station 9.5
#
#
cp GST1.pdb $CCP4_SCR/GST1_0.pdb
#
###########################################################################
# Step 1: Protin
# NCS restrains on monomers A to D
#
start:
set name = 'GST1_'
set last = 0
set cycles = 6
set count = 0
set data = '../ dmed2.mtz'
#
while ($count != $cycles)
@ curr = $last + 1
#
#
protin_big \
XYZIN $CCP4_SCR/${name}${last}.pdb \
PROTOUT $CCP4_SCR/GST1_protout.dat \
PROTCOUNTS $CCP4_SCR/GST1_counts.dat \
DICTPROTN $CLIBD/protin2.dic \
eof
TITL GST1 apoenenzyme collected Daresbury 2.78 march 30th 1998
CHNNAM ID A CHNTYP 1
CHNNAM ID B CHNTYP 1
CHNNAM ID C CHNTYP 1
CHNNAM ID D CHNTYP 1
CHNTYP 1 NTERM 1 PRO 3 CTERM 214 MET 2
CHNTYP 2 WAT
NONX 12 CHNID A B C D NSPANS 1 3 150 1
SYMMETRY P21
LIST FEW
END

eof
#
if ($status) exit
#
###########################################################################
# Step 2: Refmac
#
refmac:
refmac \
HKLIN $data \
PROTOUT $CCP4_SCR/GST1_protout.dat \
PROTCOUNTS $CCP4_SCR/GST1_counts.dat \
PROTSCR $CCP4_SCR/GST1_counts.scr \
XYZIN $CCP4_SCR/${name}${last}.pdb \
HKLOUT $CCP4_SCR/${name}${curr}.mtz \
XYZOUT $CCP4_SCR/${name}${curr}.pdb << eop
#
LABI FP=F SIGFP=SIGF FREE=FreeR_flag
LABO FC=FC PHIC=PHIC FWT=FWT PHWT=PHWT
!Refinement parameters
REFI TYPE REST
REFI RESI MLKF RESO 50.0 2.7
REFI BREF ISOT METH CGMAT
!Scaling parameters
SCAL TYPE BULK
SCAL LSSC ANISO
WEIG MATRIX 0.3
NCYC 6 ! cycles round refinement NCYC times before redoing PROTIN
MONI FEW
BINS 20
END
eop
#
if ($status) exit
#
@ last++
@ count++
end
exit
#

[included only when water is added]

Refmac will produce a log file, look for

decreasing R factor and more importantly decreasing free R

correlation factor should increase

look at distances to see how many it is saying are wrong this should decrease with each cycle

eventually convergence will occur and R free etc will not improve further.

A new mtz file and pdb file is created in the scratch space of the computer refmac was run on. Use these files in fft.com to generate new maps.

Type something like cp $CCP4_SCR/GST1_6.mtz . in the directory you wish to copy to, in the window of the computer you ran refinement in.

Refmac is able to calculate new weighted phase and observation coefficients, fft.com should be told to look for these.

fft.com

fft.com -following refinement
#!/bin/csh -f
#goto sf
#goto fftend

##########################################################
# 1) Calculate dm phased map.
##########################################################
#
fft \
MAPOUT ref1GST1.map \
HKLIN GST1_6.mtz \
<< 'END'
SCALE F1 1.0 0.0
resol 30 2.78
FFTSPACEGROUP 1
BINmapout
LABI F1=FWT PHI=PHWT
end
'END'
##########################################################
# extend map to cover protein #
###########################################################
#
extend \
MAPIN ref1GST1.map \
MAPOUT ref1gst1.ext \
XYZIN GST1_6.pdb \
<< 'END'
BORDER 10
end
'END'
#####################

This map is for all the molecule, but as before it would be helpful to work on an averaged map, Lsqkab_averageing.com is used to generate rotation and translational matrices as before. These functions can then be put into ncsmask followed by maprot.

Refinement is powerful and it can move your model quite a long way. Your new map should have better phases allowing easier building and the map should tell you where fairly biggish changes need to be made,

fft_diff.com

A difference map can be calculated to determine which areas of the density are being contributed to by the observed data and which from the calculated data.

fft_diff.com

#!/bin/csh -f
#goto sf
#goto fftend

##########################################################
# 1) Calculate dm phased map.
##########################################################
#
fft \
MAPOUT ref2GST1d.map \
HKLIN GST1b_6.mtz \
'END'
SCALE F1 1.0 0.0
resol 30 2.78
FFTSPACEGROUP 1
BINmapout
LABI F1=DELFWT PHI=PHDELWT
end
'END'
##########################################################
# extend map to cover protein #
##########################################################
#
#
extend \
MAPIN ref2GST1d.map \
MAPOUT ref2gst1d.ext \
XYZIN GST1b_6.pdb \
'END'
BORDER 10
end
'END'

The difference between this command file and fft.com are the LABI difference weights calculated in refinement. This .ext file can now be put into maprot along with the mask created by ncsmask as before to generate a map.

In the case of a difference map it is a good idea to follow the convention that the positive density is coloured red and the -ve density should be coloured green. This means residues in the green should move and those in the red should stay.

This document is intended only as a selection of handy hints for the use of the programs mentioned within it. Any problems encountered with the programs themselves are best addressed through use of the manual or contacting the program authors (available at the web sites).

Sample O
macro

Positve density (+sigma)

Negative density (-sigma)

The actual map using LABI F1=FWT PHI=PHWT in fft.com

A sample O macro is shown below. Each map must be a different map_obj

map_file gstslo1c_5a.brk
map_obj fo
map_par 20 20 20 0.1 navy_blue 0.50 0.00 1
map_act
map_dr