research papers
Modeling truncated pixel values of faint reflections in MicroED images^{1}
^{a}Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA
^{*}Correspondence email: gonent@janelia.hhmi.org
The weak pixel counts surrounding the Bragg spots in a diffraction image are important for establishing a model of the background underneath the peak and estimating the reliability of the integrated intensities. Under certain circumstances, particularly with equipment not optimized for lowintensity measurements, these pixel values may be corrupted by corrections applied to the raw image. This can lead to truncation of low pixel counts, resulting in anomalies in the integrated Bragg intensities, such as systematically higher signaltonoise ratios. A correction for this effect can be approximated by a threeparameter lognormal distribution fitted to the weakly positivevalued pixels at similar scattering angles. The procedure is validated by the improved
of an atomic model against amplitudes derived from corrected microelectron diffraction (MicroED) images.Keywords: cryoEM; microelectron diffraction; MicroED; Xray freeelectron lasers; XFELs.
1. Introduction
The success of diffraction data analysis, and consequently the quality of the final atomic model, hinges on accurate integration of the recorded Bragg reflections. The intensities of these reflections decrease with increasing scattering angle until the point where their peaks become indistinguishable from the surrounding background (Bourenkov & Popov, 2006). Ignoring the effects of solvent scattering and artifacts such as ice rings (Glover et al., 1991), the recorded counts of pixels between the Bragg spots follow the same general pattern; the greater the distance from the intersection point of the direct beam with the detector surface, the smaller their values. Because the background pixels around a reflection are commonly used to estimate the noise contribution to the integrated signal (Leslie, 1999), successful data reduction generally requires that all pixel values are accurately recorded, irrespective of their scattering angle and magnitude, or whether they represent Bragg spots or not.
Many detector systems used to record diffraction data apply corrections to the raw data before a rectified image is presented to the experimenter for processing. The flatfield calibration is one such correction. For CCD and CMOSbased detectors, this twostep procedure consists of darkframe correction, where a previously recorded, unexposed image is subtracted, followed by multiplication with a gain image. Darkframe correction removes features that arise from the small currents that flow through the sensor even when the shutter is closed. The subsequent gain correction compensates for the uneven response of individual pixels by ensuring that the calibrated readout under uniform flatfield illumination is featureless. In some cases, images are uninterpretable unless these corrections are applied.
A number of macromolecular crystal structures have recently been solved by microelectron diffraction (MicroED) (Shi et al., 2013; Nannenga, Shi, Hattne et al., 2014; Rodriguez et al., 2015; Yonekura et al., 2015). In our laboratory, diffraction datasets have been recorded by continuous rotation (Nannenga, Shi, Leslie & Gonen, 2014) using a TVIPS TemCamF416 CMOS camera. During data collection the crystal is slowly rotated in the electron beam and the accumulated counts are rapidly read out at regular intervals without interrupting the rotation of the sample. However, the camera's `rolling shutter' mode (Stumpf et al., 2010) that makes these measurements possible is primarily intended to provide realtime visual feedback during data collection. The camera does apply a flatfield correction, but the storage format required to sustain the high datatransfer rates is restricted to representing pixel values as unsigned 16 bit integers. This causes problems for weak reflections, which are typically observed at high resolution. Around these reflections the raw counts on the detector may be comparable in magnitude to those in the dark frame. Owing to random fluctuations in the raw counts, darkframe subtraction may then yield very small or even negative values, which are propagated through the subsequent gain correction. As negative counts cannot be represented in the storage format, they are truncated to zero, and information about the true, negative value is lost. Generally, the effect is not immediately apparent on visual inspection of the diffraction pattern, but becomes clear in histograms of the low pixel values, which feature a prominent peak at zero analogtodigital units (ADU) (Fig. 1a).
It is conceivable that the dark frame could be offset by some constant to reduce the probability that dark subtraction yields a negative number. This is not easily achievable without altering the software used to control the camera. Modifying the camera's storage format to use signed integers is similarly impractical. Disabling the flatfield correction altogether is unattractive, since it would remove the ability to view calibrated diffraction images while they are being retrieved from the camera. The remaining option is to attempt to recover as much information as possible from the dataset. Here we present a procedure to model the values of the truncated pixels with zero counts from the histogram of the values of the remaining pixels.
2. Methods
For a sufficiently large sample of weakly positivevalued pixels, their histogram allows the distribution of the counts around zero to be modeled. For diffraction patterns, the parameters of the distribution of recorded counts across the image depend on the scattering angle (Fig. 1b). Therefore separate models are derived from pixels within a narrow interval of scattering angles. The finite range of scattering angles leads to heavytailed distributions, particularly at low resolution where a larger spread of scattering angles is necessary to provide an adequate sample size to model the distribution. Invalid pixels, for example pixels in the shadow of the beam stop, are not considered because they do not follow the distribution of pixels that record electrons scattered from the sample.
We use the lognormal distribution to model the behavior of the lowvalued pixels. The lognormal distribution is expected where the observed counts are the result of independent multiplicative processes in the detector (Kissick et al., 2010), but in our case its use is primarily motivated by its quality of fit to the experimental data (mean r.m.s.d. 327 ADU). The probability density function f and cumulative distribution function F of the lognormal distribution are given by
where μ and σ are the location and scale parameters, respectively. A third parameter, τ, is used to arbitrarily shift the distribution, which allows the random variable it models to take any real value , rather than just positive values. Assuming the pixels in a given resolution range of a diffraction image are independent and identically distributed, the probability of observing a pixel with true integer count I, such that , can then be approximated:
The probability of observing a pixel with any value is given by
Let H(I ) denote the number of pixels with value I in the image. For any integer count I in the closed interval [0,I_{max} ], H(I ) defines the observed histogram (Fig. 1). We assume that any pixel with is measured correctly; a pixel with I = 0 could represent either a true value of zero or a negative value. We seek the parameters μ, σ and τ that maximize the probability of observing H(I ). This is equivalent to maximizing the likelihood, or more conveniently, the loglikelihood, which in our model is given by
This can be done using standard optimization algorithms such as the BFGS implementation in the R environment (R Core Team, 2015).
The recovered parameters define the H(0 ) pixels that were initially zero, such that the histogram for in the corrected image conforms to the fitted distribution (Fig. 1b). Pixels with initially positive values remain unchanged (Fig. 2), and the frequency of negative values agrees with the optimized model. Only the spatial arrangement of the negative values is random. Uncorrected images that do not contain any zerovalued pixels will have H(0) = 0; the correction does not alter these images in any way.
lognormal distribution corresponding to the observed histogram in the given resolution shell. Negative values are then randomly assigned to theIf the corrected image will be stored in a format that does not support negative counts (e.g. SMV), an offset has to be applied before the image is output. To preserve correct integration downstream, the integration software has to be made aware of this offset (e.g. ADCOFFSET in MOSFLM). Choosing the offset as the negated value of the smallest count in all resolution shells of all images after correction allows straightforward processing of the sweep.
The procedure was validated against MicroED images collected from four crystals of proteinase K. Protein solutions from Engyodontium album (SigmaAldrich, St Louis, MO, USA) were prepared by combining 2 µl of protein solution (50 mg ml^{−1}) with 2 µl of precipitant solution (1.0–1.3 M ammonium sulfate, 0.1 M Tris pH 8.0). Crystals in P4_{3}2_{1}2 with a = b = 67.3, c = 101 Å appeared in hanging drops after equilibrating against the precipitant solution for three days. MicroED images were recorded on a transmission electron microscope (FEI) equipped with a field emission gun and a TVIPS TemCamF416 CMOS camera using published protocols (Nannenga, Shi, Leslie & Gonen, 2014; Shi et al., 2016). At an acceleration voltage of 200 kV and a camera length of 1.2 m (corresponding to a virtual detector distance of 2.2 m) the detector can record reflections at resolutions up to ∼1.75 Å at the edges and ∼1.25 Å in the corners. The correction was applied to the images independently in ten concentric annuli of approximately equal area.
Corrected datasets were indexed and integrated with MOSFLM (Leslie & Powell, 2007). To ensure comparable integration for the uncorrected and corrected datasets only the missetting angles were optimized during integration. The mosaicity was refined to convergence for each crystal separately and then held constant during integration. All detector parameters were fixed, and the measurement box was set to a 13 × 13 pixel box with a 4 pixel border and an 8 pixel corner cutoff (Leslie, 1999). To allow the integration box to contain zerovalued pixels for the uncorrected data, MOSFLM's NULLPIX parameter was set to −1. The intensities calculated by summation integration were scaled and merged using AIMLESS with default parameters (Evans & Murshudov, 2013). The upper resolution limit imposed during scaling lies just inside the detector corners where the number of observations is barely large enough to permit merging statistics to be calculated. This is beyond commonly employed resolution cutoffs, but allows the effect of the correction on the weakest highresolution reflections to be evaluated.
The merged data were phased by MOLREP (Vagin & Teplyakov, 1997) using PDB ID 4woc (Guo et al., 2015) as a search model, resulting in contrast scores of 27.57 and 32.56 for the uncorrected and corrected data, respectively. Both models were refined with phenix.refine (Afonine et al., 2012) using electron scattering factors (Colliex et al., 2006), automatic water modeling and weight optimization of the stereochemistry terms. Only reflections up to 1.75 Å were included in the because the completeness of the merged dataset drops rapidly beyond the edges of the detector [see Fig. 5(a) in §3]. The simulated annealing (SA) composite omit map computed by CNS (Brunger, 2007) clearly reveals depressions or even holes in the centers of the aromatic side chains (Fig. 3).
in3. Results and discussion
The correction only modifies the zerovalued pixels in an image and it can never increase their values. Because the mode of the fitted distribution tends to decrease with increasing resolution (Fig. 1b), the number and magnitude of the negativevalued pixels is expected to increase toward the edges of the detector. This behavior is seen in the integrated reflections (Fig. 4), with the exception of the lowresolution reflections, where the decreased values of the pixels surrounding the peaks lead to stronger integrated intensities after background subtraction. For higherresolution reflections, where corrected pixels may fall within the foreground, the integrated intensities decrease as well. The magnitude of the difference between the integrated intensities before and after correction increases with resolution, and the corresponding increase in the fraction of negative intensities (Fig. 4) is consistent with this observation.
Compared to the uncorrected images, the corrected dataset merged ∼2.5× more reflections (Table 1). The vast majority of the rejections for the uncorrected images occur during integration owing to excessive background gradient (87%), indicating problems modeling the background, where low pixel counts are more abundant. Other rejections are mostly due to incompletely recorded, partial reflections and illfitting peaks. The smaller number of outlier rejections in the corrected dataset is reflected in an increased completeness and multiplicity (Fig. 5a and Table 1).

Except for the reflections only observed in the corners of the detector, the halfset correlation, CC_{1/2} (Karplus & Diederichs, 2012), is marginally higher for the corrected images than for the uncorrected images (Fig. 5b). Beyond the edge of the detector CC_{1/2} is dominated by noise. The merging R factors on the other hand are higher for the corrected dataset than for the uncorrected images, and this is most pronounced in the higherresolution shells. At high resolution, individual pixel counts are more affected by noise, and their variance is governed by fluctuations around low counts. In the uncorrected dataset these fluctuations are diminished when negative pixel counts are truncated, leading to artificially homogenous integrated intensities and underestimated standard deviations for the very weakest Bragg spots. The correction recovers some of this variance, and notably, 〈I/σ_{I}〉 in the highestresolution shell, where reflections are not visually discernible, drops twofold (Fig. 5a and Table 1).
With otherwise identical protocols, the overall R_{work} and R_{free} values are lower by 1.0 and 0.7%, respectively, for the model refined against the corrected dataset compared to those for the uncorrected dataset. The correlation coefficients between the observed and calculated amplitudes are generally higher for the model refined against the corrected data than for the model refined against the uncorrected data, and the effect is more pronounced at higher resolution (Fig. 6a). Similarly, the atomic model refined against the corrected data correlates better to its density map calculated from reflections in the interval between 1.75 and 5.00 Å than the model refined against the corresponding uncorrected data (Fig. 6b). However, the atomic coordinates of the two models are very similar with an r.m.s.d. of 0.080 Å.
4. Conclusion
The systematic truncation of weak pixel values introduces subtle anomalies in the integrated Bragg intensities, which propagate to the refined model. In the present case, the artifacts are due to the data format's inability to represent negative counts. File formats restricted to unsigned integers are common in crystallography, but it is conceivable that similar problems could arise by other means. However, modeling the counts of the lowvalued pixels can help to recover the true signal for the highresolution reflections. For stronger reflections, the benefit of the correction lies mainly in a realistic appearance of the background surrounding the peak, which provides a more accurate estimate of its reliability. The end effect is that the merged reflections better represent the amplitudes of the diffracting crystal's scattering factors. This in turn improves the quality of the final atomic model. Depending on the particular implementation of the spotfinding routine, the correction can also boost autoindexing and unitcell determination of faint diffraction datasets, where an artificially flat background otherwise yields many spurious spots.
It must be noted that the pixel values that are lost in truncation can never be truthfully recovered. Future advances could improve the quality of the procedure introduced here, but the correct negative values of the affected pixels are fundamentally irretrievable. The procedure instead models the corrupted counts, which limits the accuracy of the correction to the quality of the model and the process used to determine its parameters. While the reliance on a random number generator for the spatial distribution of negative counts is appropriate since it models the stochastic fluctuations that initially lead to the negative, truncated pixel values, it implies that the procedure is nondeterministic. Owing to the local
of the detector, initial attempts at exploiting perpixel statistics instead for the assignment of the negative counts have not been successful. However, separately applying the correction to smaller regions can reduce the impact of the random number generator. The current implementation limits the structure of these areas to concentric annuli, but this could be extended to arbitrary shapes, which together cover the surface of the detector.Ideally, a diffraction measurement would be conducted such that the need for the correction described here would never arise. In emerging methods such as MicroED, which often rely on hard and software originally developed and optimized for different purposes, this is not always immediately possible. Future developments in MicroED will address these difficulties by, for example, determining how to use the camera in a different mode that allows signed integers to be recorded.
The corrected data and the model refined against them are available under PDB id 5i9s and EMDB id EMD8077. The uncorrected data have been deposited with the Structural Biology Data Grid (Meyer et al., 2016) under doi 10.15785/SBGRID/262. The procedure will be included in an upcoming release of our conversion tools for MicroED diffraction images (Hattne et al., 2015).
Footnotes
^{1}This article will form part of a virtual special issue of the journal on freeelectron laser software.
Acknowledgements
We would like to thank Andrew Leslie (MRCLMB), who brought the effects of truncated pixel values and structured noise to our attention. We are grateful to Reza Ghadimi (TVIPS) for technical support, and Maya Topf (Birkbeck), Alexis Rohou, Hervé Rouault and Timothee Lionnet (Janelia Research Campus) for valuable discussions.
References
Afonine, P. V., GrosseKunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bourenkov, G. P. & Popov, A. N. (2006). Acta Cryst. D62, 58–64. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brunger, A. T. (2007). Nat. Protoc. 2, 2728–2733. Web of Science CrossRef PubMed CAS Google Scholar
Colliex, C. et al. (2006). International Tables for Crystallography, Vol. C, Mathematical, Physical and Chemical Tables, 1st online ed., ch. 4.3, pp. 259–429. Chester: International Union of Crystallography. Google Scholar
Evans, P. R. & Murshudov, G. N. (2013). Acta Cryst. D69, 1204–1214. Web of Science CrossRef CAS IUCr Journals Google Scholar
Glover, I. D., Harris, G. W., Helliwell, J. R. & Moss, D. S. (1991). Acta Cryst. B47, 960–968. CrossRef CAS Web of Science IUCr Journals Google Scholar
Guo, F., Zhou, W., Li, P., Mao, Z., Yennawar, N. H., French, J. B. & Huang, T. J. (2015). Small, 11, 2733–2737. CrossRef CAS PubMed Google Scholar
Hattne, J., Reyes, F. E., Nannenga, B. L., Shi, D., de la Cruz, M. J., Leslie, A. G. W. & Gonen, T. (2015). Acta Cryst. A71, 353–360. CrossRef IUCr Journals Google Scholar
Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033. Web of Science CrossRef CAS PubMed Google Scholar
Kissick, D. J., Muir, R. D. & Simpson, G. J. (2010). Anal. Chem. 82, 10129–10134. CrossRef CAS PubMed Google Scholar
Leslie, A. G. W. (1999). Acta Cryst. D55, 1696–1702. Web of Science CrossRef CAS IUCr Journals Google Scholar
Leslie, A. G. W. & Powell, H. R. (2007). Evolving Methods for Macromolecular Crystallography. Dordrecht: Springer Netherlands. Google Scholar
Meyer, P. A. et al. (2016). Nat. Commun. 7, 10882. CrossRef PubMed Google Scholar
Nannenga, B. L., Shi, D., Hattne, J., Reyes, F. E. & Gonen, T. (2014). eLife, 3, 1–11. CrossRef Google Scholar
Nannenga, B. L., Shi, D., Leslie, A. G. W. & Gonen, T. (2014). Nat. Methods, 11, 927–930. Web of Science CrossRef CAS PubMed Google Scholar
R Core Team (2015). The R Project for Statistical Computing, https://www.rproject.org/. Google Scholar
Rodriguez, J. A. et al. (2015). Nature, 525, 486–490. CrossRef CAS PubMed Google Scholar
Schrödinger (2014). PyMol, http://www.pymol.org/. Google Scholar
Shi, D., Nannenga, B. L., de la Cruz, M. J., Liu, J., Sawtelle, S., Calero, G., Reyes, F. E., Hattne, J. & Gonen, T. (2016). Nat. Protoc. 11, 895–904. CrossRef CAS PubMed Google Scholar
Shi, D., Nannenga, B. L., Iadanza, M. G. & Gonen, T. (2013). eLife, 2, e01345. Web of Science CrossRef PubMed Google Scholar
Stumpf, M., Bobolas, K., Daberkow, I., Fanderl, U., Heike, T., Huber, T., Kofler, C., Maniette, Y. & Tietz, H. (2010). Microsc. Microanal. 16, 856–857. CrossRef CAS Google Scholar
Vagin, A. & Teplyakov, A. (1997). J. Appl. Cryst. 30, 1022–1025. Web of Science CrossRef CAS IUCr Journals Google Scholar
Yonekura, K., Kato, K., Ogasawara, M., Tomita, M. & Toyoshima, C. (2015). Proc. Natl Acad. Sci. USA, 112, 3368–3373. Web of Science CrossRef CAS PubMed Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.