Quality of the fossil record through time

Nature 403, 534 - 537 (2000) © Macmillan Publishers Ltd.

M. J. BENTON*, M. A. WILLS*† & R. HITCHIN*

* Department of Earth Sciences, University of Bristol, Bristol BS8 1RJ, UK
† Oxford University Museum of Natural History, Parks Road, Oxford OX1 3PW, UK

Does the fossil record present a true picture of the history of life1,2,3, or should it be viewed with caution4,5,6? Raup5 argued that plots of the diversification of life2 were an illustration of bias: the older the rocks, the less we know. The debate was partially resolved by the observation7 that different data sets gave similar patterns of rising diversity through time. Here we show that new assessment methods, in which the order of fossils in the rocks (stratigraphy) is compared with the order inherent in evolutionary trees (phylogeny), provide a more convincing analytical tool: stratigraphy and phylogeny offer independent data on history. Assessments of congruence between stratigraphy and phylogeny for a sample of 1,000 published phylogenies show no evidence of diminution of quality backwards in time. Ancient rocks clearly preserve less information, on average, than more recent rocks. However, if scaled to the stratigraphic level of the stage and the taxonomic level of the family, the past 540 million years of the fossil record provide uniformly good documentation of the life of the past.

The reduction in quality of the fossil record backwards in time seems self-evident. Fossils in ancient rocks are more likely to have been eroded, crushed, melted, subducted, not collected or misunderstood than younger fossils5. The demonstration7 that similar patterns of diversification were found from different data sets seemed to indicate that, at least when viewed on a broad scale, the fossil record did correctly document the history of life. Indeed, palaeobiologists subsequently used the fossil record as a literal source of data on the history of the diversification of life8,9 and of mass extinctions10,11 without applying any correction factors for possible time-related bias. However, the different data sets compared7 were all subject to the same geological biasing factors, and that study did not use independent data to demonstrate that older parts of the fossil record could be trusted. This lingering doubt about the long-term quality of the fossil record has resurfaced in debates about the timing of origin of major groups of organisms: molecular studies suggest that the Metazoa (animals)12 and modern bird and mammal orders13,14 apparently originated much earlier than was expected from the fossil evidence. These new results could be mistaken15,16, or they could indicate long episodes of missing fossil record12,13,14. However, there is no doubt that older parts of the fossil record cannot ever be as well known as more recent parts. The question is whether the older fossil records are adequate to recount important events in the history of life.

One solution is to compare independent sources of data. New methods for inferring phylogenies, such as cladistics applied to the morphology of fossil and living taxa, and molecular sequencing techniques, provide representations of evolutionary trees that are essentially independent of stratigraphic data3,17,18,19. Hence, it is possible to compare age data from fossils in the rocks with clade data from molecular and morphological trees using a range of congruence metrics, the stratigraphic consistency index (SCI), the relative completeness index (RCI) and the gap excess ratio (GER) (see Methods).

In our study, we compiled a database of 1,000 morphological and molecular cladograms from the literature (see Methods). The cladograms were sorted into those in which all lineages arose in the Palaeozoic, Mesozoic or Cenozoic, as well as mixed cases with Palaeozoic and post-Palaeozoic origins and mixed Mesozoic/Cenozoic origins. Sample sizes and tree sizes were relatively uniform across all five temporal divisions (Table 1). Remarkably, the relative congruence of age and clade data for all five time divisions showed no clear increase from most ancient to most recent (Table 2; Fig. 1).

Figure 1 No change in fossil record quality through time. Mean scores of the stratigraphic consistency index (SCI; open circles), the relative completeness index (RCI; open-squares) and the gap excess ratio (GER; filled circles) for five geological time partitions of the data set of 1,000 cladograms. Pz, cladograms with origins solely in the Palaeozoic; Pz/Mz, cladograms with origins spanning the Palaeozoic and Mesozoic; Mz, cladograms with origins solely in the Mesozoic; Mz/Cz, cladograms with origins spanning the Mesozoic and Cenozoic; Cz, cladograms with origins solely in the Cenozoic.

In more detail, SCI values for all time partitions of the data set are very similar, with mean values in the range 0.493-0.618, and with most lying close to the mean value for the whole dataset of 0.551. Indeed, the highest mean SCI value overall, 0.618, is for the Palaeozoic-only values, and the lowest mean values (0.493, 0.496) are for the mixed Palaeozoic/Mesozoic and Mesozoic/Cenozoic partitions (Table 2, Fig. 1). The RCI measures showed greater variation, but no clear relationship to time (Table 2; Fig. 1). Mean values range widely, from 11.362 to 62.064, around the mean for the whole dataset of 31.130. The highest value is for the Palaeozoic partitions, which is not surprising as the RCI depends on a measure of total known range (see Methods), and for clades arising in the Palaeozoic, known ranges may potentially extend to the present day. The GER measures show little evidence of time bias (Table 2; Fig. 1). Mean values range from 0.491 to 0.636, but most lie close to the whole-sample mean of 0.562. The mean value for Palaeozoic cladograms (0.529) is not significantly different from that for the Cenozoic (0.532).

The expected result, that the fossil record of stratigraphic first appearance should become worse with increasing age5 is not confirmed. These counter-intuitive results might be the result of temporal biases in the data set19,20,21,22,23, such as (1) variable cladogram quality, (2) differing major taxonomic groups, (3) differing approaches to cladogram construction, (4) the quality of age dates and stratigraphy, (5) the total age span of origins, (6) the taxonomic level of terminal taxa, (7) tree size and (8) tree balance.

The first four of these possible sources of error imply highly unlikely assertions, such as that most Palaeozoic cladograms are markedly better than post-Palaeozoic (1), that the Palaeozoic groups are more amenable to cladistic analysis than the post-Palaeozoic (2), that systematists working on groups with Palaeozoic origins use better techniques than those working on post-Palaeozoic groups (3) and that Palaeozoic age dates and stratigraphy are better than post-Palaeozoic (4).

It could be argued that the first of these problems, variable cladogram quality, makes our results meaningless. The assertion would simply be that cladograms are so full of error that there is no reason to expect congruence with any stratigraphic signal in a large-scale analysis such as ours. There is no absolute measure of cladogram quality. However, cladograms tend to match stratigraphy better than predicted by null models19,20,21,24,25,26, and there is evidence that published morphological and molecular phylogenetic trees are probably generally close to the truth.

The remaining four possible biasing factors all turn out to be differently distributed in the current data set. The total span of time occupied by the origins of the groups in an assessed cladogram (5) can affect all three metrics19,20. However, the situation is complex: all other factors being equal, long origin spans give high SCI values, but the exact opposite applies to the RCI (refs 20, 21). The taxonomic level of terminal taxa (6) might impose a bias: cladograms consisting largely of species and genera tend to fit stratigraphy less well than those based on higher taxa such as families and orders19,20. However, there is no significant variation in taxonomic level through time in our dataset. Tree size (7) has a biasing effect on the SCI: large trees should have lower SCI values than small trees22, but the relationship of RCI and GER values to tree size is unclear21. In the present data set, tree size is uniform across all partitions of the data (Table 1). The final potentially biasing variable is tree balance (8); the variation in overall tree shape from symmetrical, or fully balanced, to pectinate (comb-like), or fully imbalanced. Theoretically, the SCI is negatively correlated with tree imbalance, so that imbalanced trees have higher SCI values than balanced trees22. However, no significant relationship between RCI or GER and tree balance has been found in real data20,21. In any case, tree balance is uniform through all time divisions in our data set.

An alternative assertion could be that the age/clade metrics cannot in fact detect major gaps in the fossil record. Perhaps the fossil record does deteriorate seriously backwards in time, but the preserved parts of the fossil record are equally complete through time. For example, a sequence of Cretaceous fossils with ammonites might be no more complete than a Silurian sequence with graptolites, some 300 million years older. The cladogram metrics could very well give equivalent values. However, perhaps the good Silurian record represents only a tiny fraction of the total record, a much smaller proportion than is represented by the Cretaceous ammonites. In other words, the older record is substantially worse, and our metrics might fail to detect it. This would mean that the fossil record is itself a biased sample of past life. However, soft-bodied organisms are equally poorly known from rocks of all ages, for example. The age/clade metrics circumvent the problem to a certain extent, as the phylogenies include preservable and non-preservable taxa alike. Further tests of the fossil record/true record issue are required. Our results are confirmed by recent studies that consider the preservation potential of different fossil groups30.

Our finding that the fossil record does not diminish in quality with time seems at first impossible, because it is a demonstrable fact that geological activity destroys ancient rocks and ancient fossils5. We should consider the scaling of observations in terms of both stratigraphy and taxonomy. Experience shows that major changes in the dating of fossils do not occur at the level of geological systems or stages, but at the finer divisions of substages and zones. Likewise, orders and families are often relatively stable, while new discoveries constantly alter the definitions of genera and species of fossils. The stability of longer time intervals and larger taxonomic categories perhaps reflects an adequate (if incomplete) fossil record. However, global studies of diversification at species and zonal levels would generally be meaningless because the incompleteness of more ancient parts of the fossil record renders it inadequate in most cases for such studies. It is important to distinguish between 'completeness' and 'adequacy'27. Early parts of the fossil record are clearly incomplete, but they can be regarded as adequate to illustrate the broad patterns of the history of life.

Methods
Data base The data set consisted of 1,000 cladograms, including one cladogram of 'all life', 33 cladograms of plants, nine cladograms of coelenterates, one cladogram of molluscs, 179 cladograms of arthropods, 14 cladograms of brachiopods, one cladogram each of bryozoans and graptolites, 60 cladograms of echinoderms, 34 cladograms of basal deuterostomes including calcichordates, 157 cladograms of fishes and 510 cladograms of tetrapods, including 26 of amphibians, 203 of reptiles, 8 of birds and 269 of mammals, extracted piecemeal from many sources. The Fossil Record 2 (
ref. 28) was the major source of stratigraphic data for dates of origin of families and suprafamilial taxa. Some cladograms included genera and species, and their dates of origin were generally determined from data provided in the paper that presented the cladogram. Origins and extinctions of clades were assessed to the level of the stratigraphic stage (mean duration of the 79 time units used for the Phanerozoic is 6.8 Myr). Geological dates for these stages (in Myr) were taken from one source29. The data set was divided into stratigraphic divisions listed in Table 1, and the entire data set is available as Supplementary Information.

Age versus clade congruence Three measures were used to assess age versus clade congruence (Fig. 2): the stratigraphic consistency index, SCI (ref. 24), the relative completeness index, RCI (ref. 25) and the gap excess ratio, GER (ref. 21). The SCI is the ratio of consistent to inconsistent nodes in a cladogram, and it can range from 0 to 1.0 in a fully pectinate (unbalanced) tree, but the minimum value lies between 0 and 0.5 in balanced trees21,22. The RCI and GER depend on numerical age estimates of the branching points on a cladogram, and the calculation of 'ghost ranges'. The ghost range26 is the difference in age, or number of stratigraphic intervals, between the oldest known fossils of two sister taxa. The RCI is assessed as the ratio between the sum of ghost ranges to the sum of recorded fossil ranges in any cladogram. The GER focuses solely on the estimated dates of origin of groups, and compares the sum of actual ghost ranges in a cladogram with the theoretical minimum and maximum ghost ranges if the various branches in the cladogram are rearranged. The metrics were calculated using the software 'Ghosts 2.4', developed by M.A.W., which assesses all three metrics (SCI, RCI, and GER) for individual cladograms, or for large batches of cladograms (available from http://palaeo.gly.bris.ac.uk/cladestrat/cladestrat.html).

Figure 2 Calculation of the three congruence metrics for age versus clade comparisons. SCI is the ratio of consistent to inconsistent nodes in a cladogram. RCI is RCI = 1 [SigmaMIG/SigmaSRL] times 100% where MIG is minimum implied gap, or ghost range, and SRL is standard range length, the known fossil record. GER is GER = 1 (MIG - Gmin)/(Gmax - Gmin), where Gmin is the minimum possible sum of ghost ranges and Gmax the maximum, for any given distribution of origination dates. a, The observed tree with SCI calculated according to the distribution of ranges in b. b, the observed tree and observed distribution of stratigraphic range data, yielding an RCI of 66.0%. GER is derived from Gmin and Gmax values calculated in c and d. c, The stratigraphic ranges from b rearranged on a pectinate tree to yield the smallest possible MIG or Gmin. d, the stratigraphic ranges from b rearranged on a pectinate tree to yield the largest possible MIG or Gmax

Supplementary information is available on Nature's World-Wide Web site (http://www.nature.com) or as paper copy from the London editorial office of Nature; and may also be viewed at http://palaeo.gly.bris.ac.uk/cladestrat/cladestrat.html.

Received 7 June;accepted 15 November 1999.

------------------

References

  1. Simpson, G. G. Tempo and Mode in Evolution (Columbia Univ. Press, New York, 1944). Links Save Citation
  2. Valentine, J. W. Patterns of taxonomic and ecological structure of the shelf benthos during Phanerozoic time. Palaeontology 12, 684-709 (1969). Links Save Citation
  3. Smith, A. B. Systematics and the Fossil Record (Blackwell, Oxford, 1994). Links Save Citation
  4. Hennig, W. Phylogenetic Systematics (Univ. of Illinois Press, Urbana, 1966). Links Save Citation
  5. Raup, D. M. Taxonomic diversity during the Phanerozoic. Science 177, 1065-1071 (1972). Links Save Citation
  6. Patterson, C. Significance of fossils in determining evolutionary relationships. Annu. Rev. Ecol. Syst. 12, 195-223 (1981). Links Save Citation
  7. Sepkoski, J. J. Jr, Bambach, R. K., Raup, D. M. & Valentine, J. W. Phanerozoic marine diversity and the fossil record. Nature 293, 435-437 (1981). Links Save Citation
  8. Sepkoski, J. J. Jr A kinetic model of Phanerozoic taxonomic diversity. III. Post-Paleozoic families and mass extinctions. Paleobiology 10, 246-267 (1984). Links Save Citation
  9. Benton, M. J. Diversification and extinction in the history of life. Science 268, 52-58 (1995). Links Save Citation
  10. Raup, D. M. & Sepkoski, J. J. Jr Mass extinctions in the marine fossil record. Science 215, 1501-1503 (1982). Links Save Citation
  11. Raup, D. M. & Sepkoski, J. J. Jr Periodicity of extinctions in the geologic past. Proc. Natl Acad. Sci. USA 81, 801-805 (1984). Links Save Citation
  12. Wray, G. A., Levinton, J. S. & Shapiro, L. H. Molecular evidence for deep Precambrian divergences among metazoan phyla. Science 274, 568-573 (1996). Links Save Citation
  13. Cooper, A. & Penny, D. Mass survival of birds across the Cretaceous-Tertiary boundary: molecular evidence. Science 275, 1109-1113 (1997). Links Save Citation
  14. Kumar, S. & Hedges, S. B. A molecular timescale for vertebrate evolution. Nature 392, 917-920 (1998). Links Save Citation
  15. Ayala, F. J., Rzhetsky, A. & Ayala, F. J. Origin of metazoan phyla: molecular clocks confirm paleontological estimates. Proc. Natl Acad. Sci. USA 95, 606-611 (1998). Links Save Citation
  16. Foote, M., Hunter, J. P., Janis, C. M. & Sepkoski, J. J. Jr Evolutionary and preservational constraints on origins of biologic groups: divergence times of eutherian mammals. Science 283, 1310-1314 (1999). Links Save Citation
  17. Forey, P. L. et al. Cladistics: A Practical Course in Systematics (Clarendon, Oxford, 1992). Links Save Citation
  18. Hillis, D. M., Moritz, C. & Mable, B. K. Molecular Systematics 2nd edn (Sinauer, Sunderland, MA, 1996). Links Save Citation
  19. Benton, M. J. & Hitchin, R. Testing the quality of the fossil record by groups and by major habitats. Historical Biol. 12, 111-157 (1996). Links Save Citation
  20. Benton, M. J., Hitchin, R. & Wills, M. A. Assessing congruence between cladistic and stratigraphic data. Syst. Biol. 48, 581-596 (1999). Links Save Citation
  21. Wills, M. A. The gap excess ratio, randomization tests, and the goodness of fit of trees to stratigraphy. Syst. Biol. 48 559-580 (1999). Links Save Citation
  22. Siddall, M. E. Stratigraphic consistency and the shape of things. Syst. Biol. 45, 111-115 (1996). Links Save Citation
  23. Wagner, P. J. in The Adequacy of the Fossil Record (eds Donovan, S. K. & Paul, C. R. C.) 165-187 (Wiley, New York, 1998). Links Save Citation
  24. Huelsenbeck, J. P. Comparing the stratigraphic record to estimates of phylogeny. Palaeobiology 20, 470-483 (1994). Links Save Citation
  25. Benton, M. J. & Storrs, G. W. Testing the quality of the fossil record: paleontological knowledge is improving. Geology 22, 111-114 (1994). Links Save Citation
  26. Norell, M. A. in Extinction and Phylogeny (eds Novacek, M. J. & Wheeler, Q. D.) 89-118 (Columbia Univ. Press, New York, 1992). Links Save Citation
  27. Paul, C. R. C. in The Adequacy of the Fossil Record (eds Donovan, S. K. & Paul, C. R. C.) 1-22 (Wiley, New York, 1998). Links Save Citation
  28. Benton, M. J. The Fossil Record 2 (Chapman & Hall, London, 1993). Links Save Citation
  29. Harland, W. B. et al. A Geologic Time Scale 1989 (Cambridge Univ. Press, Cambridge, 1993). Links Save Citation
  30. Foote, M. & Sepkoski, J. J. Jr Absolute measures of the completeness of the fossil record. Nature 398, 415-417 (1999). Links Save Citation

Acknowledgements. We thank the Leverhulme Trust and NERC for continued funding of our work, and E. Fara, M. Foote and P. N. Pearson for helpful comments on the manuscript.

 

Correspondence and requests for materials should be addressed to M. J. Benton (e-mail: mike.benton@bris.ac.uk).