Demography of Germany

Demography of Germany
Concepts, Data, and Methods
G. Rohwer U. P¨tter o

Version 3

October 2003

Fakult¨t f¨r Sozialwissenschaft a u Ruhr-Universit¨t Bochum, GB 1 a 44780 Bochum goetz.rohwer@ruhr-uni-bochum.de ulrich.poetter@ruhr-uni-bochum.de

Preface
This text is an introduction to concepts and methods of demographic description and analysis. The substantial focus is on the demographic development of Germany, all data refer to this country. The main reason for this focus on a single country is that we want to show how the tools of demography can actually be used for the analysis of demographic problems. The text consists of two parts. Part I introduces the conceptual framework and explains basic statistical notions. This part also includes a short chapter that explains how we speak of “models” and why we do not make a sharp distinction between “describing” and “modeling” demographic processes. Then follows Part II that deals with data and methods. In the present version of the text, we almost exclusively discuss mortality and fertility data; migration is only mentioned in Chapter 6 and brieﬂy considered in the context of a Leslie model at the end of the text. In addition to providing a general introduction to concepts of demography, the text also intends to show how to practically work with demographic data. We therefore extensively document all the data used and explain the statistical calculations in detail. In fact, most of these calculations are quite simple; the only exception is the discussion of Leslie models in Chapters 17 and 18 which requires some knowledge of matrix algebra. Except for these chapters, the text has been so written that it may serve as an introduction to elementary statistical methods. The basic approach is identical with the author’s Grundz¨ge der sozialwissenschaftlichen Statistik (2001). u Virtually no previous knowledge of statistical methods is required for an understanding of the present text. Some notations from set theory that we have used are explained in Appendix A.2. Most of the data that we have used in this text are taken from publications of oﬃcial statistics in Germany (Appendix A.1 provides a brief introduction to data sources). We are grateful to Hans-Peter Bosse of the Statistisches Bundesamt who provided us with some unpublished materials. We also thank Bernhard Schimpl-Neimanns of ZUMA (Mannheim) who prepared a table with birth data from the 1970 census that we have used for several analyses. In addition, we have used several data ﬁles from nonoﬃcial sources, in particular, data from the German Life History Study (Max Planck Institut f¨r Bildungsforschung, Berlin), the Socio-economic u Panel (Deutsches Institut f¨r Wirtschaftsforschung, Berlin), the Fertilu ity and Family Survey (Bundesinstitut f¨r Bev¨lkerungsforschung, Wiesu o baden), the DJI Family Surveys (Deutsches Familieninstitut, M¨nchen), u and historical data on mortality prepared by Arthur E. Imhof and his coworkers (1990). All these data sets can be obtained from the Zentralarchiv

dem.tex

October 2003

2 f¨r Empirische Sozialforschung in K¨ln. u o The extensive documentation of the data is also intended to allow readers to replicate our calculations. Many calculations can simply be done with paper and pencil. If the amount of data is somewhat larger, one might want to use a computer. Several statistical packages are publicly available. We have used the program TDA which is available from the author’s home page: www.stat.ruhr-uni-bochum/tda.html. This program was also used to create all of the ﬁgures in this text. For helpful comments and discussions we thank, in particular, Gert Hullen (Bundesinstitut f¨r Bev¨lkerungsforschung) and Bernhard Schimplu o Neimanns (ZUMA). Bochum, March 2003 G. Rohwer, U. P¨tter o 3

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Part I Conceptual Framework 2 Temporal References . . . . . . . . . . . 2.1 Events and Temporal Locations . 2.2 Duration and Calendar Time . . . 2.3 Calculations with Calendar Time . 2.4 Limitations of Accuracy . . . . . . Demographic Processes . . . . . 3.1 A Rudimentary Framework 3.2 Representation of Processes 3.3 Stocks, Flows, and Rates . 3.4 Age and Cohorts . . . . . . Variables and Distributions . . 4.1 Statistical Variables . . . 4.2 Statistical Distributions . 4.3 Remarks about Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 14 17 20 22 24 24 27 30 33 38 38 44 47 48

4

5

Modal Questions and Models . . . . . . . . . . . . . . . . . . . .

Part II Data and Methods 6 Basic Demographic Data . . . . . . . . . 6.1 Data Sources . . . . . . . . . . . . . 6.2 Number of People . . . . . . . . . . 6.3 Births and Deaths . . . . . . . . . . 6.4 Accounting Equations . . . . . . . . 6.5 Age and Sex Distributions . . . . . 6.5.1 Age Distributions . . . . . . 6.5.2 Decomposition by Sex . . . . 6.5.3 Male-Female Proportions . . 6.5.4 Aggregating Age Values . . . 6.5.5 Age Distributions since 1952 Mortality and Life Tables . . . 7.1 Mortality Rates . . . . . 7.2 Mean Age at Death . . . 7.3 Life Tables . . . . . . . . 7.3.1 Duration Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 57 59 62 66 70 70 74 79 80 81 85 85 90 94 94

7

4 7.3.2 Cohort and Period Life Tables 7.3.3 Conditional Life Length . . . . Oﬃcial Life Tables in Germany . . . . 7.4.1 Introductory Remarks . . . . . 7.4.2 General Life Tables 1871 – 1988 7.4.3 Increases in Mean Life Length 7.4.4 Life Table Age Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 101 103 103 106 111 112 118 118 119 124 124 125 129 131 137 138 138 142 142 144 148 152 154 154 156 161 12.2.5 Timing of Births . . . . . . . . . . . . . . . . . . . . 199 13 Births in the Period 1950 –1970 . . . . . . . . 13.1 Age-speciﬁc Birth Rates . . . . . . . . . 13.2 Parity-speciﬁc Birth Rates . . . . . . . 13.3 Understanding the Baby Boom . . . . . 13.3.1 Number and Timing of Births . 13.3.2 Performing the Calculations . . 13.3.3 Extending the Simulation Period 14 Data 14.1 14.2 14.3 14.4 from Non-oﬃcial Surveys . German Life History Study Socio-economic Panel . . . Fertility and Family Survey DJI Family Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 205 212 217 217 220 223 224 224 233 241 247

7.4

8

Mortality of Cohorts . . . . . . . . . . . . . 8.1 Cohort Death Rates . . . . . . . . . . 8.2 Reconstruction from Period Data . . 8.3 Historical Data . . . . . . . . . . . . . 8.3.1 Data Description . . . . . . . . 8.3.2 Parent’s Survivor Functions . . 8.3.3 Children’s Survivor Functions 8.3.4 The Kaplan-Meier Procedure . 8.4 Mortality Data from Panel Studies . .

15 Birth Rates in East Germany . . . . . . . . . . . . . . . . . . . . 250 16 In- and Out-Migration . . . . . . . . . . . . . . . . . . . . . . . . 251 17 An Analytical Modeling Approach . . . . . 17.1 Conceptual Framework . . . . . . . . 17.2 The Stable Population . . . . . . . . . 17.3 Mathematical Supplements . . . . . . 17.4 Female and Male Populations . . . . . 17.5 Practical Calculations . . . . . . . . . 17.5.1 Two Calculation Methods . . . 17.5.2 Calculations for Germany 1999 18 Conditions of Population Growth . . . . . . 18.1 Reproduction Rates . . . . . . . . . . 18.2 Relationship with Growth Rates . . . 18.3 The Distance of Generations . . . . . 18.4 Growth Rates and Age Distributions 18.5 Declining Importance of Death Rates 18.6 Population Growth with Immigration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 252 255 257 262 264 264 267 273 273 275 278 279 281 281

9

Parent’s Length of Life . . . . . . . . . . . . . . 9.1 Left Truncated Data . . . . . . . . . . . . . 9.2 Selection by Survival . . . . . . . . . . . . 9.2.1 The Simulation Model . . . . . . . . 9.2.2 Considering Left Truncation . . . . 9.2.3 Using Information from Children . . 9.2.4 Retrospective Surveys . . . . . . . . 9.3 Inferences from the GLHS and SOEP Data 9.3.1 Description of the Data . . . . . . . 9.3.2 Survivor Functions of Parents . . . 9.3.3 Visualization of Death Rates . . . .

10 Parametric Mortality Curves . . . . . . . . . . . . . . . . . . . . 164 11 Period and Cohort Birth Rates . . . 11.1 Birth Rates . . . . . . . . . . . 11.2 A Life Course Perspective . . . 11.3 Childbearing and Marriage . . 11.4 Birth Rates in a Cohort View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 165 170 172 175 181 181 184 184 187 192 195

12 Retrospective Surveys . . . . . . . . . 12.1 Introduction and Notations . . . 12.2 Data from the 1970 Census . . . 12.2.1 Sources and Limitations . 12.2.2 Age at First Childbearing 12.2.3 Age-speciﬁc Birth Rates . 12.2.4 Number of Children . . .

A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 A.1 Data from Oﬃcial Statistics . . . . . . . . . . . . . . . . . 287 A.2 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . 288 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

Chapter 1

Introduction
1. The present text is an introduction to concepts and methods of demography, exempliﬁed with data from the demographic development of Germany. The basic idea is to think of a society as a population [Bev¨lkerung], o a set of people. This is the common starting point of almost all demographic investigations and many deﬁnitions of demography. As an example, we cite the following deﬁnition from a dictionary published by the United Nations (1958, p. 3):1
“Demography is the scientiﬁc study of human populations, primarily with respect to their size, their structure and their development.”

In a German adaptation of the dictionary by Winkler (1960, p. 17) this deﬁnition reads as follows:
Die Demographie (Bev¨lkerungswissenschaft, Bev¨lkerungslehre) ist die Wiso o ” senschaft, die sich haupts¨chlich in quantitativer Betrachung mit dem Studium a menschlicher Bev¨lkerungen befaßt: Zahl (Umfang), Gliederung nach allgemeio nen Merkmalen (Struktur) und Entwicklung.“

Given this understanding, demography is concerned with human populations. 2. The focus on populations also provides a view of society. In this view a society simply is a population, a set of people living in some region however demarcated. It might be objected that such a view is greatly incomplete because human societies not only consist of people. It would be diﬃcult, however, to add further characterizations to the deﬁnition of a society. All too often the result is no longer a deﬁnition but an obscure and dubious statement. The following quotation from Matras (1973, p. 57) can serve as an example:
“As a working deﬁnition, we may say that a society is a human population organized, or characterized, by patterns of social relationships for the purpose of collective survival in, and adaptation to, its environment.”

This clearly is no longer a deﬁnition but an obscure formulation of a dubious assumption. Of course, beginning with the idea that a society is a
1 In this text we distinguish between single and double quotation marks. Single quotation marks are used to refer to linguistic expressions; for example, to say that we are referring to the term ‘social structure’. Double quotation marks are used either for citations or to indicate that an expression has no clear meaning or that it is used in a metaphorical way. Within citations, we try to reproduce quotation marks in their original form. If we add something inside a quotation this will be marked by square brackets.

8

1

INTRODUCTION

1

INTRODUCTION

9

human population, it is possible to describe institutional arrangements and to reﬂect speciﬁc purposes possibly served by such arrangements. But this can only result from an investigation and not anticipated in a deﬁnition. 3. The demographic view of society is closely linked with a statistical approach. Demography, to a large extent, is the application of statistical methods to study the development of human populations. This is the main idea which accompanied the history of demography from its beginning.2 Conversely, demography inspired many developments in statistics. The fundamental role played by the word ‘population’ in the statistical literature is but one indicator. This term has often been used to deﬁne statistics; as an example, we refer to Maurice Kendall and Alan Stuart, who begin their “Advanced Theory of Statistics” (1977, p. 1) as follows:
“The fundamental notion in statistical theory is that of the group or aggregate, a concept for which statisticians use a special word – “population”. This term will be generally employed to denote any collection of objects under consideration, whether animate or inanimate; for example, we shall consider populations of men, of plants, of mistakes in reading a scale, of barometric heights on diﬀerent days, and even populations of ideas, such as that of the possible ways in which a hand of cards might be dealt. [. . .] The science of Statistics deals with the properties of populations. In considering a population of men we are not interested, statistically speaking, in whether some particular individual has brown eyes or is a forger, but rather in how many of the individuals have brown eyes or are forgers, and whether the possession of brown eyes goes with a propensity to forgery in the population. We are, so to speak, concerned with the properties of the population itself. Such a standpoint can occur in physics as well as in demographic sciences.”

cies of some individual properties in a population. Here are some examples from the demographic literature:
“Demography is the discipline that seeks a statistical description of human populations with respect to (1) their demographic structure (the number of the population; its composition by sex, age and marital status; statistics of families, and so on) at a given date, and (2) the demographic events (births, deaths, marriages and terminations of marriages) that take place in them.” (Pressat 1972, p. 1) Unter der demographischen Struktur einer Bev¨lkerung versteht man iho ” re Aufgliederung nach demographischen Merkmalen.” (Feichtinger 1973, p. 26) o Die Struktur einer bestimmten Bev¨lkerung wird beschrieben durch die abso” lute Zahl der Einheiten sowie die Verteilung der jeweils interessierenden Merkmalsauspr¨gungen bei den Einheiten dieser Bev¨lkerung zu einem bestimmten a o Zeitpunkt t.“ (Mueller 1993, p. 2)

We mention that sociologists use the word ‘structure’ often in diﬀerent meanings. A frequent connotation is that “structure” in some way determines conditions for the behavior of the individual members of a society. It is important, therefore, that this can not be said of a statistical distribution. 6. In order to sensibly speak of conditions one would need to think of the individual members of a society as being actors whose possible actions depend in some way on a given environment. The statistical view is quite diﬀerent. Not only has statistics no conceptual framework for a reference to actors; as shown by the above quotation from Kendall and Stuart, there also is no reference to individuals. Instead, the focus is on populations. This was clearly recognized, for example, by Wilhelm Lexis:
Bei der Bildung von Massen f¨ r die statistische Beobachtung verschwindet das u ” Individuum als solches, und es erscheint nur noch als eine Einheit in einer Zahl von gleichartigen Gliedern, die gewisse Merkmale gemein haben und von deren sonstigen individuellen Unterschieden abstrahiert wird.“ (Lexis 1875, p. 1)

As far as demography applies a statistical view to human populations these remarks also contribute to an understanding of demography. The concern is with properties of populations, not with their individual members. 4. Since populations do not have properties in an empirical sense of the word, one also needs to understand how demographers construct such properties by using statistical concepts. This will be discussed at length in subsequent chapters. Here we only mention that statistically construed properties of populations are always conceptually derived from properties of their individual members. For example, referring to a human population, each of its members can be assigned a sex and the population can be characterized then by two ﬁgures reporting the proportion of male and female members. This also provides a simple example of a statistical distribution: to every individual property is assigned the relative frequency (proportion) of its occurrence in a population. 5. Almost always this is also meant when statisticians, including demographers, speak of the “structure” of a population: an account of the frequen2 For

The same idea was expressed by another author in the following way:
Innerhalb der Demographie interessiert eine individuelle Biographie nur als Ele” ment der kollektiven Geschichte der Gruppe, zu welcher das Individuum geh¨rt.“ o (Feichtinger 1979, p. 13)

an informative overview see Lorimer (1959).

7. The method to characterize populations by statistical distributions (of individual properties) is obviously quite general. Almost all properties which can sensibly be used to characterize individuals can also be used to derive statistical distributions characterizing populations. Statistical methods are therefore used not only in demography but more or less extensively in almost all empirical social research. In fact, there is no clear demarcation between demography and other branches of social research. Some authors have therefore proposed to distinguish between demographic analysis in a narrow sense, also called formal demography, and a wider

10

1

INTRODUCTION

scope, often called population studies.3 Given this distinction, the current text is only concerned with formal demography.4 The following explanation is taken from a widely known textbook by Shryock and Siegel (1976, p. 1):
“Formal demography is concerned with the size, distribution, structure, and change of populations. Size is simply the number of units (persons) in the population. Distribution refers to the arrangement of the population in space at a given time, that is, geographically or among various types of residential areas. Structure, in its narrowest sense, is the distribution of the population among its sex and age groupings. Change is the growth or decline of the total population or of one of its structural units. The components of change in total population are births, deaths, and migrations.”

Part I

Conceptual Framework

This explanation of formal demography is quite similar to the understanding of Bev¨lkerungsstatistik in the older German literature.5 It is also o similar to the deﬁnition of demography cited at the beginning of this chapter.6 The quotation also shows once more that the term ‘structure’ is used synonymously with ‘statistical distribution’. On the other hand, the word ‘distribution’ is here not used to refer to a statistical distribution but to “the arrangement of the population in space”. — This topic, including internal migration, will not be systematically discussed in the present text. On the other hand, demographic data as provided by oﬃcial statistics, are always limited to bounded regions, historically deﬁned as “nation states”. One therefore cannot avoid to take into account in- and out-migration. This is true, in particular, when dealing with the demographic development in Germany that is the empirical concern of the present text.
3 See Hauser and Duncan (1959, pp. 2-3), and Shryock and Siegel (1976, p. 1). In the older German literature a similar distinction was made between Bev¨lkerungsstatistik o and Bev¨lkerungslehre, see v. Bortkiewicz (1919). o 4 This is not to deny the importance of many questions discussed under the heading of population studies. However, we will not try to make this a special sub-discipline of social science but consider a reﬂection of demographic developments as being an essential part of almost all investigations of social structure.

an example, we cite L. v. Bortkiewicz (1919, p. 3): Soll aber eine besondere ” wissenschaftliche Betrachtung uber die Bev¨lkerung angestellt werden, so kann es sich o ¨ dabei unm¨glich um eine Er¨rterung alles dessen handeln, was ihr Wohl und Wehe o o irgendwie angeht. Es gilt hier vielmehr, zun¨chst die Bev¨lkerung als unterschiedsloa o se Menschenmasse ins Auge zu fassen, ihre r¨umliche Verteilung und die zeitlichen a ¨ Anderungen ihrer Gr¨ße zur Darstellung zu bringen, sodann aber auch ihre Gliederung o nach gewissen nat¨ rlichen Merkmalen, vor allem nach dem Geschlecht und nach dem u Alter, klarzulegen und im Anschluß hieran auf die unmittelbaren Ursachen ihres jeweiligen Standes zur¨ ckzugehen, als welche sich in erster Linie die Geburten und die u Todesf¨lle und in zweiter Linie die Wanderungen darstellen. Damit ist der Gegenstand a der Bev¨lkerungsstatistik im althergebrachten Sinne dieses Wortes angedeutet.“ o
6 Since there is a common conceptual framework, it seems not necessary, as proposed by the cited dictionary, to distinguish explicitly between formal demography and population statistics.

5 As

Chapter 2

Temporal References
The chapters in Part I of this text brieﬂy introduce the conceptual framework used to develop a demographic view of society. The main conceptual tool is the notion of a ‘demographic process’. An explicit deﬁnition will be given in the next chapter. The present chapter deals with a preliminary question that concerns a suitable temporal framework. Technically, one uses a time axis that allows to temporally locate events; but how to represent a time axis? There are two general approaches: a) One approach treats time as a sequence of temporal locations (e.g., minutes or days) and represents time by integral numbers with an arbitrarily ﬁxed origin. This is called a discrete time axis. b) Another approach treats time as a continuum (a “continuous ﬂow of time”) and represents a time axis by the set of real numbers. This is called a continuous time axis. Since a time axis is used to provide a conceptual framework for the representation of phenomena which occur “in time”, the decision for one or the other of the two approaches should depend on the kind of phenomena that one wants to describe and analyze. In demographic and, more general, social research, the primary phenomena are events, for example, birth and death events. Thinking in terms of a continuous time axis would require to conceive of events as “instantaneous changes”. While this approach is quite widespread in the demographic literature,1 it conﬂicts with the simple fact that events always need some time to occur. As we will try to show in the present chapter, this suggests to represent a time axis by real numbers but to think of temporal locations not as “time points” but as temporal intervals. Within such a framework a discrete time axis arises as a special case from an assumption of intervals of equal length. This assumption will be suﬃcient for most practical purposes and also greatly simpliﬁes the mathematics. Therefore, in later chapters, we most often use a discrete time axis with temporal locations to be understood as temporal intervals having an equal length. In the present chapter we ﬁrst discuss our notion of events and how thinking in terms of events allows temporal references. We then deal with possibilities of quantifying temporal references.
1 Examples of textbooks that use a continuous time axis for temporal references are, e.g., Keyﬁtz (1977), and Dinkel (1989).

14

2

TEMPORAL REFERENCES

2.1

EVENTS AND TEMPORAL LOCATIONS

15

2.1

Events and Temporal Locations

1. We all have learned to make temporal references by using clocks and calendars and to think of time as a linearly ordered time axis. But leaving aside for the moment clocks and calendars, what enables us to speak about time? One possible approach begins with events. This notion is extremely general and therefore quite diﬃcult to make precise. However, for the present purpose, it seems possible to neglect philosophical discussions and simply take a common sense view of events.2 The following four points seem to be essential. • The occurrence of an event always involves one or more objects. • Each event has some ﬁnite temporal duration. • For many events one can say that one event occurred earlier than another event. • Events can be characterized, and classiﬁed, by using the linguistic construct of kinds of events. 2. Using these assumptions it seems, ﬁrst of all, important to distinguish between events and kinds of events. An event is unique; it occurs exactly once. On the other hand, several events can be of the same kind, for example, marriages. Therefore, characterizing an event as being of a certain kind does not give a unique description. Furthermore, an event does not necessarily belong to only a single kind of event. Most often one can characterize an event as an example of several diﬀerent kinds of events. For example, an event that is a marriage can also be a ﬁrst marriage. 3. While common language clearly distinguishes between objects and events, one might well think of a certain correspondence between, on the one hand, objects and their properties, and on the other hand, events and kinds of events. This has led some authors (e.g., Brand 1982) to think of objects and events as being ontologically similar. Even without defending this position, we will assume that talking of events always implies a reference to objects. The idea is that it should be possible to associate, with each event, some objects that are involved in the event. Of course, these objects need not be individuals in the sense of behavioral units. 4. Following the common sense view of events it also seems obvious that events occur “in time”. The notion of event therefore provides a way to think about time. We assume that one can associate with each event a temporal location. In the following, we will use the letter e to refer to an event and t(e) to denote its temporal location. t(e) will be called the
2 For

t-location of the event e. While a strict deﬁnition cannot be given it seems important to think of t-locations not as being “time points”. Quite to the contrary, one of the most basic facts about events is that each event has a certain temporal duration. This is not only obvious when we think of standard examples of events, but seems logically implied if we think of events in terms of change. Change always needs some amount of time. This also has an important further implication: only when an event has occurred and, consequently, when it has become a fact belonging to past history, can we say that the event has, in fact, occurred. We cannot say this while the event is occurring.3 5. That one thinks of events in terms of change is quite essential for the common sense view of events that we try to follow here. Without a change nothing occurs. Fortunately, one need not be very speciﬁc about what kinds of changes occur. Also, whether these changes occur “continuously” or “instantaneously” is quite unimportant as long as we require that the event has some temporal duration. The event is deﬁned by what happened during its occurrence and must therefore be taken as a whole. Of course, one might be able to give a description of the event in terms of smaller sub-events; but these will then simply be diﬀerent events. An event is semantically indivisible. In particular, the beginning of an event is not itself an event, and consequently has no t-location. 6. Finally, it is important that one can often say of two events that one occurred earlier than the other. Of course, this cannot always be said. One event may occur while another is occurring. However, there are many clear examples where we have no diﬃculties to say that one event occurred earlier than another one. We therefore assume that the following partial order relations are available when talking about events (e and e are used to denote events): e e e e e e meaning: e begins not earlier than e meaning: e begins not before e is ﬁnished meaning: e occurs while e occurs

We also write e e if e e and not e e. All relations are only partial order relations. Nevertheless, they can be used to deﬁne corresponding
3 Thinking of human actions as particular types of events, this implication has been described by Danto (1985, p. 284) as follows: “Not knowing how our actions will be seen from the vantage point of history, we to that degree lack control over the present. If there is such a thing as inevitability in history, it is not so much due to social processes moving forward under their own steam and in accordance with their own natures, as it is to the fact that by the time it is clear what we have done, it is too late to do anything about it.”

related philosophical discussion see Hacker (1982) and Lombard (1986).

16 e2 e1 e3 e4

2

TEMPORAL REFERENCES

2.2

DURATION AND CALENDAR TIME

17

Composing Events 8. Our language is quite ﬂexible to compose two (or more) events into larger events. As an example one can think of clock ticks as elementary events. It seems quite possible to think also of two or more successive clock ticks as events. To capture this idea formally, one can introduce a binary operator, , that allows to create (linguistically) new events. The rule is: If e and e are two events then also e e is an event. Events created by using the operator will be called composed events. When classes of events are considered, one can assume that these are closed with respect to by extending the time order relations deﬁned above for composed events in the following way: e e e e e e e ⇐⇒ e e ⇐⇒ e e or e e and e e e e

Et
Fig. 2.1-1 Illustration of order relations between four events on a qualitatively ordered time axis.

Fig. 2.1-2 Graph illustration of ‘ ’ relation between the four events shown in Figure 2.1-1.

e2 r rr rr ¨ T r¨ ¨¨ rr c ¨ rr ¨ r e4 j r ¨¨ E e1

E e3 B ¨ ¨ ¨¨

e ⇐⇒ e e and e

relations between the t-locations of events. We use the same symbols: t(e) t(e) t(e) t(e ) ⇐⇒ e t(e ) ⇐⇒ e t(e ) ⇐⇒ e e e e

This also allows to introduce the notion of an elementary event. A possible deﬁnition would be that an event, say e, is an elementary event if there is no other event, e , such that e e. Using this deﬁnition, one conceives of elementary events as not being divisible into smaller events with respect to a class of events. 9. It might seem questionable whether elementary events do exist. When describing an event it often seems possible to give a description in terms of smaller and smaller sub-events, without deﬁnitive limit. However, we are not concerned here with the ontological status of events. Regardless of whether it is possible to give descriptions of events in terms of smaller sub-events, when talking about events one cannot avoid to assume some “universe of discourse” that provides the necessary linguistic tools. This justiﬁes the assumption that one can single out a ﬁnite number of elementary events from any ﬁnite collection of events. 10. Interestingly, it seems not possible to deﬁne a converse operation, , by using the interpretation that e e occurs while both events, e and e , are occurring. The reason is that we should be able to say that an event has, in fact, occurred as soon as the event no longer occurs. But this condition will in general not hold for e e because one can only say that e and e occurred when both are over. There is, therefore, no obvious way to deﬁne an algebra of events.

We will say that a set of events is equipped with a qualitatively ordered time axis if these three relations are available. 7. As an illustration consider the four events in Figure 2.1-1 where one can ﬁnd the following order relations: e1 e2 e 2 , e1 e1 e 3 , e1 e 4 , e2 e 3 , e2 e 4 , e3 e4

e1 e 3 , e1 e 4 , e2 e 3 , e2 e 4

Of course, on a qualitatively ordered time axis, the lengths of the line segments used in Figure 2.1-1 to represent events do not have a quantitative meaning in terms of duration. This becomes clear if one represents the order relations between events by means of a directed graph. This is illustrated in Figure 2.1-2 where the arcs represent the relation between the events.

2.2

Duration and Calendar Time

1. Having introduced the idea of a qualitatively ordered time axis, one can think about possibilities to quantify temporal relations. We begin with an elementary notion of duration. If an event, e, occurs while another event,

18

2

TEMPORAL REFERENCES

2.2

DURATION AND CALENDAR TIME

19

e , is occurring (e e ), one can say that the duration of e is not longer than the duration of e . This introduces a partial ordering of events with respect to duration and can be used as a starting point for a quantitative concept of duration. a) In order to measure the duration of an event e we count the number of pairwise non-overlapping events e such that e e.4 The maximal number of those events can be used as a discrete measure for the duration of e having t-locations as units. As an implication, all elementary events will have a unit duration. b) In the same way one can measure the duration between two events, say e and e . Again, simply determine the maximal number of pairwise not overlapping events, e , such that e e e . If such an event cannot be found we say that e immediately follows e.5 2. These deﬁnitions make duration dependent on the number of events that can be identiﬁed in a given context. An obvious way to cope with this dependency is to enlarge the number of events that can be used to measure duration. This is done by using clocks. Deﬁned in abstract terms, a clock is simply a device that creates sequences of (short) events. Then, if a clock is available when an event occurs, its duration can be measured by counting the clock ticks that occur while the event is occurring. Let e be the event whose duration is to be measured and let cn denote an event composed of n clock ticks. One might then be able to ﬁnd a number, n, such that t(cn ) t(e) t(cn+1 )

clocks with diﬀerent accuracies do exist we should ﬁnd a numerical representation that is independent of any speciﬁc clock. This suggests to use intervals of real numbers to represent durations. Since duration is always positive a sensible choice is R]+] := { ] a, b ] | 0 ≤ a < b, a, b ∈ R} This representation is intended to capture both conceptual and empirical indeterminacy.7 4. Thinking of events one needs to distinguish between t-locations and durations. The duration of an event tells us how long the event lasted while the t-location of an event provides information about the location of the event in a set of events equipped with the partial orders, , , and . The basic tool for the introduction of quantitative statements about t-locations is a calendar . Calendars can be deﬁned by specifying a base event and using the concept of duration between events. This allows to locate every event by providing information about the (positive or negative) duration between the event and the base event of the calendar. To make this idea precise one only needs a deﬁnition of duration between events. In principle, one can follow the approach already mentioned above. Then, having available a clock, the duration between two events, say e and e , can be measured by counting the number of non-overlapping clock events having a t-location between e and e . However, this deﬁnition of duration between events is not fully satisfactory because the events also have a duration. This fact obviously creates some conceptual indeterminacy and it seems therefore preferable to proceed in terms of a minimal and maximal duration as follows: minimal duration e1 maximal duration e2

This will allow to say that the duration of event e is between n and n + 1 clock ticks. 3. Many diﬀerent kinds of clocks have been invented,6 and this has led to the diﬃcult question how to compare diﬀerent clocks with respect to accuracy. Fortunately, we are not concerned here with the problem of how to construct good clocks. We can simply use the clocks that are commonly used in daily life to characterize, and coordinate, events. We are, however, concerned with the problem how to numerically represent durations, independent of the device actually used for measurement. Since
4 It 5 In

This suggests to use again the set of positive real intervals, R]+] , now for the numerical representation of duration between events. 5. The main conclusion is that each event refers to time in two diﬀerent ways. a) First, events have an inherent duration. This qualitative notion can be represented numerically by positive real intervals. It will be assumed, therefore, that one can associate with each event, e, a positive duration dur(e) ∈ R]+]
7 A fuller exposition of the idea to use intervals for the representation of data having both empirical and conceptual indeterminacies, including a discussion of statistical methods based on this kind of data representation, has been given elsewhere, see Rohwer and P¨tter (2001, Part V). o

will be said that two events, e and e , do not overlap if e

e or e

e.

fact, we then do not have any reason to believe in a duration between e and e . Leibniz (1985, p. 7) made this point by saying: “Ein grosser Unterschied zwischen Zeit und Linie: der Zwischenraum zwischen zwei Augenblicken, zwischen denen sich nichts beﬁndet, kann auf keine Weise bestimmt werden und es kann nicht gesagt werden, wieviele Dinge dazwischen gesetzt werden k¨nnen; [. . .] In der Zeit ber¨ hren sich daher o u die Momente zwischen denen sich nichts ereignet.”
6 See,

e.g., Borst (1990).

20

2

TEMPORAL REFERENCES

2.3

CALCULATIONS WITH CALENDAR TIME

21

Of course, the interpretation of dur(e) requires information about the kind of elementary events that have been used to measure duration. If all elementary events are of the same kind, as is normally the case when using clocks, one of these events (or a suitably deﬁned composed event) provides a sensible unit of duration. In any case, it will most often be possible to assume that duration can be measured in some standard units like seconds, days, months, or years. b) Second, one can associate with each event a t-location that provides information about the place of the event in the order of time. Again, this is a purely qualitative notion deﬁned with respect to three partial order relations between events. However, one can introduce a quantitative representation of the duration between events, by using real intervals. Then, for each pair of events, e and e , one can use dur(e, e ) ∈ R]+] to represent the duration between the two events. Finally, one can introduce a calendar as a quantitative representation of t-locations. Having speciﬁed a base event, e† , one can represent the tlocation of any other event, say e, by the duration between e and e† . Then, if e† e, dur(e† , e) provides a quantitative representation of the t-location of e with respect to the calendar deﬁned by e† .8 So one ﬁnally can use a single numerical representation, R]+] , both for the durations and t-locations of events.

Die K.-Jahre werden ab Christi Geburt gez¨hlt, beginnend mit dem Jahr 1 nach a Christus (Abk. n. Chr.). Die K.-Jahre vor dem K.-Jahr 1 werden mit 1 beginnend in die Vergangenheit nummeriert und durch den Zusatz >vor Christus< (Abk. v. Chr.) gekennzeichnet. Ein K.-Jahr 0 gibt es nicht (außer f¨ r den Bereich der u Astronomie). Ein K.-Jahr wird in 12 Monate unterteilt, von denen die Monate Januar, M¨rz, a Mai, Juli, August, Oktober, Dezember 31 Tage haben, die Monate April, Juni, September, November 30 Tage und der Monat Februar 28 oder in einem Schaltjahr 29 Tage. Unabh¨ngig hiervon wird das K.-Jahr in K.-Wochen zu je 7 a Wochentagen unterteilt, von denen es 52 oder 53 hat. Als erste K.-Woche eines a Jahres z¨hlt diejenige Woche, in die mindestens 4 der ersten 7 Januartage fallen (dabei gilt der Montag als erster Tag der K.-Woche). Ist das nicht der Fall, so z¨hlt diese Woche als letzte K.-Woche des vorausgehenden K.-Jahres.“ a

Many readers of this text will be familiar with this calender and know how it can be used for temporal references. Some diﬃculties only arise in the calculation of durations for longer periods. For example, how long is the period beginning June 13, 1911, and ending February 7, 2001, in days, weeks, months? 2. To answer this kind of question, an often used method consists in transforming Gregorian dates into numbers deﬁned by an algorithm that simply counts days.10 The idea is to ﬁrst ﬁx some day in the Gregorian calendar to become day 0 in the algorithmic calendar, and then to develop an algorithm that allows, for any other day in the Gregorian calendar, to calculate its temporal distance from day 0. As an example, we describe an algorithm proposed by Fliegel and van Flandern (1968) that uses the Gregorian Date November 24, in the year 4714 B.C., as day 0. 3. The algorithm consists of two parts. Given a Gregorian date by d (day), m (month) and y (year), one algorithm is used to calculate a corresponding Julian day which we denote by k. In a ﬁrst step, one calculates two auxiliary quantities: a = (m − 14)/12 und b = y + a + 4800

2.3

Calculations with Calendar Time

1. Calendars, like methods of measuring time, changed considerably in the course of history. The choice of a suitable base event and the use of diﬀerent clocks signify the main diﬀerences between the historical calendars used. The idea that nature provides the human experience with many periodic phenomena that, in some sense, should be accommodated by a calendar often provided reasons for calendar reforms.9 Today, in European countries, the most often used calendar is the Gregorian calendar that was introduced by Gregor XII in 1582. A German encyclopedia (Brockhaus, 20th ed. 2001, vol. 11, p. 367) provides the following explanations:
Der heutige b¨rgerliche Kalender basiert auf dem gregorian. K. Er ist demnach u ” ein Schalt-K. mit einem Gemeinjahr von 365 Tagen. Ein Schalt-Zyklus von 400 K.-Jahren hat 146097 K.-Tage. Ein mittleres K.-Jahr hat somit 365,2425 Tage, ist also um 26 s l¨nger als das trop. Jahr. a
8 If

Then the following formula provides the Julian day k : k = d − 32075 + 1461 m − 2 − 12 a b + 367 −3 4 12 b+100 100

4

It should be noticed that all calculations must be done in integer arithmetic. This means that all (intermediate) ﬂoating point results must be truncated to the next integer. For example, 25/9 = 2.
10 Such

e

e† , one can use the same approach by allowing for negative real intervals.

9 The history of calendars is described in several books, see, e.g., Borst (1990) and Richards (1998).

an algorithm is often called a Julian calendar . The name goes back to Joseph Scaliger who, in the year 1583, ﬁrst proposed this kind of algorithmical calendar. In fact, it has nothing to do with the calendar, also often called a Julian calendar, that was introduced by Julius Caesar in 46 B.C. [v. Chr.].

22

2

TEMPORAL REFERENCES

2.4

LIMITATIONS OF ACCURACY

23

4. Conversely, a second algorithm is used to calculate the Gregorian day d, month m, and year y, that correspond to a given Julian day k. The calculations consist of the following steps: p = k + 68569 q = (4 p)/146097 r = p − (146097 q + 3)/4 s = 4000 (r + 1)/1461001 t = r + 31 − (1461 s)/4 u = (80 t)/2447 v = u/11 d = t − (2447 u)/80 m = u + 2 − 12 v j = 100 (q − 49) + v + s The following table shows a few examples.11 d m j k d m j k

to introduce but irrelevant information, since most jobs end at the end of calendar months and start at the beginning of calendar months. 2. There are thus no natural temporal units, neither to locate events in historical time nor to measure durations. Moreover, the precision of data recording might be limited. While an observer might be able to measure the duration of a football game in terms of minutes, a demographer can not determine the age of a person by using a clock. He sometimes can rely on records, like birth certiﬁcates, but most often needs to ask persons for their age and the dates of other potentially interesting events. The accuracy of demographic data then also depends on person’s memory and the temporal framework that is used by them to temporally locate events. While these are empirical limits to the accuracy of demographic data, there also are theoretical limits. One of these limits, already mentioned, derives from the fact that demographic events always have some intrinsic duration. Even if it would be possible to provide a birth date exactly to the hour, or to measure marriage duration in days, the accuracy of the data would be useless because there is no theoretical argument that might justify a distinction. Why should one want to distinguish between two marriages, one of them lasting 5734 and the other one 5735 days? Even if true, it would be misleading to say that the second marriage lasted longer than the ﬁrst one. As another example suppose that one has a job just in February while someone else has a job just in July. Then the length of employment of the ﬁrst person is three days shorter than that of the second person, but both get the same renumeration, social security insurances, etc. The point is simply that data should serve to report relevant diﬀerences, not just any diﬀerences.

1 1 1 1721426 31 12 0 1721425

1 1 2001 2451911 31 12 2000 2451910

The table also shows that the ﬁrst year B.C. is given by y = 0, not by y = −1 as the explanation of the Gregorian calendar cited above might suggest.

2.4

Limitations of Accuracy

1. Depending on the purpose, temporal references use diﬀerent units of time: days, weeks, months, years, also smaller units like minutes and seconds. When recording statistical data, a suitable choice of temporal units depends on the kinds of phenomena to be captured by the data. For example, to record the age of a person one can use age in completed years, and there are rarely occasions to use a ﬁner time scale. One exception is the analysis of mortality of newborn children. On the other hand, years are not well suited to record the length of unemployment spells. We would like to distinguish between persons who are unemployed, for example, less than 3 or longer than 6 or 12 months. This suggests to measure unemployment durations not in years, but at least in months. A ﬁner time scale seems
11 Most

statistical packages provide some means to convert between Gregorian dates and Julian days. TDA, for example, provides operators that directly use the algorithms of Fliegel and van Flandern as described above. SPSS uses a similar algorithm but a diﬀerent base day (October 14, 1582).

3.1

A RUDIMENTARY FRAMEWORK

25

Chapter 3

Demographic Processes
Since demography is concerned with describing and modeling the development of human populations it is dealing with Gesamtheiten embracing many individuals. Their size may vary depending on the spatial or temporal demarcation. However, in most cases already its sheer size makes a direct and complete observation impossible. While it might be possible, at least in principle, to empirically approach each individual member of a population, the same is not true for the population as a whole. For example, we might want to talk about the totality of people who are currently living in Germany. While it is possible to empirically approach any number of individual persons, no one is able to observe the population as a whole. Put somewhat diﬀerently, the population as a whole is not an empirical object but a conceptual construction. This is not to deny that all of its members, and consequently also the population, really exists. However, the statement says that one needs some kind of representation of the population in order to have an object that one can think of and talk about. This chapter begins with the introduction of a rudimentary conceptual framework and some notations that allow to make the required representations explicit.

temporal location t precedes, and is followed by, the temporal location t+1 (for any t ∈ T ). For the moment, we do not require any speciﬁc link with historical (calendar) time. 2. In a similar way one can introduce a set of spatial locations, in the following denoted by the symbol S. The idea is that the elements of S provide a spatial context for human individuals. The spatial locations can be deﬁned in many diﬀerent ways, for example, by referring to geographical or political demarcations. But like temporal locations, spatial locations only need to be speciﬁed when it is required by the speciﬁc empirical purpose. For the moment, we also do not introduce any kind of topology or metric. Furthermore, we do not make any assumptions about the number of spatial locations in S. In particular, we allow for the limiting case that S only contains a single spatial location. We only require that our space is complete, in the sense that spatial mobility can only occur across the spatial locations given by S. 3. Having introduced a temporal and spatial context, one can think of people who live in this context. The symbol Ωt will be used to represent the totality of people who live in the space S during the temporal location t ∈ T .2 The sets Ωt are ﬁnite, and so one can sensibly speak of the number of people living during the temporal locations t. The temporal index t is necessary because the composition of the population sets Ωt changes through time. In each temporal location, some people might die and others might be born. Also, if two sets, Ωt and Ωt , contain the same number of people, they might not be identical. Referring to a set of people implies that one is able to identify and distinguish its members. In addition, we assume that, for each individual ω ∈ Ωt , there is exactly one spatial location s ∈ S where ω is currently living. 4. One further question needs consideration. Regardless of their speciﬁcation, temporal locations have some inherent duration. People are born, marry or die during a temporal location. One therefore needs a convention about starting and ending times for the membership in the sets Ωt . Our convention will be as follows: If a child is born in a temporal location t, it will be considered as a member of Ωt but not of any earlier population set; conversely, if a person dies in a temporal location t, she will be regarded as a member of Ωt but not of any later population set. How this convention relates to the measurement of age will be discussed in Section 3.4. 5. This then is our rudimentary context: a space S where people live, a time axis T that allows temporal references, and population sets Ωt that contain (ﬁctitious) names of people living in the space S during the
2 More precisely, the elements of Ω are not human individuals but (ﬁctitious) names. t However, having understood the distinction it should be possible to refer to the elements of Ωt as individuals without creating confusion.

3.1

A Rudimentary Framework

1. In order to think of a human population one ﬁrst needs a spatial and temporal context. To specify a temporal context we assume a discrete time axis as discussed in Chapter 2. Such a time axis can be thought of as a sequence of temporal locations which may be days, months, or years. To provide a symbolic representation we use the notation1 T := {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} The elements t ∈ T are not just numbers but represent temporal locations. For example, 0 represents a ‘day 0’, 1 represents a ‘day 1’, and so on. Since we want to develop a general conceptual framework that can serve both for descriptions and models, the duration of the temporal locations will be left unspeciﬁed. We only assume that all temporal locations have the same duration, and the existence of a temporal ordering in the following sense:
1 In this text we distinguish between ‘=’ and ‘:=’. Preceeding the equality sign by a colon shall mean that the expression on the left-hand side will be deﬁned by the expression on the right-hand side. In contrast, the equality sign without a colon states an equality that requires both sides to be deﬁned beforehand.

26

3

DEMOGRAPHIC PROCESSES

3.2

REPRESENTATION OF PROCESSES

27

temporal locations deﬁned by T . While this context is quite abstract and certainly requires a lot of speciﬁcations to become empirically useful, it already allows to formulate the two basic demographic questions: How are the population sets Ωt changing across time, and how do these changes depend on births, deaths, and migrations? A Fictitious Illustration 6. A small ﬁctitious example can serve to illustrate the conceptual framework. Imagine a small island with only a few inhabitants.3 Sometimes a new child is born or one of the inhabitants dies, and sometimes someone leaves the island or comes from outside as a new member of the island community. How to get more information? This is the task of a chronicler who, more or less systematically, writes down what is happening on the island. His chronicle may contain entries for any kinds of event, but here we are only interested in elementary demographic events. So we assume that the chronicle gets an entry whenever a child is born, one of the inhabitants dies, a person enters the island from outside and becomes a new inhabitant, or one of the inhabitants leaves the island. 7. Obviously, the chronicle must begin at some point in time. We assume that the records begin in 1960 and are continued until 1990. In the ﬁrst year, the chronicler makes a list of all people who are currently living on the island and also records their age and sex. This list might look as follows:4
Name Age Sex ω1 40 0 ω2 38 1 ω3 4 1 ω4 16 0 ω5 63 1 ω6 70 0 ω7 25 1 ω8 8 0 ω9 63 1 ω10 11 0

Table 3.1-1 Chronicle of our ﬁctitious island. Year 1961 1963 1964 1966 1970 1971 1975 1975 1980 1982 1985 Name ω4 ω6 ω11 ω12 ω13 ω9 ω8 ω14 ω15 ω16 ω5 Age 17 73 30 0 0 74 23 26 0 0 88 Sex 0 0 0 1 0 1 0 1 0 1 1 Kind of event leaves the island dies becomes new inhabitant is born is born dies leaves the island becomes new inhabitant is born is born dies

really tells a story about the life on the island. It is evident, however, that this is not possible if the number of people becomes very large. But then also the simple list of records becomes larger and larger and diﬃcult to survey; and so it becomes necessary to condense the list into comprehensible information. This is the task of statistical methods. The basic ideas will be discussed in the next chapter.

3.2

Representation of Processes

This is the stocktaking in the ﬁrst year, 1960, when the chronicle begins. In the following years the chronicler adds entries whenever a demographic event occurs. The complete chronicle, up to the year 1990, might then look as shown in Table 3.1-1. 8. It is quite possible that the chronicler not only records demographic events but adds a lot more information about the life of the people on the island and their living conditions. Since, in this example, the number of people is very small one also can imagine that the chronicler creates his chronicle not simply as a list of records, but uses some literary form and
3 For example, one may think of Hallig Gr¨de, a small island at the west coast of o Schleswig-Holstein in northern Germany. With currently 16 people living on this island, it is the smallest municipality [Gemeinde] in Germany. 4 Age is recorded as usual in completed years; sex is represented by numbers, 0 representing ‘male’ and 1 representing ‘female’ individuals.

1. It is often said that demography, like other social sciences, is concerned with “processes”. Taken literally, this only expresses an interest in sequences of events that are assumed to be related in some way. But how does one delineate the events that are part of the process? Observations will not provide an answer because the possibilities to consider objects and events as being part of a process are virtually unlimited. We therefore understand ‘process’, not as an ontological category (something that exists in addition to objects and events), but as belonging to the ideas and imaginations of humans aiming at an understanding of the occurrences they are observing. Put somewhat diﬀerently, we suggest to understand processes as conceptual constructions. This is not do deny that processes can meaningfully be linked to observations of objects and events; but this will then be an indirect link: one can observe objects and events, but not processes. The ﬁctitious chronicle of the previous section can serve as an example. The chronicle can meaningfully be understood as the characterization of a process and, as we have construed the example, it derives from observations. However, what the chronicler actually observes is not a process but the people on the island and a variety of events involving these people. The process only comes into existence by creating the chronicle.

28

3

DEMOGRAPHIC PROCESSES

3.2

REPRESENTATION OF PROCESSES

29

This example also illustrates the abstractions that cannot be avoided in the construction of processes. Only a small number of events can be given an explicit representation. In the example, the chronicler only records some basic demographic events and consequently abstracts from most of what is actually happening on the island. 2. In order to explicitly deﬁne processes it seems natural to begin with events. For demographic processes, the basic events are births, deaths, and migrations. An explicit representation of these events can be avoided, however, by using the conceptual framework introduced in the previous section.5 This allows to think of a demographic process simply as a sequence of population sets, Ωt . Birth and death events are then taken into account by corresponding updates of these population sets; each birth adds a person and each death removes one. This motivates the following notation to represent a demographic process without external migration: (S, T ∗ Ωt ) , to be understood as a sequence of population sets, Ωt , which are deﬁned for all temporal locations t ∈ T ∗ . In this formulation, T ∗ denotes a contiguous subset of the time axis T that covers the period for which the process shall be considered, and S provides a representation of the spatial context.6 3. The assumption on S introduced in the previous section implies that migration can only occur inside this space. People can move between the spatial locations deﬁned by S, but such events will not change the size of the population and need not be taken into account for a general deﬁnition of demographic process. The situation is somewhat diﬀerent when a demographic process is restricted to a subset of S, say S ∗ ⊂ S, which is often the case in empirical applications, for example, when considering the demographic development in a speciﬁc country. People can then migrate between S ∗ and S \ S ∗ . However, if a deﬁnition of population sets is restricted to the subspace S ∗ , such events can formally be treated like births and deaths; in-migration adds a person and outmigration removes a person. One therefore can use an analogous notation, (S ∗ T ∗ Ωt ) , , in order to represent a demographic process with external migration. As already explained, the notation is meant to imply that S ∗ is a proper subset of S and the population sets Ωt are restricted to S ∗ . 4. All further concepts to be introduced in this text, including statistical alternative approach that explicitly begins with events is taken, e.g., by Wunsch and Termote (1978, ch. 1).
6A 5 An

variables, will be derived from the notion of a demographic process (with or without external migration). As a ﬁrst step, one can simply refer to the number of people who are members of the population sets Ωt . We will use the following notations: nt := number of people in temporal location t (nt = |Ωt |) bt := number of children born in temporal location t dt := number of people dying in temporal location t For a demographic process without external migration the relation between population size and birth and death events can then be written as follows: nt+1 = nt + bt+1 − dt (3.2.1)

For a demographic process with external migration we use, in addition, the notations: mi := number of people who enter S ∗ in temporal location t t mo := number of people who leave S ∗ in temporal location t t The basic equation then becomes nt+1 = nt + bt+1 − dt + mi − mo t+1 t (3.2.2)

These equations will be called accounting equations of a demographic process (with or without external migration). One should notice that these accounting equations are true by deﬁnition. They simply are book-keeping identities about demographic processes and do not have any causal meaning. Illustrations will be given in Chapter 6 with data for the demographic development in Germany. 5. Notwithstanding the conventions introduced in the last paragraph of the preceding section, referring to the number of people who live during a temporal location t inevitably involves some conceptual indeterminacies. If temporal locations are short, e.g. days, such indeterminacies might well be ignored. On the other hand, if the temporal index t refers to years, or even longer periods, one might want to distinguish the number of people who live during this period from the number of people who live at the beginning, or end, of the period. This is done, for example, in many publications of population statistics by the Statistisches Bundesamt. The distinction is between the number of people at the end of a year, deﬁned as the last day in the year, and a midyear population size.7 When analyzing data
7 The deﬁnitional apparatus of the STATIS data base (see Appendix A.1) provides the following explanations: Der Bev¨lkerungsstand gibt die Zahl der Personen an, o ” die zur Bev¨lkerung geh¨ren, nachgewiesen zu verschiedenen Zeitpunkten. Der Bev¨lo o o

fully explicit notation would therefore be: (S, T ∗ , {Ωt | t ∈ T ∗ }).

30

3

DEMOGRAPHIC PROCESSES

3.3

STOCKS, FLOWS, AND RATES

31

from population statistics it is therefore necessary to distinguish between the diﬀerent deﬁnitions used in the data construction. The notation nt is always meant to represent some kind of mean population size in the temporal location t, most often a year. In addition, we use the following notations: nt := population size at the beginning of t nt := population size at the end of t and assume that nt = nt+1 . The exact meaning of “beginning” and “end” will be left unspeciﬁed because possible meanings depend on the application context and availability of data. 6. If the beginning and end of a temporal location are explicitly distinguished, the formulation of the accounting equations has to be changed accordingly: nt = n t + b t − d t for a demographic process without external migration, and nt = n t + b t − d t + mi t − mo t (3.2.4) (3.2.3)

Part II of this text when dealing with real data. In the present section we brieﬂy discuss a general idea that has motivated many of these measures and is derived from a distinction between stock and ﬂow quantities. As an example, we refer to equation (3.2.1) in the previous section. The equation connects two kinds of quantity: nt and nt+1 are stock quantities [Bestandsgr¨ßen] which record the number of people who live in the reo spective temporal locations; on the other hand, bt+1 and dt are called ﬂow quantities [Stromgr¨ßen] because they record changes (events) which oco cur during the respective temporal locations. The general idea is: a stock quantity records some state of aﬀairs that is (assumed to be) ﬁxed in a given temporal location, and a ﬂow quantity records changes that occur during some time interval. A ﬂow quantity then counts the number of events of a certain kind which occur during that time interval. 2. A further step consists in the deﬁnition of rates [Raten]. The basic idea is to relate the number of events to a number of people who can, in some sense, contribute to the occurrence of the events. A marriage rate can serve as an example: marriage rate := number of marriages in year t number of people who might become married in year t As shown by this example, a rate is always a ratio where the numerator refers to a ﬂow quantity. The only question is how to deﬁne a sensible denominator. In a strict sense, the denominator should refer to the number of people who might experience the events referred to in the numerator. This is possible, for example, when referring to death events. Since any living individual might die at any time, the denominator of a mortality rate can simply refer to all people still alive at a given temporal location. In other cases the deﬁnition of a sensible denominator is more diﬃcult. How should one deﬁne the number of people who might become married in a certain year? It does not suﬃce to exclude people who are already married, one should also exclude children below a certain age. 3. Actually, the term ‘rate’ is used quite loosely in the demographic literature and other areas of social statistics. While it is most often a ratio where the numerator refers to a ﬂow quantity in the sense of a number of events occurring during a time interval, the denominator might refer to any kind of stock quantity that is assumed to exhibit some sensible relation to the numerator. An example is the crude birth rate [allgemeine Geburtenziﬀer 9 ] which is deﬁned by crude birth rate := number of births in year t mean population size in year t

for a demographic process with external migration. These versions of the accounting equations will be used in Section 6.4.

3.3

Stocks, Flows, and Rates

1. Demographers have invented a large number of measures to characterize demographic processes.8 Some of these measures will be introduced in kerungsstand im Jahresdurchschnitt insgesamt ist das arithmetische Mittel aus zw¨lf o Monatswerten, die wiederum Durchschnitte aus dem Bev¨lkerungsstand am Anfang und o Ende jeden Monats sind. Zur Berechnung des durchschnittlichen Bev¨lkerungsstandes o nach Altersjahren und Geschlecht wird ein vereinfachtes Verfahren angewendet: Es werden lediglich die arithmetischen Durchschnittswerte aus dem Bev¨lkerungsstand jeder o Gruppe zum Jahresanfang und -ende gebildet und mit einem Korrekturfaktor multipliziert. Dieser Korrekturfaktor ist der Quotient aus dem durchschnittlichen Bev¨lkerungso stand insgesamt und der Summe aller vereinfacht berechneten Durchschnittswerte des Bev¨lkerungsstandes in den einzelnen Altersjahren.“ One should also note that these o deﬁnitions have exceptions and have changed through time. In den Jahren 1961, 1970 ” und 1987 wurden keine Durchschnittwerte gebildet, sondern die Ergebnisse der jeweiligen Volks- und Berufsz¨hlungen nachgewiesen.“ Bis 1953 und von 1956 bis 1960 wurde a ” zur Berechnung des Bev¨lkerungsstandes im Durchschnitt insgesamt das arithmetische o Mittel aus jeweils vier Vierteljahreswerten gebildet; dagegen wurde der Bev¨lkerungso stand von 1953 bis 1955, von 1962 bis 1969 und wird seit 1971 – wie oben beschrieben – als Durchschnitt aus Monatswerten berechnet.“
8 For a fairly complete compilation of the many measures that are used in the demographic literature see Mueller (1993 and 2000), or Esenwein-Rothe (1982).

9 This expression that avoids the term ‘rate’ is used by the Statistisches Bundesamt. In the literature one also ﬁnds other expressions, for example ‘rohe Geburtenrate’.

32 (often multiplied by 1000).

3

DEMOGRAPHIC PROCESSES

3.4

AGE AND COHORTS

33

4. Another example is the notion of a rate of change [Ver¨nderungsrate], a also called a growth rate [Wachstumsrate]. This notion can be applied to any sequence of stock quantities. As an example, we refer to a sequence of population sizes, nt . The rate of change, or growth rate, of the population is then deﬁned by10 ρt := nt+1 − nt nt (3.3.1)

This equation can also be used to deﬁne a mean growth rate [durchschnittliche Wachstumsrate]. The idea is to assume a constant growth rate during the time interval from t to t + t . If this constant growth rate is denoted by ρ, one gets the equation nt+t = nt (1 + ρ)t The mean growth rate for a period from t to t + t , in this text denoted by ρt,t+t , is deﬁned as the solution of this equation and can be calculated with the following formula: ρt,t+t = nt+t nt
1/t

For a demographic process without external migration this can also be written in the form bt+1 − dt bt+1 dt ρt = = − nt nt nt This formulation shows that the numerator is a ﬂow quantity and the denominator a stock quantity. However, the calculation only requires a knowledge of the stock quantities. As an example, we use some ﬁgures that refer to the demographic development in Germany for the period 1990 to 1996:11 t nt ρt 1990 79365 0.0078 1991 79984 0.0076 1992 80594 0.0073 1993 81179 0.0030 1994 81422 0.0029 1995 81661 0.0029 1996 81896

−1

As an illustration, using the ﬁgures from the above example, one ﬁnds ρ1990,1996 = 81896 79365
1/6

− 1 ≈ 0.00525

3.4

Age and Cohorts

Of course, growth rates can also be expressed in percent. 5. The deﬁnition of growth rates given above immediately implies the following equation: nt+1 = nt (ρt + 1) If one considers not just two consecutive temporal locations but some longer time interval, say from t to t + t , one ﬁnds the more general relationship t −1

1. A substantial part of the demographic literature is concerned with the “structure” of the population sets Ωt , where ‘structure’ refers to statistical distributions of individual properties. Two of these properties are of particular importance in demography: sex and age. This is due to the fact that both are important preconditions of many demographic events.12 For example, only women, during a certain period of their lifes, can give birth to children; and also death events depend in some way on age. So it is often sensible to distinguish people with respect to their sex and age. To distinguish numbers of male and female individuals we use superscripts: nm and nf will denote, respectively, the number of men and women living t t in temporal location t; of course, nt = nm + nf . t t 2. Age refers to the duration between a current temporal location and the date of birth of a person. A commonly used measure is completed years. Demographers also use another measure often called exact age. The meaning of this term depends on the time axis used to provide a temporal framework. As an example, we assume a discrete time axis, T , with temporal locations deﬁned as days. The exact age of a person is then simply the number of days that passed away since the person was born.
12 We

nt+t = nt (1 + ρt ) · · · (1 + ρt+t −1 ) = nt τ =0
10 Since

(1 + ρt+τ )

a growth rate conceptually relates to two temporal locations, it is an arbitrary convention to index the rate by the ﬁrst temporal location. Some authors, e.g. Rinne (1996, p. 84), use the second temporal location. We mention that growth rates can also be deﬁned diﬀerently. For example, when it seems sensible to distinguish the beginning and end of a period t, one might deﬁne a growth rate for this period by (nt − nt )/nt . ﬁgures refer to the midyear number of people in 1000 and are taken from Statistisches Jahrbuch 1997 f¨r die Bundesrepublik Deutschland (p. 46). u

11 The

speak of conditions in order to avoid causal connotations. Both, sex and age, are clearly not “factors” which in some way “produce” demographic events. Thinking of age as a “causal variable” has been called “fallacy of age reiﬁcation” (Riley 1986, p. 158).

34

3

DEMOGRAPHIC PROCESSES

3.4

AGE AND COHORTS

35

The following graphic illustrates the connection between exact age and age in completed years: age ' T birth age 0

E' T ﬁrst birthday

age 1

E T

T E time τ1 C1

C2

second birthday τ2

In this example, the exact age of a person is 0 during the day the person is born, it is 1 during the next day, and so on. We will avoid, however, the term ‘exact age’ and speak of age in completed days, or month, or years, whatever unit of time is used for measurement. Furthermore, we simply speak of age if the time unit is identical with the temporal locations of the time axis that provides the temporal framework. 3. Given these conventions for the measurement of age, the members of the population sets Ωt can be distinguished by their age. We will use the notation Ωt,τ to refer to the subset of members of Ωt being of age τ (τ = 0, 1, 2, . . .). This results in a partition
∞

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp ppp ppp pp pp ppp pp pp p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p ppp ppp ppp pppp p t1 t2 t

E time

Ωt = Ωt,0 ∪ Ωt,1 ∪ Ωt,2 ∪ · · · = τ =0

Ωt,τ

The horizontal axis represents time, the vertical axis represents age. The diagonal arrows depict the life courses of individuals born in the same temporal location. For example, the arrow denoted by C1 refers to individuals born in temporal location t1 . When time goes on they grow older and their age can be read from the vertical scale. A vertical line that begins at any temporal location, say t, intersects the diagonal life course lines at the corresponding ages. 5. We use the notation Ct to refer to the set of people born in the temporal location t, and C[t,t ] to refer to the set of people born in the time interval from t to t . Such sets are called birth cohorts.15 As shown by the Lexis diagram, a demographic process can be considered as a temporal sequence of birth cohorts, and this is one reason why this concept plays a prominent role in demography. It is also a basic concept for much of the research that deals with human life courses.16 The approach bases an understanding of long-term changes in the development of societies on a comparison of the life courses of members of successive birth cohorts.17 This research has also led to a more general deﬁnition of the term ‘cohort’:
“A cohort is an aggregate of individual elements, each of which experienced a signiﬁcant event in its life history during the same chronological interval.” (Ryder 1968, p. 546)

Using the notation nt,τ to refer to the number of members of Ωt,τ , a corresponding equation is
∞

nt = nt,0 + nt,1 + nt,2 + · · · = τ =0

nt,τ

Of course, there is an upper limit to the length of human life; summation to ∞ simply avoids the speciﬁcation of a deﬁnite limit. 4. If age is measured in the units of a time axis T , the relationship between age and time is quite simple: whenever a person survives a temporal location t ∈ T this adds one unit to the person’s age.13 This relationship can be graphically illustrated by an age-period diagram, also called a Lexis diagram, as follows:14
13 This

A similar deﬁnition was given by Glenn (1977, p. 8):
15 Some

statement assumes that age is measured in the same units as used for the definition of T ; this also implies that, if an individual is of age τ in temporal location t, then it was born in temporal location t − τ . with G. F. Knapp and W. Lexis, demographers have developed many variations of this basic age-period diagram; for a discussion see Esenwein-Rothe (1992, pp. 16-30).

demographers also use the word ‘generation’ or use both terms synonymously. However, except when referring to relationships between parents and children, we will avoid to speak of “generations” because this word has many diﬀerent and often unclear meanings; see, e.g., the discussion by Mannheim (1952), Pfeil (1967), and Kertzer (1983). an overview, see Wagner (2001). early exposition of this view was given by Ryder (1965).

14 Beginning

16 For 17 An

36

3

DEMOGRAPHIC PROCESSES

3.4

AGE AND COHORTS

37

“a cohort is deﬁned as those people within a geographically or otherwise delineated population who experienced the same signiﬁcant life event within a given period of time.”

Not just birth events, also most other kinds of event can then be used to deﬁne cohorts; for example, one can speak of a marriage cohort that comprises all people who married in the same time period. 6. In the present text cohorts will always be birth cohorts for which we have introduced the notation Ct . Similar to the population sets Ωt , we think of cohorts as sets of people, not as some kind of social group.18 We shall also avoid any kind of analogy with individuals. A cohort is a conceptual construction, not some object that exists apart and independent of its members. In particular, we shall not think of cohorts as some kind of “agents” that “drive” social change.19 This is not to deny that one may ﬁnd some similarities among the members of the same cohort. However, whatever similarities one may ﬁnd, they result from the life courses of the individual cohort members.20 It is, therefore, conceptually senseless to think of such similarities as resulting from being members of the same cohort, that is, from being born in the same year. The argument becomes not better if one refers, not just to the fact of being born in the same year, but to events and social conditions that were experienced by members of the same cohort at the same age. Such considerations might well be used for a retrospective interpretation of similarities among members of the same cohort. It would be a mistake, however, to use such considerations for implicitly changing the meaning of the term ‘cohort’. The term is deﬁned by referring to people born in the same year, or historical period, possibly adding some spatial demarcation. So whatever happens to be the case afterwards is irrelevant for the meaning of the term ‘cohort’.21 The only fact derivable from the deﬁnition is that members of the same cohort are always of (approximately) the same age during their life courses. But being of the same age can not be used to explain facts or events. In our view, therefore, cohort is not an explanatory concept, but a conceptual
18 This

tool that allows to think in terms of life courses and their development. As we have mentioned, this also might help in retrospective interpretations. But the explanatory value of such interpretations derives from locating individual life courses within the historical periods in which they develop. 7. It should also be stressed that ‘cohort’ is essentially a retrospective concept. The deﬁnition given above suggests that cohorts, like Ωt , are sets of people. While this is formally correct, there is an important ontological distinction. The population sets Ωt consist of people who live in the historical time t. Therefore, if we know that a person is a member of Ωt , we can infer that this person is alive in t. But now think of a cohort of people born in some year t0 . Knowing that a person is member of the cohort Ct0 only allows to infer that this person lived during t0 . For all later periods, nothing deﬁnite can be inferred. Of course, most members of Ct0 will live for some period following t0 . But life beyond t0 is a property of an individual member of the cohort, not of the cohort itself. It is thus dubious how to speak of the temporal existence of a cohort. One might say that Ct0 appears in the year t0 and remains existent until the death of its last survivor. But this implies that the cohort is no longer a deﬁnite population set but part of a demographic process, formally Ct0 ≡ (Ct0 ,0 , Ct0 ,1 , Ct0 ,2 , . . .) where Ct0 ,τ denotes the set of members of Ct0 still alive at age τ . In the framework of a demographic process without migration one can then formally identify the sets Ct0 ,τ with the population sets Ωt0 +τ,τ . 8. This temporal view is quite sensible and is used, for example, in the construction of cohort life tables (Chapter 8). However, it is no longer possible, then, to identify a cohort with a deﬁnite set of people. The problem is somewhat obscured by the fact that most empirical cohort studies actually condition on survivorship. They are concerned with people born in t0 and having survived until some temporal location t > t0 . So one can speak of a deﬁnite set of people, formally identical with Ωt,t−t0 ; but clearly, this set is not identical with Ct0 . Of course, there is nothing wrong in referring to a population set Ωt,t−t0 . It implies, however, a retrospective point of view. The population set Ωt,t−t0 only comes into existence when history has passed the temporal location t. Therefore, thinking of cohorts as deﬁnite population sets presupposes a retrospective point of view.

distinctions has already been stressed by Mannheim (1952, pp. 288-9). Unfortunately, Mannheim’s notions of a “generation as an actuality” and a “generation unit” eventually obscure this distinction.

19 This

view has been suggested, more or less explicitly, by Ryder (1965). In another paper, he wrote: “Some reservations to this discussion are necessary to obviate the implication that cohorts are the exclusive agents of social change.” (Ryder 1968, p. 548) The point, however, is not that there are also other agents, but that cohorts never are agents. Cohorts do not bring anything about. has been explicitly recognized by Mayer and Huinink (1990, p. 213): “the characteristics of a cohort are aggregated outcomes of the individual behavior of cohort members in the social context, indicated crudely by calender time.” additional argument is simply that the members of a cohort experience quite diﬀerent events, and live under quite diﬀerent social conditions, during their life courses; see, e.g., Rosow (1978).

20 This

21 An

4.1

STATISTICAL VARIABLES

39

Chapter 4

often to formulate general statements, for example: For all numbers x: if 0 ≤ x ≤ 5, then 0 ≤ x2 ≤ 25 This example also shows that logical variables are in no way “variable quantities”.1 2. Statistical variables serve a quite diﬀerent purpose. They are used to represent the data for statistical calculations which refer to properties of objects. The basic idea is that one can characterize objects by properties. Since this is essentially an assignment of properties to objects, statistical variables are deﬁned as functions:2 ˜ X : Ω −→ X ˜ X is the name of the function, Ω is its domain, and X is a set of possible values. To each element ω ∈ Ω, the statistical variable X assigns exactly ˜ one element of X denoted by X(ω). In this sense, a statistical variable is simply a function.3 What distinguishes statistical variables from other functions is a speciﬁc purpose: statistical variables serve to characterize objects. Therefore, in order to call X a statistical variable (and not just a function), its domain, Ω, should be a set of objects and the set of its ˜ possible values, X , should be a set of properties that can be meaningfully used to characterize the elements of Ω. To remind of this purpose, the set of possible values of a statistical variable will be called its property space [Merkmalsraum] and its elements will be called property values [Merkmalswerte].4 3. As was mentioned in the Introduction, in the statistical literature domains of statistical variables are often called populations. This is unfortunate because a statistical variable can refer to any kind of object. We therefore use the term ‘population’ only if one is actually referring to sets
1 This has been stressed by many logicians, see, e.g., Frege (1990, p. 142); and already in 1903, B. Russell (1996, p. 90) wrote: “Originally, no doubt, the variable was conceived dynamically as something which changed with the lapse of time, or, as is said, as something which successively assumed all values of a certain class. This view cannot be too soon dismissed.” 2 Since the notion of a ‘function’ is used throughout the whole text we have added a section in Appendix A.2 providing basic deﬁnitions. 3 It follows that logical and statistical variables are completely diﬀerent things. Moreover, the term ‘variable’ is misleading in both cases. For a more extensive discussion that also shows how both notions, logical and statistical variables, can be linked by using sentential functions, see Rohwer and P¨tter (2002b, ch. 9). — When there is no danger o of confusion, we will drop the attribute ‘statistical’ and simply speak of variables. 4 We generally denote statistical variables by upper case letters (A, B, C, . . . , X, Y, Z) and their property spaces by corresponding calligraphic letters that are marked by a ˜ ˜ ˜ ˜ ˜ ˜ tilde (A, B, C, . . . , X , Y, Z).

Variables and Distributions
The last chapter introduced the notion of a demographic process, formally denoted by (S, T ∗ , Ωt ). This suﬃces to represent the population size, that is, the number of people in the population sets Ωt , and to record its development through time. Additional questions concern properties of the members of Ωt . Two such properties, sex and age, were already part of the example discussed in Section 3.1, but many other properties can also be considered. If the size of Ωt is large, how can one sensibly represent the properties of all its members? This is the task of statistical methods as has been expressed by a famous statistician, Ronald A. Fisher, in the following way:
“Brieﬂy, and in its most concrete form, the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of entering the mind, is to be replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information contained in the original data.” (Fisher 1922, p. 311)

The present chapter introduces two basic notions, statistical variables and statistical distributions (additional concepts will be added in later chapters). Deﬁnitions and notations mainly follow the author’s “Grundz¨ge der u sozialwissenschaftlichen Statistik” (2001). Some notational simpliﬁcations will be discussed in Section 4.3.

4.1

Statistical Variables

1. Unfortunately, the word ‘variable’ is easily misleading because it suggests something that “varies” or being a “variable quantity”. In order to get an appropriate understanding it is ﬁrst of all necessary to distinguish statistical from logical variables. Consider the expression ‘x ≤ 5’. In this expression, x is a logical variable that can be replaced by a name. Obviously, without substituting a speciﬁc name, the expression ‘x ≤ 5’ has no deﬁnite meaning and, in particular, is neither a true nor a false statement. The expression is actually no statement at all but a sentential function [Aussageform]. A statement that is true or false or meaningless only results when a name is substituted for x. For example, if the symbol 1 is substituted for x, the result is a true statement (1 ≤ 5); if the symbol 9 is substituted for x, the result is a false statement (9 ≤ 5); and if some name not referring to a number is substituted for x, the result is neither true nor false but meaningless. As the reader will remember from his or her mathematical education such logical variables are heavily used in mathematics,

40

4

VARIABLES AND DISTRIBUTIONS

4.1

STATISTICAL VARIABLES

41

of people. In the general case we speak of the domain or, equivalently, of the reference set of a statistical variable. 4. As an illustration we use the chronicle introduced in Section 3.1. Referring to the year 1960, the reference set consists of the names of all individuals who, in 1960, were inhabitants of the island: Ω1960 := {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 , ω7 , ω8 , ω9 , ω10 } Given this reference set, one can think of characterizing its elements by properties. Two property spaces have been used in the example, age and sex. The latter property space consists of two elements, ‘male’ and ‘female’, and may be written explicitly as follows: ˜ S := {male, female} This then allows to deﬁne a statistical variable, say S, that assigns to each ˜ individual ω ∈ Ω a property value S(ω) ∈ S. Of course, if the statistical variable is intended to represent data, the assignment should not be made arbitrarily but reﬂect actual properties of the individuals. In our example, the assignment should be made in the following way: ω ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8 ω9 ω10 S(ω) male female female male female male female male female male

The main reason is, however, another one: numerical representations allow to perform statistical calculations. As an example, we introduce the mean value of a statistical variable, say X, denoted by M(X). The deﬁnition is M(X) := 1 |Ω| X(ω) ω∈Ω The calculation consists in summing up the values of the variable for all elements in the reference set and then dividing by the number of elements. Obviously, the calculation requires a numerical representation for the values of the variable, that is, for the elements of its property space. But as soon as one has introduced a numerical representation one can do anything that can be done with numbers also with the values of a variable. To be sure, this does not guarantee a result with an immediate and sensible interpretation. This might, or might not, be the case and can never be guaranteed from a statistical calculation alone.5 However, in our example we get a sensible result. Performing the calculations of a mean value for the variable S, we get the value M(S) = 1 (0 + 0 + 0 + 1 + 0 + 1 + 0 + 1 + 0 + 1 + 1) = 0.5 10

providing the proportion of female individuals in the set of inhabitants of our ﬁctitious island. 6. The chronicle also provides information about the age of the inhabitants of the island. Referring again to the set of people who lived on the island in 1960, denoted by Ω1960 , we can deﬁne another statistical variable that assigns to each individual ω ∈ Ω1960 its age in 1960. We will call this ˜ variable A, and denote its property space by A, deﬁned by ˜ A := {0, 1, 2, 3, . . .} In this case we do not need to explicitly introduce a numerical representation because age, given its usual meaning in terms of completed years, already has a numerical expression. Of course, an age of 40, say, is not identical with the number 40. It is a number which is given a speciﬁc ˜ meaning. Consequently, A, while formally identical with the set of natural numbers, has an additional meaning which is not a part of the deﬁnition of natural numbers, namely that its elements are agreed upon to denote ages in completed years. Many other methods to provide information about age could equally well be used, for example, measuring age in months or days, or simply distinguishing between children and older people. These
5 One cannot rely on any general rules but needs to consider each statistical calculation in its speciﬁc context. As an example, think of household income and rent. Subtracting rent from household income provides a meaningful result, but simply to add both quantities does not.

This example also demonstrates that, in contrast to most functions that are used in mathematics, statistical variables cannot be deﬁned by referring to some kind of rule. There is no rule that would allow to infer the sex, or any other property, of an individual by knowing its name. In order to make a statistical variable explicitly known one almost always needs a tabulation of its values. 5. As the example also shows, the elements of a property space are not numbers but properties that can be used to characterize objects. However, it is general practice in statistics to represent properties by numbers. This was already done in Section 3.1 where we have represented the properties ‘male’ and ‘female’ by the numbers 0 and 1, respectively. One reason for doing so is the resulting simpliﬁcation in the tabulation of statistical data.

42

4

VARIABLES AND DISTRIBUTIONS

4.1

STATISTICAL VARIABLES

43

are considerations that precede the deﬁnition of a property space and, consequently, a statistical variable. Here we follow the chronicler who has used completed years to record ages allowing to deﬁne a statistical variable ˜ A : Ω1960 −→ A that provides for each inhabitant of the island his or her age in 1960. Of course, the deﬁnition does not suﬃce to record ages. As in the ﬁrst example one needs to distinguish between merely assuming the existence of a variable and actually knowing its values. To provide such knowledge the values of a statistical variable must be explicitly recorded in some way. In this example we can use again a simple table as follows: ω ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8 ω9 ω10 A(ω) 40 38 4 16 63 70 25 8 63 11

8. A further point is worth attention. The same property value can have several realizations in a reference set. In our example, there are two individuals, ω5 and ω9 , having the same age in 1960 (and, of course, during their whole lifes). Also in the ﬁrst example, several people share the properties ‘male’ and ‘female’, respectively. Using terminology from Appendix A.2, one can say that statistical variables are, in general, not injective functions. Inverse functions must therefore be deﬁned in terms of sets. For example, A−1 ({4}) = {ω3 }, A−1 ({5}) = ∅, A−1 ({63}) = {ω5 , ω9 } ˜ The interpretation is straightforward: If A is a subset of the property space ˜ ˜ then A−1 (A) is the subset of members of Ω having a property value A, ˜ in A. The same notation is used with any other statistical variable. For example, S −1 ({1}) = {ω2 , ω3 , ω5 , ω7 , ω9 } is the set of female members of Ω1960 . 9. A ﬁnal consideration concerns the reference sets that are used as domains to deﬁne statistical variables. In both previous examples, Ω1960 was taken to be a set of people, the inhabitants living on our ﬁctitious island in 1960. In general, Ω can be any set of objects. The formal notion of a statistical variable only requires that the elements of Ω can be identiﬁed and distinguished, and to which values of a property space can be meaningfully assigned. This generality of statistical variables should be taken with some caution, however, depending on the purpose to be served. In this text, statistical variables will be used as means for demographic descriptions and models. The basic conceptual framework is a demographic process that consists of a space S, a time axis T , and population sets Ωt comprising the people who are living in S during the temporal location t. As our examples have shown, statistical variables can be used to represent information about the members of the sets Ωt . It also seems possible to use the space, S, as a domain for statistical variables which then take the form ˜ L : S −→ L Such variables will be called spatial variables. They are always statistical variables. Examples would be the characterization of spatial locations by their size (e.g., in square kilometers), or by the number of people who currently live in these locations. These examples show that spatial locations, as understood in this text, are similar to objects. Both have some kind of physical existence and can sensibly be characterized by properties. This ontological status is not shared, however, by temporal locations. Temporal

Such a table, often called a data matrix , provides the values of a variable and can be used as a starting point for further calculations. For example, we can calculate the mean value of A which is M(A) = 1 (40 + 38 + 4 + 16 + 60 + 70 + 25 + 8 + 65 + 11) = 33.7 10

In contrast to the ﬁrst example, this is not a proportion but the value of the mean age of the inhabitants of the island in 1960. 7. Our second example also shows that, in general, one needs to distinguish between a property space as it is used to deﬁne a statistical variable and the set of property values that are actually realized in some given reference set Ω. The former will be called a conceptual property space and the latter a realized property space.6 In our example, the realized property space is ˜ A(Ω) = {A(ω) | ω ∈ Ω} = {4, 8, 11, 16, 25, 38, 40, 63, 70} ⊂ A ˜ and is obviously not identical with A.
6 The realized property space of a statistical variable can also be called its range. This term is commonly used to denote, for an arbitrary function f : B −→ C, the image f (B) ⊆ C; see also Appendix A.2.

44

4

VARIABLES AND DISTRIBUTIONS

4.2

STATISTICAL DISTRIBUTIONS

45

locations are in no way similar to objects and do not have a physical existence. So it seems not possible to sensibly characterize temporal locations by properties, and we therefore shall avoid to use a time axis as a domain for the deﬁnition of statistical variables in a proper sense.

4.2

Statistical Distributions

2. Following this idea, one no longer refers to individual members of Ω ˜ but to elements, or subsets, of a variable’s property space. So let X be ˜ any subset of X ; such subsets will be called property sets. The question concerns the proportion of members in Ω who were assigned a value in the ˜ property set X via the function X. A general answer is provided by the frequency function of X, that is, a function ˜ P[X] : P(X ) −→ R ˜ ˜ In this formulation, P(X ) is the power set of X , that is, the set of all ˜ . So the domain of the frequency function P[X] consists of subsets of X ˜ all property sets that can be created from the property space X . The assignment is deﬁned by ˜ P[X](X) := 1 ˜ {ω ∈ Ω | X(ω) ∈ X} |Ω|

1. Statistical variables provide the starting point for all further statistical concepts. These concepts, directly or indirectly, always relate to reference sets of statistical variables and not to their individual members,7 more speciﬁcally: they relate to distributions of properties in a reference set. In order to introduce this notion explicitly, consider a statistical variable ˜ X : Ω −→ X If all values of this variable were known it would be possible to use that knowledge to characterize all individual members of Ω. However, statistical concepts and methods have a quite diﬀerent purpose. Statistical questions do not concern individual members of Ω but frequencies of property values in the reference set Ω. This was stated in a Declaration of Professional Ethics, published by the International Statistical Institute (1986, p. 238), as follows:8
“Statistical data are unconcerned with individual identities. They are collected to answer questions such as ‘how many?’ or ‘what proportions?’, not ‘who?’. The identities and records of co-operating (or non-cooperating) subjects should therefore be kept conﬁdential, whether or not conﬁdentiality has been explicitly pledged.”

˜ ˜ ˜ Thus, for every property set X ∈ P(X ), P[X](X) is the relative frequency ˜ in Ω. of X 3. Frequency functions always refer to relative frequencies (proportions). We therefore adopt the terminological convention that the word ‘frequency’, without a qualifying attribute, also always means relative frequency. Since also absolute frequencies are often used to characterize populations, we introduce the complementary notion of an absolute frequency function deﬁned by ˜ ˜ ˜ P∗ [X](X) := {ω ∈ Ω | X(ω) ∈ X} = P[X](X) |Ω| 4. It is fairly obvious how to derive a frequency function from the values of ˜ a statistical variable. Given a property set, say X, one simply counts the ˜ number of elements of Ω having a property value in X, and then divides the resulting count by the number of elements of Ω. Less obvious is that, from a statistical or demographic point of view, all relevant information about a statistical variable is contained in its frequency distribution. In fact, all proper statistical concepts are derived from frequency distributions. To illustrate this, we refer to the ﬁrst example, variable S, discussed in ˜ the previous section. The property space is S = {0, 1}, and so it suﬃces 10 Using the data which are to consider the property sets {0} and {1}. tabulated on page 40, one immediately ﬁnds: P[S]({0}) = P[S]({1}) = 0.5 same means whether he is making his way towards the front, whether he just holds his place, or whether he is falling back towards the rear. Similarly as regards the position of our class, or of our nation, among other classes and other nations.”
10 The

Accordingly, the basic idea is that statistical concepts and methods are concerned, not with individuals, but with frequencies of properties.9
7 See 8 See 9 It

the quotations from Lexis and Feichtinger cited in the Introduction. also B¨rgin and Schnorr-B¨cker (1986). u a

should be mentioned, however, that also the complementary idea, to use statistical data for the characterization of individuals, accompanies the history of statistics. The following quotation from one of its founders, Francis Galton (1889, p. 35–37), provides an example. Galton begins: “We require no more than a fairly just and comprehensive method of expressing the way in which each measurable quality is distributed among the members of any group, whether the group consists of brothers or of members of any particular social, local, or other body of persons, or whether it is co-extensive with an entire nation or race.” Then follows, however, a quite diﬀerent reasoning: “A knowledge of the distribution of any quality enables us to ascertain the Rank that each man holds among his fellows, in respect to that quality. This is a valuable piece of knowledge in this struggling and competitive world, where success is to the foremost, and failure to the hindmost, irrespective of absolute eﬃciency. [. . .] When the distribution of any faculty has been ascertained, we can tell from the measurement, say of our child, how he ranks among other children in respect to that faculty, whether it be a physical gift, or one of health, or of intellect, or of morals. As the years go by, we may learn by the

˜ ˜ power set of a property space X also contains X and the empty set, ∅. However, ˜ since P[X](X ) = 1 and P[X](∅) = 0 for any variable X, these property sets can be neglected.

46

4

VARIABLES AND DISTRIBUTIONS

4.3

REMARKS ABOUT NOTATIONS

47

Now, this information also suﬃces to calculate the mean value of S. This is possible because, for any statistical variable, say X, its mean value can also be expressed by the formula M(X) = x∈X ˜ ˜

4.3

Remarks about Notations

x P[X]({˜}) ˜ x

1. The notations introduced in the foregoing sections are somewhat more involved than those often found in introductory textbooks. This is done in order to make as clear as possible the logical structure of the corresponding concepts, in particular, two basic ideas: • A statistical variable is not any kind of “variable quantity”, but an assignment of properties to objects. So it is formally a function in the mathematical sense of this term. This also implies that statistical variables can not sensibly be thought of as “factors” which, in any dubious sense, can “inﬂuence”, or “cause”, the behavior of objects. • Statistical notions refer, not to individual objects, but to frequency distributions of properties in sets of objects. These frequency distributions which contain all statistically relevant information are, again, functions in the mathematical sense of the word. 2. However, having recognized these conceptual foundations, we will reduce the notational burden and introduce the following abbreviations: ˜ ˜ a) If X : Ω −→ X is a statistical variable and x ∈ X a property value, its ˜ frequency must correctly be written as P[X]({˜}) because the domain x ˜ of the frequency function, P[X], is the power set of X . However, it will save notational overhead to omit, in this case, the curly brackets around x and simply write P[X](˜). For example, to refer to the proportion ˜ x of male individuals in Ω1960 we might simply write P[S](0). ˜ b) One often refers to the frequency, not of a single property values x ∈ X , ˜ but of a set of property values, which we have called a property set, ˜ ˜ ˜ X ⊂ X . The correct formulation is then P[X](X), because the function ˜ is P[X] and its argument is X. While formally not correct, an often ˜ used alternative formulation is P(X ∈ X). However, this alternative formulation is sometimes practical. For example, we might want to refer to the frequency of people being of age 65 or above. The property ˜ ˜ set is then {˜ ∈ A | a ≥ 65}, and its frequency would need to be written a ˜ ˜ a as P[A]({˜ ∈ A | a ≥ 65}). But obviously, it saves notational overhead to simply write P(A ≥ 65).

Therefore, in our example, knowing the frequencies of {0} and {1}, one immediately ﬁnds M(S) = 0.5. 5. The fact that all relevant information about a statistical variable is contained in its distribution can be used to stress again the speciﬁc statistical abstraction that derives from the consideration of frequency distributions. In general, knowing the frequency distribution of a statistical variable, it is no longer possible to infer property values for the individual members of the reference set that was used to deﬁne the variable. In this sense, as said by Lexis, “verschwindet das Individuum als solches” (see the quotation in the Introduction). 6. Although the domain of the frequency function of a statistical variable, ˜ say X, is deﬁned as the power set of the variable’s property space, X , it is not necessary to explicitly tabulate the frequencies of all possible property ˜ sets. This is due to the fact that frequency functions are additive: if X ˜ are any two disjoint property sets, then and X ˜ ˜ ˜ ˜ P[X](X ∪ X ) = P[X](X) + P[X](X ) One can express, therefore, the frequency of any property set in the following way: ˜ P[X](X) = x∈X ˜ ˜

P[X]({˜}) x

This shows that it suﬃces to know the frequencies of the one-element ˜ property sets {˜}, corresponding to the property values x ∈ X , in order x ˜ to have complete knowledge of the frequency distribution of X. This also makes clear how the consideration of frequency distributions serves the main goal of statistical methods, namely to make an often large number of values of a statistical variable comprehensible. Instead of a separate entry for each individual member of the reference set Ω, the representation of a frequency distribution only requires a separate entry for each property value in the realized property space of a statistical variable. Of course, many further statistical concepts can then be used to describe, analyze, and compare frequency distributions of one or more statistical variables. We will introduce some of these concepts in later chapters when they can help in a discussion of substantial questions.

5

MODAL QUESTIONS AND MODELS

49

Chapter 5

Modal Questions and Models
1. We conclude the discussion of the conceptual framework with a few remarks concerning the term ‘model’. This will also make clear why we do not make a sharp distinction between population statistics and the construction of demographic models. — The basic idea is not to follow a widespread conception that thinks of models as being “simpliﬁed descriptions” of some part of reality. For example:
“A scientiﬁc model is an abstract and simpliﬁed description of a given phenomena.” (Olkin, Gleser, and Derman 1980, p. 2) “A model of any set of phenomena is a formal representation thereof in which certain features are abstracted while others are ignored with the intent of providing a simpler description of the salient aspects of the chosen phenomena.” (Hendry and Richard 1982, p. 4)

future possibilities, for example, the future demographic development in Germany. However, also with respect to the past one often is not able to simply state facts. As a consequence one needs to think about a modal question: What might have been the case? The question indicates that one can only speculate about possible facts. Of course, how this can be done depends on the available knowledge. For example, there are no reliable recordings of the number of births in Germany during World War II. Nevertheless, some information is available and can be used to provide reasons for the belief that birth rates declined. A researcher might then say: for these reasons it seems highly probable that birth rates declined during the years of war. 3. We propose to use the term ‘model’ in the following way: Models are explicitly formulated means intended to serve modal reasoning. Since modal reasoning comes in many diﬀerent forms, the same is true of models. Some distinctions will be suggested below. However, one should also recognize that the construction and use of models is not automatically implied by modal reasoning. In fact, most modal reasoning goes without employing a model. A model only comes into existence when it is explicitly formulated as a means intended to provide a conceptual framework for reasoning about speciﬁc questions. For example, using information from a weather report to support speculations about tomorrow’s weather is not using a model; but possibly the people preparing the forecast have used a model. As a condition of its existence, a model needs some kind of representation independent of its actual use. This is not to require any speciﬁc conceptual tools for the formulation of a model. There is again a wide variety of possibilities. The formulation of a model can be purely verbal or, in addition, employ symbols, graphs, ﬁgures, even physical devices. But in whatever form a model is presented, it must be possible to think about the model in its own right. 4. Models do not provide answers to modal questions; they serve to think about, and evaluate, possible answers.3 The main service consists in providing a framework for explicit reasoning. One has to state explicitly the available knowledge, additional assumptions, and how both are used to draw inferences. This is particularly important with regard to assumptions because possible answers to modal questions often heavily depend on assumptions. One might also say that assumptions are required by the very nature of modal questions because, by deﬁnition, the available
3 We therefore do not agree with Baumol (1966, p. 90 -91) who conceived of “predictive models” in the following way: “A predictive model need require relatively little comprehension on the part of its users or even its designers. It is a machine which grinds out its forecasts more or less mechanically, and for such tasks, unreasoning, purely extrapolative techniques frequently still turn out the best results.” Like oracles, such machines should not be called models because, as Baumol rightly says, they do not serve any kind of reasoning.

Quite similar is the view that models are in some way “mappings” [Abbildungen] of parts of reality. For example:
Modelle k¨nnen wir uns in erster N¨herung denken als begriﬄiche Konstrukte o a ” zur ‘Abbildung’ realer Systeme oder zum Umgang mit solchen.“ (Balzer 1997, p. 16) Ein Modell ist wohl immer aufzufassen als eine Abbildung. Die Frage ist ” nur, was abgebildet wird, und wie die Abbildungsfunktion aussieht.“ (Frey 1961, p. 89)

The main objection is that models as used in scientiﬁc discourse almost never serve the purpose of describing something. While descriptions certainly play an important role in scientiﬁc work, this is done by documenting observations. In contrast, most models serve a quite diﬀerent purpose, namely to provide a framework for thinking about modal questions. For example, What might have caused the decline of birth rates in Germany following the baby boom of the 1960s? or, To which extend might the proportion of old people increase during the next 20 years? 2. This is a basic distinction: statements may relate to facts or to possibilities. Descriptions are intended to provide facts, but most human reasoning concerns possibilities. This is also true of most scientiﬁc reasoning.1 Reasoning concerned with possibilities will be called modal reasoning. There is a wide variety of diﬀerent forms.2 Often modal reasoning concerns
1 An opposite view was expressed, e.g., by Samuelson (1952, p. 61): “All sciences have the common task of describing and summarizing empirical reality.” However, if the task really consists in describing something one would not need a model but could simply report observations. Therefore, given that models do not have the task to describe something, it would also be misleading to say that models can only provide “simpliﬁed”, or even “distorted descriptions” (see, e.g., Baumol 1966, p. 90). After all, why should somebody be interested in “distorted descriptions” of reality? 2 For

a good introduction see White (1975).

50

5

MODAL QUESTIONS AND MODELS

5

MODAL QUESTIONS AND MODELS

51

evidence is not suﬃcient for an answer. However, this statement needs a qualiﬁcation. Modal reasoning does not, by itself, require assumptions. For example, being interested in tomorrow’s weather, one can ask a weather forecast. In order to believe in the information one does not need the assumption that the forecaster actually knows the weather of tomorrow. In fact, one can simply use the information without making any assumptions whatsoever. Assumptions are no prerequisite for reasoning, nor for the formation of beliefs. It is easily misleading to say that assumptions are required to allow reasoning in situations where the available evidence is incomplete. Assumptions are only required if one is interested in making reasoning explicit, that is, in making reasoning an object of critique and evaluation. This also is the main task of models. Their job is to show explicitly how one might, or might not, arrive at certain conclusions with regard to a modal question that motivates the reasoning. 5. This also allows to explain the aﬃnity between thinking in terms of models and what might be called rule-based reasoning. The idea is: given certain assumptions one can draw some inferences while others are ruled out because they would violate established rules of reasoning. Of course, not only possible mistakes in applying rules, but the rules themselves, can become a matter of dispute. Furthermore, human reasoning cannot be reduced to a mechanical application of given rules. As a limiting case, one can think of mathematical proofs; but mathematics is not concerned with modal questions and, consequently, not with the construction of models. At least in the dominating understanding, mathematics is basically interested in the formal implications that can be derived from assumptions according to given rules. This allows to make use of mathematical results in many areas of rule-based reasoning. However, when thinking about modal questions, the interest is not in the formal implications of assumptions and rules but concerns possible answers. It is, therefore, not only important that the reasoning is formally correct, but of even greater importance is that assumptions are reasonable. We therefore avoid to speak of ‘formal models’. Irrespective of the conceptual tools used to formulate a model, which often are borrowed from mathematics, its task is not to allow formal inferences but to support modal reasoning.4 6. Since there is a great variety in modal questions and conceptual tools, also many diﬀerent kinds of models do exist. Thinking of models used in the social sciences, the following aspects provide hints to introduce some broad distinctions: the conceptual framework that provides the model’s ontology, the kind of modal question that the model is intended to serve,
4 One should notice, however, that the word ‘formal’ is often used in two diﬀerent meanings. As the word is used above, it refers to arguments which are true only because of their form. In a diﬀerent meaning, ‘formal’ is often understood as the opposite of ‘informal’. Its meaning then becomes similar to what we tried to express with the word ‘explicit’.

and the linguistic and technical tools used to formulate the model and derive possible implications. 7. We begin with the ﬁrst aspect and broadly distinguish two types of model: a) statistical models which conceptually relate to distributions of statistical variables, and b) behavioral models which explicitly refer to the behavior of individuals. Almost all demographic models belong to the ﬁrst type. Assumptions concern, for example, the development of birth and death rates, or the number and age distribution of immigrants. Despite the possibility to metaphorically link such assumptions to the behavior of people, they conceptually relate to demographic processes formulated in terms of population sets. All further concepts used for the model formulation are derived thereof and do not relate to individual behavior. Such models are therefore concerned, not with individual behavior, but with the development, and relations between, quantities derived from statistical distributions. In this sense, all models discussed in subsequent chapters are statistical models. In contrast, we propose to speak of behavioral models only if a model explicitly refers to individuals and allows to reason about individual behavior. 8. The second aspect concerns the modal reasoning that a model is intended to serve. We broadly distinguish three groups of models: a) representational models whose purpose is to provide a view of something, b) analytical models which are used to provide a conceptual framework for reasoning about relationships and rules, and c) technical and political models that have the purpose to support people in the design and implementation of technical artefacts and institutions. 9. There is a great variety of representational models. For example, a map can be called a representational model as it is intended to provide a speciﬁc view of some area. Another example would be the model of a building that an architect plans to build. The model is then intended to provide a view of a building that might come into existence in the future. Of course, the model is not a description because there is no “future building” which could be described. The model is rather used to support reasoning about modal questions concerning the possible features of a building that might become realized in the future. The examples discussed in the present text mainly relate to statistical distributions. One of the techniques widely used in statistics to construct representational models is smoothing. An

52

5

MODAL QUESTIONS AND MODELS

5

MODAL QUESTIONS AND MODELS

53

example would be the construction of trends by smoothing a time series. As the purpose is to provide a speciﬁc view of a process one can rightly speak of a representational model. More involved examples concern the construction of representational models in situations where the available information is incomplete. One can think, e.g., of the construction of world maps in early times when substantial parts of the earth were not known by the map makers. A similar situation often occurs in the construction of statistical models. Examples which will be discussed in later chapters concern the estimation of statistical distributions when part of the available data is incomplete. Estimation procedures are then based on assumptions which might be wrong and cannot be tested with the available data. So one is actually concerned with a modal question concerning an unknown distribution. A further and somewhat more complicated example of this type will be discussed in Chapter 9 where we deal with data from surveys in which respondents provide information about birth and, possibly, death dates of their parents and the question concerns whether such data can be used to construct cohort life tables. As will be seen, this requires several assumptions which should be considered explicitly. We therefore continue, in Section 9.2, with the discussion of a simulation model to ﬁnd out in which way conclusions depend on alternative assumptions. 10. Since every model requires in some way a representation of its subject matter there is no sharp distinction between representational and analytical models. The distinction is mainly a question of emphasis. Nevertheless, there is a speciﬁc concern, not normally present in the construction of representational models, that justiﬁes a distinction. Analytical models, as we propose to understand this term, are intended to support speculations about relationships and rules. How this can be done depends on the conceptual framework. In the construction of a behavioral model one would need to think about relationships between individuals and rules of their behavior. This will not be further discussed because, in the present text, we only deal with statistical models. Relationships then concern statistical distributions or quantities derived thereof. Furthermore, because statistical models do not explicitly refer to individuals, it is not necessary to speculate about “behavioral rules” for the model’s objects. Instead, rules only concern the argumentation used to establish the model. As an example, we discuss in Section 13.3 a rudimentary model of the baby boom that occurred in Germany in the period 1955 –1965. The modal question motivating the model concerns “timing eﬀects” on the development of the number of newborn children. The model tries to add to an understanding of the baby boom by showing that, without certain “timing eﬀects”, a quite diﬀerent development might have occurred. This will not be a causal explanation. In fact, we use a statistical model without any reference to individuals who can bring about changes by their activities. Consequently, also the rules used in the argumentation do not refer to the behavior of

individuals. Instead, they solely concern logical implications of hypothetical assumptions which, in turn, relate to the distribution of birth events as represented by cumulated cohort birth rates. — More abstract versions of analytical models will be discussed in later chapters. The model introduced in Chapter 17 is not related to historical events, like the baby boom of the 1960s, but at the beginning only provides a general conceptual framework for reasoning about possible demographic processes. As will be seen, this framework can nevertheless be used to gain insights into how such processes “work”. Moreover, as we try to show in another chapter, the model also provides a starting point for a discussion of modal questions concerning the eﬀects of immigration on the demographic development in Germany. 11. A third group of models will be called technical and political models. Again, the distinction is to some extent a question of emphasis. As an example, one can think of the model created by the architect mentioned above. The model can be understood as representational because its immediate purpose is to provide a view of the building planned by the architect. However, given that the plan becomes realized, new modal questions arise: How can/should the work be done? As a consequence, also the model, or variants derived thereof, has to serve reasoning about these additional questions. It must be transformed into a technical model that can actually be used as a guide to realize the initial idea. Of course, depending on the kind of artefacts, or systems, in the original sense of this word, which are planned and possibly realized, there is a great variety in the details of corresponding models. An important distinction concerns whether such systems also contain human individuals who can act in their own right. If this is not the case, we propose to speak of technical models, otherwise of political models. However, this distinction is mainly important only for behavioral models. Statistical models and, in particular, the demographic models discussed in the present text are only concerned with assumptions about statistical distributions or quantities derived thereof. Distinctions concerning the behavior of objects, whether they should be considered as building materials or as actors, are therefore not directly relevant. Political questions only come into play when models are also used in political discussions and decision-making. Prominent examples would be models used for population projections. 12. Finally, models can be distinguished with regard to the linguistic tools used in their formulation. While most models use symbolic, mainly mathematical, notations, this is not an essential feature. As an example, one can think of Keynes’ “General Theory” that was originally developed without any usage of symbolic notation. However, the same example also shows that using symbolic notations can help in the understanding of a model and its potential use for reasoning about modal questions. Furthermore, using symbolic notations often allows an easier understanding of the assump-

54

5

MODAL QUESTIONS AND MODELS

tions built into a model, and supports the discussion of their implications. In particular, almost all statistical models employ symbolic notations as these are already available by the initial introduction of the conceptual framework. A further distinction concerns the technical means used to derive implications of the assumptions put into a model. The classical way is reasoning, supported by paper and pencil and, if available, further instruments. New methods became available by the modern computer. In particular, the computer allows to implement so-called simulation models. An example will be discussed in Section 9.2.1.

Part II

Data and Methods

Chapter 6

Basic Demographic Data
Based on the conceptual framework introduced in Part I, we now begin with a description and analysis of the demographic development in Germany. The present chapter provides a brief presentation of some basic ﬁgures concerning the number of people and then discusses age and sex distributions.

6.1

Data Sources

1. We begin with a few remarks about data sources.1 Most of the basic data are provided by oﬃcial statistics [amtliche Statistik], its central oﬃce in Germany being the Statistisches Bundesamt.2 In this and the following chapters we mainly rely on such data from oﬃcial statistics. Supplementary data from retrospective surveys will be used in later chapters. 2. Most demographic data published by oﬃcial statistics are based on two sources, censuses and population registers, corresponding loosely to the distinction between stock and ﬂow quantities (see Section 3.3). A census [Volksz¨hlung] is intended to provide information about the number of a people who live in a certain region at a speciﬁc date, so it is a kind of stocktaking.3 In contrast, population registers record events, in particular, births, deaths, marriages and migrations. Oﬃcial statistics uses both data sources. Since censuses only take place at greater temporal intervals, data from population registers are used to provide estimates of the population size in years between censuses. 3. Like the political history of Germany, also its history of censuses is quite irregular. A publication of the Statistisches Bundesamt that deals with the historical development of oﬃcial statistics in Germany provides the following information:
Nach der territorialen Neuordnung der Nachfolgestaaten des Heiligen R¨mio ” schen Reichs Deutscher Nation auf dem Wiener Kongreß wurde 1816 erstmals in Preußen innerhalb der neuen Grenzen eine Volksz¨hlung durchgef¨ hrt. Die andea u ren L¨nder des Deutschen Bundes f¨ hrten in der Folgezeit Volksz¨hlungen durch, a u a
1 For an extensive survey of sources of demographic data see the reports by Carola Schmid (1993 and 2000). 2 For

references to publications see Appendix A.1.

often, a census not only counts people but also records some of their properties, like age, sex, marital status and citizenship. For information about the questionnaire that was used in the latest census of 1987 see W¨rzberger, St¨rtzbach and St¨rmer u o u (1986).

3 Most

58

6

BASIC DEMOGRAPHIC DATA

6.2

NUMBER OF PEOPLE

59

deren Ergebnisse jedoch wegen der unterschiedlichen Erhebungszeitpunkte und der unterschiedlichen Abgrenzung der Merkmale kaum untereinander vergleichbar sind. Erst mit der Schaﬀung des Norddeutschen Zollvereins 1834 wurde im gr¨ßten Teil des sp¨teren Deutschen Reichs eine gr¨ßere Einheitlichkeit des Voro a o gehens erreicht. Von da an fand bis 1867 alle drei Jahre Anfang Dezember eine Volksz¨hlung in den Mitgliedsl¨ndern des Zollvereins statt. Die ubrigen deuta a ¨ schen L¨nder schlossen sich diesem Verfahren erst 1867 an, so daß am 3. Dezema a ber dieses Jahres erstmals in allen deutschen L¨ndern zum gleichen Zeitpunkt gez¨hlt wurde. Die n¨chste Volksz¨hlung erfolgte dann nach der Reichsgr¨ na a a u a dung, am 1. Dezember 1871. Vom 1. Dezember 1875 an wurden Volksz¨hlungen im F¨ nf-Jahres-Turnus durchgef¨ hrt. Die letzte Z¨hlung vor dem Ersten Weltu u a krieg war am 1. Dezember 1910. Danach vergingen fast 15 Jahre, bis am 16. Juni 1925 wieder eine das gesamte damalige Reichsgebiet umfassende Volksz¨hlung a stattﬁnden konnte. Eine vorher – im Oktober 1919 – durchgef¨ hrte Z¨hlung hatu a te, da die Verh¨ltnisse noch nicht wieder konsolidiert waren, nur behelfsm¨ßigen a a Charakter. Der mit der Z¨hlung 1925 wieder angestrebte F¨ nf-Jahres-Rhythmus a u konnte infolge der Weltwirtschaftskrise nicht eingehalten werden. So fand die a a a n¨chste Z¨hlung erst acht Jahre sp¨ter am 16. Juni 1933 statt, der im Abstand von sechs Jahren am 19. Mai 1939 die letzte Z¨hlung vor dem Ausbruch des a Zweiten Weltkrieges folgte. Die n¨chste Volksz¨hlung, die am 29. Oktober 1946 a a auf Anordnung der Besatzungsm¨chte durchgef¨ hrt wurde, konnte aus den gleia u chen Gr¨ nden wie die von 1919 die normalerweise geforderten Anspr¨ che nicht u u erf¨ llen, war aber f¨ r die Bew¨ltigung der damaligen Notsituation von großer u u a a Bedeutung. Es war die letzte Z¨hlung, die mit einem einheitlichen Erhebungsprogramm in den vier Besatzungszonen gleichzeitig stattfand. Ihr folgte am 13. September 1950 die erste Volksz¨hlung im Bundesgebiet. Weitere Volksz¨hluna a gen im Abstand von etwa zehn Jahren fanden am 6. Juni 1961 und am 27. Mai 1970 statt.“ (Statistisches Bundesamt 1972, p. 89)

erstmals der Begriﬀ der Wohnbev¨lkerung verwendet, der in etwa an den Bev¨lo o kerungsbegriﬀ zwischen 1834 und 1867 anschließt. Zur Wohnbev¨lkerung z¨hlo a ten alle Personen, die am Z¨hlungsstichtag im Z¨hlungsgebiet ihren st¨ndigen a a a Wohnsitz hatten, einschl. der vor¨ bergehend Abwesenden sowie ausschließlich u der vor¨ bergehend Anwesenden. Personen mit mehreren Wohnsitzen wurden an u dem Ort zur Bev¨lkerung gez¨hlt, an dem sie sich am Stichtag der Z¨hlung befano a a den. Davon abweichend wurden Untermieter (einschl. Hausangestellte, Sch¨ ler u und Studierende mit zweitem Wohnsitz) stets an ihrem Arbeits- bzw. Studienort zur Wohnbev¨lkerung gerechnet. Dieser Bev¨lkerungsbegriﬀ liegt, mit nur o o a unwesentlichen Abweichungen, allen seitherigen Volksz¨hlungen sowie der Bev¨lkerungsentwicklung zugrunde [. . .].“ (Statistisches Bundesamt 1972, p. 89) o

It might be added that the concept of population which is used by oﬃcial statistics in Germany has again slightly changed with the introduction of a new registration law [Meldegesetz] in 1983. 5. The second main source of demographic data are population registers. In Germany, diﬀerent kinds of such registers exist. To provide basic demographic data, the Statistisches Bundesamt mainly uses information from two such registers: a) Registers of births, deaths, and marriages, which are kept by oﬃces of local authorities, called Standesamt.5 b) Registers of residences, also kept by oﬃces of local authorities, called Einwohnermeldeamt. In addition, there is a central register for persons without a German citizenship, called Ausl¨nderzentralregister . Data a from these registers are used by the Statistisches Bundesamt for its statistics about internal and external migration.6

Since then, a further census in the territory of the former FRG took place on May 25, 1987. Censuses in the territory of the former GDR were perfomed in 1950 (August 31), 1964 (December 31), and 1981 (December 31). 4 4. A further question concerns the demarcation of people who are counted in a census. The just cited document provides the following information:
Die Z¨hlungen vor dem 3. Dezember 1867 hatten nicht immer einen einheita ” lichen Bev¨lkerungsbegriﬀ. In den durch Zollvertr¨ge miteinander verbundenen o a L¨ndern wurde zwischen 1834 und 1867 die sog. Zollabrechnungsbev¨lkerung festa o gestellt. Es handelt sich hierbei im wesentlichen um die dauerhaft wohnhafte Bev¨lkerung. Dieser Bev¨lkerungsbegriﬀ wurde 1863 dahingehend pr¨zisiert, daß o o a Personen, die l¨nger als ein Jahr abwesend waren, nicht zur Zollabrechnungsa bev¨lkerung gez¨hlt wurden. Bei der Z¨hlung 1867 wurde daneben erstmals o a a auch die ortsanwesende Bev¨lkerung festgestellt, d.h. alle Personen, die sich zum o Stichtag der Z¨hlung im Z¨hlungsgebiet aufhielten. Dieser Bev¨lkerungsbegriﬀ a a o stand in der Folgezeit im Vordergrund. Im Kaiserreich wurde die ortsanwesende Bev¨lkerung allein als maßgeblich nachgewiesen. Bei der Z¨hlung 1925 wurde o a
4 For

6.2

Number of People

1. Our ﬁrst question concerns the number of people who lived in Germany during its history. Data from oﬃcial statistics begin with the ﬁrst census in Preußen in 1816. A diﬃculty results from the fact that the political boundaries of Germany have often changed during its history; the latest change occurred in October 1990 through uniﬁcation with the former GDR (Deutsche Demokratische Republik). Since we are mainly interested in the development after World War II, it suﬃces to distinguish two territories: (a) the territory of the former FRG (Bundesrepublik Deutschland),7 and
5 These registers were introduced in 1875. For a history of corresponding laws and institutions see Sch¨ tz (1977). Registration forms as used by the Statistisches Bundesamt u have been published in Fachserie 1, Reihe 1, 1990 (pp. 312-323). 6 Fachserie 7 The

1, Reihe 1, 1999 (pp. 13 -14).

additional information see Schmid (1993, pp. 55 -57).

Saarland became part of the former FRG only in 1957. However, many time series from oﬃcial statistics include the Saarland also for the period 1950 – 1956.

60

6

BASIC DEMOGRAPHIC DATA

6.2

NUMBER OF PEOPLE

61

Table 6.2-1 Number of people (in 1000) in the territory of the former FRG. Source: Statistisches Jahrbuch 2001 (p. 44). t 1816 1819 1822 1825 1828 1831 1834 1837 1840 1843 1846 1849 1852 1855 1858 1861 1864 1867 1871 1880 1890 1900 1910 nt 13720 14150 14580 15130 15270 15860 16170 16570 17010 17440 17780 17970 18230 18230 18600 19050 19600 19950 20410 22820 25433 29838 35590 t 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1946 1947 1948 1949 1950 1951 1952 1953 nt 39017 39351 39592 39861 40107 40334 40527 40737 40956 41168 41457 41781 42118 42576 43008 46190 46992 48251 49198 49989 50528 50859 51350 t 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 nt 51180 52382 53008 53656 54292 54876 55433 56175 56837 57389 57971 58619 59148 59286 59500 60067 60651 61280 61697 61987 62071 61847 61574 t 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 nt 61419 61350 61382 61538 61663 61596 61383 61126 60975 61010 61077 61450 62063 63254 64074 64865 65534 65858 66156 66444 66647 66697 66834

70 60 50 40 30 20 10 0 1800 1850 1900 1950 2000

Fig. 6.2-1 Graphical presentation of the data from Table 6.2-1. The scale of the ordinate is in million.
90 80 70 60 50 40 30 20 10 0 1950

nt na t

PSfrag replacements (b) the territory of the former GDR (including the eastern part of Berlin). We simply speak of Germany when referring to both territories. 2. The data in Table 6.2-1 are taken from the Statistisches Jahrbuch 2001 (p. 44) and refer to the territory of the former FRG. The yearbook provides the following hints about sources. a) The ﬁgures for 1961, 1970, and 1987 are based on census data and relate to their target dates (June 6, 1961, May 27, 1970, and May 25, 1987). The remaining ﬁgures for the period since 1946 are estimates of the midyear population size and are derived from register data in connection with census data and data from the Wohnungsstatistik.8 the ﬁgures result from backward projections. The Statistisches Jahrbuch 2001 (p. 41) provides the following remarks: Bei den [. . .] f¨ r die Jahre 1950 bis u ” 1970 nachgewiesenen Fortschreibungszahlen handelt es sich um r¨ ckgerechnete Einu wohnerzahlen aufgrund der Ergebnisse der Wohnungsstatistik vom 25.9.1956 (1950 bis 1955), der Volksz¨hlung vom 6.6.1961 (1957 bis 1960) und der Volksz¨hlung vom a a 27.5.1970 (1962 bis 1969). Die f¨ r die Jahre ab 1970 bis einschl. 1986 nachgewieu senen Bev¨lkerungszahlen sind Fortschreibungsdaten, die von den Ergebnissen der o Volksz¨hlung 1970 ausgehen. Die ab 30.6.1987 nachgewiesenen Bev¨lkerungszahlen bea o ruhen auf den Ergebnissen der Volksz¨hlung 1987.“ a
8 Actually,

nb t
1960 1970 1980 1990 2000

Fig. 6.2-2 Graphical presentation of the data from Table 6.2-2. The scale of the ordinate is in million.

b) The sources of the ﬁgures for earlier periods are not explicitly documented. It can be supposed that the primary sources for the period since 1871 are censuses that have taken place in the years 1871, 1880, 1890, 1900, 1910, 1925, 1933, and 1939, and that the ﬁgures for years in between are estimates, possibly also based on additional register data. c) It might further be supposed that the ﬁgures for the period before 1871 are based on censuses that began 1816 in Preußen and were then periodically continued in 3-year intervals, with some delay also in other parts of the Zollverein. It is not clear, however, in which way the ﬁgures were adjusted to the territory of the former FRG. 3. In order to get a ﬁrst impression of the long-term development of the number of people in Germany the data of Table 6.2-1 are plotted in Figure 6.2-1. Since the data are unevenly spaced, we have represented each number

62

6

BASIC DEMOGRAPHIC DATA

6.3

BIRTHS AND DEATHS

63

Table 6.2-2 Number of people (in 1000) on the territory of the former FRG (na ) and the former GDR (nb ). Source: Statistisches Jahrbuch 2001 (p. 44). t t t 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 na t 49989 50528 50859 51350 51180 52382 53008 53656 54292 54876 55433 56175 56837 57389 57971 58619 59148 nb t 18388 t 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 na t 59286 59500 60067 60651 61280 61697 61987 62071 61847 61574 61419 61350 61382 61538 61663 61596 61383 nb t 17082 17084 17076 17058 17061 17043 16980 16925 16850 16786 16765 16756 16745 16737 16736 16697 16699 t 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 na t 61126 60975 61010 61077 61450 62063 63254 64074 64865 65534 65858 66156 66444 66647 66697 66834 nb t 16671 16644 16624 16641 16666 16614 16111 15910 15730 15645 15564 15505 15451 15405 15332 15253

Table 6.3-1 Births (bt ) and deaths (dt ) in the territory of the former FRG (ﬁrst four columns) and in the territory of the former GDR (last four columns); all counts in 1000. Source: Fachserie 1. Reihe 1, 1999 (pp. 43 -44). t 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 ba t 812.8 795.6 799.1 796.1 816.0 820.1 855.9 892.2 904.5 951.9 968.6 1012.7 1018.6 1054.1 1065.4 1044.3 1050.3 1019.5 969.8 903.5 810.8 778.5 701.2 635.6 626.4 600.5 602.9 582.3 576.5 582.0 620.7 624.6 621.2 594.2 584.2 586.2 626.0 642.0 677.3 681.5 727.2 722.2 720.8 717.9 690.9 681.4 702.7 711.9 682.2 664.0 da t 528.7 543.9 546.0 578.0 555.5 581.9 599.4 615.0 597.3 605.5 643.0 627.6 644.8 673.1 644.1 677.6 686.3 687.3 734.0 744.4 734.8 730.7 731.3 731.0 727.5 749.3 733.1 704.9 723.2 711.7 714.1 722.2 715.9 718.3 696.1 704.3 701.9 687.4 687.5 697.7 713.3 708.8 695.3 711.6 703.3 706.5 708.3 692.8 688.1 685.0 b a − da t t 284.1 251.7 253.1 218.1 260.6 238.3 256.5 277.2 307.2 346.4 325.7 385.1 373.7 381.1 421.3 366.7 364.0 332.1 235.8 159.1 76.0 47.9 -30.0 -95.4 -101.1 -148.7 -130.3 -122.6 -146.8 -129.7 -93.5 -97.6 -94.7 -124.2 -112.0 -118.1 -75.9 -45.4 -10.3 -16.2 13.9 13.4 25.5 6.3 -12.4 -25.1 -5.6 19.1 -5.9 -21.0 t 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 bb t 303.9 310.8 306.0 298.9 293.7 293.3 281.3 273.3 271.4 292.0 293.0 300.8 298.0 301.5 291.9 281.1 268.0 252.8 245.1 238.9 236.9 234.9 200.4 180.3 179.1 181.8 195.5 223.2 232.2 235.2 245.1 237.5 240.1 233.8 228.1 227.6 222.3 226.0 215.7 198.9 178.5 107.8 88.3 80.5 78.7 83.8 93.3 100.3 102.9 106.7 db t 219.6 208.8 221.7 212.6 219.8 214.1 212.7 225.2 221.1 229.9 233.8 222.7 234.0 222.0 226.2 230.3 225.7 227.1 242.5 243.7 240.8 235.0 234.4 232.0 229.1 240.4 233.7 226.2 232.3 232.7 238.3 232.2 228.0 222.7 221.2 225.4 223.5 213.9 213.1 205.7 208.1 202.4 190.2 185.6 181.4 178.1 174.5 167.5 164.3 161.3 b b − db t t 84.3 102.0 84.3 86.3 73.9 79.2 68.6 48.1 50.3 62.1 59.2 78.1 64.0 79.5 65.7 50.8 42.3 25.7 2.7 -4.8 -3.9 -0.1 -34.0 -51.6 -49.9 -58.6 -38.2 -3.1 -0.2 2.5 6.9 5.3 12.1 11.1 7.0 2.3 -1.3 12.1 2.6 -6.8 -29.6 -94.7 -101.9 -105.1 -102.7 -94.2 -81.2 -67.3 -61.4 -54.6

18178 18059 17944 17716 17517 17355 17298 17241 17125 17102 17155 16992 17028 17066

from Table 6.2-1 by a separate dot. Of course, this is just a ﬁrst impression and we need to analyze more carefully the components, births, deaths, and migrations, that contributed to the overall picture. Here we only add some information about the development in the two parts of Germany after World War II. Table 6.2-2 presents the basic ﬁgures taken again from Statistisches Jahrbuch 2001 (p. 44), Figure 6.2-2 gives a graphical presentation. nt , the number of people in both territories, is calculated by adding na and nb . Notice that the demarcation of the two territories t t has not been changed after October 1990, the eastern part of Berlin is considered a part of the former territory of the GDR.

6.3

Births and Deaths

1. The number of people living in some territory changes with births, deaths, and migration. We therefore should consider these components in order to get a better understanding of the demographic development in Germany. We begin with a consideration of births and deaths in the postWorld War II period. The basic ﬁgures as published by the Statistisches Bundesamt are shown in Table 6.3-1. Following our general convention, we denote the number of births and deaths that occurred during a year t by bt and dt , respectively. Additional indices are used to distinguish between (a) the territory of the former FRG and (b) the territory of the former GDR. Figure 6.3-1 provides a graphical view of these data. 2. Comparing Figures 6.2-2 and 6.3-1, one observes that the turning points in the development of the number of people roughly corresponds to peri-

64

6

BASIC DEMOGRAPHIC DATA

6.3

BIRTHS AND DEATHS

65

1000

Table 6.3-2 Crude birth rates (CBR) and crude death rates (CDR) in Germany. Data for the period 1841 – 1943 refer to the territory of the Deutsches Reich with varying boundaries (see footnote 11); data for the period 1946 – 1999 refer to the territory of the former FRG. Sources: Statistisches Bundesamt, Bev¨lkerung und Wirtschaft 1872 – 1972 (pp. 101-103), and Fachserie 1, Reihe o 1, 1999 (p. 50).
Year
CBR CDR

Year 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920

CBR

CDR

Year 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960

CBR

CDR

Year 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

CBR

CDR

500

0 1950

1960

1970

1980

1990

2000

Fig. 6.3-1 Births (solid line) and deaths (dotted line) in the territories of the former FRG (upper part) and the former GDR (lower part). The scale of the ordinate is in 1000. The data are taken from Table 6.3-1

ods where the number of births is above, or below, the number of deaths (diﬀerences are due to migration). Most remarkable is the high amount of variation in the number of births. In the territory of the former FRG the “baby boom” of the sixties is followed by a substantial decline in number of births. In the territory of the former GDR occurred a huge decline in the number of births during the years following the uniﬁcation in 1989. 3. A closer analysis will be given in later chapters. In fact, since simple time series data do not take into account changes in the age distribution of the population, possible conclusions are quite limited. For example, it is not possible to derive any safe conclusions about changes in the length of life and conditions of mortality from the time series shown in Figure 6.3-1. It might well be that a growing number of deaths as shown in this ﬁgure for the ﬁrst two decades is accompanied by an increase in the mean life length. This will be further discussed in Chapter 7. 4. The development of birth and death rates in the recent German history should also be related to a broader historical and international context. While we are not able to provide an adequate discussion in the present text, we only present some data that roughly indicate some long-term changes.9
9 There are several studies which provide extensive discussions of the long-term demographic development in Germany. For an introduction, see Marschalck (1984). A thorough discussion of the fertility decline in the period 1871–1939 was given by Knodel (1974).

1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880

36.4 37.6 36.0 35.9 37.3 36.0 33.3 33.3 38.1 37.2 36.7 35.5 34.6 34.0 32.2 33.3 36.0 36.8 37.5 36.3 35.7 35.4 37.5 37.8 37.6 37.8 36.8 36.8 37.8 38.5 34.5 39.5 39.7 40.1 40.6 40.9 40.0 38.9 38.9 37.6

26.2 27.1 26.9 24.5 25.3 27.1 28.3 29.0 27.1 25.6 25.0 28.4 27.2 27.0 28.1 25.2 27.2 26.8 25.7 23.2 25.6 24.6 25.7 26.2 27.6 30.6 26.1 27.6 26.9 27.4 24.6 29.0 28.3 26.7 27.6 26.3 26.4 26.2 25.6 26.0

37.0 37.2 36.6 37.2 37.0 37.1 36.9 36.6 36.4 35.7 37.0 35.7 36.8 35.9 36.1 36.3 36.1 36.1 35.9 35.6 35.7 35.1 33.8 34.0 33.0 33.1 32.3 32.1 31.0 29.8 28.6 28.3 27.5 26.8 20.4 15.2 13.9 14.3 20.0 25.9

25.5 25.7 25.9 26.0 25.7 26.2 24.2 23.7 23.7 24.4 23.4 24.1 24.6 22.3 22.1 20.8 21.3 20.5 21.5 22.1 20.7 19.4 20.0 19.6 19.8 18.2 18.0 18.1 17.2 16.2 17.3 15.6 15.0 19.0 21.4 19.2 20.6 24.8 15.6 15.1

25.3 23.0 21.1 20.5 20.7 19.5 18.4 18.6 17.9 17.5 16.0 15.1 14.7 18.0 18.9 19.0 18.8 19.6 20.4 20.0 18.6 14.9 16.0

13.9 14.4 13.9 12.3 11.9 11.7 12.0 11.6 12.6 11.1 11.2 10.8 11.2 10.9 11.8 11.8 11.7 11.6 12.3 12.7 12.0 12.0 12.1

16.1 16.4 16.5 16.8 16.2 15.7 15.7 15.5 15.7 15.7 16.1 16.6 16.7 17.3 17.4

13.0 12.1 10.5 10.4 10.5 10.8 10.7 11.3 10.7 11.1 11.3 11.5 11.0 11.0 11.6

18.0 17.9 18.3 18.2 17.7 17.6 17.0 16.1 14.8 13.4 12.7 11.3 10.3 10.1 9.7 9.8 9.5 9.4 9.5 10.1 10.1 10.1 9.7 9.5 9.6 10.3 10.5 11.0 11.0 11.5 11.3 11.1 11.0 10.5 10.3 10.5 10.7 10.2 9.9

11.2 11.3 11.7 11.0 11.5 11.5 11.5 12.2 12.2 12.1 11.9 11.8 11.8 11.7 12.1 11.9 11.5 11.8 11.6 11.6 11.7 11.6 11.7 11.4 11.6 11.5 11.3 11.2 11.2 11.3 11.1 10.7 10.9 10.7 10.7 10.6 10.4 10.3 10.3

66

6

BASIC DEMOGRAPHIC DATA

6.4

ACCOUNTING EQUATIONS

67

40

1000

30

500

20

0

10 -500 1950 0 1840 1860 1880 1900 1920 1940 1960 1980 2000 1960 1970 1980 1990 2000

Fig. 6.3-2 Crude birth rates (solid line) and crude death rates (dotted line) for the period 1841–1999 in Germany, based on the data shown in Table 6.3-2

Fig. 6.4-1 Balance of migration in the territory of the former FRG (solid line) and the territory of the former GDR (dotted line). The scale of the ordinate is in 1000. Data are taken from Tables 6.4-1 and 6.4-2.

refer to the end of each year, we can use the second variant: nt+1 = nt = nt + bt − dt + mi − mo t t (6.4.1)

We simply use crude birth and death rates which have been published by the Statistisches Bundesamt for most years since 1841.10 Table 6.3-2 shows the data.11 The crude birth rates and the crude death rates are calculated per 1000 of the midyear population. Figure 6.3-2 provides a visual impression. The plot impressively shows the long-term decline of both, the crude birth rates and the crude death rates. The plot also shows that, until about 1970, birth rates were most often quite higher than death rates (a dramatic exception is only during the years of World War I).

In this equation, t refers to calendar years, and nt and nt denote, respectively, the population size at the beginning and end of the year t. These are the stock quantities. The other symbols represent ﬂow quantities, that is, number of events which occurred during the year: births (bt ), deaths (dt ), in-migration (mi ), and out-migration (mo ). t t 2. Values for nt are shown in the second column of Tables 6.4-1 and 6.4-1; the ﬁrst table refers to the territory of the former FRG, the second table to the territory of the former GDR.12 Both tables also show the number of births (bt ) and deaths (dt ) which are identical with the entries in Table 6.31. These data are already suﬃcient to calculate the balance of migration. As implied by the accounting equation (6.4.1), one gets (mi − mo ) = (nt − nt ) − (bt − dt ) t t (6.4.2)

6.4

Accounting Equations

1. Changes in the size of a population are a result of births, deaths, and migration. The basic relationships can be expressed by accounting equations. As was discussed in Section 3.3, there are two variants. Since the Statistisches Bundesamt has also published population size data which
10 The 11 For

crude birth and death rates are deﬁned, respectively, as bt /nt and dt /nt , multiplied by 1000.

This equation has been used to calculate the entries in the last column of Tables 6.4-1 and 6.4-2. 3. Figure 6.4-1 shows the development of the balance of migration as given in the last column of Tables 6.4-1 and 6.4-2, respectively. For the territory of the former FRG, the in-migration exceeded the out-migration for most years. Until 1961 when the GDR closed its borders, the excess
12 These

the period 1841 – 1943, the source uses the term ‘Reichsgebiet’ and, for the years 1938 to 1943, provides the remark: “Gebietsstand 31.12.1937.” (Statistisches Bundesamt 1972, p. 103) For the years 1871–1918 (and presumably also for previous years, see Statistisches Jahrbuch f¨ r das Deutsche Reich 1919, p. 2), the data refer to the territory u of the Deutsches Reich. Notice that, for the years before 1871, other sources sometimes provide diﬀerent ﬁgures which refer to the territory of the Deutscher Zollverein.

data are published in Fachserie 1, Reihe 1, 1999 (p. 30).

68

6

BASIC DEMOGRAPHIC DATA

6.4

ACCOUNTING EQUATIONS

69

Table 6.4-1 Population changes in the territory of the former FRG. All counts in 1000. Source: Fachserie 1, Reihe 1, 1999 (p. 30 and p. 43). t 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 nt 50336.1 50726.0 51051.9 51639.6 52126.8 52698.3 53318.8 53993.8 54606.0 55123.4 55784.8 56589.1 57247.2 57864.5 58587.5 59296.6 59792.9 59948.5 60463.0 61194.6 61001.2 61502.5 61809.4 62101.4 61991.5 61644.6 61442.0 61352.7 61321.7 61439.3 61657.9 61712.7 61546.1 61306.7 61049.3 61020.5 61140.5 61238.1 61715.1 62679.0 63725.7 64484.8 65289.2 65739.7 66007.2 66342.0 66583.4 66688.0 66747.3 bt 795.6 799.1 796.1 816.0 820.1 855.9 892.2 904.5 951.9 968.6 1012.7 1018.6 1054.1 1065.4 1044.3 1050.3 1019.5 969.8 903.5 810.8 778.5 701.2 635.6 626.4 600.5 602.9 582.3 576.5 582.0 620.7 624.6 621.2 594.2 584.2 586.2 626.0 642.0 677.3 681.5 727.2 722.2 720.8 717.9 690.9 681.4 702.7 711.9 682.2 664.0 dt 543.9 546.0 578.0 555.5 581.9 599.4 615.0 597.3 605.5 643.0 627.6 644.8 673.1 644.1 677.6 686.3 687.3 734.0 744.4 734.8 730.7 731.3 731.0 727.5 749.3 733.1 704.9 723.2 711.7 714.1 722.2 715.9 718.3 696.1 704.3 701.9 687.4 687.5 697.7 713.3 708.8 695.3 711.6 703.3 706.5 708.3 692.8 688.1 685.0 b t − dt 251.7 253.1 218.1 260.6 238.3 256.5 277.2 307.2 346.4 325.7 385.1 373.7 381.1 421.3 366.7 364.0 332.1 235.8 159.1 76.0 47.9 -30.0 -95.4 -101.1 -148.7 -130.3 -122.6 -146.8 -129.7 -93.5 -97.6 -94.7 -124.2 -112.0 -118.1 -75.9 -45.4 -10.3 -16.2 13.9 13.4 25.5 6.3 -12.4 -25.1 -5.6 19.1 -5.9 -21.0 mi − mo t t 138.2 72.8 369.6 226.6 333.2 364.0 397.8 305.0 171.0 335.7 419.2 284.4 236.2 301.7 342.4 132.3 -176.5 278.7 572.5 -269.4 453.4 336.9 387.4 -8.8 -198.2 -72.3 33.3 115.8 247.3 312.1 152.4 -71.9 -115.2 -145.4 89.3 195.9 143.0 487.3 980.1 1032.8 745.7 778.9 444.2 279.9 359.9 247.0 85.5 65.2 219.9

Table 6.4-2 Population changes in the territory of the former GDR. All counts in 1000. Source: Fachserie 1, Reihe 1, 1999 (p. 30 and p. 44). t 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 nt 18388.2 18350.1 18300.1 18112.1 18001.5 17832.2 17603.6 17410.7 17311.7 17285.9 17188.5 17079.3 17135.9 17181.1 17003.6 17039.7 17071.4 17089.9 17087.2 17074.5 17068.3 17053.7 17011.3 16951.3 16890.8 16820.2 16767.0 16757.9 16751.4 16740.3 16739.5 16705.6 16702.3 16701.5 16660.0 16640.1 16639.9 16661.4 16674.6 16433.8 16027.6 15789.8 15685.4 15598.4 15531.4 15475.5 15428.7 15369.4 15289.7 bt 310.8 306.0 298.9 293.7 293.3 281.3 273.3 271.4 292.0 293.0 300.8 298.0 301.5 291.9 281.1 268.0 252.8 245.1 238.9 236.9 234.9 200.4 180.3 179.1 181.8 195.5 223.2 232.2 235.2 245.1 237.5 240.1 233.8 228.1 227.6 222.3 226.0 215.7 198.9 178.5 107.8 88.3 80.5 78.7 83.8 93.3 100.3 102.9 106.7 dt 208.8 221.7 212.6 219.8 214.1 212.7 225.2 221.1 229.9 233.8 222.7 234.0 222.0 226.2 230.3 225.7 227.1 242.5 243.7 240.8 235.0 234.4 232.0 229.1 240.4 233.7 226.2 232.3 232.7 238.3 232.2 228.0 222.7 221.2 225.4 223.5 213.9 213.1 205.7 208.1 202.4 190.2 185.6 181.4 178.1 174.5 167.5 164.3 161.3 b t − dt 102.0 84.3 86.3 73.9 79.2 68.6 48.1 50.3 62.1 59.2 78.1 64.0 79.5 65.7 50.8 42.3 25.7 2.7 -4.8 -3.9 -0.1 -34.0 -51.6 -49.9 -58.6 -38.2 -3.1 -0.2 2.5 6.9 5.3 12.1 11.1 7.0 2.3 -1.3 12.1 2.6 -6.8 -29.6 -94.7 -101.9 -105.1 -102.7 -94.2 -81.2 -67.3 -61.4 -54.6 mi − mo t t -140.1 -134.3 -274.3 -184.5 -248.5 -297.2 -241.0 -149.3 -87.9 -156.6 -187.3 -7.4 -34.3 -243.2 -14.7 -10.6 -7.2 -5.4 -7.9 -2.3 -14.5 -8.4 -8.4 -10.6 -12.0 -14.9 -6.0 -6.3 -13.6 -7.7 -39.2 -15.4 -11.9 -48.5 -22.2 1.1 9.4 10.6 -234.0 -376.6 -143.1 -2.5 18.1 35.7 38.3 34.4 8.0 -18.3 -17.8

70

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

71

of in-migration mainly resulted from people who came from the GDR into the FRG. A similar movement has taken place in the years immediately following the uniﬁcation in 1989. One should note that our data refer separately to the territories of the former FRG and GDR and also after 1989 include migrations between both territories. 4. Table 6.4-1 shows that, beginning in 1972, in most of the following years the number of deaths exceeded the number of births. Thus, without an excess of in-migration, the population size would have declined in this period. It is diﬃcult, however, to answer the modal question, Which development of population sizes would have taken place in the absence of migration? Simply subtracting (mi − mo ) from nt+1 will not give a convincing answer t t because, in the absence of migration, the development of births and deaths would also have been diﬀerent. So we will postpone a discussion of such modal questions to a later chapter.

2. Since the number of persons is often very large it would be senseless to print the full table. So the next step is to extract relevant information. The statistical approach, as discussed in Section 4.2, always begins with a calculation of frequencies. The ﬁrst step is to ﬁnd the realized property space ˜t A∗ := At (Ωt ) = {At (ω) | ω ∈ Ωt } ˜ that is, the set of all elements of A which occur at least once in the population Ωt . This already provides a ﬁrst piece of information about the range of the variable. In a second step, one can calculate for each element ˜t of A∗ the frequency of its occurrence in the population Ωt . Since we are ˜ concerned here with ages, we will refer to the elements of A∗ by the letter t τ , corresponding to our convention to generally denote ages by τ . For the calculation of frequencies one needs to distinguish between absolute and ˜ relative frequencies. The absolute frequency of some age value τ ∈ A∗ is t simply the number of persons in Ωt of age τ . Using the notation introduced in Section 4.2, they can be written as P∗ [At ](τ ) = | {ω ∈ Ωt | At (ω) = τ } | The corresponding relative frequencies are P[At ](τ ) = P∗ [At ](τ ) nt

6.5

Age and Sex Distributions

From a demographic point of view, the two most important characteristics of people are their age and their sex. We therefore discuss in the remaining sections of this chapter how to construct, and graphically present, statistical distributions of these characteristics in a population.

6.5.1

Age Distributions

1. Both, age and sex, can be represented by statistical variables. We begin with age. Referring to a population Ωt , one can deﬁne a statistical variable ˜ At : Ωt −→ A := {0, 1, 2, 3, . . .} that provides, for each person ω ∈ Ωt , a value At (ω) which is the age of the person in the temporal location t. We will assume that age is measured in ˜ completed years, so the elements of the property space A are to be interpreted as completed years. Given this conceptual approach, corresponding data could be provided in a table as follows: ω ω1 . . . ωnt At (ω) At (ω1 ) . . . At (ωnt ) (6.5.1)

Of course, it will always be true by deﬁnition that P∗ [At ](τ ) = nt
˜ τ ∈A ∗ t

and
˜ τ ∈A ∗ t

P[At ](τ ) = 1

˜ Having performed the calculations for all values in A∗ , one gets a statistical t distribution, in this example, an age distribution for the population Ωt . This distribution provides the required statistical information and can be tabulated or graphically presented. 3. Actually, most data from oﬃcial statistics are already in the form of statistical distributions. To continue with our example, the latest data currently available for the age distribution in Germany are published by the Statistisches Bundesamt in Fachserie 1, Reihe 1, 1999 (pp. 64-65).13 These data refer to the midyear population size in the year 1999 and
13 After

where nt denotes the number of persons in Ωt . Each person is identiﬁed by a (ﬁctitious) name, often some arbitrary identiﬁcation number, given in the ﬁrst column. The second column contains the corresponding value of the variable At , in this example, the person’s age.

having written this section, an update was published in the STATIS database of the Statistisches Bundesamt. Some of these data will be used in a later chapter for the construction of life tables.

72

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

73

Table 6.5-1 Midyear population size (in 1000) in the year 1999 in Germany by age and sex; age in completed years. Source: Fachserie 1, Reihe 1, 1999 (pp. 64-65). τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 nt,τ 777.9 800.2 805.7 785.1 776.6 797.6 823.0 848.6 908.6 947.0 954.9 957.9 939.3 915.4 898.5 901.6 919.5 933.2 938.6 924.6 903.8 902.7 902.2 893.1 897.5 918.5 977.7 1085.3 1172.2 1248.5 1328.6 1380.5 1419.5 1445.6 1464.1 1473.5 1447.0 1414.9 1386.3 1347.1 1293.6 1250.0 1225.0 1194.7 1170.0 1145.7 nm t,τ 399.6 410.8 413.8 403.1 398.8 409.8 422.1 435.2 466.4 486.0 490.1 492.5 482.6 469.6 461.2 463.2 472.8 479.9 481.4 473.3 462.2 461.0 460.3 456.3 458.8 469.2 500.6 557.1 603.4 644.1 686.0 712.4 732.7 748.1 758.2 761.7 746.6 727.4 711.2 690.9 663.6 641.1 627.2 610.0 594.4 578.8 nf t,τ 378.3 389.4 391.9 382.0 377.7 387.9 400.9 413.4 442.1 461.0 464.8 465.4 456.7 445.8 437.3 438.4 446.7 453.2 457.2 451.3 441.6 441.7 442.0 436.7 438.6 449.3 477.1 528.2 568.8 604.4 642.6 668.1 686.8 697.5 705.9 711.7 700.4 687.5 675.1 656.2 630.1 608.9 597.8 584.7 575.7 566.9 τ 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90∗ Total nt,τ 1132.0 1124.4 1120.0 1107.1 1047.1 977.2 892.9 790.2 867.9 1000.4 997.3 1089.7 1228.5 1252.0 1199.8 1121.1 1069.6 1037.5 983.4 854.9 760.9 764.5 789.4 793.7 773.4 736.0 694.2 674.7 635.6 593.0 583.6 588.2 571.0 472.7 317.8 230.7 221.5 247.9 292.7 297.7 265.3 224.2 186.3 155.8 481.4 82086.6 nm t,τ 570.3 566.1 563.9 558.6 529.9 494.2 450.6 397.3 434.9 502.2 500.9 545.7 612.5 621.7 593.3 551.2 522.7 502.8 472.3 409.8 361.2 359.3 365.7 362.4 348.0 319.3 283.8 258.7 228.3 202.3 196.3 193.2 180.1 144.3 96.6 69.0 65.1 70.4 79.5 78.1 67.9 55.5 44.3 35.8 106.6 40048.0 nf t,τ 561.7 558.4 556.0 548.5 517.2 483.0 442.2 393.0 433.0 498.2 496.4 544.1 616.0 630.3 606.6 569.9 546.9 534.7 511.1 445.1 399.7 405.2 423.6 431.2 425.4 416.7 410.4 416.0 407.4 390.7 387.4 395.0 391.0 328.4 221.2 161.8 156.4 177.5 213.2 219.6 197.4 168.7 142.0 120.0 374.8 42038.6

record age in completed years. We will use the following abbreviations: nt,τ := number of persons in t being of age τ nm := number of men in t being of age τ t,τ nf := number of women in t being of age τ t,τ We also deﬁne
∞ ∞

nm := t τ =0

nm t,τ

and nf := t τ =0

nf t,τ

to denote the total number of men and women, respectively. The total population size is then given by nt = nm + nf . t t 4. Table 6.5-1 shows values (in 1000) for the year t = 1999. This table is actually a cross-tabulation with respect to age and sex, but for the moment we are only interested in a simple classiﬁcation by age, that is, in the values of nt,τ . These are absolute frequencies, and the relationship with our previous notation is therefore given by nt,τ = P∗ [At ](τ ). Of course, one immediately also gets relative frequencies: P[At ](τ ) = nt,τ nt

The last age category in Table 6.5-1, denoted by 90∗ , comprises age 90 and all higher ages. The frequency for this age category is therefore not directly comparable with the other frequencies. However, one can safely assume that
∞

nt,90∗ = τ =90

nt,τ

since this equation directly follows from the additivity of frequencies. 5. Summing up the values for nt,τ shows that nt = 82086600 which is the midyear number of people living in Germany in 1999. Compared with an original raw data ﬁle, Table 6.5-1 is obviously much smaller. However, even the condensed frequency table is diﬃcult to survey. How can one extract the information in the table in a more comprehensible form? One possibility is to aggregate the property space; this will be discussed in Section 6.5.4. Here we use a graphical display, called frequency curves, which does not require aggregation. The basic idea is simple: One employs a twodimensional coordinate system and uses the horizontal axis (abscissa) to represent the elements of the property space and the vertical axis (ordinate) to represent the (absolute or relative) frequencies. As an example, Figure 6.5-1 shows the age distribution, in terms of absolute frequencies

74
1500

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

75

matrix ω St (ω) St (ω1 ) . . . St (ωnt ) (6.5.2)

1000

ω1 . . . ωnt

500

one can calculate absolute and relative frequencies. Using the data from Table 6.5-1, and the abbreviations introduced in the previous section, one ﬁnds for t = 1999 the values P∗ [St ](0) = nm = 40048.0 t
0 10 20 30 40 50 60 70 80 90 100

0

P[St ](0) = P[St ](1) =

nm t = 0.488 nt nf t = 0.512 nt

Fig. 6.5-1 Plot of a frequency curve that represents the age distribution in Germany in the year 1999. Data are taken from Table 6.5-1, omitting category 90∗ . The scale of the ordinate is in 1000.

P∗ [St ](1) = nf = 42038.6 t

which show that there are somewhat more female than male persons who are currently living in Germany. 2. We now have two frequency distributions, one for age and another one for sex, but this will not allow, for example, to compare the age distribution of men and women. Also knowing the data in the form of (6.5.1) and (6.5.2) will not suﬃce because one cannot know whether the individual members of Ωt have been given the same names (identiﬁcation numbers) in both tables. So one needs a diﬀerent starting point, namely a statistical variable that provides, for each person in Ωt , simultaneously both an age and a sex. This is formally expressed by a two-dimensional variable ˜ ˜ (A, S)t : Ωt −→ A × S This variable is called two-dimensional because it consists of two components and correspondingly has a two-dimensional property space being ˜ ˜ the cartesian product of A and S. Written explicitly: ˜ ˜ A × S = {(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), . . .} Each element of the property space is now a pair of values where the ﬁrst value indicates the age and the second value indicates the sex. For example, (A, S)t (ω) = (27, 1) would mean that ω is a male individual of age 27. Of course, given a two-dimensional variable one can derive two onedimensional variables, in our example, one variable for age and another one for sex. The important point, however, is that given two one-dimensional variables, it will normally not be possible to reconstruct a two-dimensional variable. 3. This also means that the values of a two-dimensional variable must be tabulated simultaneously. Instead of two separate forms, like (6.5.1) and

nt,τ , in the form of a frequency curve. This plot may then be used as a starting point for further interpretation. 6. Note that the age distribution in a given year is the result of demographic events which occurred over a long period of time, beginning about 100 years ago when the oldest persons still living were born. In this sense an age distribution is to be viewed as a transitory result of a long-term demographic process. In general, the number of people being of age τ in year t results from the number born τ years before t and having survived until age τ .14 This relates the age categories to historically earlier periods. For example, the low frequency of people of age 53 can be traced back to the year 1946 and related to the low number of births in that year.

6.5.2

Decomposition by Sex

1. Instead of age, one can use any other property space to construct a statistical distribution. As a second example, and in order to explain the idea of a two-dimensional distribution, we refer to people’s sex. So we set up another statistical variable ˜ St : Ωt −→ S := {0, 1} ˜ which assigns to each member ω ∈ Ωt a value St (ω) ∈ S which is 0 for men and 1 for women. Assuming that data are given in the form of a data
14 Of

course, people may have been born anywhere and become members of a population set Ωt by immigration.

76

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

77

(6.5.2), one needs a table that provides, for each individual ω ∈ Ωt , a value of the two-dimensional variable (A, S)t . Of course, the organization of the table is not important as long as one can identify for each person both its age and sex. So we might formally identify the two-dimensional variable (A, S)t with a pair of two one-dimensional variables, (At , St ), and organize the table as follows: ω ω1 . . . ωnt At (ω) At (ω1 ) . . . St (ω) St (ω1 ) . . . (6.5.3)

space with, respectively, the rows and columns of a frequency table. In our example, the table would then look as follows: τ 0 1 2 . . .
∗

s=0
∗

s=1 (6.5.5)

P [At , St ](0, 0) P [At , St ](0, 1) P∗ [At , St ](1, 0) P∗ [At , St ](1, 1) P∗ [At , St ](2, 0) P∗ [At , St ](2, 1) . . . . . .

At (ωnt ) St (ωnt )

In this example we have used absolute frequencies; it should be obvious, however, that the same kind of table can also be used to represent relative frequencies. 4. It is now easily seen that Table 6.5-1 that was used in Section 6.5.1 to report the data as published by the Statistisches Bundesamt is actually a combination of two frequency tables. The ﬁrst one, already discussed in the previous section, consists of the ﬁrst two columns. This part of the table refers to a one-dimensional age distribution, that is, tabulates the values of a function τ −→ nt,τ = P∗ [At ](τ ) In addition, the ﬁrst, third, and fourth columns document a twodimensional frequency distribution which refers simultaneously to age and sex. Obviously, this part of the table is organized in the same way as shown schematically in (6.5.5) and may be explicitly written as a function in the following way: (τ, s) −→ P∗ [At , St ](τ, s) = nm t,τ nf t,τ if s = 0 if s = 1

This is often called a cross-tabulation of the variables At and St . However, the table should not be viewed as providing values for two variables separately. The important point is that we want to construct a two-dimensional distribution which provides a frequency for each element in the combined ˜ ˜ ˜ ˜ property space A × S. Given any element (τ, s) ∈ A × S, we want to calculate the number of individuals in Ωt who are of age τ and sex s, that is, the frequency P∗ [At , St ](τ, s) := {ω ∈ Ωt | (A, S)t (ω) = (τ, s)} The corresponding relative frequencies are then given by P[At , St ](τ, s) := P∗ [At , St ](τ, s) nt

These frequencies can ﬁnally be documented in a frequency table. For our example, such a table might be organized as follows: τ 0 0 1 1 . . . s 0 1 0 1 . . . P∗ [At , St ](τ, s) P [At , St ](0, 0) P∗ [At , St ](0, 1) P∗ [At , St ](1, 0) P∗ [At , St ](1, 1) . . .
∗

P[At , St ](τ, s) P[At , St ](0, 0) P[At , St ](0, 1) P[At , St ](1, 0) P[At , St ](1, 1) . . .

(6.5.4)

5. Again, the question arises how to represent the data in a more accessible way. A general approach is to consider conditional distributions. For a two-dimensional distribution, the distribution of one of its components in sub-populations deﬁned by speciﬁc values of the other component are ˜ considered. To illustrate, in our example one can use the elements of S to deﬁne two sub-populations Ωm := {ω ∈ Ωt | St (ω) = 0} and Ωf := {ω ∈ Ωt | St (ω) = 1} t t the ﬁrst one comprising all male individuals and the second one all female individuals in Ωt . This then allows to deﬁne separate distributions of At for the two sub-populations: P[At |St = s](τ ) := P[At , St ](τ, s) P[St ](s)

Each row shows the absolute and relative frequencies for one speciﬁc ele˜ ˜ ment in the combined property space A × S. Notice that this way of organizing a frequency table can also be used for distributions having three or more dimensions. For two-dimensional distributions, another possibility would be to associate the elements of the two components of the property

78
1000

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

79

100 proportion of women

500

50

sex ratio

proportion of men 0 0 10 20 30 40 50 60 70 80 90 100 0 0 10 20 30 40 50 60 70 80 90 100

Fig. 6.5-2 Plot of two frequency curves that represent the age distribution in Germany at the end of 1999 for men (solid line) and women (dotted line). Data are taken from Table 6.5-1, omitting category 90∗ . The scale of the ordinate is in 1000.

Fig. 6.5-3 Dependency of sex ratios, and male-female proportions (in percent), on age in Germany, 1999; calculated from Table 6.5-1.

6.5.3

Male-Female Proportions

Speciﬁcally, in our example, P[At |St = 0](τ ) = nm /nm is the age distrit,τ t bution in sub-population Ωm , and P[At |St = 1](τ ) = nf /nf is the age t t,τ t distribution in sub-population Ωf . t 6. Conditional distributions are most often expressed in terms of relative frequencies. It is easy, however, to derive corresponding absolute frequencies. One simply has to multiply P[At |St = y](τ ) by the number of people in the sub-population deﬁned by St = y, explicitly written: P∗ [At |St = y](τ ) := P[At |St = y](τ ) P∗ [St ](y) The third column of 6.5-1 provides values for P∗ [At |St = 0](τ ) = nm and t,τ the fourth column provides values for P∗ [At |St = 1](τ ) = nf . t,τ 7. In the same way as was done in the previous section, one can use frequency curves to get a visual impression of the distributions. In order to allow for a comparison, one can plot both curves in the same coordinate system as shown by Figure 6.5-2.15 There are obviously some remarkable diﬀerences in the age distribution of men and women, especially in the older ages.
15 In

1. Figure 6.5-2 shows that the proportion of male and female individuals in a society depends on age. As a natural starting point one can consider the number of male births per 100 female births. This is called the sex ratio at birth. For an arbitrary age the deﬁnition is: Sex ratio at age τ := number of men at age τ number of women at age τ (multiplied by 100)

It has often been found that the sex ratio at birth is about 105 or 106. For example, in 1999 in Germany the number of male births was 396296 and the number of female births was 374448,16 so one ﬁnds a sex ratio of 105.8. 2. Another possibility is to use proportions. We will use the notations σt,m := proportion of male birth in year t σt,f := proportion of female birth in year t Referring again to the births in Germany in 1999, one ﬁnds σ1999,m = 0.514 and σ1999,f = 0.486. Given the sex ratio, one can calculate the male proportion by the sex ratio divided by 100 plus the sex ratio, and correspondingly for females. 3. Male and female proportions also depend on age. This can be seen by comparing age distributions for men and women as in Figure 6.5-2.
16 Fachserie

much of the demographic literature, one often ﬁnds a slightly diﬀerent graphical presentation, called a population pyramid , which results from drawing the age distributions of men and women in opposite directions. To allow for an easy comparison, we prefer to plot the frequency curves in the same coordinate system.

1, Reihe 1, 1999 (p. 42).

80

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

81

Alternatively, one can calculate sex ratios, or, equivalently, proportions as functions of age. Based on the data in Table 6.5-1, this is shown in Figure 6.5-3. The changes in higher ages mainly result from diﬀerent mortality rates of men and women; this will be further discussed in the next chapter. In Germany, an additional source of variation is due to in-migration.

frequencies are given by P∗ [A∗ ](˜∗ ) = t aj τ ∈˜ ∗ aj

P∗ [At ](τ )

and P[A∗ ](˜∗ ) = t aj τ ∈˜ ∗ aj

P[At ](τ )

respectively. To illustrate, using the data from Table 6.5-1, we ﬁnd the following frequencies: a∗ ˜ a∗ ˜1 a∗ ˜2 a∗ ˜3 a∗ ˜4 a∗ ˜5 a∗ ˜6 {0, . . . , 5} {6, . . . , 18} {19, . . . , 30} {31, . . . , 64} {65, . . . , 79} {80, . . .} P∗ [A∗ ](˜∗ ) P[A∗ ](˜∗ ) t a t a 4743.1 11886.1 12154.7 40095.6 10285.8 2921.3 0.0578 0.1448 0.1481 0.4885 0.1253 0.0356 (6.5.6)

6.5.4

Aggregating Age Values

1. A statistical distribution shows, for each value in a property space, its frequency in a population. The problem of comprehensible representation of statistical distributions will therefore depend on the number of elements of a property space. If the number is small, as in the case of sex, one can simply report the frequencies in the form of a small table. If, on the other hand, the number of values is large as, for example, in the case of age, frequency tables are not easily surveyed and one needs some additional means for the presentation of a distribution. In the foregoing sections we have used frequency curves to provide graphical displays. In this section we discuss an approach that relies on the aggregation of the elements in a property space. 2. The problems which arise from a large number of diﬀerent elements of a property space are solved by merging several of them into classes of properties. Formally, this means that a property space is partitioned into classes and these classes are then considered as elements of a new property ˜ space. To illustrate, we use the property space A for age in completed years as introduced in Section 6.5.1. Its values can be partitioned into classes, for example: a∗ := {0, . . . , 5} ˜1 a∗ := {6, . . . , 18} ˜2 a∗ := {19, . . . , 30} ˜3 a∗ := {31, . . . , 64} ˜4 a∗ := {65, . . . , 79} ˜5 a∗ := {80, . . .} ˜6

This will be called an aggregated frequency table. Of course, the expression ‘aggregated’ is to be understood as referring to the original frequency table from which the aggregated table is derived.

6.5.5

Age Distributions since 1952

These age classes can then be considered as elements of a new property space ˜ a1 ˜ 2 ˜ 3 ˜ 4 ˜ 5 ˜ 6 A∗ := {˜∗ , a∗ , a∗ , a∗ , a∗ , a∗ } ˜ which, in turn, can be used to deﬁne a new variable A∗ : Ωt −→ A∗ . It t should be obvious how its values are derived from the values of the original variable At . For any ω ∈ Ωt , if At (ω) = τ and τ ∈ a∗ , then A∗ (ω) = a∗ . ˜j ˜j t Less formally, if an individual is of age τ , it is assigned the age class that contains τ . 3. The distribution of is easily derived from the distribution of At ˜ because frequencies are additive. For each a∗ ∈ A∗ , its absolute and relative ˜j A∗ t

1. Any age distribution refers to a certain historical time t. How do age distributions change with time? We base our description of the German development through the last 50 years on data available from the STATIS data base of the Statistisches Bundesamt (see Appendix A.1). The data set refers to the territory of the former FRG and covers the years 1952 to 1998.17 For each of these years, there is an age distribution (in completed years) both for men and women. Using previously introduced notations, the absolute frequencies are given as values for nm and nf . In order to t,τ t,τ simplify the presentation, we disregard sex and use nt,τ = nm + nf . t,τ t,τ Schematically, the data ﬁle then looks as follows: τ 0 1 2 . . . 90∗
17 Segment

1952 n1952,0 n1952,1 n1952,2 . . . n1952,90∗

··· ··· ··· ··· ···

1998 n1998,0 n1998,1 n1998,2 . . . n1998,90∗

36, as updated in June 2000.

82
1500

6

BASIC DEMOGRAPHIC DATA

6.5

AGE AND SEX DISTRIBUTIONS

83

1 0.9 65 - 79

80+

1000 0.8 500 0.7 31 - 64 0.6 0 0 0.02 0.4 0.3 0.01 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 100 0 1950 6 - 18 0-5 1960 1970 1980 1990 2000 19 - 30 10 20 30 40 50 60 70 80 90 100 0.5

Fig. 6.5-4 Age distribution in the territory of the former FRG in 1998 (solid line) and 1952 (dotted line). In the upper plot, the frequency curves refer to absolute frequencies (in 1000), in the lower plot they refer to relative frequencies.

Fig. 6.5-5 Development of the age distribution in the territory of the former FRG, from 1952 to 1998. Shown are proportions in six age classes as indicated on the right-hand side of the graphic.

The property space for age is the same as in Table 6.5-1; the last value, 90∗ , represents an open-ended class that comprises ages of 90 and above. 2. To begin with, we compare the age distribution in the years 1952 and 1998 with the help of frequency curves. The result is shown in Figure 6.5-4. Since population size has changed during this period,18 the ﬁgure provides curves both for absolute frequencies (in the upper plot) and relative frequencies (in the lower plot). Obviously, there is a remarkable increase in the number of people in higher ages, both in terms of absolute and relative frequencies. The comparison also shows that the marked irregularities in the age distributions are mainly due to the development of the demographic process since World War I. 3. It remains the question of how to get an impression of the year-to-year changes in the age distribution for the whole period from 1952 to 1998. It
18 Based

is obviously not sensible to provide frequency curves separately for each of these years. The general question is how to describe a time series of frequency distributions. A simple approach would be to characterize each distribution by its mean value and then to plot the development of these means. This would show how the mean age of the population has changed over the period. In our application we ﬁnd that this mean age continuously increased from 38 years in 1952 to 44 years in 1998.19 4. However, just to report the development of mean ages provides only limited information. To provide additional information, we use the method of aggregation discussed in the previous section. To illustrate, the same partition into 6 age classes is used. Calculating relative frequencies for these age classes, results in proportions P[A∗ ](˜∗ ), . . . , P[A∗ ](˜∗ ) t a1 t a6
19 For

(t = 1952, . . . , 1998)

on this data ﬁle, the population size (in 1000) is 51620 in 1952 and 66697 in

1998.

these calculations we have assumed an age of 90 for all people in the age class 90∗ . Other assumptions would result in slightly higher mean age values.

84

6

BASIC DEMOGRAPHIC DATA

For each year, these proportions sum to unity and can be displayed in a plot of proportions as shown in Figure 6.5-5. The plot impressively shows the rise of the proportion of elderly people during the period from 1952 to 1998.

Chapter 7

Mortality and Life Tables
The development of a demographic process primarily depends on birth and death events. In this chapter, we discuss methods that have been proposed to quantify mortality and illustrate these methods with data from German oﬃcial statistics. We begin with death events because birth events require more complicated methods. The reason is simply that each person must eventually die, and will only die once, while a women can give birth to several children.

7.1

Mortality Rates

1. A simple way to quantify mortality is to count the number of deaths which occur in a year and relate this to the midyear size of the population in that year. This is called a crude death rate or crude mortality rate. 1 Using previously introduced notations, the deﬁnition is Crude death rate := dt nt (multiplied by 1000)

where the temporal index t most often refers to calendar years. How these crude death rates have developed in Germany has already been shown in Section 6.3. 2. An obvious problem with crude death rates is that they do not take into account changes in the age distribution of a population, or diﬀerences between the age distributions of two or more populations that are to be compared with respect to mortality. Since mortality is highly dependent on age, a better starting point is to calculate age-speciﬁc death rates which will be denoted by δt,τ := dt,τ nt,τ

In this deﬁnition, dt,τ denotes the number of people who died in year t at the age of τ , and nt,τ is the midyear estimate of the number of people in year t being of age τ . Furthermore, since mortality is also diﬀerent for men and women, we use the notations m δt,τ :=

dm t,τ nm t,τ

and

f δt,τ :=

df t,τ nf t,τ

1 In publications of the Statistisches Bundesamt this is called allgemeine Sterbeziﬀer . Some authors also use the term ‘rohe Sterblichkeitsrate’, or ‘rohe Mortalit¨tsrate’. a

86

7

MORTALITY AND LIFE TABLES

7.1

MORTALITY RATES

87

Table 7.1-1 Midyear population and number of deaths in Germany 1999, subdivided by age (τ ); 95∗ comprises all ages τ ≥ 95. Source: Segments 685 and 1124-26 of the STATIS data base of the Statistisches Bundesamt. τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 nm t,τ 399633 410782 413836 403107 398813 409761 422128 435168 466447 485976 490076 492537 482637 469636 461216 463159 472798 479914 481413 473334 462189 460967 460272 456346 458828 469223 500605 557051 603444 644108 686013 712393 732651 748125 758218 761731 746559 727433 711239 690941 663561 641087 627205 610005 594357 578806 570259 566062 dm t,τ 1979 173 126 90 79 52 70 66 76 56 67 71 76 79 114 145 194 314 487 454 435 468 412 401 441 377 443 473 533 528 614 627 666 740 809 872 1066 1076 1126 1300 1334 1418 1572 1719 1788 1983 2055 2223 nf t,τ df t,τ τ 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95∗ nm t,τ 563942 558611 529900 494176 450618 397251 434885 502194 500889 545671 612495 621650 593275 551237 522738 502833 472336 409837 361225 359314 365748 362427 347956 319299 283797 258704 228253 202335 196282 193182 180053 144258 96563 68960 65068 70399 79463 78125 67852 55469 44330 35770 27775 20817 15772 11724 8658 21807 dm t,τ 2422 2555 2745 2691 2873 2432 3182 4012 4244 5189 5906 6970 7303 7359 7819 8422 8835 8207 7990 8984 10288 11122 11690 11439 10934 11017 10847 10404 11027 12292 12610 12452 7484 6478 6763 7840 10704 11001 10565 9517 8191 7471 6466 5174 4072 3170 2446 4871 nf t,τ 556038 548502 517217 482985 442241 392979 432973 498201 496426 544075 615985 630333 606556 569910 546881 534670 511058 445060 399667 405178 423645 431229 425406 416670 410392 416006 407353 390714 387353 395011 390951 328396 221190 161774 156443 177521 213228 219593 197433 168730 141995 120032 97204 75582 58386 42726 30973 69931 df t,τ 1222 1320 1354 1414 1421 1287 1618 2023 2010 2493 2831 3345 3514 3497 3692 3985 4451 4173 4074 4766 5572 6279 6858 7489 8366 9404 10147 11172 12516 14616 16287 17041 10678 9951 11125 13397 20226 22124 22470 21892 20816 19758 18053 15700 13384 10696 8332 20949

300 250 200 150 100 50 0 0 10 20 30 40 50 60 70 80 90 100

378251 1517 389437 137 391872 83 381972 63 377741 54 387869 41 400905 43 413392 52 442107 44 461036 54 464833 42 465361 42 456705 54 445784 50 437317 65 438441 84 446710 108 453245 148 457174 159 451301 168 441633 137 441747 156 441953 125 436715 118 438622 135 449325 155 477112 143 528212 169 568797 200 604417 214 642576 235 668147 263 686810 308 697478 345 705906 365 711736 431 700436 433 687469 539 675052 559 656171 629 630073 667 608883 717 597762 786 584657 858 575662 892 566937 980 561714 1097 558353 1136

Fig. 7.1-1 Age-speciﬁc death rates (per 1000) for men (solid line) and women (dotted line) in Germany 1999, calculated from Table 7.1-1. The plot is restricted to ages less than 95.

to refer to age-speciﬁc death rates for men and women, respectively. Like crude death rates, also age-speciﬁc death rates are often multiplied by 1000, this is marked by adding a tilde: f m ˜m ˜f ˜ δt,τ := 1000 δt,τ , δt,τ := 1000 δt,τ , δt,τ := 1000 δt,τ

3. To illustrate the calculations, we use data for the year 1999 shown in Table 7.1-1.2 For an age of 60, one ﬁnds ˜m δ1999,60 = 7303 = 12.31 and 593.275 ˜f δ1999,60 = 3514 = 5.79 606.556

Performing these calculations for all ages, the resulting death rates can be visualized as shown in Figure 7.1-1. The ﬁgure clearly shows how mortality varies with age, and also shows that, in older ages, death rates of men are higher than death rates of women. So the ﬁgure also conﬁrms that, when investigating mortality, one should take into account age and sex. 4. The higher mortality of male individuals already begins in an early age. This is hidden in Figure 7.1-1 because death rates are generally very low until an age of about 50. The rates for the range of ages from 0 to 50 are shown in Figure 7.1-2. The ﬁgure shows that signiﬁcant diﬀerences in the death rates already begin at an age of about 15. It seems plausible that at least some part of the higher mortality of male individuals is also due to diﬀerent behavior and/or socio-economic conditions.
2 Values for the midyear population correspond to those given in Table 6.5-1; of course, the seemingly exact values in Table 7.1-1 result from population projections and should be understood as estimates.

88
6 5 4 3 2 1 0 0 10 20

7

MORTALITY AND LIFE TABLES

7.1

MORTALITY RATES

89

Table 7.1-2 Death rates for speciﬁed age groups in Germany. For all years, the ﬁgures refer to the territories of both the former FRG and the former GDR. Source: Fachserie 1, Reihe 1, 1999 (p. 230). Age Men 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 – – – – – – – – – – – – – – – – – – – 0 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 1952 59.661 2.242 0.818 0.686 1.266 1.883 1.862 2.062 2.701 3.785 5.951 10.068 15.428 23.805 36.701 58.846 96.879 158.387 242.165 352.626 1952 47.075 1.773 0.533 0.398 0.720 1.095 1.282 1.550 2.153 2.864 4.225 6.311 9.494 15.436 27.302 48.909 86.029 142.401 220.005 328.131 1970 25.242 1.058 0.600 0.493 1.395 1.731 1.616 1.814 2.443 3.725 5.736 9.155 15.240 26.353 44.226 69.332 102.786 153.799 232.039 341.384 1970 19.089 0.836 0.389 0.299 0.561 0.616 0.703 0.903 1.425 2.237 3.606 5.328 7.869 13.050 23.033 41.437 73.536 126.783 204.513 317.971 1990 8.223 0.469 0.243 0.243 0.819 1.097 1.133 1.467 2.026 2.816 4.797 7.539 12.526 19.486 30.838 47.465 80.038 129.312 197.655 309.298 1990 6.209 0.377 0.196 0.165 0.325 0.405 0.434 0.626 1.035 1.515 2.440 3.497 5.697 9.196 15.301 25.386 48.149 87.257 152.491 268.712 1999 4.952 0.288 0.144 0.170 0.672 0.938 0.848 0.950 1.495 2.497 3.960 6.036 9.458 15.038 25.068 38.892 64.168 103.216 166.030 245.878 1999 4.011 0.219 0.111 0.111 0.297 0.305 0.335 0.446 0.755 1.308 2.062 3.127 4.561 6.912 11.813 20.360 37.852 70.286 126.282 232.427

30

40

50

Fig. 7.1-2 Age-speciﬁc death rates (per 1000) for men (solid line) and women (dotted line) in Germany 1999, restricted to ages from 0 to 50. Calculated from Table 7.1-1.

5. A consideration of age-speciﬁc death rates also shows why crude death rates may suggest misleading conclusions. While the crude death rates, as shown in Figure 6.3-1, have increased until about 1970, the age-speciﬁc death rates actually declined during this period. This is shown by the ﬁgures in Table 7.1-2. For example, for men, the crude death rate was 12.0 in 1952 and 13.1 in 1970, but in almost all age groups the age-speciﬁc death rates were lower in 1970 than in 1952. 6. The crude death rate can be written as a weighted mean of age-speciﬁc death rates. Denoting by at,τ := nt,τ /nt the proportion of persons of age τ , in year t, the relationship is as follows: dt = nt τ dt,τ

Age Women 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 – – – – – – – – – – – – – – – – – – – 0 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89

nt

=

τ

dt,τ = nt

τ

dt,τ nt,τ = nt,τ nt

τ

δt,τ at,τ

The equation shows that the crude death rate depends on both, the agespeciﬁc death rates δt,τ , and the age distribution given by at,τ . This suggests that, in order to compare death rates for diﬀerent years, or among diﬀerent territories, one can use standardized death rates which refer to a common age distribution. As an example, Table 7.1-3 shows crude and standardized death rates for Germany, as calculated by the Statistisches Bundesamt. The standardization is based on the age distribution in 1995; the standardized death rates shown in this table are therefore calculated with the following formula: Standardized death rate for year t = τ δt,τ a1995,τ

In contrast to the crude death rates, the standardized death rates declined

90

7

MORTALITY AND LIFE TABLES

7.2

MEAN AGE AT DEATH

91

Table 7.1-3 Crude and standardized death rates in Germany. The standardized death rates are based on the age distribution in 1995. Source: Fachserie 1, Reihe 1, 1999 (p. 55). Crude death rates Year 1952 1960 1970 1980 1990 1999 Male 12.0 13.2 13.1 12.2 11.1 9.8 Female 10.1 11.0 12.0 12.1 12.1 10.8 Both 11.1 12.0 12.6 12.1 11.6 10.3 Standardized death rates Male 14.8 15.2 14.9 13.3 11.5 9.2 Female 20.9 20.0 18.2 15.1 12.7 10.3 Both 17.9 17.6 16.6 14.2 12.1 9.8

20000

10000

over the whole period from 1952 to 1999.3 These rates, therefore, summarize the development of the age-speciﬁc death rates.

0 0 10 20 30 40 50 60 70 80 90 100

7.2

Mean Age at Death

1. Another approach to summarize information about mortality uses either mean life length or mean age at death. In order to deﬁne these concepts one needs to refer to a population. Mean age at death refers to all people who died in a speciﬁc year, while mean life length refers to birth cohorts, that is, to sets of people born in the same year. Actually, many calculations of “life expectations” neither follow the ﬁrst nor the second of these two approaches. In fact, they do not refer to any population at all but construct a ﬁctitious distribution for the length of life with the help of a period life table. This will be discussed in Section 7.3. In the present section we brieﬂy discuss the calculation of mean age at death. The discussion of mean life length will be postponed because most often one then needs to take into account incomplete observations. 2. We will denote the set of people who died in year t by Implied by the general framework introduced in Chapter 3, this is a subset of Ωt . We can then formally deﬁne a variable ˜ A† : Ω† −→ A := {0, 1, 2, . . .} t t which provides, for each person ω ∈ Ω† , an age at death, A† (ω). This then t t implies a statistical distribution for the variable A† , and its mean value, t M(A† ) = t ω∈Ω† t
3 One

Fig. 7.2-1 Frequency curves showing the distribution of age at death for men (solid line) and women (dotted line) who died in Germany in 1999; curves are restricted to ages less than 95. Calculated from Table 7.1-1.

is the mean life length of the people in Ω† . In the following, we use t this conceptual framework but additionally distinguish men and women: Ω† = Ω†,m ∪ Ω†,f . The corresponding variables will be denoted by A†,m t t t t and A†,f , respectively. t 3. Table 7.1-1 provides information on Ω†,m and Ω†,f , the sets of, re1999 1999 spectively, men and women who died in Germany in 1999. Absolute frequencies for the variables A†,m and A†,f are given by the entries dm and t,τ t t df . Summing up these values, one ﬁnds t,τ dm := | Ω†,m | = 390742 and df := | Ω†,f | = 455588 t t t t and, by dividing the entries by these numbers, one immediately also ﬁnds the relative frequencies: P[A†,m ](τ ) = t dm t,τ dm t and P[A†,f ](τ ) = t df t,τ df t

Ω† . t

A† (ω) / |Ω† | t t

4. In a next step, one can visualize the distributions of A†,m and A†,f t t by frequency curves. Using absolute frequencies, the curves are shown in Figure 7.2-1. It is seen that the curves are remarkably diﬀerent for men and women and also depend on the age distribution in 1999.4 Since age is
4 This is true, in particular, for ages around 80; see the age distributions in Figure 6.5-2.

exception is male mortality in the period following World War II.

92

7

MORTALITY AND LIFE TABLES

7.2

MEAN AGE AT DEATH

93

recorded in completed years, it seems sensible to use the formula
∞

1

(τ + 0.5) P[A† ](τ ) = M(A† ) + 0.5 t t τ =0

suitably modiﬁed for men and women, to calculate the mean age at death. However, an obvious problem concerns the open-ended age class beginning at age 95 in Table 7.1-1. One needs some estimate, say a, for the mean age of the people who died at an age equal to, or greater than, 95. This would then allow to rewrite the formula as
94

0.5

0 0 10 20 30 40 50 60 70 80 90 100

(τ + 0.5) P[A† ](τ ) + a P[A† ](95∗ ) t t τ =0

Using a = 95.5 would result in a minimal mean life length. It might be more reasonable to use a somewhat higher estimate. To see the dependency on a, we calculate the ﬁrst term which is 69.46 for men and 74.48 for women. Using the proportions of men and women in the 95∗ age class, one gets the formulas 69.46 + a 0.0125 and 74.48 + a 0.0460 Assuming a = 95.5, the mean age at death would be 70.7 for men and 78.9 for women. However, also a much higher value of a would only slightly increase the estimates. Thus it seems safe to believe that the mean age at death in Germany in 1999 is about 71 years for men and 80 years for women. 5. We brieﬂy mention another possibility to report some kind of mean value for a distribution which is called its median and refers to the distribution function of a statistical variable. We ﬁrst introduce the notion of a ˜ distribution function. If X : Ω −→ X is any statistical variable, its distribution function is deﬁned as a function F [X] : R −→ R which associates with each number x ∈ R the proportion of elements of Ω having a value of the variable X which is less than, or equal to, x. In a formal notation: F [X](x) := {ω ∈ Ω | X(ω) ≤ x} |Ω|

Fig. 7.2-2 Distribution functions for the variables and which record the age at death in Germany 1999. The upper curve refers to men, the lower curve to women, both curves are restricted to ages less than 95. Calculated from Table 7.1-1.
1

A†,m t

A†,f t

0.5

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 7.2-3 Distribution functions for the variables (solid line) and A†,f (dotted line) which record the age at death in Germany 1999; t both curves are restricted to ages less than 95. Calculated from Table 7.1-1.

A†,m t

Therefore, a distribution function is sometimes also called a cumulated frequency function. 6. Using the data from Table 7.1-1, distribution functions of the variables A†,m and A†,f are plotted in Figure 7.2-2. As seen from this ﬁgure, dist t tribution functions are step functions with jumps at the elements of a variable’s realized property space.5 Of course, if the realized property space contains many values which are near together, it might be sensible, for better visibility, to connect the function values by linear line segments.
5 The ﬁgure shows the distribution function only for age values less than 95 since, from the data in Table 7.1-1, we do not know where the function approaches unity. By the deﬁnition of a distribution function, one only knows that F (∞) = 1.

which also shows that the values of a distribution function are always ˜ between 0 and 1. If the number of elements in the property space X is ﬁnite (which is always the case in the examples of this text), there is a simple relationship between a distribution function and relative frequencies: F [X](x) = x≤x ˜

P[X](˜) x

94

7

MORTALITY AND LIFE TABLES

7.3

LIFE TABLES

95

For our example, this is shown in Figure 7.2-3. This ﬁgure also illustrates the notion of median: If F [X] denotes the distribution function of a statistical variable X, its median is deﬁned as a number, say mx , such that F [X](mx ) ≈ 0.5.6 Using this deﬁnition, one ﬁnds from Figure 7.2-3 that the median life length is about 72 years for men and 82 years for women. These are somewhat higher than the mean values calculated above since most of the frequencies occur in the upper right part of the distribution. 7. The median of a distribution can be interpreted in the following way: about half of the population has property values below and another half has property values above that number. In our example, about half of the men who died in 1999 died at ages below 72 years. One might notice that the calculation of a median does not require complete knowledge about a distribution. Contrary to the calculation of mean values discussed above, the median life length is quite independent of the form of the distribution function below and above its median. In particular, in our example, one does not need any assumptions about the mean age of the people who died in ages higher than 90 years.

˜ any case, T will be considered as a discrete time axis representing temporal locations 0, 1, 2, . . . which might be days, months, or years. Therefore, if T (ω) = t, this means that the event terminating ω’s duration occurs somewhere in the temporal location t, and the duration amounts to t completed time units. 2. Since T is a statistical variable, it has a statistical distribution deﬁned by a frequency function P[T ](t) = {ω ∈ Ω | T (ω) = t} |Ω|

˜ For each t ∈ T , P[T ](t) is the proportion of individuals in Ω for whom the ˜ variable T has the value t. For example, if T refers to life length, P[T ](t) would be the proportion of individuals whose life length is t. 3. As already discussed in Section 7.2, the distribution of a statistical variable can also be described by a distribution function. Applied to the duration variable T , values of the distribution function are given by F [T ](t) = {ω ∈ Ω | T (ω) ≤ t} |Ω|

7.3

Life Tables

Mean age at death refers to people who died in a speciﬁc year. Another approach is to think in terms of life length of people born in the same year or period. This leads to the idea of life tables. As will be discussed later one has to distinguish cohort and period life tables. In order to prepare this discussion we ﬁrst introduce the notion of duration variables.

where now t ∈ R is any real number. One may notice that both functions, P[T ] and F [T ], provide the same information because one can be derived from the other. If the frequency function is given, then F [T ](t) = t ≤t

P[T ](t )

7.3.1

Duration Variables

˜ On the other hand, if t is any value in T , then P[T ](t) = F [T ](t) − F [T ](t − 1) if t > 0, and P[T ](0) = F [T ](0). 4. A further concept often used in a discussion of duration variables is called a survivor function and denoted by G[T ]. We will use the following deﬁnition: G[T ](t) := {ω ∈ Ω | T (ω) ≥ t} |Ω|

1. Life length is just one example of duration data. In general, duration data can refer to almost any kind of duration, for example, job durations and marriage durations. In this section, before continuing with a discussion of mortality, we introduce deﬁnitions and notations which are helpful to deal not only with life length but with other kinds of duration data as well. The starting point is a general duration variable ˜ T : Ω −→ T := {0, 1, 2, 3, . . .} which is deﬁned for some population Ω. For each individual ω ∈ Ω, the ˜ ˜ variable T records a duration T (ω) ∈ T . As mentioned, T can refer to life length, job duration, marriage duration, or any other kind of duration. In
6 Since distribution functions are step functions, there normally is no unique number mx such that F [X](mx ) exactly equals 0.5. For practical computations, an often used approach is to sort the values of a variable in ascending order and then to choose the mid-value, if the number of data is uneven, or otherwise the mean of two neighboring mid-values.

where t can be any real number.7 Again, F [T ] and G[T ] provide the same
7 In

the literature one also ﬁnds a slightly diﬀerent deﬁnition: ˛ ˛ ˛{ω ∈ Ω | T (ω) > t}˛ G[T ](t) := = 1 − F [T ](t) |Ω|

This deﬁnition was used, for example, by Rohwer and P¨tter (2001, p. 198). The defo inition given above is preferred for the present text because it better suits a discrete time axis.

96

7

MORTALITY AND LIFE TABLES

7.3

LIFE TABLES

97

information. F [T ](t) is the proportion of individuals whose duration is less than, or equal to, t; and G[T ](t) is the proportion of individuals whose ˜ duration is greater than, or equal to, t. For example, if T refers to life length, G[T ](70) would be the proportion of people still alive at age 70. 5. Finally, one can characterize the distribution of a duration variable by a rate function. A rate function ˜ r[T ] : X −→ R ˜ associates to each duration t ∈ X a number {ω ∈ Ω | T (ω) = t} r[T ](t) := {ω ∈ Ω | T (ω) ≥ t} The numerator is the number of individuals in Ω whose duration is t, and the denominator is the number of individuals with a duration not less than t. For example, assuming that T refers to life length, if the number of individuals still alive at age 90 is 1000 and, of these people, 100 die at age 90, then the rate for t = 90 would be r[T ](90) = 100/1000 = 0.1 6. Another way to interprete rates is in terms of events, in this example, in terms of death events. One can deﬁne a risk set R(t) := {ω ∈ Ω | T (ω) ≥ t} containing all individuals who still might experience the event (which, in turn, deﬁnes the duration) in t; and also an event set E(t) := {ω ∈ Ω | T (ω) = t} containing the members of R(t) who actually experienced the event in t. The deﬁnition of a rate as given above is then equivalent to r[T ](t) = | E(t) | | R(t) |

On the other hand, assume that the rate function is given. Since always G[T ](0) = 1, the survivor function may be written in the form G[T ](t) = G[T ](t) G[T ](t − 1) G[T ](1) ··· G[T ](t − 1) G[T ](t − 2) G[T ](0)

However, since the factors can also be written as G[T ](t − 1) − P[T ](t − 1) G[T ](t) = = 1 − r[T ](t − 1) G[T ](t − 1) G[T ](t − 1) it follows that G[T ](t) = (1 − r[T ](t − 1)) (1 − r[T ](t − 2)) · · · (1 − r[T ](0)) t−1 = j=0 (1 − r[T ](j))

(7.3.1)

Therefore, given the rate function, one can derive the survivor function, and consequently also the frequency and distribution functions.

7.3.2

Cohort and Period Life Tables

1. An often used method to record mortality data is the construction of a life table [Sterbetafel]. There are two variants: a) A cohort life table records the mortality of a birth cohort and refers to the historical period during which members of the birth cohort lived. b) A period life table is derived from the age-speciﬁc mortality rates of one or more consecutive years and, consequently, reﬂects the mortality conditions of these years. 2. The construction of a cohort life table refers to a birth cohort, say Ct0 , whose members are born in the year t0 . One can think, then, of a duration variable ˜ Tt0 : Ct0 −→ T = {0, 1, 2, 3, . . .} that records, for each individual ω ∈ Ct0 , its life length Tt0 (ω). A life table is then simply a table that describes the distribution of Tt0 , most often in terms of a survivor function or a rate function. 3. Actually, most life tables, and in particular life tables published by oﬃcial statistics, are period life tables. One reason is that period life tables are better suited to keep track of mortality conditions as they are changing from year to year. In contrast, a cohort life table would refer to a relatively long historical period. For example, a life table for persons

7. We mention that a rate function provides the same information about the distribution of T as the frequency function P[T ], the distribution function F [T ], and the survivor function G[T ]. First, since | E(t) | = P[T ](t) |Ω| and | R(t) | = G[T ](t) |Ω| one directly ﬁnds that r[T ](t) = P[T ](t) G[T ](t)

98

7

MORTALITY AND LIFE TABLES

7.3

LIFE TABLES

99

born in 1900 would be the result of all changes in mortality conditions that occurred during the whole last century. A second reason is that it is more diﬃcult to ﬁnd suitable data for cohort life tables. In the remainder of the present section we therefore concentrate on period life tables. Some approaches to construct cohort life tables will be discussed in Chapter 8. 4. A period life table refers to a population of people who live during a period t. For the moment, we will assume that t refers to a speciﬁc year and denote the population by Ωt . Most of the members of Ωt will be still alive in the next year, t + 1, but some will die during the year t. This can be represented by a two-dimensional statistical variable ˜ ˜ (At , Dt ) : Ωt −→ A × D ˜ A is a property space for age in completed years, so At (ω) is the age of ω ˜ in the year t, measured in completed years; and D := {0, 1} is the property space for variable Dt which is used to record whether a person dies during the year t or survives to the next year: Dt (ω) := 1 if ω dies during the year t 0 otherwise

beginning in day 0, the proportion of people dying during the year τ is given by δt,τ . This implies: lt,1 = lt,0 (1 − δt,0 ) lt,2 = lt,1 (1 − δt,1 ) lt,3 = lt,2 (1 − δt,2 ) and, in general, τ −1

lt,τ = lt,τ −1 (1 − δt,τ −1 ) = lt,0 j=0 (1 − δt,j )

until, eventually, all members of the ﬁctitious cohort are dead.9 The construction of a period life table basically consists in performing these calculations and presenting the results in a table where the essential columns are: the age τ , the age-speciﬁc death rates δt,τ , and the number of people still alive at age τ . 6. Alternatively, one can think in terms of a ﬁctitious duration variable, Tt , that has a distribution deﬁned by the rate function r[Tt ](τ ) := δt,τ

For example, (At , Dt )(ω) = (50, 1) would mean that ω died at age 50 during the year t; and (At , Dt )(ω) = (50, 0) would mean that ω is of age 50 in year t but survived to the following year. Given this two-dimensional variable, one can deﬁne age-speciﬁc death rates. If nt,τ = |{ω ∈ Ωt | Xt (ω) = τ }| is the number of persons in Ωt who are of age τ in the year t, and dt,τ = |{ω ∈ Ωt | Xt (ω) = τ, Dt (ω) = 1}| is the number of persons in Ωt who died during the year t at the age τ , the age-speciﬁc death rates are given by δt,τ = dt,τ nt,τ

This rate function implies a survivor function τ −1 τ −1

G[Tt ](τ ) = j=0 (1 − r[Tt ](j)) = j=0 (1 − δt,j )

and it follows that G[Tt ](τ ) = lt,τ /lt,0 . The sequence lt,0 , lt,1 , lt,2 , . . . can therefore be interpreted as the values of a survivor function for the ﬁctitious duration variable Tt . 7. To illustrate the calculations, we use data for Germany in 1999 as shown in Table 7.1-1. The result of the calculations, separately for men and women, is shown in Table 7.3-1. The initial size of the ﬁctitious cohorts is f f m m lt,0 = 100000 and lt,0 = 100000. Further values of lt,τ and lt,τ can then be calculated recursively as described above. For example, m m m lt,1 = lt,0 (1 − δt,0 ) = 100000 · 1 −

Obviously, this is identical with the deﬁnition of age-speciﬁc death rates given in Section 7.1.8 5. These age-speciﬁc mortality rates can now be used to construct a kind of ﬁctitious distribution. To motivate the construction, authors often refer to a ﬁctitious cohort in the following way: Think of a set of lt,0 people, all born at the same time, day 0. Then assume that, for each year τ ,
8 These age-speciﬁc death rates are often called “death probabilities” [Sterbewahrscheinlichkeiten]. This is misleading because these rates refer to frequencies, not to probabilities. Unfortunately, there is a general tendency in the statistical literature to confuse probabilities and frequencies. For a discussion, and critique, see Rohwer and P¨tter (2002b). o

4.95 1000

= 99505

From the 100000 men assumed to be alive at the beginning, 99505 survive their ﬁrst birth day. Figure 7.3-1 shows the corresponding survivor functions for men and women. These functions are only shown up to an age
9 Obviously, in order to provide sensible results, it is required that all death rates are strictly less than 1 until, at the maximal age (or open-ended age class) the death rate gets the value 1.

100

7

MORTALITY AND LIFE TABLES

7.3

LIFE TABLES

101

Table 7.3-1 Period life table for Germany in 1999, calculated from the data in Table 7.1-1. τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 ˜m δt,τ 4.95 0.42 0.30 0.22 0.20 0.13 0.17 0.15 0.16 0.12 0.14 0.14 0.16 0.17 0.25 0.31 0.41 0.65 1.01 0.96 0.94 1.02 0.90 0.88 0.96 0.80 0.88 0.85 0.88 0.82 0.90 0.88 0.91 0.99 1.07 1.14 1.43 1.48 1.58 1.88 2.01 2.21 2.51 2.82 3.01 3.43 3.60 3.93 m lt,τ

1

˜f δt,τ 4.01 0.35 0.21 0.16 0.14 0.11 0.11 0.13 0.10 0.12 0.09 0.09 0.12 0.11 0.15 0.19 0.24 0.33 0.35 0.37 0.31 0.35 0.28 0.27 0.31 0.34 0.30 0.32 0.35 0.35 0.37 0.39 0.45 0.49 0.52 0.61 0.62 0.78 0.83 0.96 1.06 1.18 1.31 1.47 1.55 1.73 1.95 2.03

f lt,τ

τ 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

˜m δt,τ 4.29 4.57 5.18 5.45 6.38 6.12 7.32 7.99 8.47 9.51 9.64 11.21 12.31 13.35 14.96 16.75 18.70 20.03 22.12 25.00 28.13 30.69 33.60 35.83 38.53 42.59 47.52 51.42 56.18 63.63 70.03 86.32 77.50 93.94 103.94 111.37 134.70 140.81 155.71 171.57 184.77 208.86 232.80 248.55 258.18 270.39 282.51

m lt,τ

˜f δt,τ 2.20 2.41 2.62 2.93 3.21 3.27 3.74 4.06 4.05 4.58 4.60 5.31 5.79 6.14 6.75 7.45 8.71 9.38 10.19 11.76 13.15 14.56 16.12 17.97 20.39 22.61 24.91 28.59 32.31 37.00 41.66 51.89 48.28 61.51 71.11 75.47 94.86 100.75 113.81 129.75 146.60 164.61 185.72 207.72 229.23 250.34 269.01

f lt,τ

100000 99505 99463 99433 99410 99391 99378 99362 99347 99330 99319 99305 99291 99275 99259 99234 99203 99162 99098 98997 98902 98809 98709 98621 98534 98439 98360 98273 98190 98103 98022 97935 97849 97760 97663 97559 97447 97308 97164 97010 96828 96633 96419 96178 95906 95618 95290 94947

100000 99599 99564 99543 99526 99512 99502 99491 99478 99469 99457 99448 99439 99427 99416 99401 99382 99358 99326 99291 99254 99223 99188 99160 99134 99103 99069 99039 99008 98973 98938 98901 98863 98818 98769 98718 98658 98597 98520 98439 98344 98240 98124 97995 97852 97700 97531 97341

94574 94168 93737 93252 92744 92153 91588 90918 90192 89428 88577 87723 86740 85672 84528 83264 81869 80338 78729 76988 75063 72951 70713 68337 65889 63350 60652 57770 54800 51721 48430 45038 41151 37961 34395 30820 27388 23699 20362 17191 14242 11610 9185 7047 5295 3928 2866 2056

97143 96929 96696 96443 96160 95851 95537 95180 94794 94410 93978 93546 93049 92510 91942 91322 90641 89852 89009 88102 87066 85920 84669 83304 81807 80139 78328 76377 74193 71796 69139 66259 62820 59788 56110 52120 48187 43616 39222 34758 30248 25814 21565 17560 13912 10723 8039 5876

0.5

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 7.3-1 Plot of the survivor functions calculated in Table 7.3-1. For f m men: lt,τ /100000 (solid line), for women: lt,τ /100000 (dotted line).

of 94 because our data group all higher ages into a single age class (95 ∗ ). For the same reason, Table 7.3-1 does not provide death rates for τ = 95. If one would refer to the age class 95∗ , the death rate would simply be 1 since any person in this age class must eventually die. 8. We mention that the survivor functions shown in Figure 7.3-1 are diﬀerent from the survivor functions that correspond to the distribution functions shown in Figure 7.2-2. While these distribution functions, and the corresponding survivor functions, refer to a deﬁnite population, namely all people who died in Germany in 1999, the survivor functions shown in Figure 7.3-1 do not refer to any identiﬁable population but are conceptual constructions derived from the age-speciﬁc death rates in 1999. The diﬀerence also becomes visible when calculating median life lengths. Based on Figure 7.3-1, one ﬁnds about 77.5 years for men and 83.5 years for women. This is signiﬁcantly higher than the median life length of those men and women who actually died in 1999, as calculated in Section 7.2, namely 72 years for men and 82 years for women. Of course, these values are lower because they reﬂect the mortality conditions during the life courses of these people, and not just in 1999.

7.3.3

Conditional Life Length

1. A period life table can be thought of as the representation of the distribution of a ﬁctitious duration variable Tt . The corresponding mean value of Tt , M(Tt ), might be interpreted as the mean life length corresponding to the mortality conditions in t. In a further step, one can condition the calculation on the assumption that people have already reached a certain

102

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

103

age, say τ0 . One might then ask for the mean life length of these people. 2. The formal framework is provided by the notion of conditional mean value. We ﬁrst introduce this notion for a general duration variable T : ˜ ˜ Ω −→ T . Given any value t0 ∈ T , the risk set R(t0 ) := {ω ∈ Ω | T (ω) ≥ t0 } consists of those people in Ω whose values of T are not less than t0 . The conditional mean value of T , given T ≥ t0 , is then simply the mean value of T in the subpopulation R(t0 ). We use the following notation: M[T |T ≥ t0 ] := ω∈R(t0 )

100 female male 90

80

T (ω)
70 0 10 20 30 40 50 60 70 80 90 100

|R(t0 )|

Since T can only assume non-negative values, the unconditional mean value is a special case: M(T ) = M[T |T ≥ 0]. It is also easy to see that if t0 ≤ t1 , then M[T |T ≥ t0 ] ≤ M[T |T ≥ t1 ] In any case, the calculation of conditional mean values only requires a knowledge of the distribution of T beginning at t0 , as shown by the following equation: M[T |T ≥ t0 ] =
∞ t=t0 t P[T ](t) ∞ t=t0 P[T ](t)

Fig. 7.3-2 Conditional life length in Germany, 1999, derived from the data in Table 7.3-1.

=

∞ ∗ t=t0 t P [T ](t) ∞ ∗ t=t0 P [T ](t)

one can use (7.3.2) to calculate conditional life lengths for all τ0 ≥ 0. The result is shown in Figure 7.3-2 where the abscissa refers to age values τ0 and the ordinate records the conditional life length. The unconditional mean values, corresponding to τ0 = 0, are about 74.5 years for men and 80.5 years for women. Obviously, if τ0 increases, also the conditional life length increases. One can also derive a mean residual life function [fernere Lebenserwartung] deﬁned by M(Tt |Tt ≥ τ0 ] − τ0 For example, given that people have already reached an age of 70, our period life table would estimate a mean residual life length of about 11.5 years for men and 14.5 years for women.

3. The notion of a conditional mean can also be applied to a ﬁctitious duration variable Tt deﬁned by a period life table for the period t. Using notations from the previous section, and omitting indices which distinguish male and female quantities, one may write: P[Tt ](τ ) = lt,τ − lt,τ +1 lt,τ δt,τ = 100000 100000

7.4

Oﬃcial Life Tables in Germany

This then allows, for any age τ0 , to calculate a conditional mean value by M[Tt |Tt ≥ τ0 ] =
∞ τ =τ0 τ lt,τ δt,τ ∞ τ =τ0 lt,τ δt,τ

(7.3.2)

In the present section we discuss life tables by oﬃcial statistics in Germany. We begin with a brief overview and then consider changes in mortality as reﬂected by a series of period life tables beginning in the 1871–81 period.

4. To illustrate the calculations we use the data from Table 7.3-1. The only diﬃculty concerns the age class 95∗ . As was already discussed in Section 7.2, one needs an assumption about the mean age at death in this age class, that is, about the conditional life length for τ0 = 95. Assuming without further justiﬁcation M(Ttm |Ttm ≥ 90] = 97 and M(Ttf |Ttf ≥ 90] = 99

7.4.1

Introductory Remarks

1. In Section 7.3 we used age-speciﬁc death rates of the year 1999 in order to construct a life table for that year. This rather straightforward method is often modiﬁed.10 Diﬀerences mainly concern the following points:
10 Much

of the discussion of diﬀerent methods to construct life tables occurred already during the 19th century. For a fairly complete report see v. Bortkiewicz (1911).

104

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

105

a) How to calculate age-speciﬁc death rates? While the numerator simply counts the number of deaths that occurred in a given age and year, there are several possible choices for the deﬁnition of the denominator. Instead of using the midyear population, nt,τ , as we have done in Section 7.3, one might want to take into account also the temporal distribution of death events during the year.11 b) Another point concerns the calculation of age-speciﬁc death rates for very old ages. In the example presented in Section 7.3 this was not done because the data did not provide any information about ages greater than, or equal to, 90. If further data would be available one might be able to estimate age-speciﬁc death rates also for these higher ages. Alternatively, one might apply some interpolation procedure.12 c) A further point concerns the use of data for a single year. An alternative would be to combine the data of several years. The latter approach is often used as a kind of smoothing procedure. For the same reason, one might apply analytical smoothing procedures to single-year data. We will not here discuss the many diﬀerent methods that have been proposed for the construction of life tables. Instead, we describe the methods used by the Statistisches Bundesamt. 2. Oﬃcial statistics in Germany distinguishes between general life tables [Allgemeine Sterbetafeln] and abridged life tables [abgek¨rzte Steru betafeln]. General life tables refer to periods which are centered around the year of a census. The most recent general life table is based on the census in 1987 and refers to the three-year period 1986 – 88. The methods to calculate general life tables changed several times.13 For the last two tables (1970 – 72 and 1986 – 88), calculations begin with age-speciﬁc death rates which, in our standard notation, are deﬁned as δt,τ =
11 A

where τ refers to age in completed years and t refers to a calendar year. These rates are then modiﬁed to qt,τ := dt,τ nt,τ + dt,τ 2

The reason behind this modiﬁcation is that about half of the people who die at age τ during the year t are not counted in nt,τ . Therefore, in order to get an estimate of the number of people who are actually at risk of dying in year t, dt,τ /2 is added to nt,τ .14 These modiﬁed age-speciﬁc death rates are then calculated for a three-year period as follows: q(t),τ := dt−1,τ + dt,τ + dt+1,τ nt−1,τ + nt,τ + nt+1,τ + dt−1,τ +dt,τ +dt+1,τ 2

where t = 1987 for the life table which refers to the period 1986 – 88. 3. Abridged life tables, like general life tables, are based on three-year intervals and use the same method of calculating modiﬁed age-speciﬁc death rates.15 The main diﬀerences are as follows: a) For the construction of general life tables, additional calculations are performed to provide more detailed information about death rates during the ﬁrst year after birth. The abridged life tables simply use q(t),0 without further subdivisions. b) While the calculation of abridged life tables is directly derived from the death rates q(t),τ , these rates are smoothed before they are used in the calculation of general life tables.16
14 In

dt,τ nt,τ

the literature, these modiﬁed rates, qt,τ , are often called “age-speciﬁc death probabilities”. However, for reasons already mentioned, we avoid the term ‘probability’ and simply speak of modiﬁed age-speciﬁc death rates. life tables have been calculated regularly for each year since 1957; results are published in Fachserie 1, Reihe 1.

15 Abridged 16 Fachserie

discussion of alternative methods can be found, e.g., in Namboodiri and Suchindran (1987, pp. 12 -19), or Flask¨mper (1962, pp. 351-366). a

12 Namboodiri

and Suchindran (1987, p. 19 -20) write: “Population and death data at the very old ages, when they are available, are generally disregarded in computing a life table, mainly because they are considered inaccurate. It has therefore been a common practice to use arbitrary methods for computing qx [the death rate for age x] at the very old ages (usually 85 and above). For practical purposes, any reasonable method is satisfactory, since the arbitrariness involved in the method has only a small eﬀect on the life table as a whole. The major requirement that is usually kept in mind when choosing a procedure in this connection is that the procedure should produce a smooth junction with the qx values already computed and a smooth upward progression of qx with advancing age.” The authors then brieﬂy describe four diﬀerent methods. explanations are available in Fachserie 1, Reihe 1 S.2, Allgemeine Sterbetafel f¨r die Bundesrepublik Deutschland 1986/88. u

13 Detailed

1, Reihe 1, S.2 (p. 13) provides the following reasons: Um einen m¨glichst o ” wirklichkeitsgetreuen Verlauf der Sterbewahrscheinlichkeiten in Abh¨ngigkeit vom Alter a x zu erreichen, ist es notwendig, die rohen Sterbewahrscheinlichkeiten q x auszugleichen, ¯ das heißt, von zufallsbedingten Schwankungen und solchen systematischen Spr¨ ngen zu u bereinigen, die an bestimmte Geburtsjahrg¨nge gebunden sind. An das Ausgleichungsa verfahren sind damit die folgenden Anforderungen zu stellen: – Der Verlauf der ausgeglichenen Sterbewahrscheinlichkeiten qx in Abh¨ngigkeit vom a Alter x soll m¨glichst glatt“ sein, das heißt hier, m¨glichst kleine Kr¨ mmungen haben o o u ” und keine Sprungstellen und keine Knicke aufweisen. – Zufallsbedingte Schwankungen sollen ausgeglichen werden. – Typische altersspeziﬁsche Besonderheiten im Sterblichkeitsverlauf sollen bewahrt bleiben, zum Beispiel das relative Maximum bei den 20j¨hrigen. a – Besonderheiten im Sterblichkeitsverlauf, die an bestimmte Geburtsjahrg¨nge gebuna den sind (Kohorteneﬀekte), zum Beispiel die relative hohe Sterbewahrscheinlichkeit bei den Kriegsjahrg¨ngen“ des Ersten Weltkriegs, m¨ ssen eliminiert werden.“ a u ”

106

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

107

One should notice that the term ‘abridged life table’ is used diﬀerently in the demographic literature. In contrast to ‘abgek¨rzte Sterbetafel’, an u abridge life table most often refers to a calculation based on 5-year or 10-year age intervals.17

Table 7.4-1 Male survivor functions in German life tables from oﬃcial statistics. Source: see text.
1871/ 1881/ 1891/ 1901/ 1910/ 1924/ 1932/ 1949/ 1960/ 1970/ 1986/ 1881 1890 1900 1910 1911 1926 1934 1951 1962 1972 1988 τ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ

7.4.2

General Life Tables 1871 – 1988

1. In Germany, general life tables have been constructed by oﬃcial statistics for the following periods:
Period 1871 – 1880 1881 – 1890 1891 – 1900 1901 – 1910 1910 – 1911 1924 – 1926 1932 – 1934 Publication Statistik des Deutschen Reichs, Vol. 246 (pp. 14∗ -17∗ ). Statistik des Deutschen Reichs, Vol. 246 (pp. 14∗ -17∗ ). Statistik des Deutschen Reichs, Vol. 246 (pp. 14∗ -17∗ ). Statistik des Deutschen Reichs, Vol. 246 (pp. 14∗ -17∗ ). Statistik des Deutschen Reichs, Vol. 275. Statistisches Jahrbuch f¨r das Deutsche Reich 1919 (pp. 50-51). u Statistik des Deutschen Reichs, Vol. 360 and 401. Statistisches Jahrbuch f¨r das Deutsche Reich 1928 (pp. 38-39). u Statistik des Deutschen Reichs, Vol. 495 (pp. 86-87). Statistisches Jahrbuch f¨r das Deutsche Reich 1936 (pp. 45u 46). Statistik der Bundesrepublik Deutschland, Vol. 75 and Statistisches Jahrbuch f¨r die Bundesrepublik u 173. Deutschland 1954 (pp. 62-63). Statistisches Jahrbuch f¨r die Bundesrepublik Deutschu land 1965 (pp. 67-68). See also: Schwarz (1964). Fachserie 1, Reihe 2, Sonderheft 1. Allgemeine Sterbetafel u f¨r die Bundesrepublik Deutschland 1970/72. See also: Meyer and R¨ckert (1974). u Fachserie 1, Reihe 1, Sonderheft 2. Allgemeine Sterbetafel f¨r die Bundesrepublik Deutschland 1986/88. See also: u Meyer and Paul (1991).

1949 – 1951

1960 – 1962 1970 – 1972

1986 – 1988

All tables are period life tables. Until 1932 – 34, they refer to the territory of the former Deutsches Reich; all other tables refer to the territory of the former FRG. As has been mentioned in Section 7.4.1, methods of table construction have slightly changed throughout the years. 2. Separately for men and women, the survivor functions of all life tables are reproduced in Tables 7.4-1- 4. Following general conventions in the prem sentation of life tables, beginning with initial values l0 = 100000 men and
17 See,

e.g., Namboodiri and Suchindran (1987, pp. 21-26).

0 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000 1 74727 75831 76614 79766 81855 88462 91465 93823 96467 97400 99075 2 69876 70998 72631 76585 79211 87030 90618 93433 96244 97249 99005 3 67557 68729 70999 75442 78255 86477 90211 93203 96109 97152 98956 4 65997 67212 69945 74727 77662 86127 89901 93022 96013 97067 98921 5 64871 66127 69194 74211 77213 85855 89654 92880 95929 96989 98891 6 64028 65330 68641 73820 76873 85647 89446 92768 95852 96918 98862 7 63369 64711 68214 73506 76596 85477 89255 92673 95782 96854 98835 8 62849 64221 67874 73244 76361 85330 89081 92586 95721 96795 98809 9 62431 63836 67599 73023 76161 85197 88927 92513 95667 96741 98786 10 62089 63526 67369 72827 75984 85070 88793 92444 95620 96692 98764 11 61800 63265 67167 72650 75818 84950 88675 92379 95577 96647 98744 12 61547 63036 66983 72487 75662 84837 88567 92315 95536 96604 98724 13 61320 62830 66811 72334 75517 84726 88464 92250 95493 96561 98704 14 61108 62636 66641 72179 75365 84607 88360 92178 95445 96515 98681 15 60892 62441 66462 72007 75189 84469 88244 92097 95388 96459 98652 16 60657 62226 66259 71808 74986 84306 88105 92001 95316 96383 98612 17 60383 61972 66017 71573 74746 84110 87939 91892 95225 96273 98557 18 60063 61675 65731 71300 74470 83874 87746 91767 95112 96118 98483 19 59696 61340 65405 70989 74165 83592 87531 91625 94973 95927 98389 20 59287 60970 65049 70647 73832 83268 87298 91466 94812 95732 98284 21 58843 60572 64674 70291 73488 82912 87051 91294 94637 95541 98175 22 58369 60156 64292 69935 73143 82539 86795 91113 94457 95357 98068 23 57871 59734 63912 69582 72800 82162 86539 90924 94280 95182 97964 24 57378 59315 63539 69232 72466 81792 86285 90730 94110 95016 97862 25 56892 58897 63168 68881 72130 81429 86032 90531 93948 94858 97763 26 56410 58474 62796 68528 71789 81072 85777 90329 93789 94705 97664 27 55927 58047 62420 68173 71446 80721 85516 90125 93633 94555 97567 28 55442 57613 62043 67817 71105 80380 85251 89922 93478 94405 97468 29 54951 57169 61663 67458 70768 80049 84984 89720 93323 94253 97367 30 54454 56713 61274 67092 70425 79726 84715 89518 93166 94097 97262 31 53949 56243 60873 66719 70070 79404 84440 89314 93008 93937 97153 32 53434 55755 60459 66338 69705 79080 84157 89104 92846 93773 97039 33 52908 55245 60030 65946 69332 78758 83863 88887 92679 93604 96920 34 52369 54715 59581 65536 68948 78436 83555 88662 92505 93429 96794 35 51815 54168 59111 65104 68545 78111 83234 88428 92322 93245 96661 36 51244 53599 58618 64650 68125 77779 82905 88184 92129 93049 96519 37 50656 53009 58099 64175 67693 77433 82571 87930 91924 92838 96367 38 50049 52406 57557 63676 67233 77073 82224 87666 91705 92610 96203 39 49422 51788 56992 63149 66741 76701 81860 87391 91470 92361 96026 40 48775 51148 56402 62598 66227 76313 81481 87102 91218 92089 95834 41 48110 50486 55785 62021 65682 75905 81088 86795 90949 91794 95624 42 47428 49806 55142 61413 65113 75473 80676 86468 90662 91475 95394 43 46729 49112 54470 60773 64518 75016 80240 86120 90354 91131 95141 44 46010 48402 53768 60105 63894 74536 79776 85746 90021 90761 94863 45 45272 47668 53037 59405 63238 74032 79285 85342 89659 90363 94555 46 44511 46910 52282 58666 62542 73496 78763 84902 89262 89934 94216 47 43728 46135 51507 57892 61810 72927 78207 84417 88825 89468 93841 48 42919 45347 50708 57084 61036 72326 77617 83883 88344 88958 93428 49 42086 44534 49875 56233 60215 71688 76990 83294 87814 88398 92973 50 41228 43684 49002 55340 59349 71006 76322 82648 87230 87781 92471

108

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

109

Table 7.4-2 Male survivor functions in German life tables from oﬃcial statistics. Source: see text.
1871/ 1881/ 1891/ 1901/ 1910/ 1924/ 1932/ 1949/ 1960/ 1970/ 1986/ 1881 1890 1900 1910 1911 1926 1934 1951 1962 1972 1988 τ 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ m lτ

Table 7.4-3 Female survivor functions in German life tables from oﬃcial statistics. Source: see text.
1871/ 1881/ 1891/ 1901/ 1910/ 1924/ 1932/ 1949/ 1960/ 1970/ 1986/ 1881 1890 1900 1910 1911 1926 1934 1951 1962 1972 1988 τ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ

40343 39433 38497 37534 36544 35524 34474 33392 32276 31124 29935 28708 27442 26139 24802 23433 22037 20620 19189 17750 16310 14880 13468 12085 10743 9454 8228 7077 6010 5035 4156 3378 2700 2120 1635 1236 917 666 474 330 225 150 97 61 38 23 13 7 4 2

42800 41890 40956 39990 38989 37949 36872 35774 34643 33456 32221 30954 29658 28322 26940 25520 24076 22622 21154 19665 18160 16649 15145 13655 12188 10761 9404 8130 6934 5833 4837 3944 3158 2481 1909 1437 1057 758 530 360 238 152 94 57 33 18 10 5 3 1

48092 47150 46179 45176 44133 43047 41922 40760 39558 38308 37008 35657 34255 32799 31294 29743 28155 26531 24877 23195 21494 19784 18080 16391 14730 13109 11543 10049 8640 7330 6129 5044 4075 3225 2497 1893 1405 1018 718 492 327 211 132 80 46 27 14 7 4 2

54403 53419 52388 51312 50186 49003 47772 46500 45180 43807 42379 40892 39343 37737 36079 34381 32637 30838 28998 27136 25254 23345 21416 19490 17586 15715 13902 12169 10525 8987 7568 6275 5116 4094 3212 2468 1856 1364 978 683 464 307 197 123 74 44 25 14 7 4

58435 57473 56457 55395 54290 53114 51869 50563 49177 47736 46246 44663 43013 41312 39527 37695 35842 33933 31946 29905 27850 25741 23587 21450 19328 17216 15184 13278 11440 9711 8152 6708 5396 4253 3297 2519 1882 1374 982 679 457 299 190 117 70 40 23 12 6 3

70274 69497 68670 67780 66818 65784 64678 63495 62232 60883 59444 57914 56285 54553 52715 50769 48705 46527 44256 41906 39472 36948 34348 31697 28998 26275 23589 20989 18479 16066 13785 11664 9712 7941 6371 5015 3872 2930 2182 1599 1144 801 549 368 241 154 97 59 35 20

75605 74834 74004 73109 72147 71124 70043 68889 67640 66293 64853 63321 61695 59962 58106 56128 54033 51822 49495 47059 44517 41872 39138 36341 33479 30553 27609 24703 21863 19122 16509 14038 11725 9607 7732 6126 4765 3623 2698 1966 1400 974 662 438 283 178 109 65 37 21

81945 81186 80371 79497 78562 77560 76490 75352 74141 72852 71474 70003 68437 66772 64999 63110 61104 58985 56751 54394 51903 49278 46529 43666 40700 37644 34524 31372 28222 25106 22059 19118 16324 13715 11321 9168 7274 5655 4294 3175 2278 1589 1082 719 466 294 181 108 63 36

86585 85871 85078 84197 83221 82142 80952 79644 78212 76652 74963 73144 71198 69128 66941 64643 62240 59739 57145 54461 51691 48835 45894 42873 39784 36647 33487 30334 27215 24156 21186 18337 15644 13142 10861 8819 7026 5479 4171 3092 2229 1565 1070 713 463 293 181 110 65 38

87104 86369 85574 84717 83789 82779 81673 80460 79130 77675 76087 74357 72477 70440 68242 65882 63361 60685 57864 54909 51838 48673 45438 42161 38872 35601 32373 29212 26137 23167 20321 17619 15083 12735 10595 8678 6990 5529 4287 3251 2407 1735 1215 824 539 339 204 117 64 33

91917 91305 90630 89887 89071 88177 87204 86146 85002 83767 82439 81014 79486 77851 76106 74245 72262 70150 67901 65508 62966 60270 57419 54417 51273 48000 44620 41157 37645 34119 30618 27183 23856 20678 17687 14914 12385 10119 8126 6406 4952 3750 2778 2011 1421 979 656 428 271 167

0 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000 1 78260 79311 80138 82952 84695 90608 93161 95091 97222 98016 99298 2 73280 74404 76137 79761 82070 89255 92394 94749 97027 97888 99241 3 70892 72073 74482 78594 81126 88743 92026 94545 96922 97810 99201 4 69295 70514 73406 77867 80523 88422 91761 94390 96845 97745 99174 5 68126 69377 72623 77334 80077 88169 91535 94270 96782 97690 99153 6 67249 68537 72038 76924 79730 87975 91338 94177 96728 97641 99136 7 66572 67881 71577 76587 79445 87817 91160 94100 96682 97597 99119 8 66035 67358 71206 76301 79206 87683 91003 94041 96643 97558 99103 9 65599 66942 70903 76058 79001 87563 90870 93986 96609 97523 99088 10 65237 66601 70646 75845 78816 87452 90753 93937 96579 97492 99073 11 64926 66309 70420 75651 78642 87347 90650 93893 96552 97465 99058 12 64649 66049 70210 75467 78476 87243 90557 93850 96525 97439 99044 13 64390 65801 70003 75285 78311 87134 90467 93805 96498 97413 99029 14 64136 65555 69789 75094 78131 87013 90373 93756 96468 97384 99013 15 63878 65306 69562 74887 77930 86877 90270 93701 96434 97349 98995 16 63609 65045 69319 74661 77710 86719 90152 93637 96395 97305 98974 17 63322 64764 69060 74411 77470 86534 90016 93564 96351 97251 98947 18 63013 64468 68787 74143 77216 86319 89858 93484 96301 97189 98916 19 62681 64160 68500 73861 76945 86075 89680 93394 96246 97124 98881 20 62324 63838 68201 73564 76659 85808 89490 93295 96188 97059 98843 21 61941 63500 67888 73254 76362 85523 89287 93188 96128 96996 98806 22 61534 63142 67559 72929 76052 85226 89072 93073 96068 96934 98768 23 61102 62762 67212 72586 75730 84920 88849 92955 96008 96874 98731 24 60648 62360 66848 72225 75397 84602 88622 92834 95948 96815 98694 25 60174 61937 66467 71849 75043 84275 88390 92711 95884 96755 98657 26 59680 61497 66072 71463 74668 83943 88151 92586 95814 96694 98619 27 59170 61042 65666 71070 74283 83610 87904 92457 95739 96632 98579 28 58647 60570 65249 70669 73896 83274 87653 92324 95660 96567 98538 29 58111 60082 64822 70261 73513 82937 87397 92185 95575 96499 98493 30 57566 59584 64385 69848 73115 82597 87139 92039 95485 96429 98446 31 57010 59076 63937 69432 72703 82254 86876 91887 95390 96355 98395 32 56445 58554 63479 69008 72291 81909 86607 91729 95290 96276 98340 33 55869 58018 63010 68575 71876 81559 86329 91565 95184 96190 98280 34 55282 57473 62533 68132 71457 81205 86044 91396 95071 96098 98216 35 54685 56921 62047 67679 71020 80847 85754 91221 94949 95997 98146 36 54078 56360 61549 67215 70554 80482 85455 91039 94818 95886 98071 37 53462 55789 61041 66744 70080 80105 85145 90850 94676 95764 97988 38 52837 55215 60524 66266 69610 79720 84819 90651 94524 95632 97896 39 52207 54638 59998 65779 69139 79324 84481 90443 94360 95488 97796 40 51576 54054 59467 65283 68659 78917 84135 90225 94184 95331 97685 41 50946 53467 58931 64779 68172 78498 83779 89995 93995 95161 97564 42 50320 52880 58391 64269 67689 78068 83410 89749 93792 94975 97431 43 49701 52297 57848 63754 67194 77627 83027 89486 93573 94773 97286 44 49090 51720 57302 63238 66692 77175 82630 89204 93337 94551 97127 45 48481 51146 56751 62717 66187 76704 82211 88901 93081 94308 96954 46 47870 50569 56195 62181 65661 76210 81763 88574 92803 94042 96766 47 47248 49983 55628 61628 65105 75688 81282 88221 92500 93750 96562 48 46605 49385 55040 61053 64510 75136 80767 87841 92173 93427 96341 49 45939 48765 54423 60449 63883 74557 80213 87432 91821 93072 96102 50 45245 48110 53768 59812 63231 73943 79620 86991 91442 92683 95842

110

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

111

Table 7.4-4 Female survivor functions in German life tables from oﬃcial statistics. Source: see text.
1871/ 1881/ 1891/ 1901/ 1910/ 1924/ 1932/ 1949/ 1960/ 1970/ 1986/ 1881 1890 1900 1910 1911 1926 1934 1951 1962 1972 1988 τ 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ f lτ

f l0 = 100000 women, subsequent values show how many of these men and women have survived until an age τ (completed age in years). Therefore, m f lτ /100000 and lτ /100000

44521 43767 42981 42162 41308 40414 39472 38476 37418 36293 35101 33843 32521 31140 29703 28217 26686 25118 23521 21901 20265 18617 16960 15307 13677 12090 10569 9131 7795 6570 5464 4479 3614 2867 2232 1705 1276 935 671 471 323 217 142 90 56 34 20 11 6 3

47418 46692 45934 45136 44293 43396 42448 41462 40415 39287 38087 36823 35497 34102 32628 31088 29506 27897 26252 24546 22786 21000 19204 17416 15645 13892 12219 10661 9192 7815 6550 5408 4394 3511 2756 2124 1605 1189 862 612 424 288 191 123 78 48 29 17 10 6

53078 52354 51594 50791 49938 49032 48072 47054 45971 44814 43582 42272 40880 39398 37828 36179 34460 32675 30826 28917 26956 24957 22938 20914 18900 16919 15000 13163 11417 9773 8252 6869 5626 4524 3568 2764 2104 1571 1149 821 573 390 260 169 107 66 40 24 14 8

59138 58418 57648 56837 55984 55077 54106 53067 51959 50780 49524 48176 46725 45178 43540 41816 40007 38111 36129 34078 31963 29777 27535 25273 23006 20745 18526 16372 14299 12348 10539 8864 7329 5955 4752 3719 2850 2138 1571 1131 797 549 370 244 157 99 61 38 22 13

62547 61827 61048 60219 59350 58441 57468 56398 55245 54016 52713 51320 49816 48199 46484 44693 42782 40773 38663 36448 34191 31830 29379 26933 24517 22106 19673 17336 15112 12981 11016 9184 7499 6030 4794 3746 2856 2140 1574 1126 786 534 354 228 142 87 51 29 16 9

73289 72592 71854 71071 70236 69342 68383 67357 66257 65076 63809 62448 60973 59377 57671 55852 53901 51813 49597 47255 44799 42248 39609 36869 34024 31126 28217 25335 22487 19711 17075 14624 12353 10262 8372 6712 5290 4101 3128 2356 1736 1256 891 620 423 283 185 119 74 45

78990 78322 77613 76855 76038 75162 74225 73221 72142 70984 69745 68409 66960 65396 63712 61895 59933 57822 55568 53184 50652 47951 45118 42182 39132 35989 32820 29670 26559 23500 20527 17691 15026 12561 10323 8324 6567 5075 3857 2868 2083 1476 1019 683 445 281 172 101 58 31

86516 86003 85451 84860 84225 83540 82796 81989 81115 80166 79131 77994 76744 75374 73875 72232 70428 68455 66312 63994 61491 58794 55905 52837 49605 46226 42721 39118 35457 31787 28163 24642 21282 18132 15225 12582 10213 8132 6335 4815 3567 2571 1814 1253 846 559 361 227 140 84

91035 90597 90125 89615 89063 88464 87814 87105 86331 85484 84556 83538 82420 81191 79839 78352 76720 74932 72976 70840 68513 65981 63235 60267 57076 53674 50082 46331 42458 38507 34529 30579 26717 23004 19500 16258 13319 10705 8147 6480 4872 3580 2571 1805 1240 834 550 356 227 142

92260 91806 91323 90813 90272 89696 89078 88411 87689 86903 86044 85101 84062 82915 81647 80250 78713 77027 75179 73157 70948 68539 65920 63084 60033 56774 53323 49702 45934 42046 38076 34071 30091 26204 22478 18974 15744 12826 10245 8016 6139 4597 3362 2409 1671 1134 750 483 303 185

95559 95252 94918 94553 94156 93723 93252 92738 92179 91569 90903 90178 89387 88526 87587 86565 85451 84236 82909 81459 79869 78124 76206 74096 71775 69230 66447 63419 60148 56640 52912 48992 44916 40734 36501 32282 28146 24160 20393 16903 13738 10935 8511 6468 4792 3457 2425 1651 1090 697

directly provide values of the life table survivor functions. All further quantities commonly presented in publications of life tables can be derived: a) Omitting the superscript indicating sex, the number of individuals (per 100000) who died at age τ is dτ := lτ − lτ +1 For example, referring to the life table for 1910 – 11, dm = 166 and 10 df = 174. Of course, calculating these quantities for the last age 10 class, τ = 100, requires additional assumptions. b) Conditional death frequencies18 can be calculated by qτ := dτ lτ − lτ +1 = lτ lτ

For example, referring again to the life table for 1910 – 11, one can f m calculate q10 = 0.00218 and q10 = 0.00221. c) Calculation of conditional mean life lengths requires an assumption about mortality in the last age class, τ = 100. Assuming that 100 is the oldest possible age, calculation can be done with the formula eτ :=
100 j=τ (j + 0.5) dj 100 j=τ dj

This is the mean life duration of individuals who reached age τ . For example, referring again to the life table for 1910 – 11, one ﬁnds em = 10 62.08 and ef = 63.99. We mention that, in the presentation of life 10 tables, one also ﬁnds ﬁgures for eτ − τ , often called mean residual life length [fernere Lebenserwartung].

7.4.3

Increases in Mean Life Length

1. The data in Tables 7.4-1-4 can be used to investigate changes in (ﬁctitious) mean life length.19 We begin with directly plotting the survivor functions as given in the tables. This is shown in Figure 7.4-1. For women
18 Often 19 See

called “death probabilities” [Sterbewahrscheinlichkeiten]. However, since the quantities refer to frequencies, we avoid to speak of “probabilities”. also Proebsting (1984) who has discussed all these data sets, except the one for 1986 – 88.

112

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

113

and also for younger men, the functions follow the chronological order from bottom to top. For example, in the period 1871 – 81, about 41 % of the men and about 38 % of the women died before age 20, while in the period 1986 – 88 these proportions have declined to about 1–2 %. A substantial decline in mortality occurred, in particular, for newborn children. This can also be calculated directly from Tables 7.4-1 and 7.4-3. The following table shows the proportion (in %) of male and female babies who died during their ﬁrst year of life:
Period 1871 − 1881 1881 − 1890 1891 − 1900 1901 − 1910 1910 − 1911 1924 − 1926 1932 − 1934 1949 − 1951 1960 − 1962 1970 − 1972 1986 − 1988 Male 25.3 24.2 23.4 20.2 18.1 11.5 8.5 6.2 3.5 2.6 0.9 Female 21.7 20.7 19.9 17.0 15.3 9.4 6.8 4.9 2.8 2.0 0.7

1 Male 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80

1986-1988 1970-1972 1960-1962 1949-1951 1932-1934 1924-1926 1910-1911 1901-1910 1891-1900 1881-1890 1871-1881

90
1986-1988 1970-1972 1960-1962 1949-1951 1932-1934 1924-1926 1910-1911 1901-1910 1891-1900 1881-1890 1871-1881

100

2. Instead of directly comparing survivor functions, one can compare agedependent mean life lengths, eτ , as deﬁned in the previous section. They are shown in Figure 7.4-2. Again, the graphs follow the chronological order from bottom to top. It is seen that the greatest increases in mean life length occurred in young ages. To keep the plot easy to survey, the graphs begin at age 1. However, changes in the mean life length of newborn children can be calculated directly from the data. Comparing the periods 1871 – 81 and 1986 – 88, these mean life lengths have increased from about 36 to 72 years for male, and from about 38 to 79 years for female children. 3. These changes can also be visualized in historical time by locating the values roughly at the center years of the life table periods. This is done in Figure 7.4-3. Shown are the historical changes in values of eτ for a selected number of ages (τ = 0, 10, 20, 30, 40, 50, 60, 70). One sees, again, that the most signiﬁcant increases in the mean life length occurred in younger ages, in particular for newborn children.

1 Female 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80

7.4.4

Life Table Age Distributions

1. In general, changes in the age distribution of a population not only depend on death rates but also on the development of births and migration. So it is diﬃcult to isolate the contribution of changes in mortality. Nevertheless, some ideas can be gained from a hypothetical consideration. Assume that for a longer period, say 100 years, each year the same number

90

100

Fig. 7.4-1 Male and female survivor functions in Germany, 1871 – 1988. At an age of 10 years the functions are in chronological order.

114
100 Male 90

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

115

Male 80 70 70 60 50 60 40 30 20 10

80
1986-1988

70

1970-1972 1960-1962 1949-1951

60

1932-1934 1924-1926 1910-1911 1901-1910 1891-1900 1881-1890 1871-1881

50

50

40 0 30 100 80 70 60 70 1870 Female 1890 1910 1930 1950 1970 1990

40 0 100 Female 90 10 20 30 40 50 60 70 80 90

80
1986-1988

60

70

1970-1972 1960-1962 1949-1951

50 40 30 20 10

60

1932-1934 1924-1926 1910-1911 1901-1910 1891-1900 1881-1890 1871-1881

50

50

40

0

40 0 10 20 30 40 50 60 70 80 90 100

30 1870 1890 1910 1930 1950 1970 1990

Fig. 7.4-2 Male and female mean life durations eτ in Germany, 1871 – 1988, conditional on age τ as speciﬁed on the abscissa.

Fig. 7.4-3 Changes in male and female mean life durations eτ in Germany, 1871 – 1988, conditional on ages τ = 0, 10, 20, 30, 40, 50, 60, 70.

116

7

MORTALITY AND LIFE TABLES

7.4

OFFICIAL LIFE TABLES IN GERMANY

117

20 0.02 1871-81 15

0-9

Men

10-19 20-29 30-39 40-49 50-59

0.01 1986-88 10

60-69

0 0 10 20 30 40 50 60 70 80 90 100

5
≥70

Fig. 7.4-4 Hypothetical male (solid line) and female (dotted line) age frequencies calculated from period life tables 1871 – 81 and 1986 – 88.
0 1870 20
0-9 10-19

of children is born and that they survive according to a given period life table. This then implies a stable age distribution completely determined by the given life table. In fact, this age distribution is simply proportional to the life table’s survivor function. Of course, since death rates are different for men and women, also the corresponding age distributions are diﬀerent. In our hypothetical population generated by a constant number of 100000 births per year and constant mortality conditions given by some f period life table, the number of women of age τ in a given year is just lτ . f Thus, the total number of women alive in that year is τ lτ . The relative frequency of women of age τ is therefore
100 m lτ / j=1 m lj

1890

1910

1930

1950

1970 Women

1990

15

20-29 30-39 40-49 50-59

10

60-69

and analoguously for men. Figure 7.4-4 directly compares the sex-speciﬁc age frequency curves implied by the 1871 – 81 and 1986 – 88 period life tables. 2. Another possibility is to aggregate ages into age classes and then to calculate frequencies for each age class. This allows to visualize how the hypothetical age distributions that can be associated with each period life table have changed in historical time. Using 10-year age classes, this is shown, separately for men and women, in Figure 7.4-5.

5

≥70

0 1870 1890 1910 1930 1950 1970 1990

Fig. 7.4-5 Development of hypothetical age distributions of men and women, calculated from the period life tables from 1871 – 81 to 1986 – 88. The ordinate is in percent for speciﬁed age classes.

8.2

RECONSTRUCTION FROM PERIOD DATA

119

Chapter 8

Mortality of Cohorts
While period life tables reﬂect the mortality conditions of a given period, cohort life tables try to reconstruct mortality conditions as they developed during the life time of birth cohorts. The latter are much more diﬃcult to produce, mainly due to insuﬃcient data. There are basically two approaches: one can either try to reconstruct cohort life tables from period data, or one can try to actually follow the members of a birth cohort through their life.1 In the present chapter, we begin with a discussion of the ﬁrst approach, an attempted reconstruction from period data from oﬃcial statistics.2 The second approach is more diﬃcult. Assuming that one wants to construct a cohort life table for a birth cohort Ct0 , one would need to actually follow all of its members beginning in the year t0 . Such historical data are very scarce. One example will be discussed in Section 8.3. An alternative is to follow the members of Ct0 only through part of their life histories, say, beginning in some year t > t0 . This is possible with surveys organized as panels, that is, surveys repeated year by year and targeted at the same people. Such data will allow to construct incomplete life tables which condition on survivorship until an age of τ = t − t0 . Based on data from the German Socio-economic Panel, this approach will be discussed in Section 8.4.

(equivalently, in the year t0 + τ ). So we can introduce age-speciﬁc cohort death rates referring to the proportion of members of Ct0 ,τ who died at age τ . We will use the notation ηt0 ,τ := | Ct0 ,τ \ Ct0 ,τ +1 | | Ct0 ,τ |

The numerator refers to the number of members of Ct0 who died at age τ , and the denominator refers to the number of members of Ct0 who survived age τ − 1 and might die at age τ . In order to distinguish male and female mortality we also use the notations m ηt0 ,τ := m m | Ct0 ,τ \ Ct0 ,τ +1 | m | Ct0 ,τ | f and ηt0 ,τ := f f | Ct0 ,τ \ Ct0 ,τ +1 | f | Ct0 ,τ |

3. One also can think in terms of a duration variable ˜ Tt0 : Ct0 −→ T := {0, 1, 2, 3, . . .} that records the life length of the members of Ct0 . As was introduced in Section 7.3, a cohort life table is simply a description of the distribution of Tt0 . This can be done in terms of a frequency function P[Tt0 ], a distribution function F [Tt0 ], a survivor function G[Tt0 ], or a rate function r[Tt0 ]. All descriptions are equivalent in that each one can be derived from any other one. Particularly useful is the rate function that records the age-speciﬁc cohort death rates: τ −→ r[Tt0 ](τ ) = ηt0 ,τ

8.1

Cohort Death Rates

1. We begin with a few deﬁnitions. As already introduced in Section 3.4, the symbol Ct0 will be used to denote a birth cohort, that is, a set of f m people all born during the year t0 . Correspondingly, Ct0 and Ct0 denote, respectively, the male and female members of Ct0 . Furthermore, Ct0 ,τ is the set of members of Ct0 being of age τ , that is, who survived at least f m until age τ . Again, we use Ct0 ,τ and Ct0 ,τ to distinguish male and female members. 2. A cohort view of mortality can be depicted in the following way: Ct0 = Ct0 ,0 ⊇ Ct0 ,1 ⊇ Ct0 ,2 ⊇ · · · In general, Ct0 ,τ \ Ct0 ,τ +1 is the set of members of Ct0 who died at age τ
1 A further possibility is based on data from surveys in which respondents are asked to provide information about dates of birth and, possibly, also dates of death of their parents. Such data will be discussed in Chapter 9.

8.2

Reconstruction from Period Data

1. The idea is to reconstruct cohort life tables from age-speciﬁc death rates of consecutive years. As an example we consider the birth cohort t0 = 1910. When referring to the territory of the former Deutsches Reich in 1910, this birth cohort has 1924778 members.3 Of course, nobody knows the true age-speciﬁc cohort death rates η1910,τ = members of C1910 who died at age τ members of C1910 who survived age τ − 1

Data from oﬃcial statistics can be used, however, to calculate age-speciﬁc period death rates: δ1910,0 , δ1911,1 , δ1912,2 , δ1913,3 , . . .
3 Statistisches

this was the main approach to the construction of cohort life tables, see the historical survey by Young (1978). For an early example see Merrell (1947).

2 Historically,

Jahrbuch f¨r das Deutsche Reich 1919 (p. 41). u

120

8

MORTALITY OF COHORTS

8.2

RECONSTRUCTION FROM PERIOD DATA

121

If we ignore in- and out-migration, and if we also ignore changing political borders, we might assume that these death rates provide sensible estimates for the cohort death rates η1910,τ . The reconstruction of a cohort life table then simply consists of using the death rates δ1910+τ,τ instead of the rates η1910,τ .4 2. Unfortunately, the data available in the STATIS data base of the Statistisches Bundesamt only allow to calculate the death rates beginning at age 42, corresponding to the year 1952. We therefore calculate a conditional survivor function. Let T denote the statistical variable that would record the life length as implied by a complete knowledge of the death rates δ1910+τ,τ . We can then deﬁne, for each age τ0 , a conditional survivor function τ −1

Table 8.2-1 Age-speciﬁc midyear population (in 1000), number of deaths, and age-speciﬁc death rates in Germany, 1952 –1999. Source: STATIS data base and Fachserie 1, Reihe 1 (see text). t 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 τ 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 nm t,τ 477.7 476.7 486.9 473.9 465.2 467.6 465.8 463.8 461.0 456.2 453.2 449.9 444.8 440.4 435.2 428.2 420.9 414.0 404.9 396.5 387.3 377.7 367.2 355.9 343.8 331.7 319.0 305.6 291.9 277.3 262.2 247.0 231.4 215.6 199.7 185.4 170.0 154.6 139.7 125.1 111.0 97.9 85.2 73.5 62.5 52.7 43.7 35.8 28.4 dm t,τ 1775 1797 1896 2189 2390 2650 2671 3036 3577 3640 3974 4518 4918 5522 5907 6302 7154 8013 8497 8999 9622 10048 10792 11663 12042 12254 13266 13727 14433 14920 15171 15460 15237 16083 15833 15433 15488 14997 15290 14452 13570 13152 12123 11248 10489 9183 8371 7471 6406 ˜m δt,τ 3.72 3.77 3.89 4.62 5.14 5.67 5.73 6.55 7.76 7.98 8.77 10.04 11.06 12.54 13.57 14.72 17.00 19.36 20.99 22.69 24.85 26.60 29.39 32.77 35.03 36.95 41.59 44.92 49.44 53.81 57.87 62.60 65.84 74.58 79.29 83.23 91.09 97.00 109.47 115.49 122.21 134.34 142.35 153.12 167.71 174.40 191.35 208.69 225.56 nf t,τ 633.5 632.2 636.3 629.3 622.7 623.9 622.6 621.1 618.6 613.9 612.6 609.8 604.8 601.4 597.9 593.6 589.0 584.1 574.9 570.2 564.1 557.3 549.9 541.8 532.7 523.4 513.7 502.8 491.3 478.6 464.9 450.1 434.3 417.5 399.6 378.3 360.2 339.9 318.9 297.2 274.9 252.6 229.8 206.8 184.1 162.0 140.6 120.0 100.7 df t,τ 1776 1806 1830 2024 2117 2275 2407 2552 2771 2955 3239 3517 3575 3948 4281 4507 5175 5549 5889 6428 6983 7391 8031 8674 9453 9758 10458 11507 12510 13443 14252 15460 16296 17400 18398 18922 19881 20881 22230 22433 22154 22977 22895 23085 22810 21713 20896 19758 18054 ˜f δt,τ 2.80 2.86 2.88 3.22 3.40 3.65 3.87 4.11 4.48 4.81 5.29 5.77 5.91 6.57 7.16 7.59 8.79 9.50 10.24 11.27 12.38 13.26 14.60 16.01 17.75 18.64 20.36 22.88 25.46 28.09 30.66 34.35 37.52 41.68 46.04 50.01 55.19 61.44 69.71 75.49 80.59 90.96 99.63 111.63 123.92 134.03 148.59 164.65 179.29

G[T |T ≥ τ0 ](τ ) := j=τ0 (1 − δ1910+j,j )

deﬁned for all τ > τ0 . As a convention, we also deﬁne G[T |T ≥ τ0 ](τ0 ) = 1. If τ0 = 0, one gets the unconditional survivor function G[T |T ≥ 0](τ ) = G[T ](τ ). In general, the relationship is G[T ](τ ) = G[T ](τ0 ) G[T |T ≥ τ0 ](τ ) As a special case, if τ0 = 42, we get the formulation G[T ](τ ) = G[T ](42) G[T |T ≥ 42](τ ) (for τ ≥ 42) (8.2.1)

Given our data, we can only calculate the second term on the right-hand side. But, since the ﬁrst term on the right-hand side is a constant, the second term provides a function proportional to the survivor function for all ages τ ≥ 42. 3. Table 8.2-1 provides the respective data which refer to the territory of Germany since 1990: values for the midyear population size, nm and t,τ nf , are taken from Segment 685 of the STATIS data base; values of the t,τ number of deaths, dm and df , are taken from Fachserie 1, Reihe 1.S.3 t,τ t,τ (Gestorbene nach Alters- und Geburtsjahren sowie Familienstand, 1948 – 1989) and from Segments 1124 -26 of the same data base.5 Death rates (per 1000) are calculated as ˜m δt,τ = 1000 dm t,τ nm t,τ and ˜f δt,τ = 1000 df t,τ nf t,τ

4 There are several proposals that do not start with age-speciﬁc death rates but try to concatenate information from period life tables; see H¨hn (1984), Dinkel (1984). Dinkel o (1992) has also suggested to combine both methods. For further discussion see also the contributions in Dinkel, H¨hn and Scholz (1996). o 5 For the year 1990, we have used additional data from Fachserie 1, Reihe 1, 1990 (p. 136).

122
1

8

MORTALITY OF COHORTS

8.2

RECONSTRUCTION FROM PERIOD DATA

123

Male survivor functions

order to provide a context for an interpretation we have added conditional survivor functions from period life tables.6 It is seen how period life tables systematically underestimate reductions in death rates that occurred in historical time. An remarkable exception is the 1949-51 life table for men. The conditional survivor function from this table seems to be mainly identical with the conditional survivor function of the 1910 quasi-cohort.7

0.5 1986-88 1949-51 1910-11 0 40 1
Female survivor functions

50

60

70

80

90

100

0.5 1986-88 1949-51 1910-11 0 40 50 60 70 80 90 100

Fig. 8.2-1 Conditional survivor functions (τ ≥ 42) for the members of the birth cohort 1910 (solid line), compared with conditional survivor functions from period life tables (dotted lines).

Death rates for the reconstruction of a cohort life table can be derived by m f ˜m ˜f deﬁning δτ := δt,τ /1000 and δτ := δt,τ /1000. So one can ﬁnally calculate conditional survivor functions τ −1

G[T m |T m ≥ 42](τ ) = j=42 τ −1

m (1 − δτ )

G[T f |T f ≥ 42](τ ) = j=42 f (1 − δτ )
6 The 7 It

data are taken from Tables 7.4-1-4 in Section 7.4.2.

4. Results of this calculation are shown as solid lines in Figure 8.2-1. In

is known, however, that the 1949-51 period life table, in particular for men, is based on underestimated death rates, see Dinkel and Meinl (1991, p. 117).

124

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

125

8.3

Historical Data
Children

As an alternative to the reconstruction of cohort life tables from period data, one can try to actually follow the life courses of people born in the past. As an example, we discuss a data set that was made available by Arthur E. Imhof and his co-workers (see Imhof et al. 1990).

150

100

8.3.1

Data Description
50
Parents

1. The data set is available from the Zentralarchiv f¨r empirische Sozialu forschung (K¨ln). A basic description can be found in Imhof et al. (1990). o The data result from so-called “Ortssippenb¨cher” (see Imhof et al. 1990, u pp. 57-66, also Knodel 1975) and refer to several diﬀerent local areas. Here we only use the data ﬁle from Ostfriesland (a region between Aurich and Leer). This data ﬁle ﬁle contains information about 24971 persons belonging to 3882 families. All families consist of a married women, in 3756 cases also information about the husband is available; the remaining 17333 persons are children. The following tables shows the distribution of family sizes.
Number of children families 0 1 2 3 4 5 6 7 321 472 374 422 429 434 411 364 Number of children families 8 9 10 11 12 13 14 15 280 173 114 61 17 6 3 1

0 1600

1700

1800

1900

Fig. 8.3-1 Absolute frequencies of birth years of 7145 parents and 17242 children in the Ostfriesland data set.

age of marriage and/or giving birth to children. We therefore proceed in two steps: We begin with calculating survivor functions for parents, this is easy because there is complete information about their death dates; we then try to estimate survivor functions for the children. Finally we compare parent’s and children’s life length.

8.3.2

Parent’s Survivor Functions

1. Since the birth years range over a very long period (Fig. 8.3-1), we distinguish four broad birth cohorts (all parents are born before 1850):
Birth years 1616 − 1699 1700 − 1749 1750 − 1799 1800 − 1849 Mothers 260 1111 1524 729 Fathers 322 1176 1455 567

For 584 persons no valid information about the birth year is available, we therefore consider only the remaining 24387 persons, 7145 parents and 17242 children. The birth years of parents range from 1616 until 1835,8 birth years of children range from 1636 to 1871. Figure 8.3-1 shows the frequency distributions on a historical time axis. 2. In many cases additional information about death years is available. In order to use this information for the estimation of survivor functions it is important to distinguish between parents and children. For parents we need to take into account that they have already survived until the
8 This is due to the selection of families: Es wurden nur Daten von Kindern aus solchen ” Ehen erhoben, bei denen das Todesdatum beider Elternteile (bei unehelichen Kindern das der Mutter) bekannt war und der Todesfall im Untersuchungsgebiet eintrat. Zudem mußte die Ehe vor 1850 geschlossen oder die erste uneheliche Geburt vor 1860 erfolgt sein.“ (Imhof et al. 1990, pp. 62-63)

The total number of mothers is 3624 and the total number of fathers is 3520. These are the cases where the birth year is known. Fortunately, for all these cases also the death year is known.9 So one can directly calculate all life lengths as shown in Table 8.3-1. ˆf 2. The data in Table 8.3-1 provide values of statistical variables Tc 1 and ˆ m1 that record, respectively, the life length of mothers and fathers belongTc ing to birth cohort c.10 The only problem concerns the fact that mothers
9 As

already mentioned, this is implied by the selection of families for the data set.

10 The

superscripts are meant to indicate female (f1 ) and male (m1 ) persons of the ﬁrst generation.

126

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

127

Table 8.3-1 Number of mothers and fathers in the Ostfriesland data set who died in the speciﬁed age.
Mothers Age 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1616/ 1699 1700/ 1749 2 1 1 1 1 3 8 11 10 11 3 5 14 9 12 9 14 8 24 9 12 20 16 18 11 10 8 27 13 16 30 10 17 17 18 12 23 14 23 22 18 21 19 23 30 30 30 25 1750/ 1799 1800/ 1849 1616/ 1699 Fathers 1700/ 1749 1750/ 1799 1800/ 1849

Table 8.3-1 (continued) Number of mothers and fathers in the Ostfriesland data set who died in the speciﬁed age.
Mothers Age 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Total 1616/ 1699 8 4 7 7 7 5 6 11 6 9 5 8 10 5 5 1 2 5 5 7 2 4 1 1 1 1 2 1 1 1 1 1 260 1111 1524 729 1 1 322 1700/ 1749 21 23 21 23 35 24 46 36 22 33 25 25 22 22 31 22 26 20 18 15 18 14 9 5 12 3 6 3 1 1750/ 1799 28 25 31 35 39 41 41 45 42 45 29 38 46 58 49 39 30 40 31 16 15 16 18 10 10 6 1 8 2 1 1 1800/ 1849 16 15 21 13 8 19 17 21 18 27 19 19 15 16 14 20 15 18 15 10 10 7 4 3 3 3 1 3 2 2 1616/ 1699 10 11 13 6 13 12 9 12 4 5 7 10 8 8 15 4 7 3 8 4 5 3 5 5 1 1 1 1 Fathers 1700/ 1749 21 37 34 19 36 32 37 38 41 39 25 35 28 24 46 26 35 21 20 15 16 15 7 4 9 2 5 3 2 1 1 1 1 1176 1455 567 1750/ 1799 32 32 32 37 56 36 35 40 51 44 55 38 33 39 23 22 23 27 26 22 15 9 15 5 5 5 2 1 1800/ 1849 9 11 7 14 14 11 14 18 9 19 17 19 5 12 18 19 11 17 18 13 7 4 5 3 4 2 2 3 1 1 1

1 1 1 1 1 2 2 4 2 4 1 7 3 6 2 2 5 6 2 2 2 1 1 2 4 4 3 5 4 3 3 3 4 6 3 2 2 1 8 4 2

4 2 2 3 4 10 8 5 13 5 3 5 11 9 7 8 7 11 5 11 11 8 15 20 14 12 3 9 10 7 17 10 11 9 11 15 16 18 17 20 29 18 18 24 25 27

3 2 4 3 4 5 8 5 10 5 11 12 5 10 7 8 6 6 9 7 6 10 16 13 7 7 8 7 4 5 4 7 6 8 8 8 14 7 11 13 17 8 16 14

2 1 1 1 1 2 2 2 3 1 1 2 2 2 1 3 2 4 5 2 2 5 6 7 3 3 7 6 3 5 5 5 1 6 3 5 18 2 3 1 3 5 6 2 6 12 8 5 8 3 7 9 16 11 10 12 9 11 10 12 10 14 16 8 15 14 20 16 26 16 19 21 26 15 26 21 20 26 4 4 2 3 5 6 9 10 8 9 15 14 11 7 13 17 17 20 11 11 9 17 19 12 15 14 23 26 25 26 26 26 26 31 30 39 30 30 43 30 1 1 4 2 3 2 4 3 2 6 5 8 5 2 3 6 5 8 4 10 4 5 6 7 6 7 9 9 9 12 10 9 10 10 12 10 10 9 7 14

2

and fathers already survived until some age. We therefore calculate conditional survivor functions. For mothers we begin at age 25, and for fathers at age 30. So we use the formula ˆf ˆ f G[Tc 1 |Tc 1 ≥ 25](τ ) =
∞ k=τ ∞ k=25

d f1 c,τ d f1 c,τ

128
1

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

129

1616 - 1699 1700 - 1749 1750 - 1799 1800 - 1849

for mothers, and the formula ˆm ˆm G[Tc 1 |Tc 1 ≥ 25](τ ) =
∞ k=τ ∞ k=30

d m1 c,τ d m1 c,τ

0.5

0 20 30 40 50 60 70 80 90 100

for fathers; df1 and dm+1 are, respectively, the number of mothers and fac,τ c,τ thers belonging to birth cohort c who died at the age of τ (see Table 8.3-1). Figures 8.3-2 and 8.3-3 show these conditional survivor functions. Interestingly, there are only small variations across the diﬀerent birth cohorts. As seen in Figure 8.3-4, these survivor functions are also very similar for mothers and fathers. Of course, one needs to recognize that we have used conditional survivor functions which only refer to parents.

ˆf ˆ f Fig. 8.3-2 Conditional survivor functions G[Tc |Tc ≥ 25] for mothers in the Ostfriesland data set.
1
1616 - 1699 1700 - 1749 1750 - 1799 1800 - 1849

8.3.3

Children’s Survivor Functions

0.5

1. The calculation of survivor functions for children is more complicated because the observations are incomplete. In about 2 % of all cases we do not know the child’s sex, and in order to distinguish female and male children, these cases cannot be used. Furthermore, the data set does not provide a valid birth year for all the remaining 8295 female and 8723 male children. However, as shown in the following table, this only concerns the ﬁrst birth cohort.
Birth years 1616 − 1699 1700 − 1749 1750 − 1799 1800 − 1849 1850 − 1881 Total Female children 154 1382 2932 3362 465 8295 (a) 46 0 0 0 0 46 (b) 108 1382 2932 3362 465 8249 (c) 59 946 2029 1939 245 5218 Male children 141 1588 3117 3421 456 8723 (a) 36 0 0 0 0 36 (b) 105 1588 3117 3421 456 8687 (c) 68 1141 2304 2174 242 5929

0 30 40 50 60 70 80 90 100

Fig. 8.3-3 Conditional survivor functions in the Ostfriesland data set.
1

ˆm ˆm G[Tc |Tc

≥ 30] for fathers

Mothers Fathers

Columns labeled (a) show the number of cases without a valid birth year, columns (b) and (c) show, respectively, the number of cases with a valid birth year and a valid death year. So the question is how to use this incomplete information in order to estimate survivor functions. 2. As an example we consider male children born in the years 1750 – 1799. A ﬁrst possibility would be to use only the 2304 complete observations. It would be possible then to immediately calculate a survivor function in the same way as was done in the previous section for parents. However, would the result be trustworthy? Assuming that we do not have any idea about the selection process that created the incomplete observations, no answer can be given. Nevertheless, we can at least calculate lower and upper bounds for a range of possible survivor functions. In order to calculate a lower bound we can simply assume that all children with an unknown death year died at age τ = 0, and in order to calculate an upper bound we can assume that all these children survived the highest observed age which

0.5

0 30 40 50 60 70 80 90 100

ˆf ˆ f Fig. 8.3-4 Comparison of conditional survivor functions G[Tc |Tc ≥ 30] ˆ m ˆm and G[Tc |Tc ≥ 30] for birth cohort 1750 – 1799.

130
1

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

131

0.5

upper bound

kind of information can be used that allows to conclude that a person has survived some known age. For example, there might be information about a date of marriage or child-bearing (see the discussion in Imhof et al. 1990, p. 68 and p. 71). For some of those persons without a valid death year such information about a latest observation is provided in the data set, in most cases this is the marriage year. The following table shows the availability of this information.
Birth years 1616 − 1699 1700 − 1749 1750 − 1799 1800 − 1849 1850 − 1881 Female children (a) (b) 49 436 903 1423 220 11 142 393 1021 124 Male children (a) (b) 37 447 813 1247 214 7 119 246 696 80

lower bound

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 8.3-5 Lower and upper bounds for the survivor function of male children born between 1750 and 1799 in the Ostfriesland data set. The dotted line shows a survivor function calculated from only the complete observations.
1

In our example, there are 813 male children without a valid death year, but in 246 cases we know a date of latest observation and can use this additional information to get better bounds for the survivor function. This is shown in Figure 8.3-6. Obviously, compared with Figure 8.3-5, the bounds are somewhat narrower. 4. Without the introduction of additional assumptions, the calculation of bounds to include the unknown survivor function is the best one can do. Of course, depending on the proportion of incomplete observations and the possibilities to use additional information, the range of possible survivor functions might become very broad and then looses almost all informational content. An alternative approach which is often followed in statistical practice would be to make assumptions about the process that leads to incomplete observations. The simplest assumption would be that the durations which are incompletely observed “randomly” result from the same distribution as the completely observed durations. Of course, this assumption might be wrong and there are almost no possibilities for checking the assumption with the given data set. The most often used estimation method for survivor functions which is based on this assumption is the Kaplan-Meier procedure (Kaplan and Meier 1958) and will be discussed in the next section.

0.5

upper bound

lower bound

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 8.3-6 Lower and upper bounds for the survivor function of male children born between 1750 and 1799 in the Ostfriesland data set calculated by using additional information about latest observation. The dotted line shows a survivor function calculated from only the complete observations.

8.3.4

The Kaplan-Meier Procedure

is 99 in this example. Figure 8.3-5 shows these bounds and also a survivor function calculated from only the complete observations. Obviously, there is a broad range for possible survivor functions. 3. The question therefore arises whether one can ﬁnd additional information that can be used to get more narrow bounds. For this purpose any

1. In order to explain the Kaplan-Meier procedure we refer to a general duration variable ˆ ˜ T : Ω −→ T := {0, 1, 2, 3, . . .} which is deﬁned for some population Ω. For each individual ω ∈ Ω, the ˆ ˆ ˜ variable T records a duration T (ω) ∈ T (see Section 7.3.1). This is the vari-

132

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

133

able of theoretical interest. Observations are given by a two-dimensional variable ˜ (T, D) : Ω −→ T × {0, 1} ˆ If D(ω) = 1 the observation is complete and we can conclude that T (ω) = T (ω). On the other hand, if D(ω) = 0 the observation is right censored ˆ and we can only conclude that T (ω) ≥ T (ω).11 The question then is how ˆ to estimate the distribution of T by using the information provided by (T, D). 2. The Kaplan-Meier procedure is intended to provide one kind of answer. ˆ One possibility to explain this method is by referring to rates. Let r[T ] ˆ. As was denote the rate function corresponding the distribution of T ˆ shown in Section 7.3.1, the survivor function of T can then be calculated as follows: t−1 G∗ is then called the Kaplan-Meier estimate of the unknown survivor ˆ function G[T ]. 3. As an illustration we continue with the example from the previous section and consider the male children born in the years 1750 to 1799. The variable (T, D) will be deﬁned as follows: a) If ω refers to a male child in this birth cohort and we know the death year, then D(ω) = 1 and T (ω) records the life length of ω. b) If we do not know the death year but have information about a latest observation, then D(ω) = 0 and T (ω) is the age at latest observation. c) If we know neither a death year nor a year of latest observation, then D(ω) = 0 and T (ω) = 0, that is, the observation is right censored already at the beginning. Table 8.3-2 shows the data for male and female children born between 1750 and 1799. The numbers of male and female children who died at age τ are denoted, respectively, by dm2 and df2 , and the number of censored τ τ observations are denoted by cm2 and cf2 . τ τ 4. Table 8.3-3 illustrates the calculations. The column labeled nm2 shows τ the number of cases in the “risk set”, that is, the number of persons who are still at risk to die at age τ . Then follow the number of persons who actually died (dm2 ) and the number of censored observations (cm2 ) at the current τ τ age. This then allows to calculate the observed rate r ∗ (τ ) and to update the survivor function G∗ (τ ). The resulting survivor function is shown in Figure 8.3-7. Also shown is a survivor function calculated from only the complete observations which is located below the Kaplan-Meier survivor function. This follows from the fact that, in the calculation of observed rates, the Kaplan-Meier procedure takes into account also the censored observations. However, this does not make the Kaplan-Meier estimate always superior to the other one that only uses complete observations. As shown in the ﬁgure, both survivor functions are in the range that is indicated by the lower and upper bounds. 5. Using the data from Table 8.3-2, one can compare survivor functions for male and female children born between 1750 and 1799. The Kaplan-Meier estimates shown in Figure 8.3-8 suggest somewhat higher death rates for male children. 6. Survivor functions of parents and children cannot be compared directly because parents already survived until some age. But we can compare conditional survivor functions. As was done in Figure 8.3-4, we condition on having survived age 30. Conditional survivor functions for children can directly be derived from the Kaplan-Meier estimates: if G∗ is an estimate

ˆ G[T ](t) = j=0 ˆ (1 − r[T ](j))

Of course, with partially censored data we do not know the rate function ˆ r[T ], but we can use the observations provided by (T, D) to get estimates and then use the above formula to calculate an estimate of the survivor function. Estimates of values of the rate function can be calculated in the following way: ˆ r[T ](t) ≈e r∗ (t) := |{ω ∈ Ω | T (ω) = t, D(ω) = 1}| |{ω ∈ Ω | T (ω) ≥ t}|

r∗ (t) might be called the observed rate at t, as derived from the observations which might, and actually have, ended their duration in this temporal location.12 Of course, whether this observed rate is approximately equal ˆ to the value of the rate function r[T ] at t is not known and, as already remarked at the end of the previous section, can also not be checked with incomplete data. We therefore use the notation ‘≈e ’ to indicate that the right-hand side is assumed to be a reasonable estimate of the left-hand quantity. Given this assumption one immediately derives an estimate of the survivor function, namely t−1 ˆ G[T ](t) ≈e G∗ (t) := j=0 11 Also

(1 − r∗ (j))

a strict inequality sign might here be used. However, with broadly deﬁned units of the time axis, it is often plausible that an episode might end in the same temporal location where the observation ends. set referred to in the denominator is sometimes called the observed risk set, and the set referred to in the numerator is called the observed event set.

12 The

134

8

MORTALITY OF COHORTS

8.3

HISTORICAL DATA

135

Table 8.3-2 Information about male and female children belonging to birth cohort 1750 – 1799 in the Ostfriesland data set. τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 d m2 τ 431 161 110 70 52 41 38 28 19 16 12 12 6 12 10 5 8 11 12 5 11 13 12 16 11 22 8 9 7 14 6 16 16 12 17 11 17 11 13 9 10 13 10 19 10 10 9 12 18 14 9 c m2 τ 567 df 2 τ 403 134 95 54 46 42 32 23 9 13 9 5 8 8 7 4 7 11 7 10 4 8 10 15 12 8 13 13 6 8 13 11 7 11 12 7 17 9 15 11 16 12 7 6 9 16 14 6 15 6 13 c f2 τ 510 τ 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 d m2 τ 17 24 20 19 21 17 17 20 23 19 22 10 24 30 22 27 24 26 27 29 38 28 30 28 30 34 28 23 25 16 15 19 24 19 14 11 8 11 8 3 6 2 c m2 τ 2 4 2 2 1 4 2 1 1 2 df 2 τ 12 13 8 13 11 15 16 13 18 12 13 19 22 24 20 17 18 22 20 23 30 36 35 28 28 23 31 28 26 31 28 19 22 13 8 10 16 9 3 5 5 5 1 1 1 c f2 τ 5 2 3

Table 8.3-3 Application of the Kaplan-Meier procedure to the data for male children in Table 8.3-2. τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 n m2 τ 3117 2119 1958 1848 1778 1726 1685 1647 1619 1600 1584 1572 1560 1554 1542 1532 1527 1519 1508 1496 1490 1478 1462 1449 1432 1415 1385 1367 1349 1330 1307 1287 1260 1239 1223 1196 1177 1151 1138 1115 1095 1075 1054 1035 1008 990 974 961 943 920 dm 2 τ 431 161 110 70 52 41 38 28 19 16 12 12 6 12 10 5 8 11 12 5 11 13 12 16 11 22 8 9 7 14 6 16 16 12 17 11 17 11 13 9 10 13 10 19 10 10 9 12 18 14 c m2 τ 567 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 1 1 6 8 10 9 12 9 14 11 5 4 10 8 9 2 10 11 10 8 9 8 8 6 4 6 5 7 r ∗ (τ ) 0.1383 0.0760 0.0562 0.0379 0.0292 0.0238 0.0226 0.0170 0.0117 0.0100 0.0076 0.0076 0.0038 0.0077 0.0065 0.0033 0.0052 0.0072 0.0080 0.0033 0.0074 0.0088 0.0082 0.0110 0.0077 0.0155 0.0058 0.0066 0.0052 0.0105 0.0046 0.0124 0.0127 0.0097 0.0139 0.0092 0.0144 0.0096 0.0114 0.0081 0.0091 0.0121 0.0095 0.0184 0.0099 0.0101 0.0092 0.0125 0.0191 0.0152 G∗ (τ ) 1.0000 0.8617 0.7963 0.7515 0.7231 0.7019 0.6852 0.6698 0.6584 0.6507 0.6442 0.6393 0.6344 0.6320 0.6271 0.6230 0.6210 0.6177 0.6133 0.6084 0.6063 0.6019 0.5966 0.5917 0.5851 0.5806 0.5716 0.5683 0.5646 0.5616 0.5557 0.5532 0.5463 0.5394 0.5341 0.5267 0.5219 0.5143 0.5094 0.5036 0.4995 0.4950 0.4890 0.4843 0.4755 0.4707 0.4660 0.4617 0.4559 0.4472 τ 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 n m2 τ 899 887 868 840 818 797 775 754 735 714 690 671 647 637 613 581 559 532 508 481 453 424 386 358 327 299 269 235 207 184 158 142 127 108 84 65 51 40 32 21 13 10 4 2 2 2 2 1 1 1 dm 2 τ 9 17 24 20 19 21 17 17 20 23 19 22 10 24 30 22 27 24 26 27 29 38 28 30 28 30 34 28 23 25 16 15 19 24 19 14 11 8 11 8 3 6 2 0 0 0 1 0 0 1 c m2 τ 3 2 4 2 2 1 4 2 1 1 0 2 0 0 2 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 r ∗ (τ ) 0.0100 0.0192 0.0276 0.0238 0.0232 0.0263 0.0219 0.0225 0.0272 0.0322 0.0275 0.0328 0.0155 0.0377 0.0489 0.0379 0.0483 0.0451 0.0512 0.0561 0.0640 0.0896 0.0725 0.0838 0.0856 0.1003 0.1264 0.1191 0.1111 0.1359 0.1013 0.1056 0.1496 0.2222 0.2262 0.2154 0.2157 0.2000 0.3438 0.3810 0.2308 0.6000 0.5000 0.0000 0.0000 0.0000 0.5000 0.0000 0.0000 1.0000 G∗ (τ ) 0.4404 0.4360 0.4276 0.4158 0.4059 0.3965 0.3860 0.3776 0.3691 0.3590 0.3475 0.3379 0.3268 0.3218 0.3096 0.2945 0.2833 0.2696 0.2575 0.2443 0.2306 0.2158 0.1965 0.1822 0.1670 0.1527 0.1373 0.1200 0.1057 0.0939 0.0812 0.0730 0.0653 0.0555 0.0432 0.0334 0.0262 0.0206 0.0164 0.0108 0.0067 0.0051 0.0021 0.0010 0.0010 0.0010 0.0010 0.0005 0.0005 0.0005

1

1 1

2

1

1 1 3 1 1 6 8 10 9 12 9 14 11 5 4 10 8 9 2 10 11 10 8 9 8 8 6 4 6 5 7 3

2 4 6 9 12 23 21 19 30 20 19 20 11 12 8 21 13 10 15 15 12 12 11 9 14 6 12 4 4 1 1

1 1

1

1

1

2

1

1 1

136
1

8

MORTALITY OF COHORTS

8.4

MORTALITY DATA FROM PANEL STUDIES

137

1

Mothers Female children

0.5

0.5

0 0 10 20 30 40 50 60 70 80 90 100

0 30 40 50 60 70 80 90 100

Fig. 8.3-7 Kaplan-Meier survivor function for male children born between 1750 and 1799 in the Ostfriesland data set calculated in Table 8.3-3 (solid line). The dotted line and the grey-scaled bounds are taken from Figure 8.3-6.
1

Fig. 8.3-9 Kaplan-Meier survivor function for mothers and female children born between 1750 and 1799 in the Ostfriesland data set.
1

Fathers Male children

Female children Male children

0.5 0.5

0 30 0 0 10 20 30 40 50 60 70 80 90 100 40 50 60 70 80 90 100

Fig. 8.3-10 Kaplan-Meier survivor function for fathers and male children born between 1750 and 1799 in the Ostfriesland data set.

Fig. 8.3-8 Kaplan-Meier survivor functions for female and male children born between 1750 and 1799 in the Ostfriesland data set (Table 8.3-2).

ˆ of G[T ], then G∗ (t) ˆ ˆ G[T |T ≥ t0 ] ≈e ∗ = G (t0 ) t−1 1750 and 1799, Figure 8.3-11 provides the corresponding curves for fathers and male children. In both cases, the conditional survivor functions are more or less similar, providing some conﬁdence into the Kaplan-Meier estimates of the survivor functions for children, at least for higher ages.

8.4
(1 − r (j))
∗

Mortality Data from Panel Studies

j=t0

This section is not ﬁnished.

Figure 8.3-9 compares mothers and female children, both born between

9.1

LEFT TRUNCATED DATA

139

Chapter 9

Parent’s Length of Life
Additional information about the survival of people in historical time is available from the German Life History Study (GLHS) and the Socioeconomic Panel (SOEP). In both surveys respondents were asked to provide information about their parents, in particular about their parent’s birth years, whether they were still alive at the interview date, and, if not, about their respective years of death. We can try to use this information to enlarge the knowledge about mortality conditions in earlier periods.1 However, we need ﬁrst to consider the speciﬁc features of the data generating process because, in this case, the information about the parents results from a sample of their children. Information is therefore only available for persons who became a parent of at least one child, and this information also depends on the child’s survival to the interview dates. We ﬁrst introduce the notion of left truncated data, and then use a simulation model to study possible complications. The insights gained by this study will ﬁnally be used to draw some inferences from the GLHS and SOEP data.

life length, and C(ω) records the age at which ω became, for the ﬁrst time, mother of a child, or is -1 if this did not happen during ω’s lifetime.2 The problem can now be stated as follows: The available data only refer to a subset of Ω, namely Ω∗ := {ω ∈ Ω | C(ω) ≥ 0} consisting of women who became a mother of at least one child. We start from the assumption that information from all children is available, ignoring their mortality up to the interview date. The question then is, how, and to what extent, can these data be used to assess the distribution of T in Ω ? 3. We follow the basic idea of the Kaplan-Meier procedure to assess the distribution of T via rates (see Section 8.3.4). Assume complete observations. It would be possible, then, to create a risk set Ωτ := {ω ∈ Ω | T (ω) ≥ τ } containing all members of Ω who might die at age τ , and an event set {ω ∈ Ωτ | T (ω) = τ } containing those members of Ωτ who actually died at age τ . From these sets one can calculate rates r(τ ) := |{ω ∈ Ωτ | T (ω) = τ }| |Ωτ |

9.1

Left Truncated Data

1. The ﬁrst problem obviously concerns the fact that the available data contain information only about those persons who became mother or father of at least one child. One possibility would be to restrict any inferences to those persons. This would allow to directly apply the standard KaplanMeier procedure to estimate survivor functions with partially censored data (see Section 8.3.4). On the other hand, one may also assume that mortality is independent of whether or not persons became parents of a child. This assumption would open the possibility to draw at least some inferences about the whole population. Of course, it will not be possible to estimate complete survivor functions because no information is available about death events occurring at early ages. But given the independence assumption, it might be possible to estimate survivor functions conditional on having survived to the age at which children are born. 2. In order to discuss this question we consider a simple model where we are given a population set Ω and a two-dimensional variable: ˜ ˜ (T, C) : Ω −→ T × T ∪ {−1} ˜ T := {0, 1, 2, . . .} is a property space for age. For each ω ∈ Ω, T (ω) is ω’s
1 For previous analyzes of the SOEP data about the life lengths of parents see Schepers and Wagner (1989), and Klein (1993).

which can be used to ﬁnd the survivor function τ −1

G[T ](τ ) = j=0 (1 − r(j))

Now, since our data only refer to Ω∗ , we cannot create these sets and consequently cannot calculate the rates r(τ ). One can only try to estimate these rates, but this will then require an assumption. Our assumption will be that mortality does not depend on whether, and when, people became mothers and fathers. In terms of the model, the assumption is3 r(τ ) ≈e r∗ (τ ) := ˜ |{ω ∈ Ω∗ | T (ω) = τ }| τ |Ω∗ | τ

2 For the present discussion we assume that Ω refers to women only. The same reasoning, however, applies to men with minor modiﬁcations. 3 As in Section 8.3.4, we use the notation ‘≈ ’ to indicate that the right-hand side is e assumed to be a reasonable estimate of the left-hand quantity.

140

9

PARENT’S LENGTH OF LIFE

9.1

LEFT TRUNCATED DATA

141

where the risk set on the right-hand side is now deﬁned by Ω∗ := {ω ∈ Ω∗ | T (ω) ≥ τ, 0 ≤ C(ω) ≤ τ } τ Since both this risk set and the corresponding event set can be calculated from data restricted to Ω∗ , one gets estimates of the rates r(τ ). Of course, this will be possible only for ages τ ≥ a+ := min{C(ω) | ω ∈ Ω∗ } which implies that only the conditional survivor function G[T |T ≥ a+ ] can be estimated: τ −1

3 The remaining number of women is 800 − 200 = 600, and we assume that 200 of these women die in the age class τ = 3. Furthermore we assume that again 200 women become mothers of a child. Consequently, about 200/3 ≈ 67 of these women also will die. Furthermore, there are 358 women who became mothers before τ = 3, and of these 119 will die. 4 Finally, there remain 400 women and all will die because τ = 4 is the last and open-ended age class. 5. Given this situation, we can ﬁrst assume that complete data are available. This would allow to calculate the survivor function in the following way: τ 0 1 2 3 4 |Ωτ | 1000 900 800 600 400 |{ω ∈ Ωτ | T (ω) = τ }| 100 100 200 200 400 r(τ ) 1/10 1/9 1/4 1/3 1 G[T ](τ ) 1.00 0.90 0.80 0.60 0.40

G[T |T ≥ a ](τ ) ≈e j=0 +

(1 − r∗ (j)) ˜

(9.1.1)

Notice also that in general Ω∗ = {ω ∈ Ω∗ | T (ω) ≥ τ } τ because a women in Ω∗ might get her ﬁrst child later than τ . In order to create suitable risk sets Ω∗ one has to apply the same conditioning as used τ for the event sets to meet the assumption that mortality does not depend on whether, and when, women become mothers. 4. An example can serve to illustrate the reasoning. We assume that Ω contains 1000 women and consider, in turn, ﬁve age classes: 0 In the age class τ = 0 all 1000 women are at risk of dying, and we assume that 100 women actually die. 1 There remain 900 women who might die in the age class τ = 1. We assume that 100 of these women actually die. However, some of these women will also become mothers of children. We assume that this is true of 200 women. Implied by the assumption that mortality does not depend on becoming a mother, about 100 200 ≈ 22 900 will also die; of course, they belong to the 100 persons who die in this age class. 2 There remain 800 women who might die in the age class τ = 2. We assume that 200 of these women actually die. Furthermore, we assume that 300 women become mothers of a child. The assumption of equal mortality implies that about 300 / 4 = 75 of these women also die. In addition, there are 178 women who became mothers in age class τ = 1, and of these about 178/4 ≈ 45 will die.

Obviously, the survivor function is simply proportional to the number of persons in the risk set. In a next step, we assume that data are only available for Ω∗ , that is, women who gave birth to at least one child. In our example, there are altogether 200 + 300 + 200 = 700 women. We now perform the same calculations for these women using the risk and event sets as deﬁned above. This can be summarized in the following table: τ 0 1 2 3 4 |Ω∗ | τ 200 478 558 372 |{ω ∈ Ω∗ | T (ω) = τ }| τ 22 120 186 372 r ∗ (τ ) ˜ 0.110 0.251 0.333 1.000 G∗ (τ ) 1 g g · 0.890 g · 0.667 g · 0.445

For τ = 0, the risk set is empty and we cannot calculate a death rate. Consequently, we also cannot estimate the value of the survivor function for τ = 1 which, in the table, is substituted by the unknown value g. For τ > 0 it is possible, however, to create risk and event sets and calculate corresponding rates r ∗ (τ ). And these rates can ﬁnally be used to derive ˜ the values of the conditional survivor function G[T |T ≥ a+ ](τ ) = G[T ](τ ) 1 ≈e G∗ (τ ) ∗) G[T ](a g

where a+ = 1 in the present example.

142

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

143

6. The important point is to recognize the diﬀerence between the unconditional rates r∗ (τ ) := |{ω ∈ Ω∗ | T (ω) = τ }| |{ω ∈ Ω∗ | T (ω) ≥ τ }|

Box 9.2-1 Skeleton of the simulation model.

and the rates r ∗ (τ ) deﬁned above. Using the rates r ∗ (τ ) would result in a ˜ survivor function for the variable T ∗ deﬁned for the reference set Ω∗ . But this survivor function will not, in general, be proportional to a conditional survivor function for the variable of interest, T , which is deﬁned for the reference set Ω. In order to calculate a conditional survivor function for T one needs the rates r ∗ (τ ), see formula (9.1.1). The risk sets Ω∗ from ˜ τ which the rates r ∗ (τ ) are derived take into account the temporal nature of ˜ becoming a member of Ω∗ . In our example, a woman becomes a member of Ω∗ after the birth of her ﬁrst child. Corresponding observations are therefore called left truncated , in this example, left truncated at the age of ﬁrst child-bearing. Since our observations of death events only relate to members of Ω∗ , the risk of an observed death at the age τ only relates to persons who became members of Ω∗ until τ . This argument will again be used in Section 9.2.2.

For each ω ∈ Ω do: n(ω) := 0; # counter for ω’s children For (τ = 0, . . . , 100 ) { Get a random number ; If ( ≤ βτ,n(ω) ) add one child to n(ω), create a new entry in Ωc , and record the mother’s identiﬁcation number and age; Get another random number ; f If ( ≤ δτ ) goto L1; } L1: Record that ω died at age τ and has given birth to c(ω) children, also record for all children the mother’s age at death; For each ω ∈ Ωc do: For (τ = 0, . . . , 100 ) { Get a random number ; c If ( ≤ δτ ) goto L2; } L2: Record that ω died at age τ ;

9.2

Selection by Survival

Before applying the method discussed in the previous section to the GLHS and SOEP data we need to discuss the additional complications that result from a retrospective survey of children who are asked about their parents. In order to understand some of the problems that might result from this speciﬁc data generating process we use a simulation model.

9.2.1

The Simulation Model

1. The basic idea is to simulate data for a set of women according to a known survivor function and then to compare this known function with estimates based on information from the women’s children who survived until some ﬁxed interview date. In the ﬁrst version of the model we refer to a set of N = 10000 women all born in the year t0 := 1900; this set will be denoted by Ω. We assume that these women survive according to the 1891–1900 period life table for Germany (see Table 7.4-3 in Section f 7.4.2); the corresponding age-speciﬁc death rates will be denoted by δτ . 4 Additional assumptions concern the birth of children. We assume ageand parity-speciﬁc birth rates βτ,k :=
Number of women giving birth to a further child at age τ Number of women aged τ and having k children

In order to arrive at a simulation model that roughly corresponds to the historical situation these rates are calculated from a subsample of the census that took place in Germany in the year 1970 (see Section 12.2.1). For each women who survived until 1970, this sample contains information about the birth years of up to 12 children. For the calculation of age- and parity-speciﬁc birth rates we have used all of these women who were born between 1870 and 1925. These rates are only used to set up our simulation model, we therefore do not pay attention to historical accuracy. 2. We can thus think of a second reference set, Ωc , containing identiﬁcation numbers of all children born of the women in Ω. Of course, the number of members of Ωc is not known in advance but depends on the death rates f δτ and the birth rates βτ,k . But given these rates, we can ﬁnally create two lists. One list containing, for each women in Ω, her identiﬁcation number, her death year, and her number of children. And another list that contains, for each child in Ωc , an identiﬁcation number, the birth year, and the identiﬁcation number of the mother. In addition, in order

4 Women, as well as men, can become parents in diﬀerent ways. In the model we only consider women who might become mothers by giving birth to children.

144

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

145

800 700 600 500 400 300 200 100 0 1900 1950 2000 2050
Death years of women/mothers Death years of children Birth years of children

importance of correctly taking into account left truncated observations. What we want to recover is some part of the distribution of the variable ˜ T : Ω −→ T := {0, 1, 2, 3, . . .} that records the life length of the members of Ω. Of course, our observations refer at best to a subset of Ω that consists of those members of Ω who gave birth to at least one child. This subset will be denoted by Ω∗ . We can now again deﬁne a variable ˜ T ∗ : Ω∗ −→ T := {0, 1, 2, 3, . . .} that records the life length of the members of Ω∗ . However, as already mentioned in Section 9.1, it is important to recognize that the distribution of T ∗ will not, in general, be identical with a conditional distribution of T. 2. Referring to the simulation model of the previous section, we have def ﬁned the distribution of T by the death rates δτ . The survivor function of T is therefore given by τ −1

Fig. 9.2-1 Frequency distributions of birth and death years in the simulated data set.

to simulate a retrospective survey, we assume that the children survive according to the 1960–1962 period life table for women (see Table 7.4-3 in Section 7.4.2); the corresponding age-speciﬁc death rates will be denoted c by δτ . So we can add, for each child in the second list, also a death year. 3. Box 9.2-1 depicts the algorithm that we have used to generate the data for the simulation model. In this description, refers to a draw from random numbers which are equally distributed in the interval from 0 to 1. Using this algorithm we get the ﬁrst list with N = 10000 entries that record the identiﬁcation numbers of the women in Ω, their age at death, and their number of children. Of these women, 4776 have at least one child.5 We also get the second list which, in our implementation of the model, contains entries for 11407 children. Figure 9.2-1 shows a frequency distribution of the years in which the women in Ω died on a historical time axis. Also shown are frequency distributions of the birth and death years of the children. Note that the algorithm is based on the assumption that women’s survival is independent of their giving birth to children. Problems that might result from a violation of this assumption can therefore not be checked within this model.

G[T ](τ ) = j=0 f (1 − δj )

Obviously, in order to recover (some part of) this survivor function we f need estimates of the death rates δτ . However, these death rates are systematically diﬀerent from the death rates r∗ (τ ) := |{ω ∈ Ω∗ | T ∗ (ω) = τ }| |{ω ∈ Ω∗ | T ∗ (ω) ≥ τ }|

which correspond to the variable T ∗ and might be used to calculate its f survivor function. In order to ﬁnd estimates of δτ , we need to take into ∗ account that women only become members of Ω when they have given birth to a ﬁrst child. We therefore consider a two-dimensional variable ˜ ˜ (T ∗ , C ∗ ) : Ω∗ −→ T × T where T ∗ is deﬁned as before and C ∗ records the age at which members of Ω∗ gave, for the ﬁrst time, birth to a child. This then allows to deﬁne a rate function r∗ (τ ) := ˜ |{ω ∈ Ω∗ | T ∗ (ω) = τ C ∗ (ω) ≤ τ }| |{ω ∈ Ω∗ | T ∗ (ω) ≥ τ, C ∗ (ω) ≤ τ }|

9.2.2

Considering Left Truncation

1. Before using the model to discuss the question whether we might be able to recover the survivor function of the members of Ω based on information resulting from a retrospective survey of their children, we illustrate the
5 One should note that, based on the 1891–1900 period life table, only 68 % of the women survived age 20, and only 60 % survived age 40.

The denominator counts the number of members of the risk set at τ , deﬁned as Ω∗ := {ω ∈ Ω∗ | T ∗ (ω) ≥ τ, C ∗ (ω) ≤ τ } τ

146
1

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

147

1

0.5

0.5

0 0 10 20 30 40 50 60 70 80 90 100

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 9.2-2 Survivor functions, conditional on τ ≥ 21, from the 1891–1900 period life table (solid line) and from women with at least one child in the simulated data set (dotted line).

˜a Fig. 9.2-3 Conditional survivor functions G∗∗ (solid line) and G[T ∗ |T ∗ ≥ a∗ ] calculated from the simulated data set with a∗ = 21.

that is, members of Ω∗ who actually have given birth to a ﬁrst child not later than τ ; and the numerator counts the number of members of the risk ˜ set who actually died at the age τ . Obviously, r ∗ (τ ) = r∗ (τ ), but r∗ (τ ) is ˜ the death rate of women who actually are members of Ω∗ at the age of τ f and, as we have construed the model, is a reasonable estimate of δτ . We ∗ therefore should use r (τ ) to estimate a conditional version of the survivor ˜ function G[T ]. 3. In principle, it would be possible to obtain estimates of r ∗ from the age ˜ at the ﬁrst birth onward. Since this rate is zero up to the age of the ﬁrst observed death in Ω∗ , one might as well start at this age, say a∗ ,6 so that the conditional survivor function is then τ −1

from the 1891-1900 period life table for women. Obviously, both curves agree quite well. On the other hand, if we had not taken into account the fact that women become members of Ω∗ only after having given birth to a ﬁrst child, but estimated the survivor function G[T ∗ |T ∗ ≥ a∗ ], the result would be systematically biased as a consequence of the inequality r∗ (τ ) ≤ r∗ (τ ). This is illustrated by Figure 9.2-3 where the solid line shows ˜ ˜a G∗∗ and the dotted line shows G[T ∗ |T ∗ ≥ a∗ ]. 4. The fact that women become members of Ω∗ only after the birth of a child is formally equivalent to treating the observations as left truncated at the age at ﬁrst birth. Of course, nothing is wrong with estimating the ˜ survivor function of T ∗ instead of G∗∗ . The argument has only shown a that one should use the latter one if the interest is in recovering part of the distribution of T . One might also notice that, while G[T ∗ ] refers ˜ to a well-deﬁned statistical variable, this cannot be said of G∗∗ . This a function actually results from a mixture of rate functions. This is seen by a partition of Ω∗ into subsets Ω∗ := {ω ∈ Ω∗ | C ∗ (ω) = a} , consisting [a] of those members of Ω∗ who had a ﬁrst birth at the age a. Deﬁning rate functions for these subsets by ra (τ ) := ˜∗ |{ω ∈ Ω∗ | T ∗ (ω) = τ }| [a] |{ω ∈ Ω∗ | T ∗ (ω) ≥ τ }| [a]

˜a G∗∗ (τ ) := j=a∗ (1 − r∗ (j)) ˜

It might be taken as an estimate of the conditional survivor function G[T |T ≥ a∗ ]. To illustrate, we use the simulated data set from our model. Assuming complete knowledge about all women in Ω∗ , we ﬁnd that the earliest death occurs at age 19. However, this occurs only once, and at τ = 20 there is no death at all. We therefore deﬁne a∗ := 21 and, given ˜a complete knowledge, can directly calculate G∗∗ . This is shown in Figure 9.2-2 as a dotted line. Also shown as a solid line is G[T |T ≥ a∗ ] calculated course, due to the small number of cases in a sample of observations, r ∗ (a∗ ) might ˜ not be a good estimate of δ f (a∗ ) and one should condition on some later age. In fact, it might happen that r ∗ (a∗ ) = 1 so that one cannot ﬁnd a reasonable estimate of a ˜ survivor function beginning at a∗ .
6 Of

one can express r ∗ (τ ) as a mixture ˜ r∗ (τ ) = ˜ a≤τ ra (τ ) wa (τ ) ˜∗

148 where the weights, deﬁned as wa (τ ) := |{ω ∈ Ω∗ | T ∗ (ω) ≥ τ }| [a] a ≤τ

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

149

1
First child Last child Period 1891-1900

|{ω ∈

Ω∗ ] [a

| T ∗ (ω)

≥ τ }|

reﬂect the composition of the risk set at τ .
0.5

9.2.3

Using Information from Children

1. We now turn to the question of how to estimate conditional survivor functions for the members of Ω when we only have information from the children, that is, members of Ωc . So we need to take into account the relationship between Ωc and Ω∗ . To make this explicit, we introduce a variable (function) m : Ωc −→ Ω∗ such that for each child ω ∈ Ωc , m(ω) refers to the mother of ω in Ω∗ . Conversely, for each women ω ∈ Ω∗ , m−1 ({ω}) is the set of her children ¯ in Ωc . Now let Ωc denote a simple random sample from Ωc . This induces a random sample from Ω∗ , namely ¯ Ω := {ω ∈ Ω | there is an ω ∈ Ωc with m(ω ) = ω} ¯∗
∗

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 9.2-4 Comparison of conditional survivor functions calculated from two diﬀerent samples from the simulated data set.

1 weighted not weighted Period 1891-1900

¯ But Ω∗ is not a simple random sample from Ω∗ because women with more ¯ children are more frequent in Ω∗ than in Ω∗ . This should be taken into account when estimating r ∗ (τ ) from information provided by the children ˜ ¯ in the sample Ωc . 2. A further problem concerns the temporal nature of the membership of women in Ω∗ . As has been discussed in the previous section, given the data generating process assumed in our simulation model, a women belongs to Ω∗ as soon as she has given birth to her ﬁrst child. The deﬁnition of the rates r∗ makes the condition explicit by including the variable C ∗ ˜ referring to the age at the ﬁrst birth. Therefore, if ω is any member of ¯ the sample Ωc , one should not condition on the mother’s age when giving birth to ω, but on the age of her ﬁrst child-bearing. To illustrate, we use the data from the simulation model and compare two ﬁctitious samples: ¯ ¯ Ωc contains all ﬁrst-born children from Ωc , and Ωc contains all last-born 1 2 children from Ωc . Of course, both samples provide the same information about the life length of women in Ω∗ . But there are now diﬀerent ways to select truncation times. If we condition on the age of the mothers when giving birth to the children in the samples, we get the results shown in Figure 9.2-4. Obviously, conditioning on the mother’s age when giving birth to her last child would result in an extremely biased estimate.

0.5

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 9.2-5 Comparison of conditional survivor functions estimated with, and without, weights from the simulated data set.

3. In order to avoid this mistake, we should, ideally, have values of the following variable:
∗ ∗ ∗ ˜ ˜ (Tc , Cc , Nc ) : Ωc −→ T × T × {1, 2, 3, . . .} ∗ where Tc (ω) provides information about the (possibly censored) life length ∗ of ω’s mother, Cc (ω) provides information about the mother’s age at ﬁrst ∗ child-bearing, and Nc (ω) counts the mother’s number of children. Since ∗ ¯ Nc will be used to provide weights for the observations in the sample Ωc ,

150

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

151

this should be the number of children surviving up to the time when the sample is drawn. Now, assuming that this information is available from a ¯ simple random sample Ωc , the rates r∗ can be estimated in the following ˜ way:7 r∗ (τ ) ≈e rw (τ ) := ˜ ˜∗
1 ¯ ∗ ω∈Ωc Nc (ω) 1 ¯ ∗ ω∈Ωc Nc (ω) ∗ ∗ I[Tc = τ, Cc ≤ τ ](ω) ∗ ∗ I[Tc ≥ τ, Cc ≤ τ ](ω)

1

0.5

To illustrate, we use again data from the simulation model. Figure 9.2-5 compares conditional survivor functions calculated from estimated rates rw (τ ) and from analogously deﬁned rates where the weights are dropped.8 ˜∗ ∗ The ﬁgure clearly indicates that one should use the weights 1/Nc if this information is available. 4. However, this information might not be available and it is important, therefore, that there is also another and simpler way to arrive at reasonable estimates. In order to explain this possibility consider the risk set Ω∗ = {ω ∈ Ω∗ | T ∗ (ω) ≥ τ, C ∗ (ω) ≤ τ } τ at τ . The death rates to be estimated can then be written as r∗ (τ ) = ˜ |{ω ∈ Ω∗ | T ∗ (ω) = τ }| τ |Ω∗ | τ

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 9.2-6 Conditional survivor function estimated from the rates rc (τ ), ˜∗ compared with a conditional survivor function from the 1891-1900 period life table.

However, by assumption these rates are all (approximately) identical with the death rate r ∗ (τ ). Consequently, we do not need weights when we only ˜ use information from children born until τ . Instead, we can directly refer to the sets of children born of women in Ω∗ which can be deﬁned by τ,k Ωc := m−1 (Ω∗ ) τ,k τ,k The death rates rk (τ ) may then be written as ˜∗ rk (τ ) ≈e ˜∗
∗ |{ω ∈ Ωc | Tc (ω) = τ }| τ,k c | |Ωτ,k

By assumption, these rates do not depend on the number of children born of members of Ω∗ until τ , and also do not depend on the children’s birth τ dates. To make this explicit, we may partition the risk sets into subsets ∗ according to the number of children born until τ . Let Kτ (ω) denote the ∗ number of children born of ω until τ . Each risk set Ωτ may then be written as a union of subsets Ω∗ τ,k := {ω ∈ Ω∗ τ
∗ | Kτ (ω)

= k}

and, since these rates are approximately identical across the subsets, we might ﬁnally write ˜∗ r∗ (τ ) ≈e rc (τ ) := ˜
∗ ∗ |{ω ∈ Ωc | Tc (ω) = τ, Sc (ω) ≤ τ }| ∗ ∗ |{ω ∈ Ωc | Tc (ω) ≥ τ, Sc (ω) ≤ τ }|

taken over all possible values of k. Furthermore, we can deﬁne death rates for these subsets, rk (τ ) := ˜∗
7 The

(9.2.1)

|{ω ∈

Ω∗ τ,k

| T (ω) = τ }| |Ω∗ | τ,k

∗

∗ where now Sc (ω) is the age of ω’s mother at the birth of ω. Notice that this approach does not require any weights and also requires no information about the mothers age at her ﬁrst child-bearing.

notation uses indicator variables. If X is any statistical variable with a possible value x, then ˜  1 if X(ω) = x ˜ I[X = x](ω) := ˜ 0 otherwise
8 In the calculation we have used all observations from Ω c , but basically the same diﬀerences would result from a simple random sample from Ωc .

5. To illustrate the argument we use again data from the simulation model. We take into account all children in Ωc but, for the calculation of the rates rc (τ ) only use information from children born not later than τ . Of course, ˜∗ this simply means to use all information from Ωc and, for each ω ∈ Ωc , treat ∗ the observation about ω’s mother as left truncated at Sc (ω).9 Figure 9.2-6
9 One can use, therefore, any standard Kaplan-Meier procedure that allows for left truncated data. We have used TDA’s dple procedure.

152

9

PARENT’S LENGTH OF LIFE

9.2

SELECTION BY SURVIVAL

153

shows the conditional survivor function calculated from the rates r ∗ c(τ ). ˜ This function obviously agrees quite well with the 1891-1900 period life table that was used to generate the data. Of course, the result would be basically the same if we had used as simple random sample from Ωc .

7000 6000 5000 4000 3000 2000 1000 t = 2020 t = 2010 t = 2000

9.2.4

Retrospective Surveys

1. In the previous section we assumed that we have data from a simple random sample from the complete set of children, Ωc . However, our data actually result from a retrospective survey performed in some speciﬁc year, say t, and we therefore have to take into account that not all members of f Ωc survive until t. Fortunately, the approach to estimate δτ via the rates ∗ rc (τ ) that was discussed in the previous section can also be applied to a ˜ retrospective sample if we make the additional assumption that children’s life lengths are independent of their mother’s life length.10 To explain the argument, let T c denote the life length of children in the reference set Ωc . On a historical time axis, if mothers are born in the year t0 , each child ∗ ∗ ω ∈ Ωc survives until t0 + Sc (ω) + T c (ω) (as already introduced, Sc (ω) is the age of the mother when ω was born). The set of children who survive at least until the year t is therefore given by
∗ Ωc [t] := {ω ∈ Ωc | t0 + Sc (ω) + T c (ω) ≥ t}

t = 2015

0 20 30 40 50 60 70 80 90 100

Fig. 9.2-7 Sizes of the risk sets Ωc [t], depending on τ , calculated τ from four retrospective surveys of the simulated data set in the years t = 2000, 2010, 2015, and 2020.

an abbreviated notation, we may write: rc,t (τ ) = ˜∗ =
∗ ∗ ∗ P(Tc = τ, Sc ≤ τ, Sc + T c ≥ t − t0 ) ∗ ∗ ∗ P(Tc ≥ τ, Sc ≤ τ, Sc + T c ≥ t − t0 ) ∗ ∗ ∗ ∗ ∗ P(Sc + T c ≥ t − t0 | Tc = τ, Sc ≤ τ ) P(Tc = τ, Sc ≤ τ ) ∗ ∗ ∗ ∗ ∗ P(Sc + T c ≥ t − t0 | Tc ≥ τ, Sc ≤ τ ) P(Tc ≥ τ, Sc ≤ τ ) ∗ ∗ ∗ P(Sc + T c ≥ t − t0 | Tc = τ, Sc ≤ τ ) ∗ + T c ≥ t − t | T ∗ ≥ τ, S ∗ ≤ τ ) P(Sc 0 c c

In the simulation model introduced in Section 9.2.1 we assumed t0 = 1900. Based on this assumption, Figure 9.2-1 shows the survival of children in historical time. 2. Now assume a retrospective survey performed in the year t. The sample is then drawn from the reference set Ωc [t]. Following the approach discussed in the previous section, we can calculate rates rc,t (τ ) := ˜∗
∗ ∗ |{ω ∈ Ωc [t] | Tc (ω) = τ, Sc (ω) ≤ τ }| c [t] | T ∗ (ω) ≥ τ, S ∗ (ω) ≤ τ }| |{ω ∈ Ω c c

= rc (τ ) ˜∗

Now, given the assumption mentioned at the beginning, that, conditional ∗ on Sc ≤ τ , the survival of children does not depend on the survival of their mothers, the last term on the right-hand side becomes approximately
∗ ∗ P(Sc + T c ≥ t − t0 | Sc ≤ τ ) ∗ + T c ≥ t − t | S∗ ≤ τ ) P(Sc 0 c

which are deﬁned analogously to the rates rc (τ ) introduced in (9.2.1). In ˜∗ order to see that the rates rc,t (τ ) are reasonable estimates of the rates ˜∗ rc (τ ), their deﬁnition might be written in the following way: ˜∗ rc,t (τ ) = ˜∗
∗ ∗ ∗ |{ω ∈ Ωc | Tc (ω) = τ, Sc (ω) ≤ τ, Sc (ω) + T c (ω) ≥ t − t0 }| ∗ ∗ ∗ |{ω ∈ Ωc | Tc (ω) ≥ τ, Sc (ω) ≤ τ, Sc (ω) + T c (ω) ≥ t − t0 }|

and may be omitted. 3. There is, however, a further diﬃculty resulting from retrospective surveys. The later the year t in which the survey is performed, the smaller is the number of children who might participate in the survey, and consequently also the risk set to be used for the estimation of the death rates rc,t becomes smaller. This is shown in Figure 9.2-7 which is based on the ˜∗ data from our simulation model. Shown are the functions
∗ ∗ τ −→ Ωc [t] := {ω ∈ Ωc [t] | Tc (ω) ≥ τ, Sc (ω) ≤ τ } τ

The further argument proceeds in terms of conditional frequencies. Using
10 This

assumption, already built into the simulation model in Section 9.2.1, is probably not completely true. However, for the moment we will base our argument on this assumption.

as they result from four ﬁctitious retrospective surveys performed in the

154
1

9

PARENT’S LENGTH OF LIFE

9.3

INFERENCES FROM THE GLHS AND SOEP DATA

155

0.5

a) The ﬁrst study was LV I. The 2171 respondents were born in the periods 1929 – 31, 1939 – 41, and 1949 – 51; the interviews were conducted in the years 1981 – 83. In 2120 cases respondents were able to provide a valid birth year of their mother. Of these mothers, 732 died before the interview date, 1386 were still alive, and for two mothers we have no information. Complete information is therefore available for 2118 mothers. In 8 cases this information is inconsistent or implausible, for example, the birth year of the respondent is greater than the death year of the mother.11 If we exclude these cases there ﬁnally remain 2110 cases in which we know: the birth year of the mother, whether she died before the interview date, and, if she died, also the death year. Similarly, we get valid information for 2044 fathers.
0 10 20 30 40 50 60 70 80 90 100

0

Fig. 9.2-8 Conditional survivor functions, estimated from Ωc [2010] (dotted line) and calculated from the 1891–1900 period life table (solid line), both beginning at a∗ = 25.

b) The second study was LV II and involved respondents born in the years 1919 – 21. This study was conducted in two parts: LV IIA with interviews during 1985 – 86, and LV IIT with interviews during 1987 – 88. In the same way as explained for LV I we get valid information about the lifetimes of 387 + 956 = 1343 mothers and 382 + 943 = 1325 fathers. c) The third study was LV III and involved respondents born in the periods 1954 – 56 and 1959 – 61. From this study we get valid information about 1954 mothers and 1911 fathers. 2. Comparable information is available from the third wave of the SOEP conducted in 1986. All members of subsample A of the SOEP were asked to provide information about birth years of their parents, whether parents died before the interview date and, if they died, about death years. In order to get data comparable with the GLHS, we selected only persons with a German citizenship. As shown in Table 9.3-1, there are 8021 persons from which we get valid information about 7746 mothers and 7614 fathers. Taking the GLHS and SOEP data together, we ﬁnally have valid information about 13153 mothers and 12894 fathers. 3. We prepared two data ﬁles for further analysis, one for mothers and the other one for fathers. Both ﬁles contain values of four variables: Bf := birth year of the mother := birth year of the child (respondent) := 1 if mother died before the interview date, 0 otherwise := mother’s death year, or the year of the interview, depending on the value of E f Pf Ef D f years t = 2000, 2010, 2015, and 2020. The possible problem concerns estimation with left truncated data. Contrary to the standard Kaplan-Meier procedure with right censored data only, the risk set is very small at the beginning and may not allow reliable estimates of the death rates. Due to the cumulative nature of the calculation of survivor functions from these rates, any imprecisions introduced at the beginning will then propagate to values of the survivor function at later ages. To illustrate, we use the simulated data set and perform a retrospective survey in the year t = 2010. We assume that all children who survive this year, that is about 20 % of the 11407 children in Ωc , participate in the survey and provide information about their mothers. Nevertheless, we can only begin to estimate a conditional survivor function at a∗ = 25 as shown in Figure 9.2-8.

9.3

Inferences from the GLHS and SOEP Data

We now use the methods discussed in the previous sections to draw some inferences from the GLHS and SOEP data. We begin with a brief data description, then estimate survivor functions, and ﬁnally show plots of the death rates.

9.3.1

Description of the Data

1. We brieﬂy describe the available data. The basic ﬁgures are shown in Table 9.3-1. From the GLHS we use all studies which are currently available in the Zentralarchiv f¨r empirische Sozialforschung (see Section u 14.1).

Variables in the data ﬁle for fathers are deﬁned accordingly and will be denoted by B m , P m , E m , and Dm .
11 In

addition to inconsistent cases we also exclude cases with a life length which is greater than 105 years. For women we also require that the age at which the women gave birth to her child (the respondent) is not greater than 51 years.

156

9

PARENT’S LENGTH OF LIFE

9.3

INFERENCES FROM THE GLHS AND SOEP DATA

157

Table 9.3-1 Information about lifetimes of mothers and fathers which is available in the GLHS and SOEP data sets.
LV I Interview dates Respondents Mothers - valid birth year - still alive - known death year - no information - complete information - dismissed - remaining cases - still alive - died Fathers - valid birth year - still alive - known death year - no information - complete information - dismissed - remaining cases - still alive - died 1981-83 2171 2120 1386 732 2 2118 8 2110 1385 725 2062 909 1150 3 2059 15 2044 909 1135 LV IIA 1985-86 407 390 24 366 0 390 3 387 24 363 386 1 384 1 385 3 382 1 381 LV IIT 1987-88 1005 962 43 919 0 962 6 956 43 913 955 8 945 2 953 10 943 8 935 LV III 1989 2008 1954 1766 188 0 1954 0 1954 1766 188 1916 1460 451 5 1911 0 1911 1460 451 SOEP 1986 8021 7819 4872 2911 36 7783 37 7746 4854 2892 7699 3586 4053 60 7639 25 7614 3577 4037

Table 9.3-2 Deﬁnition of birth cohorts used in the estimation of survivor functions.
Mothers Cohort C1 C2 C3 C4 C5 C6 C7 Birth years 1870 1880 1890 1900 1910 1920 1930 – – – – – – – 1879 1889 1899 1909 1919 1929 1939 died 271 1064 1698 1123 438 272 123 alive 0 10 170 954 1456 2467 2219 total 271 1074 1868 2077 1894 2739 2342 died 528 1393 1591 1685 907 464 196 Fathers alive 0 12 101 600 1011 2035 1773 total 528 1405 1692 2285 1918 2499 1969

f ˆf and we only know that Tc (ω) ≥ Tc (ω). For variables pertaining to men the interpretation is analogous.

2. To illustrate the calculations we refer to women belonging to birth cohort C4. The data are shown in Table 9.3-3. The column labeled (a) shows the risk sets. As discussed in the previous section, the risk set at age τ contains all women who did not die before τ and became a mother not later than τ .13 In this example, the youngest age for which we know of a child is 15; risk sets can therefore be calculated only for ages τ ≥ τ ∗ = 15. The next column, labeled (b), shows the number of death events. Then follows column (d) providing the number of censored cases which are required to update the risk sets. As shown by the deﬁnition f f R∗ (τ ) := {ω | Tc (ω) ≥ τ, Cc (ω) ≤ τ } f women belong to a risk set only until the maximal value of Tc , that is, until a death event occurs or until the interview date (of their children).

9.3.2

Survivor Functions of Parents

1. We now apply the method discussed in the previous section to the data introduced in Section 9.3.1. Since we already know that mortality conditions have substantially changed during the last 100 years, we consider birth cohorts as deﬁned in Table 9.3-2.12 To develop the argument we ˆf ˆm consider variables Tc and Tc representing the life length of women and men who belong to a birth cohort indexed by c. Derivable from the variables introduced at the end of Section 9.3.1, available data are given by variables f f f Cc := Pc − Bc m m m and Cc := Pc − Bc

3. The information in Table 9.3-3 suﬃces to calculate death rates. For example, r∗ (20) = 1/100 and r ∗ (80) = 33/507. These rates can then be used to estimate the survivor function τ −1

G∗ (τ ) = gτ ∗ j=τ ∗

(1 − r∗ (j))

which record the ages at which persons belonging to birth cohort c became mothers or fathers, and variables f f f Tc := Dc − Bc m m m and Tc := Dc − Bc f f which record the knowledge about the life length. If Ec (ω) = 1, Tc (ω) = f ˆ Tc (ω) is the known life length of ω; otherwise, the information is censored
12 Compared

Of course, we do not know gτ ∗ , that is, the proportion of women who survived age 14. So we can only estimate a conditional survivor function τ −1

ˆf ˆ f G[Tc |Tc ≥ τ ∗ ] ≈e j=τ ∗
13 Of

(1 − r∗ (j))

with the ﬁgures in Table 9.3-1 the total number of cases is slightly smaller because persons born before 1870 or after 1939 have been omitted.

course, from our data we do not know when women actually gave birth to a ﬁrst child. Whether this has implications for the quality of the estimates will be discussed in a later section.

158

9

PARENT’S LENGTH OF LIFE

9.3

INFERENCES FROM THE GLHS AND SOEP DATA

159

Table 9.3-3 Mortality data for mothers belonging to birth cohort C4 in the merged GLHS and SOEP data set. (a) (b) (c) (d) τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

1

Size of risk set at age τ . Number of deaths at age τ . Number of censored cases at age τ . Values of the conditional survivor function at age τ .
(b) 0 0 0 0 0 1 1 2 0 2 1 3 4 1 3 3 2 5 9 5 6 5 5 9 13 12 2 5 7 5 13 7 10 8 8 7 12 (c) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (d) 1.000 1.000 1.000 1.000 1.000 1.000 0.990 0.985 0.977 0.977 0.973 0.971 0.967 0.962 0.961 0.958 0.956 0.954 0.951 0.945 0.942 0.939 0.936 0.934 0.929 0.923 0.917 0.916 0.914 0.910 0.908 0.902 0.899 0.894 0.891 0.887 0.884 τ 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 (a) 1901 1897 1882 1869 1848 1841 1824 1810 1792 1772 1753 1738 1715 1697 1664 1641 1605 1573 1549 1508 1469 1383 1277 1164 1068 983 816 663 507 389 291 209 140 94 48 9 4 (b) 4 15 13 21 7 17 14 18 20 19 15 23 18 33 23 36 32 24 41 39 54 53 56 53 55 42 57 42 33 20 12 21 12 5 0 0 0 (c) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 53 57 43 30 125 96 114 85 78 70 48 34 41 39 5 4 (d) 0.878 0.876 0.869 0.863 0.854 0.850 0.843 0.836 0.828 0.818 0.810 0.803 0.792 0.784 0.769 0.758 0.741 0.727 0.715 0.697 0.679 0.654 0.629 0.601 0.574 0.544 0.521 0.484 0.454 0.424 0.402 0.386 0.347 0.317 0.300 0.300 0.300

0.5

(a) 1 2 3 18 45 100 183 266 347 435 556 690 781 928 1075 1217 1327 1427 1512 1597 1677 1740 1800 1864 1903 1929 1945 1952 1957 1957 1956 1947 1943 1935 1928 1920 1913

0 0 10 20 30 40 50 60 70 80 90

Fig. 9.3-1 Female survivor function of the German period life table 1901/10 (dotted line) and conditional survivor function from Table 9.3-3.

which is shown in the last column of Table 9.3-3 labeled (d). 4. Since this approach to estimate a conditional survivor function depends on a previous estimation of rates, one should also consider the question whether these rates can be reliably estimated. Formally, one can begin at age τ ∗ which is 15 in our example. However, due to the small number of cases in the risk sets at ages under 20, one might question the reliability of these estimates. In fact, formally following the estimation procedure implies estimated death rates having a value of zero during ages from 15 to 19. But given our knowledge about mortality and life tables from other sources, these estimates will clearly be wrong. Moreover, the reliability of estimates of death rates not only depends on the size of the risk sets but also on the number of death events that can be observed. Therefore, regarding the data in Table 9.3-3, it might be sensible to begin an interpretation of estimated death rates only at some later age, for example, at age 26 or even later. 5. Conditional survivor functions can be represented graphically in two possible ways: The function can be plotted beginning at some age τ with arbitrary value gτ ; or one can try to ﬁnd some estimate of gτ and then plot the conditional survivor function as part of a complete survivor function. In any case one needs to decide where to start the plotting. For our example we begin at age 26 and estimate g26 from the female survivor function of the German period life table for the period 1901–10 (see Table 7.4-3 in Section 7.4.2). Beginning at age 26, we therefore multiply all values of column (d) in Table 9.3-3 with the factor g26 = 0.71463 = 0.736 0.971

160

9

PARENT’S LENGTH OF LIFE

9.3

INFERENCES FROM THE GLHS AND SOEP DATA

161

1930-39

1965

The result is shown in Figure 9.3-1. The dotted line represents the female survivor function from the 1901–10 period life table; the solid line shows the adjusted conditional survivor function from Table 9.3-3. By deﬁnition, values are identical at age 26. The diﬀerent development of both curves reﬂects the reduction of death rates that occurred during the period from about 1930 until the end of the century. So we might use the latest 1986 – 88 period life table for a further comparison. As can be estimated from Table 9.3-3, the death rate at age 80 is about 0.065. A corresponding estimate from the 1986 –88 period life table is 0.066.14 One should note, however, that values of rates calculated from sample data for single years often show high ﬂuctuations and it might be better, therefore, to use mean values for larger age classes. 6. In the same way as has been discussed for women belonging to birth cohort C4 (1900 –1909) one can estimate conditional survivor functions for all birth cohorts distinguished in Table 9.3-2. Results are shown in Figure 9.3-2. To allow for a comparison, all survivor functions are drawn conditional on τ ∗ = 30. The placement onto a historical time axis was done by using the centers of the birth cohort intervals. For example, the value of the conditional survivor function for birth cohort C1 at age 30 is shown in the year 1875 + 30 = 1905. The changing shapes of the survivor functions not only reﬂect a general tendency of decreasing death rates, both for men and women. Also clearly seen are period eﬀects, especially the substantial increases of male death rates during the years of World War II. This seems not to be the case with regard to female death rates. An interpretation should consider, however, that the occurrence of death events might not be independent for mothers and their children, in particular during war time. The death events of mothers might therefore be substantially underrepresented in our data set.

1920-29

1910-19

1900-09

1890-99

1925

1930

1935

1940

1945

1950

1955

1960

1970

1975

1980

1985

1990

9.3.3

Visualization of Death Rates

1920

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 1900

1

1. In order to investigate period eﬀects it is often preferable to directly plot the rates from which (conditional) survivor functions are derived. The only drawback is that rates calculated from small samples are often highly ﬂuctuating. As an example we refer to death rates of men belonging to birth cohort C5 (1910 –1919). The solid line in Figure 9.3-3 shows the death rates as directly calculated from the data, that is, for each year of age, the number of deaths divided by the number of persons in the risk set. There obviously are big ﬂuctuations. One should therefore apply some kind of smoothing procedure to provide a better view of the general shape of the rate function. 2. Many such smoothing procedures have been proposed in the literature.

1880-89

1870-79

Fig. 9.3-2 Conditional survivor functions, beginning at age 30, for men (solid lines) and women (dotted lines) belonging to speciﬁed birth cohorts.

1905

1910

1915

14 Calculated

from the data in Table 7.4-4 in Section 7.4.2.

162
0.06

9

PARENT’S LENGTH OF LIFE

9.3

INFERENCES FROM THE GLHS AND SOEP DATA

163

1900-09

1910-19

0.05 1890-99 0.04 0.03 0.02 0.01 0 20 30 40 50 60 70 Age 80 1870-79

1920-29 0.02 0.01

1880-89

(k) rτ :=

1 2k + 1

τ +k

rj j=τ −k

At both ends of the series only the actually available values are taken into account.15 Choosing k = 2, this procedure was used to calculate values for the dotted line in Figure 9.3-3. It is seen how the smoothing removes the ﬂuctuations but preserves the global shape of the rate function. 3. We now compare the death rates of men belonging to birth cohorts C1, . . . , C6. The rate functions are shown in Figure 9.3-4 and placed onto a historical time axis. To support visibility, the rate functions are smoothed with the procedure just described (again, k = 2). Compared with the survivor functions shown in Figure 9.3-2, the rate functions provide a much better view of the impact of World War II.

0.1

0.09

0.08

0.07

0.06

0.05

0.04

rτ

(k)

:=

1 min{τn , τ + k} − max{τ1 , τ − k} + 1

min{τn ,τ +k} j=max{τ1 ,τ −k}

X

rj

where τ1 and τn refer, respectively, to the ﬁrst and last element of the series.

Fig. 9.3-4 Smoothed death rates of men belonging to the indicated birth cohorts. (Moving averages with k = 2.)

0.03

15 The

complete formula may then be written as follows:

0 1910

1915

1920

1925

1930

1935

1940

1945

In the present context, smoothing will only serve to visualize rate functions. It might therefore suﬃce to simply use moving averages. Given a series of values rτ , for τ = τ1 , . . . , τn , each value is then substituted by a mean of neighboring values. If the number of neighbors is denoted by k, the smoothed values are calculated as

1950

1955

Fig. 9.3-3 Raw values (solid line) and smoothed values (dotted line) of death rates of men belonging to birth cohort C5 (1910 – 1919).

1960

1965

1970

1975

1980

1985

Chapter 10

Chapter 11

Parametric Mortality Curves
This chapter is not ﬁnished yet.

Period and Cohort Birth Rates
We now leave the topic of mortality and turn to the complementary one: the birth of children. In this chapter, we begin with the standard approach that records the development of births in terms of rates. We then turn to a life course perspective which suggests to view birth events in the context of women’s life courses.

11.1

Birth Rates

1. Demographers have invented a lot of measures to statistically record the fertility of a population.1 An elementary measure parallels the crude mortality rate and is called crude birth rate [allgemeine Geburtenziﬀer].2 It is deﬁned as bt (multiplied by 1000) Crude birth rate := nt The numerator records the total number of births that occurred during the year t, and the denominator refers to the midyear population size in the same year. To calculate crude birth rates one can use the data from Tables 6.2-2 and 6.3-1 in Chapter 6. For example, referring to the territory of the former FRG, the crude birth rate in 1950 is 1000 · 812.8/49989 = 16.26. Figure 11.1-1 compares the development, until 1999, in the territories of the former FRG and the former GDR. The impression is that developments were quite similar until about 1973. Then, in the western part of Germany, the crude birth rate stabilized around a value of 10, while in the eastern part a temporary increase in fertility was ended by a sharp decline that began, roughly, at the time of the German uniﬁcation. 2. Like crude mortality rates, crude birth rates neglect the age and sex composition of a population. Demographers therefore often calculate a general birth rate [allgemeine Geburtenrate3 ], also called a general fertility
1 We mention that in the German demographic literature, and in publications of statistical oﬃces, the literal translation of ‘fertility’ [‘Fruchtbarkeit’] is considered obsolete; instead, one refers to birth events [Geburten] or newborn children [Geborene]. One should also notice that the terms ‘fertility’ and ‘fecundity’ are used somewhat diﬀerently in the literature. English texts most often use the term ‘fertility’ to refer to realized births, and the term ‘fecundity’ to refer to women’s ability to bear children (see, e.g., Pressat 1972, p. 172, and Newell 1988, p. 35); some other authors use these words in an opposite meaning (see, e.g., Mueller 1993, p. 154). 2 One 3 Also

also often ﬁnds the term ‘crude fertility rate’.

called ‘allgemeine Fruchtbarkeitsziﬀer’ in the older German literature, see, e.g., Statistisches Bundesamt 1985, p. 18.

166
20

11

PERIOD AND COHORT BIRTH RATES

11.1

BIRTH RATES

167

150 15 100 10 50

5

0 1950

1960

1970

1980

1990

2000

0 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Fig. 11.1-1 Crude birth rates in the territory of the former FRG (solid line) and the territory of the former GDR (dotted line); calculated from Tables 6.2-2 and 6.3-1 in Chapter 6.
60 50 40 30 20 10 0 1950

Fig. 11.1-3 Development of the general birth rate in Germany since 1871. Data for the post-World War II period refer to the territory of the former FRG. Available data for the period before 1939 are indicated by dots. Source: Statistisches Bundesamt, Bev¨lkerung und Wirtschaft o 1872 – 1972 (p. 109), and Fachserie 1, Reihe 1.

in publications of demographic data one also ﬁnds the periods 15 – 44, or 15 – 49, etc. We will use τa to denote the beginning and τb to denote the end of the reproductive period. As shown by the deﬁnition, the diﬀerence between a crude and a general birth rate depends on the sex ratio and the proportion of women in childbearing ages. How these proportions have changed over the years in the territory of the former FRG is shown in Figure 11.1-2. They obviously cannot explain the big changes that are visible in Figure 11.1-1. 3. In order to get an impression of long-term changes in childbearing both crude and general birth rates can be used. The long-term development of crude birth rates has been shown in Figure 6.3-2 in Section 6.3. A similar plot based on data on the general birth rate is shown in Figure 11.1-3. Both ﬁgures show that a long-term trend of declining fertility began in Germany roughly at the end of the nineteenth century. 4. A further concept is the age-speciﬁc birth rate [altersspeziﬁsche Geburtenziﬀer] which refers to women of a speciﬁc age. We will use the following deﬁnition: βt,τ := bt,τ nf t,τ

1960

1970

1980

1990

2000

Fig. 11.1-2 Proportion (in percent) of women in the midyear population (dotted line) and of women aged 15 to 45 in all women (solid line). Calculated from data in Segment 36 of the STATIS data base of the Statistisches Bundesamt.

rate, in which the number of births is related only to the number of women in childbearing ages. We will use the notation General birth rate :=
∗

bt nt f∗ (multiplied by 1000)

where the index, f , refers to women in the reproductive period , often assumed to be 15 to 45 years of age. However, there is no general agreement;

The denominator refers to the midyear number of women in year t aged τ (in completed years), and the numerator refers to the number of children born of these women during the year t. Notice that in publications from

168

11

PERIOD AND COHORT BIRTH RATES

11.1

BIRTH RATES

169

Table 11.1-1 Number of children born in Germany 1999 (b1999,τ ) and number f+ of women (nf 1999,τ and n1999,τ ) classiﬁed according to women’s age (τ ); also ˜ ˜ shown are age-speciﬁc birth rates (β1999,τ and (β1999,τ ). Source: Values of b1999,τ : Statistisches Jahrbuch 2001 (p. 71) and Segment 2070 in the STATIS f+ data base; nf 1999,τ : Fachserie 1, Reihe 1, 1999 (pp. 64-65); values of n1999,τ : unpublished material provided by the Statistisches Bundesamt. τ ≤14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 ≥51 b1999,τ 80 341 1234 3085 6332 11158 15558 19693 24009 27326 30436 35493 39850 45348 52632 56566 60007 60093 56767 50623 43428 36185 28680 21055 15398 11165 7540 4627 2963 1619 789 342 163 58 48 25 12 16 nf 1999,τ 438.4 446.7 453.2 457.2 451.3 441.6 441.7 442.0 436.7 438.6 449.3 477.1 528.2 568.8 604.4 642.6 668.1 686.8 697.5 705.9 711.7 700.4 687.5 675.1 656.2 630.1 608.9 597.8 584.7 575.7 566.9 561.7 558.4 556.0 548.5 517.2 ˜ β1999,τ 0.78 2.76 6.81 13.85 24.72 35.23 44.58 54.32 62.57 69.39 79.00 83.53 85.85 92.53 93.59 93.38 89.95 82.65 72.58 61.52 50.84 40.95 30.63 22.81 17.01 11.97 7.60 4.96 2.77 1.37 0.60 0.29 0.10 0.09 0.05 0.02 nf + 1999,τ 436.782 441.006 452.610 454.730 460.706 442.599 440.781 443.065 440.361 432.779 444.718 454.341 500.610 555.333 582.220 626.937 657.849 677.296 696.136 699.210 713.016 710.250 690.981 684.141 666.236 646.050 614.752 603.257 592.163 577.973 573.468 560.591 563.369 553.593 558.612 ˜+ β1999,τ 0.78 2.80 6.82 13.92 24.22 35.15 44.68 54.19 62.05 70.33 79.81 87.71 90.59 94.78 97.16 95.71 91.35 83.81 72.72 62.11 50.75 40.38 30.47 22.51 16.76 11.67 7.53 4.91 2.73 1.37 0.60 0.29 0.10 0.09 0.04

100 80 60 40 20 0 10 20 30 40 50

Fig. 11.1-4 Age-speciﬁc birth rates in Germany, 1999, restricted to ages ˜ in the range from 15 to 49 years. Data are taken from columns β1999,τ ˜+ (solid line) and β1999,τ (dotted line) in Table 11.1-1.

birth year of the child. This leads to slightly diﬀerent birth rates as is also shown in Table 11.1-1: nf + 1999,τ is the number of women who are born in the 4 year 1999 − τ . The diﬀerences are illustrated in Figure 11.1-4. However, both curves clearly show how birth rates depend on women’s age. 5. The general birth rate can then be viewed as a weighted mean of agespeciﬁc birth rates. We mention that demographers also calculate an unweighted mean value which is called total birth rate [zusammengefasste Geburtenziﬀer].5 The deﬁnition is τb Total birth rate := τ =τa

βt,τ

(multiplied by 1000)

oﬃcial statistics age-speciﬁc birth rates are often multiplied by 1000, we ˜ will then use the notation βt,τ := βt,τ 1000. Table 11.1-1 illustrates the calculation of these rates for the year 1999. We also mention that the Statistisches Bundesamt uses a slightly diﬀerent deﬁnition and calculates women’s age as the diﬀerence between the birth year of the women and the

where the range of summation depends on assumptions about the childbearing ages of women. For example, the calculation of total birth rates in Fachserie 1, Reihe 1 (1999, p. 50) is based on an age range from 15 to 49 years. The value for 1999, calculated for the territory of the former FRG, is 1405.8. However, while formally a mean value, this ﬁgure does not relate to any well-deﬁned population and is therefore diﬃcult to interprete. It is not possible, for example, to infer that the mean number of children per women (which women?) is 1.4. However, the total birth rate can also be viewed as a standardized version of the general birth rate and, are grateful to Hans-Peter Bosse who made available these ﬁgures which are not normally published by the Statistisches Bundesamt.
5 One 4 We

also often ﬁnds the term ‘total fertility rate’.

170

11

PERIOD AND COHORT BIRTH RATES

11.2

A LIFE COURSE PERSPECTIVE

171

with this understanding, used as a measure for the comparison of birth frequencies in a sequence of calendar years. This will be illustrated in a later section where we compare total birth rates with a similar measure relating to cohorts.

¯ t0 + τb the variable Bt0 gets an empirically deﬁnite meaning. Nevertheless, ¯ with this reservation, one can use Bt0 to deﬁne ¯ Bt0 (ω) f ω∈Ct0

f | C t0 |

11.2

A Life Course Perspective

1. In order to develop a conceptual framework for recording birth events it seems sensible to refer to the life courses of women who might, or might not, give birth to children. Beginning at some age, it becomes possible for most women to bear children; but whether they will do so is basically contingent on their life courses. With the general availability of contraceptive means, women can also inﬂuence the occurrence of birth events. Therefore, whether, and when, a women will give birth to children is always a personal decision. A statistical approach cannot claim to reconstruct such individual histories in any serious sense. Nevertheless, also from a statistical point of view, one can try to relate birth events to women’s life courses and their social conditions. 2. This is most often done by using a cohort approach. In the present context this means that we begin with a reference to birth cohorts. Using f previously introduced notation, we will denote by Ct0 a set of women all f born in the same year, t0 . Life courses of the members of Ct0 are then parallel on a calendar time axis as shown in the following graphic. ﬁrst child? 0 τa more children? τb

This might be called a cohort birth rate: the denominator records the f number of women in Ct0 , and the numerator refers to the total number of children born of these women. 4. The cohort birth rate obviously does not provide any information about ages of childbearing but globally refers to the total number of children born during the reproductive period. To incorporate age information one might deﬁne variables f Bt0 ,τ : Ct0 −→ {0, 1, 2, 3, . . .} f recording the number of children born of members of Ct0 at the age of τ . Values of these variables can be cumulated: τ

¯ Bt0 ,τ (ω) := j=τa Bt0 ,j (ω)

¯ ¯ In particular, one ﬁnds the simple relationship Bt0 (ω) = Bt0 ,τb (ω). 5. It is also helpful to introduce age-speciﬁc cohort birth rates. We will use the following deﬁnition: γt0 ,τ := f ω∈Ct0 ,τ

E E

age historical time

Bt0 ,τ (ω)

f | Ct0 ,τ |

t0

f All members of Ct0 begin their life course in the same year, t0 , at age τ = 0, and they can be compared with respect to their childbearing histories.

f The denominator refers to the number of members of Ct0 who survived age τ − 1 and therefore might give birth to children at age τ . The numerator f refers to the number of children born of members of Ct0 ,τ at age τ .

3. How can one record birth events of the members of statistical variables? We can begin by deﬁning a variable f ¯ Bt0 : Ct0 −→ {0, 1, 2, 3, . . .}

f C t0

in terms of

6. The rates γt0 ,τ can be used to investigate the distribution of births during the reproductive period of women belonging to the same birth cohort. As will be illustrated later, this can be done by plotting values of γt0 ,τ against τ . Alternatively, one can plot cumulated cohort birth rates τ which simply counts the number of children, possibly zero, born of memf f ¯ bers of Ct0 . For each women ω ∈ Ct0 , Bt0 (ω) is the number of children born of ω. Of course, this number can only be known at the end of the ref productive period of the women in Ct0 , that is, when they have reached an age τ > τb . In a temporal view, this means that only at the end of the year

γt0 ,τ := ¯ j=τa γt0 ,j

However, γt0 ,τb should not be confused with the mean number of children ¯ f born of members of Ct0 until the end of the reproductive period. In order to relate age-speciﬁc cohort birth rates to the total number of children born

172

11

PERIOD AND COHORT BIRTH RATES

11.3

CHILDBEARING AND MARRIAGE

173

of members of a birth cohort, one has to take into account the women who died before the end of the reproductive period. The total number of f children born of members of Ct0 is τb f γt0 ,τ |Ct0 ,τ | τ =τa f Dividing this quantity by the number of women belonging to Ct0 would provide the mean number of children per women. To see explicitly the dependence on women’s age-speciﬁc death rates, one can use the relationship f |Ct0 ,τ | f |Ct0 | τ −1

example:7
“Marriage and divorce have been of long concern in population studies because of their recognized relationship to population composition, on the one hand, and to fertility, on the other. Next to age and sex, no characteristic is more basic to a population than its composition by marital status: its absolute and relative numbers of single, married, widowed, and divorced persons of each sex and at each age. Although children may be born outside of marriage, in every society childbearing is intimately associated with marriage and generally is viewed both as the object and as a more or less immediate consequence of marriage and conjugal relations.” (Matras 1973, p. 258)

= j=0 (1 −

f ηt0 ,j )

derived in Section (8.1). Using this relationship, one ﬁnds Mean number of children per women = τb γt0 ,τ τ =τa

f |Ct0 ,τ | f |Ct0 |

τb

τ −1

= τ =τa

γt0 ,τ j=0 f (1 − ηt0 ,j )

This discussion will be continued in Section 18.1 where we deal with reproduction rates. Here we only mention that, although cumulated cohort birth rates do not allow inferences about the mean number of children born of women belonging to the same birth cohort, they can be used as some measure of “cohort fertility”. In particular, one can use γt0 ,τb , commonly ¯ called a completed cohort birth rate. This rate would equal the cohort birth rate if all women survived the end of the reproductive period.

However, for several reasons we shall not adopt this view. The most important one is that a substantial proportion of women who give birth to children is not married. Table 11.3-1 provides ﬁgures that show the number of non-marital births (per 1000) in Germany since 1872. Figure 11.3-1 provides a graphical illustration. It is seen that until about 1940 the proportion of non-marital birth was already about 10 percent.8 Then, after an initial decline after World War II, beginning in the mid-sixties, the proportion is continually rising. This trend is particularly strong in the territory of the former GDR where the proportion of non-marital births has reached almost 50 percent.9 On the other hand, there are also married women who, for whatever reasons, remain childless. In short, being married is neither a necessary nor a suﬃcient condition for childbearing, nor is there any kind of causal relationship. 3. This is not to deny that living arrangements may play an important role in women’s decisions to give birth to children. But living conditions and marriage are diﬀerent concepts. This is often obscured by an unclear usage of terms. To cite Matras again:
“Basically, a family consists of an adult male and female living in a common residence, maintaining a socially approved sexual relationship, and sharing the residence with their oﬀspring and sometimes with other persons united with them in some biologically based relationship. Marriage is the establishment of this residence and socially approved sexual relationship between the adult male and female.” (Matras 1973, p. 260)

11.3

Childbearing and Marriage

1. Due to an unfortunate focus on marital births, oﬃcial birth statistics are inadequate when dealing with questions of parity and number and timing of births. The problem is aggravated by the fact that generally not even divorces are taken into account. Counting of children starts anew with every marriage, disregarding all previous births.6 2. The confounding of childbearing and marriage behavior has a long tradition in demography. Many demographers assume that a statistical analysis of marriages and divorces should be considered an essential part of demography. The following quotation from a textbook can serve as an
6 The latter defect has been avoided in a 10 % subsample of the 1970 census where women were asked to report the birth dates of all their marital children, regardless of their current marital status. These data will be discussed in Chapter 12.2.

Not only does this deﬁnition of ‘family’ ignore the widely diﬀerent forms of household types which have emerged in human history. More important an example from the German literature see Bolte, Kappe and Schmid (1980, pp. 13 -14).
8 Actually, at least in some parts of Germany, percentages of non-marital births were even higher in earlier periods. For example, Lindner (1900, p. 217) reports about 20 % non-marital births during the period 1825 –1868 for the K¨nigreich Bayern. For an ino terpretation of some of the changes that occurred during the 19th century see Kottmann (1987). 9 For 7 As

a discussion see Huinink (1998).

174

11

PERIOD AND COHORT BIRTH RATES

11.4

BIRTH RATES IN A COHORT VIEW

175

Table 11.3-1 Proportion of non-marital births (per 1000 births) in Germany and the territories of the former FRG and GDR. Sources: Statistisches Bundesamt, Bev¨lkerung und Wirtschaft 1872 –1972 (pp. 107-108) for the peo riod 1872 –1938; Fachserie 1, Reihe 1, 1999 (pp. 50-51) for the period 1946 –1999.
Territory of the former Germany (Reichsgebiet) Year Year 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 87.8 91.3 85.7 85.6 85.8 85.7 87.6 89.7 91.9 91.3 94.2 93.6 93.8 93.4 90.5 92.7 89.8 92.7 91.3 90.3 88.8 86.3 84.8 83.9 82.4 83.1 84.3 84.1 86.0 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 87.7 89.2 89.6 90.8 94.4 96.0 96.9 110.7 109.5 114.1 129.6 110.3 112.2 105.6 106.3 103.1 104.1 118.2 123.7 122.8 122.1 120.7 120.0 117.5 116.3 106.7 85.3 77.7 77.0 76.6 76.0 FRG Year 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 163.8 118.5 102.3 93.1 97.3 96.4 90.3 86.7 84.2 78.6 74.7 71.9 68.5 66.9 63.3 59.5 55.6 52.3 49.9 46.9 45.6 46.1 47.6 50.4 54.6 58.1 60.5 192.5 151.1 126.9 118.9 127.9 131.5 130.0 130.3 132.5 130.0 131.9 131.8 123.7 120.1 116.0 111.3 100.8 93.4 94.2 98.1 99.9 107.0 114.9 124.1 133.0 151.2 162.0 GDR Year 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 62.7 62.7 61.2 63.5 64.7 69.6 71.3 75.6 79.0 84.9 88.3 90.7 94.0 95.5 97.1 100.3 102.2 104.9 111.1 115.9 118.7 124.3 128.9 136.8 142.7 159.2 176.7 156.4 162.9 161.4 162.1 157.7 173.4 195.9 228.4 255.8 292.9 320.4 335.5 338.1 344.3 328.0 334.4 336.4 349.9 417.2 418.2 410.9 414.4 417.7 423.9 441.0 471.5 499.4 FRG GDR

50 40 30 20 10 0 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Fig. 11.3-1 Proportion of non-marital births (in percent) in Germany and in the territory of the former FRG (solid line) and in the territory of the former GDR (dotted line); calculated from the data in Table 11.3-1.

the term ‘marriage’ should therefore be considered, not as a sociological category, but as belonging to the realm of administrative regulations. Of course, this does not preclude a sociological analysis of practices of marriage and divorce. 4. A further argument can illustrate the diﬀerence. While it seems plausible that women’s decisions to give birth to children depend on their actual and expected living arrangements, this can most often not be said of marriages. Women do not bear children because they are married; but they might want to become married because they want a legally secured framework for their children.

11.4

Birth Rates in a Cohort View

for our present argument is Matras’s unspeciﬁc use of the term ‘marriage’. Given his deﬁnition, a marriage takes place when two people, of opposite sex, decide to start a common household (residence). However, this obscures the fact that, in modern societies, ‘marriage’ does not refer to some kind of household formation, but is a juridical term that gets its meaning from laws and a corresponding juridical practice. In fact, two people cannot simply decide to become married but need the approval of an oﬃcial institution. In particular, only then they will be counted as being married in oﬃcial statistics. In contrast to terms that refer to living conditions,

f 1. In order to record age-speciﬁc birth rates of a cohort Ct0 it would be necessary to follow the cohort members from birth until the end of the reproductive period. Diﬃculties are the same as in the construction of cohort life tables (see Chapter 8). Mainly three surrogate methods seem possible:

a) One can approximate age-speciﬁc cohort birth rates with age-speciﬁc period data; b) one can use data from retrospective surveys, which implies that one has to ignore cohort mortality; and c) one can use data from panel studies which allow to follow the members of a birth cohort for a sequence of years during their life courses.

176

11

PERIOD AND COHORT BIRTH RATES

11.4

BIRTH RATES IN A COHORT VIEW

177

Table 11.4-1 Age-speciﬁc birth rates of women belonging to birth cohorts 1930, . . . , 1970. Source: Fachserie 1, Reihe 1, 1999 (pp. 198-200).
Birth year Age 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 1930 0.3 2.1 10.0 28.9 52.7 74.6 96.6 114.2 125.3 134.9 139.4 145.9 149.1 141.8 136.5 123.9 113.6 98.9 89.5 78.7 65.6 56.4 45.0 36.1 27.6 19.7 14.3 8.5 5.1 2.7 1.3 0.6 0.3 0.1 0.1 1935 0.2 2.2 9.8 26.8 52.2 77.3 104.2 130.1 145.8 161.6 167.5 170.0 161.7 155.1 143.2 127.6 112.6 95.6 78.7 65.3 50.6 40.4 29.8 21.2 15.5 10.7 7.3 4.4 2.6 1.3 0.8 0.4 0.2 0.1 0.0 1940 0.4 2.3 10.7 28.0 56.9 85.9 120.0 143.3 163.3 173.2 171.7 169.0 156.0 138.0 116.9 94.1 78.2 61.0 46.8 38.8 30.5 24.2 18.4 13.5 10.2 7.5 5.2 3.3 1.9 1.0 0.6 0.3 0.2 0.1 0.1 1945 0.8 5.0 18.9 46.6 82.4 113.1 141.0 159.8 155.9 138.6 125.3 118.9 102.5 88.5 80.9 72.8 63.3 53.1 45.1 37.6 32.6 26.0 19.9 14.6 10.6 7.6 5.2 3.7 2.2 1.3 0.8 0.3 0.2 0.1 0.0 1950 0.9 5.5 21.8 53.8 90.5 109.8 115.5 109.9 105.9 110.3 110.3 110.9 105.0 98.0 91.3 85.8 74.8 63.3 50.8 41.5 35.1 29.0 23.3 18.4 12.9 10.2 6.9 4.3 2.6 1.4 0.7 0.3 0.2 0.1 0.1 1955 1.2 7.8 26.8 43.7 58.6 67.1 78.9 86.1 93.6 99.5 111.1 112.9 110.0 101.2 93.5 86.4 81.7 72.7 63.6 52.6 45.8 35.6 27.5 20.4 15.1 10.6 7.4 5.0 2.8 1.5 1960 1.0 5.0 13.8 26.0 40.1 55.9 67.1 77.3 83.5 89.2 97.4 109.0 112.8 114.7 108.0 104.1 91.8 80.4 68.5 56.5 47.7 40.3 33.1 24.9 18.8 1965 0.7 3.1 8.1 14.4 23.6 32.4 43.0 55.1 68.1 79.6 94.9 101.2 104.3 107.4 103.5 99.7 97.1 91.3 78.7 68.1 1970 0.6 2.2 6.5 14.2 25.7 37.7 47.8 55.8 61.9 67.6 75.0 86.9 95.7 96.8 99.3

200 1935 1940 150 1930 1945 1950 100 1955 1960

50

0 1940

1950

1960

1970

1980

1990

2000

Fig. 11.4-1 Age-speciﬁc birth rates of quasi-cohorts 1930, . . . , 1960. Data are taken from Table 11.4-1.

Of course, in reality migration takes place, and the approach therefore essentially consists in the construction of age-speciﬁc birth rates for quasicohorts. 3. Age-speciﬁc birth rates for the territory of the former FRG have been published by the Statistisches Bundesamt in Fachserie 1, Reihe 1, 1999 (pp. 198-200) for cohorts beginning in the birth year 1930 and for ages 15 to 49. For a selection of birth cohorts the data are shown in Table 11.4-1. Figure 11.4-1 shows a plot of these age-speciﬁc quasi-cohort birth rates on a calendar time axis. It can be seen how the births of women of successive quasi-cohorts changed through historical time, both in shape and size. 4. Next, we consider the cumulated birth rates τ γt0 ,τ := ¯∗ j=15 βt0 +j,j

In the present section we discuss the ﬁrst approach. 2. The basic idea is quite simple. If there were no in- and out-migration, one could identify age-speciﬁc cohort birth rates and period birth rates in the following way: γt0 ,τ = βt0 +τ,τ It would be possible, then, to reconstruct the age-speciﬁc birth rates of a f cohort Ct0 from the sequence of period birth rates βt0 +τa ,τa , . . . , βt0 +τb ,τb

which are helpful to compare distributions of birth rates during the reproductive period as shown in Figure 11.4-2. The plot exhibits substantial changes in the timing of births. Using the 1930 birth cohort as an arbitrary reference, the plot suggests that the mean age of childbearing declined until birth cohorts born between 1945 and 1950, and then began to increase. This is also seen in Figure 11.4-3 showing a level plot of the cumulated rates in an age-period diagram. The mapping is (t, τ ) −→ γt−τ,τ ¯∗ where t refers to calendar years. Accordingly, each diagonal line refers to a 1-year birth cohort, with birth years ranging from 1930 to 1978. The

178

11

PERIOD AND COHORT BIRTH RATES

11.4

BIRTH RATES IN A COHORT VIEW

179
2000

1500

1000

1930 2000 1940 1945 1950 1955 1960

1750

500

1935

1500

2000

1000

500 1970 0 15 20 25 30 35 40 45 50 1960 50 45 40 35 30 25 20

Fig. 11.4-2 Plot of cumulated age-speciﬁc birth rates for birth cohorts 1930, . . . , 1960, based on the data in Table 11.4-1.

contour lines connect cumulated birth rates (per 1000 of women) having approximately the same value. The maximal value of 2240 is reached by women belonging to birth cohort 1934 at age 49. 5. Figure 11.4-2 also suggests that completed cohort birth rates declined, beginning with birth years around 1935. This is also seen from the dotted line in Figure 11.4-4 which shows γt0 ,40 for t0 = 1930, . . . , 1959.10 The ﬁg¯∗ ure also shows the development of total birth rates that, for comparability,
15

10 Age

40 was chosen to allow the calculation of cumulated birth rates until birth cohort 1959, given that data are only available until calendar year 1999.

Fig. 11.4-3 Level plot of cumulated age-speciﬁc quasi-cohort birth rates in the period 1945 – 99, based on data from Fachserie 1, Reihe 1, 1999 (pp. 198-200).

1945

1950

1955

1965

Age

1975

1980

1985

1990

1995

180

11

PERIOD AND COHORT BIRTH RATES

3000 2500 2000 1500 1000 500 0 1950

1960

1970

1980

1990
Cohort birth years

Chapter 12

Retrospective Surveys
Even though cohort birth rates are quite informative, they do not allow to recover (a) the distribution of ages at ﬁrst childbearing, (b) the proportion of women who remain childless, and (c) the distribution of the number of births. As was mentioned in the previous chapter, due to an unfortunate focus on marital births, oﬃcial statistics in Germany provides only limited information on these quantities. Most investigations are therefore based on retrospective surveys in which women are asked to report about the birth dates of their children. In the present chapter we brieﬂy discuss the conceptual framework and then use data from the 1970 census. Additional data from non-oﬃcial surveys will be considered in Chapter 14.

Calendar years

1960

1970

1980

1990

2000

Fig. 11.4-4 Comparison of total birth rates (solid line) and cumulated quasi-cohort birth rates (dotted line), both calculated for ages 15 – 40.

12.1

Introduction and Notations

have been calculated from age-speciﬁc birth rates as
40

βt,τ τ =15

for t = 1950, . . . , 1999. Since completed cohort birth rates do not refer to single calendar years, both time series cannot be compared directly. Nevertheless, the ﬁgure clearly suggests that variability in completed cohort birth rates is much smaller than in total birth rates. This is explainable by the fact that total birth rates also depend on the timing of births. Women might give birth to more children in one year and to less children in another year without necessarily aﬀecting the completed cohort birth rates. This idea will be taken up in Chapter 13.3 where we show that a substantial part of the “baby boom” that occurred in the period 1955 – 65 can be attributed to “timing eﬀects”.

1. To focus the discussion, we consider the question whether, and at which age, women give birth to a ﬁrst child. To allow for an investigation of changes among successive cohorts, our conceptual framework refers to f birth cohorts. Using Ct0 to denote the birth cohort of women born in ˆ the year t0 we might begin with a duration variable, Tt0 , that records the f age when members of Ct0 get their ﬁrst child (the corresponding property ˜ space, T := {0, 1, 2, . . .}, being understood as a representation of ages in completed years). Obviously, there are two complications. First, not all women will give birth to a child, and in such cases there is also no duration until the birth of a ﬁrst child. Secondly, some women will die before the end of the reproductive period.1 It is therefore necessary to introduce a second variable that records which of these possibilities actually takes ˆ place. This second variable will be denoted by Dt0 and deﬁned as follows: ˆ Dt0 (ω) := 1 0 f if ω ∈ Ct0 has given birth to at least one child

otherwise

We are concerned, then, with a two-dimensional variable f ˆ ˆ ˜ ˜ (Tt0 , Dt0 ) : Ct0 −→ T × D

ˆ where the meaning of the duration variable, Tt0 , depends on the value ˆ ˆ ˆ of Dt0 . If Dt0 (ω) = 1 then Tt0 (ω) ≤ τb records the age at which ω has
1 We use τ and τ to denote, respectively, the beginning and end of the reproductive a b period of women.

182

12

RETROSPECTIVE SURVEYS

12.1

INTRODUCTION AND NOTATIONS

183

ˆ ˆ given birth to a ﬁrst child.2 If, on the other hand, Dt0 (ω) = 0, Tt0 (ω) will represent the age at which ω dies, or τb , whichever occurs ﬁrst. ˆ ˆ 2. In order to recover the distribution of (Tt0 , Dt0 ), it is necessary to folf low the members of Ct0 at least until the end of the reproductive period. However, if only for reasons of practicability, the standard approach is to perform a retrospective survey. The following graphic provides an illustration:

the recorded histories will be more or less incomplete. 3. An immediate implication of the censoring problem is that the amount of information that can be gathered with a retrospective survey depends on the birth year of the interviewed persons. If one selects, for the survey, f only persons belonging to the same birth cohort, say Ct0 , then also the lifespan until the interview date is approximately the same for all interviewed persons. But often an interest concerns diﬀerences in the life courses of persons who belong to diﬀerent birth cohorts. For example, we might want to compare childbearing histories of women born in the years 1950, 1960, and 1970, and the interviews are performed in the year 2000. The childbearing histories of the women born in 1950 will then be complete, but for the younger birth cohorts they will be censored at an age of 30 or 40, respectively. 4. We ﬁnally need to relate the information which can be gained by a retrospective survey to the conceptual framework introduced at the beginning. f We therefore think of a survey in which members of Ct0 , who survived ∗ the interview date t , are asked whether they already gave birth to a ﬁrst child and, given this was the case, at which age the birth event occurred. The data can be represented by a two-dimensional variable denoted by ˜ ˜ (Tt0 , Dt0 ). The property spaces are again T and D, respectively, but the ˆ ˆ meaning of the variables is diﬀerent from (Tt0 , Dt0 ). Dt0 now records whether a women has given birth to a ﬁrst child until the interview date. The relation is therefore as follows: a) If Dt0 (ω) = 1, ω has born a child before the interview date, and in this case Tt0 (ω) records the age of the women in the year of her ﬁrst birth. ˆ ˆ So one can conclude that Dt0 (ω) = 1 and Tt0 (ω) = Tt0 (ω). b) If, on the other hand, there was no ﬁrst birth until the interview date, then Dt0 (ω) = 0 and Tt0 (ω) records the age of the women at the ˆ interview date. Given the deﬁnition of Tt0 , one can conclude that ˆt0 (ω) ≥ Tt0 (ω) but the conclusion about Dt0 depends on the women’s ˆ T ˆ age. If Tt0 (ω) > τb , one can conclude that Dt0 (ω) = 0; but otherwise ˆ t0 (ω) can be drawn. no deﬁnite conclusion about the value of D Consequently, data from a retrospective survey in which not all interviewed women have already reached the end of the reproductive period, are necessarily to some extent incomplete; and so the question arises how to use ˆ ˆ the data for an assessment of the distribution of (Tt0 , Dt0 ). 5. In any case, the available data only allow inferences for those members f of Ct0 who survived the interview date t∗ , or, equivalently, who survived age τ ∗ := t∗ −t0 . Using notation introduced in Section 3.4 (see also Section f 8.1), this is the subset Ct0 ,τ ∗ . One therefore can only consider a variable f ˆ ˆ ˜ ˜ (Tt0 ,τ ∗ , Dt0 ,τ ∗ ) : Ct0 ,τ ∗ −→ T × D

ω2 ω1 t0

E

ppppppppp t∗ E

historical time

At some date in historical time, t∗ , which will be called the interview date, people are asked about their previous life courses. Of course, this can only be done with persons born before the interview date. There are, however, two further implications. a) One can only interview people still alive at the interview date. In the picture, one might ask ω1 , but not ω2 who died before t∗ . So it is normally not possible, with a retrospective survey, to get complete information about all members of a birth cohort.3 Whether this is a serious problem depends on the purpose of the survey. It might be a serious problem if one intends to interview people at very old ages. On the other hand, assuming a historical situation in which only few women die during the reproductive period, it might well be possible to ignore the problem of mortality when performing a survey to record information about childbearing histories. b) A second implication concerns the fact that information about life histories is always right censored at the date of the interview. In the above picture this is shown by the person called ω1 . This person is still alive at the interview date and therefore can report about his or her life course until t∗ ,4 but cannot report about what might happen in the future. If, for example, the end of the reproductive period of the f members of Ct0 has been reached before the interview date, it is possible to get complete records of the childbearing histories; but otherwise
2 Of course, also twins, or triplets, might be born. For the moment we ignore this possibility and simply speak of a ﬁrst child.

indicate this fact one might speak of retrospective cohorts. Contrary to proper birth cohorts they are deﬁned by conditioning on survival until the interview date and living in the region where the survey is conducted.
4 This, of course, also depends on memory. It is well possible that details of a life course have been forgotten or become confused after some while.

3 To

184

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

185

f that is restricted to the members of Ct0 ,τ ∗ . For these members, the values are identical:

ˆ ˆ Tt0 ,τ ∗ (ω) = Tt0 (ω)

ˆ ˆ and Dt0 ,τ ∗ (ω) = Dt0 (ω)

Moreover, it is also evident that the available data do not allow inferences for periods beyond t∗ . Therefore, one can only consider the distribution ˆ ˆ ˆ of (Tt0 ,τ ∗ , Dt0 ,τ ∗ ) conditional on Tt0 ,τ ∗ ≤ τ ∗ . However, this implies the formal identity ˆ ˆ ˆ P[Tt0 ,τ ∗ , Dt0 ,τ ∗ | Tt0 ,τ ∗ ≤ τ ∗ ] = P[Tt0 , Dt0 | Tt0 ≤ τ ∗ ] f for all ω ∈ Ct0 ,τ ∗ . One therefore does not need any speciﬁc estimation procedure but can directly use the observed values of Tt0 and Dt0 .

were asked for dates of marriage and birth dates of all their marital children, regardless of their current marital status. Some results from these additional questions were published, albeit in highly aggregated form, by the Statistisches Bundesamt in Fachserie A.6 Fortunately, some years ago, oﬃcial statistics in Germany agreed to make available, for scientiﬁc research, anonymised subsamples of many main surveys, including the 1970 census.7 2. The data set to be used in the present chapter consists of a 10 % subsample of the 10 % part of the 1970 census.8 So it is a 1 % subsample of all women who lived in May 1970 in the territory of the former FRG and had a German citizenship. The number of cases is 314993; if multiplied by 100, this is roughly the number of women, with a German citizenship, who lived in the former FRG in May 1970. 3. For each person in our subsample we have the following information: (a) the birth year, and (b) the births years of all (up to 12) marital children. So we are able to reconstruct marital childbearing histories. The limitation is, of course, that we have no information about non-marital births. As shown in Table 11.3-1 in Section 11.3, for the period until about 1970 this amounts to about 10 % of all births. Actually, however, the birth coverage of the sample is somewhat higher than 90 % because a substantial portion of non-marital births has been “legitimized” by a following marriage. To provide an example, the total number of births during the year 1969 in the territory of the former FRG was 903456. In 852783 cases the mother had a German citizenship, and of these cases 810002 were marital births.9 On the other hand, the number of birth in 1969, reported by women in our sample, is 8215 which is 821500 when multiplied by 100. So one can estimate that about 27 % of non-marital birth have been “legitimized” by a following marriage. Nevertheless, it is clearly important to be aware of the fact that our sample does not cover all births. 4. Further limitations are due to the fact that our data set results from a retrospective survey as was discussed in Chapter 12. Only women who survived until 1970 could have been asked about previous childbearing. This is illustrated by the distribution of birth years shown in Figure 12.26 Fachserie A. Bev¨lkerung und Kultur. Volksz¨hlung vom 27. Mai 1970. Heft 7, o a Geburten. See also Schwarz (1974).

6. This result is due to the fact that censoring occurs at the same time f for all members of Ct0 ,τ ∗ . Slightly more complicated is the situation when interview dates extend over a longer period of time and/or cohorts are deﬁned by comprising several birth years. As a consequence, also the censoring times extend over several years and there is no longer a deﬁnite period for reliable conclusions. If one is not willing to restrict inferences until the minimal age of censored observations, that is, min{T (ω) | D(ω) = 0}, one needs some method of estimation. One possibility is to use the Kaplan-Meier procedure introduced in Section 8.3.4. Examples will be discussed in Chapter 14.

12.2

Data from the 1970 Census

As was mentioned in Section 11.3, information available from oﬃcial birth statistics in Germany is severely limited by the fact that the parity of births [Ordnungsnummer der Geburten] is only recorded for marital births in current marriages. Somewhat better information is available from the 1970 census in which 10 % of the women were asked to report the dates of all marital births, regardless of their current marital status. In the following sections we discuss a subsample of this data set available for scientiﬁc research.

12.2.1

Sources and Limitations

1. The census of 1970 was conducted on May 27 of that year in the territory of the former FRG.5 As part of this census a subsample of 10 % of the population was asked to provide additional information, in particular, all women with a German citizenship who participated in the 10 % subsample
5 For a detailed description, including a presentation of the questionnaire, see Schubnell and Herberger (1970).

information on these data sets are available from the Zentrum f¨r Umfrau gen, Methoden und Analysen (ZUMA, Mannheim), Abteilung f¨r Mikrodaten; see: u www.gesis.org/Dauerbeobachtung/Mikrodaten.
8 We are grateful to Bernhard Schimpl-Neimanns (ZUMA) who prepared the tables which we have used. The tables are based on the data set: Ergebnisse der Volks- und Berufsz¨hlung 1970 mit den Erg¨nzungsfragen (1 % Stichprobe der Wohnbev¨lkerung); a a o see Bach, Handl and M¨ller (1980), Schimpl-Neimanns and Frenzel (1995). u 9 Fachserie

7 More

1, Reihe 1, 1999 (p. 211).

186

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

187

5000 4000 3000 2000 1000 Birth year 0 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

1500

1000

500

0 1920

1930

1940

1950

1960

1970

Fig. 12.2-1 Number of women born between 1870 and 1969 in the 1 % subsample of the 1970 census.

1.10 However, part of the problem can be circumvented by a separation into birth cohorts and reconstructing childbearing histories for each birth cohort separately. This then only requires to assume that diﬀerential mortality is not heavily correlated with childbearing. In the following sections we will make this assumption and consider all birth cohorts with birth years from 1905 to 1945. Of course, for the younger birth cohorts, beginning about 1930, childbearing histories are not completed by 1970. 5. Only to consider, and compare, birth cohorts is insuﬃcient if one intends to reconstruct the historical development of births. It becomes necessary, then, to locate the birth cohorts in historical time and, in particular, take into account changes in cohort size. One aspect of this problem concerns the absolute number of children born of women who survived until 1970. This is shown by the solid line in Figure 12.2-2. For comparison, the dotted line that begins in 1946 shows the number of births as recorded in the territory of the former FRG by oﬃcial statistics. The diﬀerence is mainly due to the fact that the dotted line refers to all births while our sample only reports marital births of women with a German citizenship. The important point is that both curves are nearly proportional so that it seems justiﬁed to use our sample for a reconstruction of changes in the development of birth rates in the post-war period. More problematic are the earlier periods. Since political boundaries have changed and no valid data are available for the years from 1939 to 1945, it is already diﬃcult to assess the birth coverage of the sample. The ﬁrst part of the dotted line in Figure 12.2-2, which ends in 1938, shows the total number of births
10 This

Fig. 12.2-2 Number of children (in 1000) born during the years 1920 – 1969 in Germany. The solid line refers to the number of children reported by women in the 1 % subsample of the 1970 census. Dotted lines are based on data taken from Statistisches Bundesamt, Bev¨lkerung und Wirtschaft o 1872 –1972 (pp. 107-9). Until 1938 these data refer to the territory of the former Deutsches Reich, beginning in 1946 they refer to the territory of the former FRG.

in the territory of the former Deutsches Reich. Obviously, at least until about 1930, there is no correspondence between the two curves, due to the fact that many women who gave birth to children before 1930 died before 1970. It might be possible, however, to use the sample data also for some conclusions about the development of births since the beginning of the 1930 s.

12.2.2

Age at First Childbearing

can also be viewed as an age distribution of the female population in 1970; see, for comparison, the age distributions which were shown in Chapter 6.5.

1. We begin with an investigation of the distribution of ages at ﬁrst marital childbearing. This will be done separately for each 1-year birth cohort, C5, . . . , C45. The numbers refer to birth years, for example, C5 denotes the birth cohort of women born in 1905. For some of these birth cohorts the data are shown in Table 12.2-1. Referring to birth cohort C10 as an example, there are 5 women who reported that their ﬁrst marital birth occurred in the year 1926, that is, at age 16. As can be seen from the table, all births occurred at ages between 16 and 48. Altogether, 3251 women reported a birth year for the ﬁrst marital child. In addition, 1166 women had no marital children until the interview date in 1970, corresponding to an age of 60 years. These are called censored cases in the table. However, since the age at censoring is after the end of the reproductive period, one can safely assume that these women will remain without a marital

188

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

189

Table 12.2-1 Number of women in the 1 % subsample of the 1970 census who reported a ﬁrst marital birth at the speciﬁed age, classiﬁed according to 1-year birth cohorts. Also shown is the number of women with no marital birth until the interview date in 1970. τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Total Censored at age Cohort size C5 C10 C15 C20 C25 1 7 12 33 73 114 177 259 311 305 292 251 241 207 200 137 121 109 74 52 52 36 30 33 19 8 9 7 3 3 2 C30 C35 C40 C45

50 40 30 20 10 0 1900

3 8 34 75 117 166 177 204 220 211 182 154 166 189 159 118 97 81 71 59 46 32 33 27 14 12 8 6 2 2 2

5 15 58 82 138 167 167 227 274 302 307 265 255 231 190 121 78 78 67 41 28 43 29 26 19 14 6 8 3 4 2 1

1 18 34 80 128 169 169 228 269 272 233 145 170 134 66 73 61 65 53 52 25 25 17 9 9 9 4 3 4 3

5 29 82 119 197 245 261 297 356 201 233 216 203 224 179 150 125 78 66 56 39 35 21 17 17 7 4 3 3

1 10 43 136 208 259 289 280 270 270 269 228 181 136 120 90 47 68 41 35 22 20 18 12 2

10 30 70 152 237 292 365 390 373 320 285 265 183 153 118 107 82 52 32 12

7 57 103 218 342 401 452 439 407 409 286 252 201 150 43

9 39 112 176 230 238 225 229 172 58

1910

1920

1930

1940

1950

Fig. 12.2-3 Percentage of women without marital children in the 1 % subsample of the 1970 census. The broken part of the curve can not be reliably estimated from the data.

child. This allows to calculate the proportion of women who ﬁnally remain without a marital child in the birth cohort C10 to be 26 %. 2. In the same way one can calculate the proportion of women without a marital child for each birth cohort. The result is shown in Figure 12.2-3. Obviously, at least until birth cohorts born around 1930 the proportion of childless women declined. For younger birth cohorts our data set does not allow any safe conclusions because these cohorts did not reach the end of the reproductive period in 1970. However, we will see in Chapter 14 that the trend of declining proportions of childless women continued until birth cohorts of women born around 1945. 3. Additional information can be gained by a consideration of the distribution of ages at ﬁrst childbearing. This is easy because, as shown in Table 12.2-1, censored cases only occur in 1970. So one does not need the Kaplan-Meier procedure that was discussed in Section 8.3.4 but can directly calculate distribution and survivor functions. For example, referring again to C10, 298 out of 4417 women had a ﬁrst child before age 20. The distribution function has therefore a value of F (20) = 298/4417 = 0.067, that is, about 7 % of women born in 1910 had a ﬁrst marital child until age 20. For selected cohorts, corresponding survivor functions are shown in Figure 12.2-4. It is clearly seen that younger birth cohorts began childbearing at earlier ages. As discussed above, this was associated by a declining proportion of ﬁnally childless women. There is no reason, however, to believe this correlation to be stable through time. In fact, as will be shown later, a decline of the age at ﬁrst childbearing can also be associated with an increase in the proportion of ﬁnally childless women.

1

1 1

1 2676 1250 65 3926 3251 1166 60 4417 2529 874 55 3403 3470 1158 50 4628 3178 984 45 4162 3055 790 40 3845 3528 762 35 4290 3767 1223 30 4990 1488 1297 25 2785

190
1

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

191

0.9

4. While a plot of survivor functions, as shown in Figure 12.2-4, is well suited to compare a small number of cohorts, it becomes impractical for long time series. An alternative possibility to investigate changes in a series of distribution or survivor functions is based on the calculation of quantiles. An example is the median that was introduced in Section 7.2. Referring to a distribution function F , the median is a number, say m, such that F (m) ≈ 0.5. By generalization, the q-quantile is deﬁned as a number, say mq , such that F (mq ) ≈ q with the understanding that q is some number strictly between 0 and 1. One possibility to calculate quantiles is by linear interpolation.11 To illustrate, we calculate the median age at ﬁrst childbearing for the C10 cohort. Using the data from Table 12.2-1, one ﬁnds F (27) = 0.454 and F (28) = 0.512. Therefore, by linear interpolation: m − 27 28 − 27 = 0.512 − 0.454 0.5 − 0.454 from which can be derived the median m = 27.8. In the same way, one can calculate quantiles for any value of q between 0 and 1. We have done this for the values q = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 The result is shown in Table 12.2-2 for all birth cohorts born between 1905 and 1945. It is seen, for example, that the median age at ﬁrst marital childbearing declined from 29.4, for birth cohort C5, to 23.8, for birth cohort C45. These quantiles can ﬁnally be presented graphically as shown in Figure 12.2-5. 5. Interpretations should keep in mind that our data set only records marital births. Results would be diﬀerent if one would be able to include all ﬁrst births, regardless of whether the mother is married or not. To provide an impression of the diﬀerences we use data from the German Life History Study (GLHS) for three birth cohorts, C20, C30, and C40.12 Figure 12.2-6 compares the distributions; the solid lines refer to the 1 % subsample of the 1970 census, the dotted lines refer to the GLHS data set. Obviously, the proportion of childless women is much smaller than the proportion of women without a marital child.
11 Since

0.8

0.7

0.6

0.5

C45

0.4 C5 0.3 C40 0.2 C35 C30 C10 C20

0.1

0 15 20 25 30 35 40

Age 45

Fig. 12.2-4 Survivor functions for the age at ﬁrst marital birth, calculated from the 1 % subsample of the 1970 census.

already the deﬁnition of quantiles relies on approximation, there exist several diﬀerent methods to calculate quantiles. One should also note that statistical packages often use diﬀerent formulas. For an overview see Hyndman and Fan (1996). GLHS, and how we have done the calculations, will be discussed in Section 14.1.

12 The

192
40

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

193

Age

Table 12.2-2 Quantiles of the distribution of ages at ﬁrst marital childbearing, calculated from the 1 % subsample of the 1970 census.
Quantiles

35

Cohort

0.9 20.9 21.0 20.8 20.8 20.8 20.9 20.7 21.0 20.9 20.5 20.5 20.7 20.4 20.0 20.1 20.1 20.3 20.5 20.6 20.8 21.0 20.7 20.5 20.2 20.1 19.9 19.7 19.5 19.4 19.6 19.7 19.7 19.5 19.5 19.6 19.3 19.3 19.2 19.1 19.0 18.7

0.8 23.0 23.1 22.8 23.0 22.9 23.1 22.8 22.8 22.6 22.4 22.4 22.2 21.9 21.5 21.5 22.0 22.1 22.1 22.6 22.5 22.5 22.2 22.0 21.6 21.5 21.4 21.2 21.0 21.0 21.2 21.2 21.2 20.9 21.0 20.9 20.7 20.8 20.6 20.5 20.4 20.0

0.7 24.8 24.9 24.7 25.0 24.8 24.6 24.4 24.2 24.2 23.8 23.7 23.5 23.1 22.8 23.2 23.4 24.0 24.0 24.1 23.9 23.9 23.7 23.3 22.9 22.9 22.7 22.5 22.4 22.4 22.4 22.3 22.4 22.1 22.1 22.0 21.8 21.9 21.8 21.7 21.6 21.1

0.6 27.1 27.1 26.7 26.5 26.4 26.1 25.9 25.7 25.5 25.1 25.0 24.9 24.7 24.4 24.7 25.3 25.7 25.6 25.6 25.3 25.3 25.2 24.8 24.4 24.3 24.2 23.8 23.6 23.7 23.6 23.5 23.6 23.2 23.2 23.0 22.9 23.0 22.9 22.8 22.7 22.4

0.5 29.4 28.9 28.6 28.4 28.3 27.8 27.5 27.3 26.9 26.6 26.7 26.6 26.7 25.9 26.8 27.4 27.5 27.3 27.3 27.1 27.0 26.8 26.3 26.0 25.7 25.6 25.2 25.0 24.8 24.8 24.7 24.7 24.4 24.4 24.2 24.2 24.2 24.2 24.1 24.0 23.8

0.4 32.9 31.8 30.9 31.0 30.6 29.8 29.4 29.4 29.1 28.8 28.9 29.4 30.1 29.0 29.3 29.6 29.8 29.4 29.6 29.4 29.1 28.9 28.2 27.8 27.4 27.2 26.8 26.4 26.2 26.2 26.2 26.2 25.8 25.8 25.6 25.6 25.6 25.7 25.8

0.3 42.1 36.0 37.5 36.2 35.9 34.3 34.5 34.3 34.1 34.3 34.7 34.7 33.1 33.3 33.6 34.4 33.5 33.6 33.0 32.9 32.2 31.0 30.7 30.1 29.9 29.3 28.6 28.1 28.1 28.2 28.1 27.7 27.6 27.4 27.6 27.8

0.2

70 %

30

60 % 50 %

25

40 % 30 % 20 %

20

10 %

15 1900

Birth cohort 1910 1920 1930 1940 1950

Fig. 12.2-5 Graphical presentation of quantiles of the distribution of ages at ﬁrst marital childbearing, calculated for 1-year cohorts with birth years between 1905 and 1945 from the 1 % subsample of the 1970 census.

12.2.3

Age-speciﬁc Birth Rates

1. Our next question concerns the number of children born. As was discussed in Chapter 11, some useful information can already be gained from age-speciﬁc cohort birth rates, deﬁned as γt0 ,τ := number of children born of women at age τ number of women at age τ

with the understanding that both the numerator and the denominator refer to women born in the year t0 . If we assume that death rates do not depend on women’s parity, such rates, restricted to martial births, can be calculated from the 1 % subsample of the 1970 census.13 For a selection of birth cohorts, the required data are shown in Table 12.2-3. Each entry shows how many children were born of members of a speciﬁed birth cohort
13 Actually,

1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945

35.2 34.0 33.0 32.4 32.0 32.1 31.5

at a speciﬁed age. For example, out of 3845 women belonging to birth cohort C30, 505 reported to have born a marital child at the age of 25. Thus, given the above mentioned assumption, one gets the approximation γ1930,25 ≈ 505 = 0.1313 3845

one also has to ignore migration. Therefore, in a strict sense, also the 1 % percent subsample of the 1970 census only allows to calculate quasi-cohort birth rates.

194
1

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

195

C20

This ﬁgure can also be compared with period data for the year 1955. As reported in Fachserie 1, Reihe 1, 1999 (p. 198), the birth rate of women at age 25 in 1955 was β1955,25 = 0.1394

0.5

The diﬀerence of about 6 % can be attributed to the fact that the period data include all births and are not restricted to women having a German citizenship. 2. Again for birth cohort C30, Figure 12.2-7 compares the distribution of these rates. The solid and dotted lines show, respectively, the quantities

0 15 1 20 25 30 35 40 45

γ1930,τ 39 j=15 γ1930,j

and

β1930,τ 39 j=15 β1930,j

C30

The diﬀerences between the curves that occur at younger ages might indicate that non-marital births are more frequent in these ages. 3. It remains the question how to graphically present the age-speciﬁc cohort birth rates for a whole sequence of birth cohorts. One possibility is to ﬁrst calculate cumulated cohort birth rates, γt0 ,τ , and then to plot values ¯ for speciﬁed ages. The required data are shown in Table 12.2-4. From these data one can calculate the cumulated cohort birth rates ¯ ¯ ¯ ¯ γt0 ,25 , γt0 ,30 , γt0 ,35 , γt0 ,40 , γt0 ,45 ¯

0.5

0 15 1 20 25 30 35 40 45

for birth cohorts t0 = C5, . . . , C45. These rates can ﬁnally be presented graphically as shown in Figure 12.2-8.

12.2.4
C40

Number of Children

0.5

0 15 20 25 30 35 40 45

Fig. 12.2-6 Comparison of distributions of the age at ﬁrst (marital) birth, calculated from the 1 % subsample of the 1970 census (solid lines) and from the GLHS data (dotted lines).

1. Cumulated cohort birth rates refer to all children born of all members of a birth cohort until some speciﬁed age and therefore do not provide information about the distribution of the number of children among the cohort members. The latter distribution requires to calculate, separately for each birth cohort, the proportion of women with distinct numbers of children. This has been done in Table 12.2-5. For example, altogether there are 3926 women in the 1 % subsample of the 1970 census born in the year 1905. Of these, 1250 have no marital child, 828 have one marital child, 797 have two marital children, and so on. The total number of children born of these women is 6713 (equal to the number of children shown in Table 12.2-3). One should notice that these ﬁgures refer to the interview date in 1970. For birth cohorts born after about 1930, both the absolute numbers and the proportions will probably change until the end of the reproductive period. 2. The ﬁgures in Table 12.2-5 can be used to calculate the mean number

196

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

197

Table 12.2-3 Number of marital children, born of women belonging to the speciﬁed birth cohort in the speciﬁed age; calculated from the 1 % subsample of the 1970 census. τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Total Cohort size C5 C10 C15 C20 C25 1 7 12 35 76 126 213 332 429 444 473 474 488 467 474 459 383 389 367 319 289 233 187 156 142 91 64 51 23 17 7 C30 C35 C40 C45

0.08 0.07 0.06 0.05

3 8 36 79 141 198 261 310 375 378 354 338 353 432 426 393 386 374 382 321 273 207 200 160 103 66 54 50 28 12 5 3 2 1 1

5 16 61 97 173 215 242 338 437 503 568 568 596 618 573 510 383 368 341 193 183 193 167 159 118 95 61 40 29 15 8 2 3 4 1 1 7884 4417

1 19 36 89 150 210 257 358 451 484 476 331 389 356 223 247 237 262 256 221 187 155 112 104 90 71 60 30 25 13 5 2 2 1

5 29 88 133 243 326 336 422 524 368 423 452 482 522 463 428 408 372 315 277 232 217 186 142 99 79 48 22 27 9 4 2 3 2

1 10 44 140 230 324 402 445 488 505 541 555 496 485 437 424 339 338 269 225 218 156 139 111 28

10 32 76 169 274 380 507 595 646 657 688 700 628 597 503 474 408 339 269 86

7 57 118 260 416 553 683 768 801 834 730 743 634 558 193

9 42 116 209 305 374 379 414 358 130

0.04 0.03 0.02 0.01 0 15 20 25 30 35 40

Fig. 12.2-7 Age-speciﬁc quasi-cohort birth rates for birth cohort C30, calculated from the 1 % subsample of the 1970 census (solid line) and from Table 11.4-1 (dotted line). The ordinate shows proportions as explained in the text.

2 45 40 35 1

30

25 0 1900 Birth cohort 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950

Fig. 12.2-8 Cumulated cohort birth rates until speciﬁed ages, based on the data in Table 12.2-4.

6713 3926

5910 3403

7688 4628

7228 4162

7350 3845

8038 4290

7355 4990

2336 2785

of marital children per women. If one takes into account only women with at least one marital child, this mean value varies between 2.2 and 2.5, but does not show any substantial trend. Changes become visible, however, if one investigates the distribution of the number of children. The proportions can easily be calculated from the data in Table 12.2-5 and their development is graphically presented in Figure 12.2-9.

198

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

199

Table 12.2-4 Number of women belonging to speciﬁed birth cohorts, and number of children born of these women until speciﬁed age, calculated from the 1 % subsample of the 1970 census.
Children until age Birth year 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 Cohort size 3926 4045 4208 4430 4395 4417 4349 4511 4377 4301 3403 2578 2428 2465 3595 4628 4680 4442 4265 4058 4162 3998 3863 3923 3762 3845 3535 3444 3280 4036 4290 4348 4302 4715 5044 4990 4562 3771 3842 3820 2785 25 1789 1794 1923 1947 1969 2087 2095 2297 2304 2509 2055 1559 1527 1599 2184 2474 2336 2223 2091 2091 2147 2129 2261 2472 2443 2589 2518 2568 2531 3197 3346 3482 3668 4020 4402 4497 4082 3370 3406 3387 2336 30 3692 3852 4281 4510 4647 5010 4971 5139 5067 5073 3830 2846 2599 2800 3904 4816 4947 4764 4520 4354 4509 4441 4682 4859 4904 5103 4928 4992 4961 6151 6462 6598 6762 7342 7921 35 5548 5768 6422 6603 6735 6805 6592 6721 6514 6472 5053 3734 3509 3704 5245 6616 6796 6545 6345 6139 6256 6113 6362 6543 6525 6698 6450 6483 6347 7746 40 6491 6668 7378 7444 7556 7625 7485 7710 7416 7266 5701 4235 4009 4124 5927 7492 7720 7431 7242 7015 7065 6861 7159 7243 7168 45 6701 6926 7624 7695 7796 7865 7721 7944 7652 7488 5900 4354 4128 4248 6082 7677 7962 7652 7462 7199

30 2 25 1 20 0

15

3 4+ Birth cohort 1910 1915 1920 1925 1930

10 1905

Fig. 12.2-9 Proportion (in %) of women born in speciﬁed years with 0, 1, 2, 3, and 4 or more children; calculated from the data in Table 12.2-5.

12.2.5

Timing of Births

1. A ﬁnal question concerns the timing of childbearing in women’s life courses. One aspect of this question, the age at ﬁrst childbearing, has been discussed in Section 12.2.2. We now discuss two further aspects. One of them concerns the temporal distance between the births of several children, often called the spacing of childbearing. Another one concerns the idea that there might be a relationship between age at ﬁrst childbearing and the total number of children born until the end of the reproductive period. 2. As in the preceding sections, calculations are based on the 1 % subsample of the 1970 census. For all women with at least two children (excluding twins) we can calculate the temporal distance between the two births. Similarly, for all women with at least three children one can calculate the temporal distances between the ﬁrst and the third and between the second and the third birth. Additional temporal intervals can be calculated for women with at least four children. Results of these calculations are shown in Table 12.2-6. As can be seen, at least for the birth cohorts C5, . . . , C30, there are virtually no changes in the spacing of childbearing. No conclu-

200

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

201

Table 12.2-5 Number of women in the 1 % subsample of the 1970 census, and number of children born of these women, classiﬁed according to women’s birth cohort.
Number of children Birth year 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 Cohort size 3926 4045 4208 4430 4395 4417 4349 4511 4377 4301 3403 2578 2428 2465 3595 4628 4680 4442 4265 4058 4162 3998 3863 3923 3762 3845 3535 3444 3280 4036 4290 4348 4302 4715 5044 4990 4562 3771 3842 3820 2785 0 1250 1202 1128 1230 1200 1166 1055 1133 1102 1066 874 666 613 603 885 1158 1209 1110 1050 966 984 933 855 810 777 789 656 617 557 694 762 802 807 939 1060 1223 1291 1199 1427 1613 1297 1 828 903 914 985 973 991 1039 1044 1013 1023 788 591 592 592 926 1155 1109 1085 997 966 1008 980 849 933 841 885 785 727 714 878 989 1020 1036 1172 1323 1377 1273 1147 1169 1179 841 2 797 883 1002 1041 1037 1043 1082 1145 1117 1136 854 685 618 634 892 1248 1222 1121 1108 1083 1144 1083 1068 1093 1065 1057 1067 1050 990 1290 1345 1428 1376 1570 1682 1535 1383 1010 899 764 489 3 505 513 599 626 607 627 648 649 622 552 467 371 321 349 495 603 616 595 629 563 567 536 590 599 571 603 544 579 553 695 735 677 700 678 676 624 453 314 283 225 123 4 266 268 258 272 299 317 263 282 264 294 224 154 160 174 222 270 279 305 249 265 252 265 250 267 253 278 264 250 278 267 276 247 239 253 205 163 113 78 43 32 27 5+ 280 276 307 276 279 273 262 258 259 230 196 111 124 113 175 194 245 226 232 215 207 201 251 221 255 233 219 221 188 212 183 174 144 103 98 68 49 23 21 7 8 Total 6713 6945 7640 7711 7812 7884 7739 7955 7664 7506 5910 4358 4134 4256 6093 7688 7978 7665 7467 7203 7228 6981 7283 7313 7199 7350 6923 6852 6552 7824 8038 7886 7653 7927 8063 7355 6119 4543 4102 3549 2336

Table 12.2-6 Temporal distance in years between the i-th and the j-th birth, for birth cohorts C5, . . . , C30; calculated from the 1 % subsample of the 1970 census.
Birth year 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1–2 5.1 5.2 5.1 5.0 5.0 4.9 5.0 4.9 5.0 4.7 4.9 4.9 4.9 4.9 4.8 4.9 5.0 4.8 4.8 5.0 4.9 4.8 5.0 4.8 4.9 5.0 2–3 5.3 5.1 5.0 5.0 4.8 4.9 4.8 4.9 5.1 4.9 5.1 5.0 4.8 4.8 5.1 4.9 5.2 5.0 5.1 5.1 5.1 4.9 5.1 4.9 4.8 4.8 3–4 4.8 4.7 4.7 4.4 4.2 4.6 4.3 4.8 4.9 4.9 4.9 4.5 4.8 4.7 4.9 4.8 4.7 5.1 4.7 4.6 4.8 5.0 4.5 4.5 4.3 4.1 1–3 8.7 8.3 8.4 8.5 8.2 8.1 8.1 8.1 8.4 8.0 8.2 8.3 8.2 8.3 8.6 8.5 8.5 8.3 8.4 8.4 8.2 8.1 8.3 8.1 8.0 8.0 1–4 10.9 11.2 10.7 10.7 10.4 10.5 10.4 10.7 11.1 10.6 10.6 10.4 10.7 10.9 11.0 11.1 10.6 11.6 10.8 10.6 10.7 10.9 10.3 10.6 10.2 9.9

sions can be derived, however, for younger birth cohorts.14 3. Also based on data from the 10 % subsample of the 1970 census, similar calculations have been performed by R¨ckert (1975). Instead of birth cou horts, R¨ckert considers marriage cohorts of women in their ﬁrst marriage. u He therefore ﬁnds slightly shorter distances between successive births. For example, he ﬁnds a mean duration of 4 years between the birth of the ﬁrst and second child, for women who married between 1940 and 1949. However, also R¨ckert’s ﬁgures show that there have been virtually no changes u in the mean durations between successive births at least for marriage cohorts of women who married between 1920 and 1949. 4. R¨ckert also investigated possible relationships between the spacing of u childbearing and the ﬁnal number of children born of women in their ﬁrst
14 Performing

the same calculations for younger birth cohorts would result in a substantial selection bias. For example, the temporal distance between the ﬁrst and second child for birth cohort C45 is just 2.5 years. But this is most probably due to the fact that members of this cohort are only observed until an age of 25 years.

202
5 4 3 2 1 0 15
C30 C25 C20 C15 C10 C5

12

RETROSPECTIVE SURVEYS

12.2

DATA FROM THE 1970 CENSUS

203

Table 12.2-7 Mean age at childbearing (¯t0 ) and until age 40 cumulated cohort τ ¯ birth rates (βt0 , per 1000) for birth cohorts t0 . Calculated from age-speciﬁc birth rates in Fachserie 1, Reihe 1, 1999 (pp. 198 -200). t0 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 τ t0 ¯ 27.7 27.7 27.6 27.4 27.1 27.1 26.9 26.7 26.5 26.3 ¯ βt 0 2107.3 2133.7 2173.4 2201.4 2220.7 2155.7 2120.3 2095.1 2056.6 2012.1 t0 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 τ t0 ¯ 26.1 26.0 25.9 25.7 25.6 25.5 25.5 25.6 25.7 25.9 ¯ βt 0 1958.8 1891.2 1837.5 1797.2 1765.1 1761.4 1765.3 1738.0 1714.2 1700.0 t0 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 τ t0 ¯ 26.1 26.2 26.4 26.6 26.8 26.9 27.1 27.3 27.5 27.6 ¯ βt 0 1684.5 1642.6 1630.8 1612.9 1589.0 1604.0 1599.7 1582.5 1585.2 1581.4

20

25

30

35

40 2300 2200 2100 C30 C40 C34

Fig. 12.2-10 Mean number of marital children (ordinate) by age at ﬁrst marital childbearing (abscissa), for birth cohorts C5, C10, C15, C20, C25, and C30, calculated from the 1 % subsample of the 1970 census.

marriage. He found that the mean interval lengths between successive births decreases with increasing completed marital fertility.15 It is, however, questionable how to interpret this result. There is obviously no reason to assume a causal relationship. How many children a women will eventually bear does not depend on the temporal distance between her ﬁrst and second child. On the other hand, given a limited period of childbearing and conditioning on the number of eventually born children, it seems not surprising to ﬁnd relatively shorter distances between successive births for women who ﬁnally give birth to more children. 5. A similar problem occurs when one tries to ﬁnd relationships between the age at ﬁrst childbearing and the ﬁnal number of children born. Figure 12.2-10 provides an illustration. The relationship is quite similar for all birth cohorts C5, . . . , C30. Women who began childbearing at younger ages ﬁnally gave birth to relatively more children. But again, except for cases where reaching the end of the reproductive period creates deﬁnite limits to childbearing, there is no obvious causal relationship. Moreover, the relationship is actually not so stable as suggested by Figure 12.2-10. While this can not be demonstrated with the data from the 1 % subsample of the 1970 census, some additional information can be gained from period data of oﬃcial statistics. Given age-speciﬁc birth rates, βt,τ , they can be used to calculate the mean age at childbearing for quasi-cohorts in the
15

2000 1900 1800 1700 1600 1500 25 C46

C50 C55 C59 26 27 28

Fig. 12.2-11 Plot of the data in Table 12.2-7. The abscissa refers to mean age at childbearing, the ordinate refers to until age 40 cumulated age-speciﬁc birth rates (per 1000).

following way: τt0 := ¯ τb τ =τa τ βt0 +τ,τ τb τ =τa βt0 +τ,τ

For each cohort t0 , one can also calculate the cumulated cohort birth rate τb ¯ βt0 := τ =τa

βt0 +τ,τ

Es gilt oﬀensichtlich allgemein, daß der durchschnittliche Geburtenabstand um so ” k¨ rzer ist, je gr¨ßer die Kinderzahlen in den Ehen nach abgeschlossener Familienbildung u o sind.“ (R¨ ckert 1975, p. 87) u

so that it becomes possible to investigate changes in the relationship between the two quantities across cohorts. The age-speciﬁc birth rates published by the Statistisches Bundesamt allow to calculate these quantities

204

12

RETROSPECTIVE SURVEYS

for t0 = 1930, . . . , 1959, assuming τa = 15 and τb = 40.16 Results of the calculation are shown in Table 12.2-7. The graphical view of these data in Figure 12.2-11 clearly shows that there is no simple relationship between mean age at ﬁrst childbearing and cumulated birth rates.

Chapter 13

Births in the Period 1950 –1970
In the previous chapter, the presentation of data from a 1 % percent subsample of the 1970 census focused on birth cohorts. This is useful for an understanding of historical changes but has limitations. The subjects of historical change are individuals, not birth cohorts. Birth cohorts are just analytical tools for the presentation of data related to life courses of individuals. These life courses are not, however, determined by a speciﬁc birth year but depend on the changing historical contexts in which they develop. The cohort approach therefore has to incorporate historical periods. There is, however, no direct connection both for a technical and a substantial reason. The technical reason refers to the fact that cohorts contribute to the whole range of historical periods during which their members live. The more substantial reason refers to the fact that the starting point for an understanding of individual behavior is a historical period from which, possibly, substantial diﬀerences between successive birth cohorts result.1 In order to understand the relationship between cohorts and periods, we consider, in the present chapter, the development of births in the period 1950 – 70. As in the previous chapter, the data source is the 1 % subsample of the 1970 census.

13.1

Age-speciﬁc Birth Rates

1. In the territory of the former FRG, a substantial increase in the number of births began in the mid-ﬁfties. This increase, sometimes called “baby boom”, lasted until about the mid-sixties and was followed by a longterm decline in the number of births. In the present chapter we try to reconstruct this development, for the period 1950 – 70, with the data of the 1 % subsample of the 1970 census. As was shown in Figure 12.2.12 in Section 12.2.1, most of the births that occurred in this period were contributed by women represented in our data set. 2. We begin with an investigation of age-speciﬁc birth rates. Using our standard notation, the age-speciﬁc birth rate for age τ in year t is deﬁned as βt,τ = bt,τ /nf . The denominator refers to the number of women who are t,τ of age τ in the year t, and the numerator refers to the number of children
1 We therefore do not follow Ryder’s (1964) idea of a “process of demographic translation”. From a technical point of view this simply means to derive period measures from cohort measures of people’s reproductive behavior. But this way to set up the problem confuses the order in which the facts from which cohort measures are statistically derived are brought about by people’s behavior which always takes place in speciﬁc and changing historical periods.

16 Data

are taken from Fachserie 1, Reihe 1, 1999 (pp. 198 -200).

206
13 BIRTHS IN THE PERIOD 1950 –1970 13.1

AGE-SPECIFIC BIRTH RATES

207

Table 13.1-1 Number of marital children born of mothers of speciﬁed age in speciﬁed year (1 % subsample of the 1970 census). τ 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 11 81 171 297 440 451 459 470 358 526 503 481 541 558 575 435 335 288 269 210 144 137 116 111 75 59 49 18 17 15

born of these women during year t. We assume that these rates, when restricted to marital births, can sensibly be approximated by the data in the 1 % subsample of the 1970 census for the years t = 1950, . . . , 1969.2 Values for the numerators are shown in Table 13.1-1.3 For example, 230 children were born in the year 1950 of women in our data set who were at age 20 in that year. In order to estimate age-speciﬁc birth rates we also need to know the number of women. These numbers can be taken from Table 12.2.3-2 in Section 12.2.3. For example, there are 3845 women in our data set who were born in 1930 and therefore of age 20 in 1950. So we get the estimate β1950,20 ≈ 230 = 0.0598 3845

15 60 128 209 342 437 508 712 801 822 778 687 664 597 525 363 331 337 269 249 205 207 162 142 110 91 50 45 27 13

15 60 176 254 305 425 508 575 736 834 802 784 625 544 503 425 316 295 233 225 193 174 157 123 91 74 47 42 22 9

12 67 174 266 398 374 520 598 584 769 730 791 672 568 506 474 363 260 251 207 218 158 120 117 77 64 59 44 19 19

7 75 173 295 409 466 379 578 564 604 684 743 721 559 506 467 408 316 221 215 174 156 120 105 86 59 51 18 21 13

9 63 197 297 420 462 474 414 564 553 492 637 634 630 529 415 417 339 281 226 185 148 139 97 96 64 35 23 18 17

16 42 115 221 329 485 683 805 738 703 670 700 600 463 425 410 339 306 304 266 206 187 166 137 102 81 48 28 11 5

In the same way one can estimate age-speciﬁc birth rates for all ages τ = 16, . . . , 45 and all years in the period 1947– 69. Given this period and range of ages, the birth years range from 1902 to 1953.4

9 61 116 254 357 413 613 768 859 773 708 732 628 587 400 399 335 338 294 290 230 180 156 153 108 97 75 22 16 7

7 33 103 186 284 380 462 409 461 479 541 531 487 494 427 383 410 376 328 305 232 178 94 83 59 71 50 46 31 12

3. In order to visualize age-speciﬁc birth rates one can use level plots as introduced in Section 11.4. Such a plot is shown in Figure 13.1-1. The darkness of the grey-scale corresponds to the values of the age-speciﬁc birth rates. The maximum value of 170.3 marital children per 1000 women is reached in 1963 at an age of 24 years. We have selected ﬁve levels (40, 70, 100, 130, and 150) for contour lines. It is seen that high values of age-speciﬁc birth rates concentrate in the period 1958 – 66 at ages between 23 and 27 years. In the same way one can visualize cumulated age-speciﬁc birth rates as shown in Figure 13.1-2. The highest value of 2406 children

5 57 107 236 338 406 507 585 439 490 502 555 504 500 460 406 389 367 319 318 237 217 128 58 78 45 60 36 26 10

4 36 118 235 344 443 486 595 592 516 554 508 496 490 415 440 347 367 319 322 229 253 186 108 67 52 31 30 24 11

10 38 118 260 355 495 544 616 646 648 530 509 499 485 420 408 395 360 319 293 235 221 175 142 93 47 25 20 25 18

10 43 110 238 416 541 588 645 656 657 623 522 480 441 437 406 362 315 293 289 243 229 168 157 99 54 32 20 17 13

9 53 115 205 370 553 642 705 677 732 688 619 515 456 460 424 366 320 264 266 233 204 188 166 102 79 44 27 17 6

2 One cannot reliably approximate these rates for the year 1970 because the census was conducted already in May of that year.

3 31 101 180 274 349 366 447 460 505 516 482 457 457 459 396 383 336 318 277 175 76 79 82 90 94 60 40 19 15

3 Since only very few children were born of women at ages under 16 or above 45, we restrict all calculations to ages τ = 16, . . . , 45.

6 29 76 181 246 290 369 445 441 477 498 483 467 433 466 416 375 372 228 130 131 101 112 119 106 81 55 40 29 15

7 25 88 169 251 288 345 426 488 509 503 512 476 474 435 425 383 379 315 213 125 129 103 104 114 93 72 54 29 12

4 In order to approximate age-speciﬁc birth rates for the whole period we therefore need, in addition to the numbers from Table 12.2.3-2, also the following cohort sizes:

5 32 91 144 270 306 402 404 464 480 469 488 471 455 488 472 408 285 164 143 156 155 140 150 135 96 61 41 29 20

Birth year 1902 1903 1904 1946 1947 1948 1949 1950 1951 1952 1953

Cohort size 3579 3661 3971 3371 3573 3872 4170 4185 3970 4053 3803

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

0 10 51 121 171 254 332 390 434 468 523 452 353 223 209 188 237 294 286 245 200 193 167 135 138 94 54 50 21 14

3 18 44 121 216 278 367 429 451 499 494 518 482 329 228 210 201 262 286 261 242 212 167 146 129 99 76 50 26 20

6 18 68 140 219 308 344 358 444 482 505 539 530 522 395 211 216 173 256 274 280 274 209 159 130 98 66 49 28 15

6 31 76 149 230 314 406 444 418 473 446 503 540 530 463 333 216 174 178 221 231 208 207 140 118 93 70 35 26 12

10 32 86 168 243 324 374 431 411 451 474 478 500 480 510 428 282 183 167 148 187 190 158 131 132 95 65 39 26 13

Like Table 12.2.3-2, these ﬁgures refer to the number of women in the 1 % subsample of the 1970 census.

208

13

BIRTHS IN THE PERIOD 1950 –1970

13.1

AGE-SPECIFIC BIRTH RATES

209

1969

1965

1960

2300

150

1955

2000

130

1500

1000

100

100

1950

500

40

70

70

40

45

40

35

30

25

20

15

45

40

35

30

25

20

Fig. 13.1-1 Level plot of age-speciﬁc birth rates in the period 1950 – 69, calculated from the 1 % subsample of the 1970 census.

Fig. 13.1-2 Level plot of cumulated age-speciﬁc birth rates in the period 1950 – 69, calculated from the 1 % subsample of the 1970 census.

15

1950

1955

1960

1965

1969

210

13

BIRTHS IN THE PERIOD 1950 –1970

13.1

AGE-SPECIFIC BIRTH RATES

211

150 24-26

Table 13.1-2 Actual and hypothetical number of marital children born in the period 1950 – 69 of women in the 1 % subsample of the 1970 census.
Actual development t bt 7291 7216 7424 7217 7546 7527 7942 8385 8618 8949 9104 9505 9591 9978 9886 9572 9479 9193 8875 8200 bt − 7291 0 -75 133 -74 255 236 651 1094 1327 1658 1813 2214 2300 2687 2595 2281 2188 1902 1584 909 cumulated 0 -75 58 -16 239 475 1126 2220 3547 5205 7018 9232 11532 14219 16814 19095 21283 23185 24769 25678 Hypothetical development b∗ t 7291 7251 7479 7290 7655 7586 7972 8333 8502 8685 8741 9082 9075 9414 9377 9134 9140 8902 8643 8023 b∗ − 7291 t 0 -40 188 -1 364 295 681 1042 1211 1394 1450 1791 1784 2123 2086 1843 1849 1611 1352 732 cumulated 0 -40 148 147 511 806 1488 2529 3741 5134 6585 8376 10160 12283 14369 16212 18061 19672 21023 21755

100

29-31 19-21 34-36 39-41

50

0 1950

1955

1960

1965

1970

Fig. 13.1-3 Age-speciﬁc birth rates in the period 1950 – 69. Mean values for the speciﬁed age groups calculated from our 1 % subsample of the 1970 census.

per 1000 women occurs again in the year 1963. This ﬁgure also illustrates that, until the second half of the 1960 s, women tended to get children in younger ages. 4. Additional information can be gained by focusing on birth rates for speciﬁc age groups. We selected ﬁve age groups and, for each group, calculated birth rates as unweighted mean values of the age-speciﬁc birth rates of the contributing ages. The result is presented in Figure 13.1-3. It is seen that mainly young women, up to an age of about 30, contributed to the rising number of births until the mid-sixties. Furthermore, with the exception of very young women, birth rates began to decline already since about 1965. 5. The data presented so far suggest that the increase in the number of births until the mid-sixties is mainly due to increasing birth rates. However, the number of children actually born also depends on the number and age distribution of potential mothers. One might therefore ask what part of the rising number of children can be attributed to changes in cohort sizes and age distribution, and what part can be attributed to changes in age-speciﬁc birth rates. One possibility to approach this question is by performing a hypothetical calculation based on the assumption that the number and age distribution of women remained the same as it was in 1950 for the whole period from 1950 to 1969. The actual number of births, bt , can then be compared with a hypothetical number of births, calculated as
45

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969

Table 13.1-2 shows the result of this calculation; Figure 13.1-4 compares the development of bt and b∗ . It is seen that only a small part of the t increasing number of marital children born in the period 1950 – 69 can be attributed to changes in the number and age distribution of women. So we conclude that the main part of the “baby boom” resulted from increasing birth rates, that is, women, in particular younger women, gave birth to more children. 6. One might try to quantify the contribution to the rising number of children which can be attributed to changes in the number and age distribution of women. A simple measure can be derived from Table 13.1-2. Column bt − 7291 shows the surplus of marital children compared with 1950; the next column shows the cumulated values. The same calculations are then applied to the hypothetical development. For example, until the year 1963, the cumulated surplus of marital children amounted to 14219. Under the assumption that the number and age distribution of women had not changed since 1950, this ﬁgure would be 12283. One can therefore attribute about 14 % of the cumulated surplus until 1963 to changes in the number and age distribution of women. 7. An additional consideration concerns “timing eﬀects”: women can give birth to children anywhere during the reproductive period. In fact, as already mentioned several times, until about the mid-sixties, women began

b∗ t

:= τ =16

βt,τ nf 1950,τ

212
10000 9000 8000 7000 6000 5000 1950

13

BIRTHS IN THE PERIOD 1950 –1970

13.2

PARITY-SPECIFIC BIRTH RATES

213

Table 13.2-1 Number of marital children, classiﬁed with respect to parity, born by women of age 16 – 45 who are members of the 1 % subsample of the 1970 census. t 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 bt 6307 6864 7316 7291 7216 7424 7217 7546 7527 7942 8385 8618 8949 9104 9505 9591 9978 9886 9572 9479 9193 8875 8200 bt
(1)

bt

(2)

bt

(3)

bt

(4)

bt

(5+)

1955

1960

1965

1970

Fig. 13.1-4 Actual (solid line) and hypothetical (dotted line) number of marital children born in the period 1950 – 69. Data are taken from columns bt and b∗ in Table 13.1-2. t

childbearing in younger ages. Several authors have therefore suggested that at least some part of the baby boom is due to such “timing eﬀects” (see, e.g., Dinkel, 1983). However, investigating this question requires more complicated considerations and will be postponed until Chapter 13.3.

13.2

Parity-speciﬁc Birth Rates

2963 3209 3395 3530 3415 3399 3260 3358 3364 3503 3675 3744 3781 3924 4061 3989 4042 3872 3905 3662 3617 3457 3345

1768 1949 2182 2086 2124 2245 2233 2345 2266 2332 2487 2591 2768 2716 2832 2946 3141 3103 3007 3063 2965 2864 2617

808 940 970 932 897 1020 992 1050 1045 1183 1279 1237 1310 1316 1379 1431 1509 1525 1457 1543 1471 1435 1247

385 381 415 360 405 423 405 448 494 518 518 565 582 633 655 663 656 725 617 624 608 587 532

383 385 354 383 375 337 327 345 358 406 426 481 508 515 578 562 630 661 586 587 532 532 459

1. One can get additional information by distinguishing births with respect to parity, that is, for each women, the ﬁrst child, the second child, and so on. We will use the following notation: bt,τ := number of children of parity p born in year t of women at age τ Of course, women might give birth to several children at the same time (twins, triplets, . . . ), and in these cases parities are arbitrarily assigned. In a ﬁrst step we ignore the dependence on age and simply consider τb (p)

2. Figure 13.2-1 provides a graphical display of the data in Table 13.2-1. It is seen that children of all parities contributed to the general increase until about 1963. It is also seen that, depending on parity, the decline in the number of births began in diﬀerent years. 3. In a next step we can consider parity-speciﬁc birth rates. The standard deﬁnition is βt,τ :=
(p)

bt,τ

(p)

nf t,τ

bt

(p)

:= τ =τa

bt,τ

(p)

called parity-speciﬁc number of children. Values can be calculated from the 1 % subsample of the 1970 census as shown in Table 13.2-1. For consistency with earlier calculations, we have taken into account, for each year t = 1947, . . . , 1969, only women who were of age 16 – 45 in the respective year. Once again, the ﬁgures in Table 13.2-1 only refer to marital children.

The denominator refers to the number of women aged τ in the year t, and the numerator refers to the number of children of parity p who are born of these women during the year t. The deﬁnition has, however, a drawback in not taking into account that, except in cases of multiple births, children of parity p can only be born of women who have already given birth to p − 1 children. It is therefore preferable to calculate parity progression rates. 5
5 Also

called parity progression ratios, see, e.g., Newell (1988, p. 58).

214

13

BIRTHS IN THE PERIOD 1950 –1970

13.2

PARITY-SPECIFIC BIRTH RATES

215

4000 1 3000 2 2000 3 1000 4 5+ 1955 1960 1965 1970

Table 13.2-2 Number of women born in the years speciﬁed in the ﬁrst column with p children before year t, calculated from the 1 % subsample of the 1970 census.
Birth years 1902 – 1931 1903 – 1932 1904 – 1933 1905 – 1934 1906 – 1935 1907 – 1936 1908 – 1937 1909 – 1938 1910 – 1939 1911 – 1940 1912 – 1941 1913 – 1942 1914 – 1943 1915 – 1944 1916 – 1945 1917 – 1946 1918 – 1947 1919 – 1948 1920 – 1949 1921 – 1950 1922 – 1951 1923 – 1952 1924 – 1953 t 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 nt f,(0) nt

f,(1)

nt

f,(2)

nt

f,(3)

nt

f,(4+)

nf t 117800 117665 117284 117349 117713 118016 118110 118395 119044 119617 119830 119090 118555 118074 117456 118249 119394 120801 121376 120933 120223 119834 119372

0 1950

Fig. 13.2-1 Parity-speciﬁc number of children in the period 1950 – 69, corresponding to the values in Table 13.2-1.

We use the following deﬁnition:
(p) βt,τ

:=

bt,τ nt,τ

(p)

f,(p−1)

In this deﬁnition, the denominator only refers to women at age τ in year (1) t who have already given birth to p − 1 children. For example, βt,τ is the proportion of women at age τ in year t who gave birth to their ﬁrst child (2) during that year. Similarly, βt,τ is the proportion of women with already a ﬁrst child who gave birth to a second child during the year t.6 4. We will try to calculate parity progression rates from the data in the 1 % subsample of the 1970 census. We begin with a simpliﬁed approach and ignore age, that is, we relate the number of children of parity p born during a year t to all women who might give birth to a child of this parity during the year t. The formal deﬁnition is βt
(p)

64498 63745 62616 62028 61535 61266 61038 61260 61744 62201 62202 61162 60158 59127 57113 55757 54726 53949 53359 52479 51575 50899 50195

22644 23087 23566 23939 24556 24941 25180 25222 25261 25368 25500 25644 25783 25775 26194 26831 27284 27595 27438 27181 26672 26241 25837

16692 16948 17228 17577 17936 18282 18507 18705 18964 19141 19207 19272 19505 19826 20370 21139 22036 23035 23722 24024 24319 24690 25009

7801 7815 7922 8006 8072 8046 8044 8010 8004 7930 7947 8058 8111 8286 8505 8856 9303 9806 10113 10348 10654 10922 11143

6165 6070 5952 5799 5614 5481 5341 5198 5071 4977 4974 4954 4998 5060 5274 5666 6045 6416 6744 6901 7003 7082 7188

22644 women had a ﬁrst child until the end of 1946 and might get a second child during the year 1947, and so on. These numbers are used to calculate βt , βt , βt , and βt
(1) (2) (3) (4)

The required numerators can be found in Table 13.2-1. For example, β1947 =
(1)

:=

bt

(p)

f,(p−1) nt

2963 = 0.0459 64498

Table 13.2-2 shows values for the denominator. For example, there are 117800 women born during the years 1902–31 and therefore in an age between 16 and 45 in the year t = 1947. Of these women, 64498 had no child until the end of 1946 and might get a ﬁrst child during the year 1947;
6 Birg, Filip and Fl¨thmann (1990, p. 11) call these rates “bedingte Geburteno wahrscheinlichkeiten”. We avoid this wording because partity progression rates are simply proportions, not probabilities.

that is, about 4.6 % of women who might get a ﬁrst marital child during 1947 actually realized this possibility. 5. In the same way parity progression rates, for p = 1, . . . , 4, can be calculated for all years. The result is shown in Figure 13.2-2. It suggests that the decline of parity progression rates began somewhat earlier for parities 3 and 4, compared with parities 1 and 2. Since there is only a very short time-lag it seems not warranted, however, to make this a substantial point.

216

13

BIRTHS IN THE PERIOD 1950 –1970

13.3

UNDERSTANDING THE BABY BOOM

217

13.3
10 p=2

Understanding the Baby Boom

p=1 5 p=3 p=4

An instructive example of the fact that population growth not only depends on the number of newborn children but also on the timing of births and, especially, on women’s age at childbearing, is the baby boom in West Germany during the period 1955 – 1965. It has been argued (e.g., by Dinkel, 1983) that this baby boom was mainly a consequence of the fact that women began childbearing at younger ages. The argument implies a comparison between the actual population growth and a hypothetical one that might have occurred if women behaved diﬀerently. One therefore needs some kind of analytical model to make the argument fully explicit.

0 1950

1955

1960

1965

1970

13.3.1

Number and Timing of Births

Fig. 13.2-2 Parity progression rates of marital children in the period 1950 – 69, calculated from the data in Tables 13.2-1 and 13.2-2.

1. In order to develop a conceptual framework we refer to birth cohorts f of women denoted by Ct0 , t0 being the birth year. This allows to deﬁne age-speciﬁc cohort birth rates7 γt0 ,τ := bt0 +τ,τ f |Ct0 ,τ

where the denominator refers to the number of women belonging to the f birth cohort Ct0 at age τ , and the numerator records the number of children f born by members of Ct0 at age τ (in the year t = t0 + τ ). Denoting the beginning and end of the reproductive period by τa and τb , respectively, one can also deﬁne cumulated cohort birth rates τ γt0 ,τ := ¯ j=τa γt0 ,j

2. These concepts can be used to compare childbearing among birth cohorts of women and, in these comparisons, distinguish between the number of children born and the timing of childbearing. The ﬁrst aspect is captured by the completed cohort birth rate, γt0 ,τb ; the second aspect is ¯ captured by the shape of the function τ −→ γt0 ,τ ¯ As an example, we use data from the 1 % subsample of the 1970 census discussed in Chapter 12.2. Figure 13.3.1-1 compares birth rates of the cohorts t0 = 1910 and t0 = 1920. The topmost plot (a) compares the cumulated cohort birth rates γ1910,τ (solid line) and γ1920,τ (dotted line). Assuming ¯ ¯ τb = 45, it is seen that cohort C20 has a somewhat lower completed cohort
7 These

notions have been introduced in Section 11.2.

218
2

13

BIRTHS IN THE PERIOD 1950 –1970

13.3

UNDERSTANDING THE BABY BOOM

219

(a)

C10 C20

birth rate than cohort C10. Values can be calculated from Table 12.2.3-1 ¯ in Section 12.2.3: γ1910,45 = 1.7806 and γ1920,45 = 1.6588. Furthermore, ¯ there is also a somewhat diﬀerent timing of births. Compared with C10, relatively more women belonging to C20 gave birth to children at ages under 25. 3. Of course, it would be strange to say that these women hastened to realize births that they anticipated to have anyway.8 Nevertheless, in order to conceptually distinguish between the number and timing of births, one cannot avoid to apply a retrospective view and assume completed cohort birth rates as given. In our example, this allows to construct, for birth cohort C20, hypothetical cumulated cohort birth rates which have the same shape as the cumulated cohort birth rates of C10 but keep the original completed cohort birth rate of C20. The following deﬁnition shows the construction: γ1920,45 ¯ ¯ γ1920,τ := γ1910,τ ¯∗ γ1910,45 ¯ Part (b) of Figure 13.3.1-1 compares γ1920,τ (solid line) and γ1920,τ (dotted ¯ ¯∗ line). Without changing the completed cohort birth rate, part of the births are “shifted” into higher ages. This is also seen in part (c) of the ﬁgure where the solid line refers to the age-speciﬁc birth rates γ1920,τ and the dotted line refers to the corresponding hypothetical birth rates
∗ γ1920,τ := γ1910,τ

1

0 15 2 20 25 30 35 40 45 50

(b)

1

γ1920,45 ¯ γ1910,45 ¯

(13.3.1)

0 15 0.15 20 25 30 35 40 45 50

4. Finally, one can compare the actual with a hypothetical development of births. The actual development is given by the equation τb τb

(c)

bt = τ =τa

bt,τ = τ =τa

f γt−τ,τ |Ct−τ,τ |

0.1

which shows how the number of births in year t derives from the surviving f cohort members, |Ct−τ,τ |, and the cohort birth rates, γt−τ,τ , of all births cohorts t − τ (τ = τa , . . . , τb ). The idea now is to compare this actual development of births with a hypothetical development deﬁned by τb 0.05

b∗ := t τ =τa

f ∗ γt−τ,τ |Ct−τ,τ |

(13.3.2)

0 15 20 25 30 35 40 45 50

b∗ would be the number of children born in year t if the childbearing of t women who might contribute to these births would follow the modiﬁed
8 Actually, the whole argument is in statistical terms and does not relate to the behavior of individual women; and, as was discussed in Section 3.4, one also cannot sensibly speak of the behavior of a cohort.

Fig. 13.3.1-1 Comparison of age-speciﬁc cohort birth rates; see the text for explanation.

220

13

BIRTHS IN THE PERIOD 1950 –1970

13.3

UNDERSTANDING THE BABY BOOM

221

∗ birth rates γt−τ,τ , instead of the actually realized birth rates γt−τ,τ . Of course, in order to deﬁne the modiﬁed birth rates one needs to refer to one birth cohort whose timing of births provides a reference. In the example above the cohort of women born 1910 was used to deﬁned the reference. It is quite possible, however, that the results of a comparison between bt and b∗ also depend on the choice of the reference cohort. t

Table 13.3.2-1 Calculation of completed cohort birth rates for birth cohorts 1905 – 1954. See the text for explanations.
Birth year 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 Cohort size 3926 4045 4208 4430 4395 4417 4349 4511 4377 4301 3403 2578 2428 2465 3595 4628 4680 4442 4265 4058 4162 3998 3863 3923 3762 3845 3535 3444 3280 4036 4290 4348 4302 4715 5044 4990 4562 3771 3842 3820 2785 3371 3573 3872 4170 4185 3970 4053 3803 4012 (a) 1.6533 1.6485 1.7533 1.6804 1.7192 1.7263 1.7211 1.7092 1.6943 1.6894 1.6753 1.6427 1.6512 1.6730 1.6487 1.6188 1.6496 1.6729 1.6980 1.7287 1.6975 1.7161 1.8532 1.8463 1.9054 1.9116 (b) 1.7068 1.7122 1.8118 1.7370 1.7738 1.7806 1.7754 1.7610 1.7482 1.7410 1.7338 1.6889 1.7002 1.7233 1.6918 1.6588 1.7013 1.7226 1.7496 1.7740 1.7416 1.7607 1.9014 1.8943 1.9549 1.9613 (c) (d) (e) 1.8775 1.8834 1.9930 1.9107 1.9512 1.9587 1.9529 1.9371 1.9230 1.9151 1.9072 1.8578 1.8702 1.8956 1.8610 1.8247 1.8714 1.8949 1.9246 1.9514 1.9158 1.9368 2.0915 2.0837 2.1504 2.1395 2.1623 2.1993 2.2244 2.2395 2.1721 2.1347 2.1079 2.0695 2.0243 1.9708 1.9025 1.8490 1.8089 1.7771 1.7746 1.7791 1.7513 1.7286 1.7145 1.7003 1.6578 1.6464 1.6287 1.6057

13.3.2

Performing the Calculations

1. We now try to compare bt and (diﬀerent versions of) b∗ in the period t 1950 –1970 in the territory of the former FRG. Values of bt are available from oﬃcial statistics (see Table 6.3-1 in Section 6.3). In order to ﬁnd values of b∗ one needs to refer to all birth cohorts of women who contributed t to the births in the period 1950 –1970. Assuming a reproductive period from age 16 to age 45, cohort birth years range from 1905 to 1954. For each birth cohort we need values for the cohort size in 1950 and the completed cohort birth rates. Since appropriate data are not directly available from oﬃcial period statistics, we try to ﬁnd approximately valid quantities from the 1 % subsample of the 1970 census discussed in Chapter 12.2. 2. We assume that cohort sizes in 1950 are approximately proportional to the number of women, born in years from 1905 to 1954, who were still alive at the census date in 1970. These numbers, taken from the 1 % subsample of the census, are shown in the second column of Table 13.3.2-1. Completed cohort birth rates are more diﬃcult to approximate. Cumulated cohort birth rates that can be calculated from the subsample of the 1970 census only refer to marital births. Moreover, assuming τb = 45, completed marital cohort birth rates can only be calculated for cohorts 1905 –1924. They are shown in column (b) of Table 13.3.2-1. In order to extend the period one can use the fact that cumulated cohort birth rates at age 45 are only slightly larger than at age 40. This is seen in Table 13.3.2-1 by comparing column (b) with column (a) which shows the cumulated cohort birth rates up to an age of 40. The entries for birth cohorts 1925 –30, shown in column (c) of the table, have been calculated by simply multiplying the entries in column (a) by 1.026. Beginning with birth cohort 1930, oﬃcial period statistics allow to calculate completed quasi-cohort birth rates. Still assuming that τb = 45, they are shown in column (d) of Table 13.3.2-1.9 Finally, since these values refer to all births, one needs an adjustment of the completed cohort birth rates calculated from the 1 % subsample of the 1970 census which only refer to marital births. Assuming a proportion of about 10 % non-marital births we have simply multiplied the entries in columns (b) and (c) by the factor 1.1 in order to get the entries in column (e). The values in columns (d) and (e)
9 Data are taken from Fachserie 1, Reihe 1, 1999 (pp. 198 -200). See also the discussion of these data in Section 11.4.

222

13

BIRTHS IN THE PERIOD 1950 –1970

13.3

UNDERSTANDING THE BABY BOOM

223

Table 13.3.2-2 Age-speciﬁc cohort birth rates for birth cohorts 1910, 1920, and 1930. τ 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1910 0.00113 0.00362 0.01381 0.02196 0.03917 0.04868 0.05479 0.07652 0.09894 0.11388 0.12859 0.12859 0.13493 0.13991 0.12973 1920 0.00108 0.00627 0.01901 0.02874 0.05251 0.07044 0.07260 0.09118 0.11322 0.07952 0.09140 0.09767 0.10415 0.11279 0.10004 1930 0.0021 0.0100 0.0289 0.0527 0.0748 0.0968 0.1142 0.1253 0.1349 0.1394 0.1459 0.1491 0.1418 0.1365 0.1239 τ 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Total 1910 0.11546 0.08671 0.08331 0.07720 0.04369 0.04143 0.04369 0.03781 0.03600 0.02671 0.02151 0.01381 0.00906 0.00657 0.00340 1.7806 1920 0.09248 0.08816 0.08038 0.06806 0.05985 0.05013 0.04689 0.04019 0.03068 0.02139 0.01707 0.01037 0.00475 0.00583 0.00194 1.6588 1930 0.1136 0.0989 0.0895 0.0787 0.0656 0.0564 0.0450 0.0361 0.0276 0.0197 0.0143 0.0085 0.0051 0.0027 0.0013 2.1395

1000

500 C1910 C1920 C1930 0 1950 1955 1960 1965 1970

Fig. 13.3.2-1 Number of birth (in 1000) in the territory of the former FRG (solid line) and hypothetical developments (dotted lines) which assume a timing of births according to birth cohorts 1910, 1920, and 1930, respectively.

will then be used in the following simulations. 3. A further question concerns the birth cohort to be used as a reference for the assessment of timing eﬀects. Since simulation results might well depend on the choice of a reference cohort, we perform the calculations separately for three reference cohorts with birth years 1910, 1920, and 1930, respectively. The age-speciﬁc birth rates for these cohorts that we have used for the simulations are shown in Table 13.3.2-2. For birth cohorts 1910 and 1920, the rates refer to marital birth and are calculated from the 1 % percent subsample of the 1970 census. For birth cohort 1930 they are taken from oﬃcial period statistics (Fachserie 1, Reihe 1, 1999, p. 198) and refer to all births. This diﬀerence may be neglected, however, because in the simulation the age-speciﬁc rates are only used to provide a standard shape for the timing of childbearing. In order to calculate hypothetical birth rates with formula (13.3.1), one only needs to use the appropriate completed cohort birth rates as shown in the last row of Table 13.3.2-2. 4. So we ﬁnally have at least some approximations for all values required to calculate hypothetical developments of births with formula (13.3.2). The result is shown in Figure 13.3.2-1. The solid line shows the actual number of births in the period 1950 –1970.10 The dotted lines show corresponding hypothetical developments. Since the calculation is based on a 1 % percent subsample of the 1970 census, the simulated ﬁgures have been multiplied by 100 in order to make the hypothetical developments roughly comparable with the actual development. However, regardless of the exact level, it is
10 Data

clearly seen that the development of birth would have been quite diﬀerent without the changes in the timing of births which actually occurred. In fact, the plot suggests that the baby boom that occurred in the period 1955 – 65 can mainly be attributed to changes in the timing of childbearing. It is also remarkable that the hypothetical developments, on the whole, do not depend on the birth cohort that is used to provide a shape for the timing of births. This is consistent with the fact, discussed in Section 11.4, that substantial shifts of births towards younger ages occurred in cohorts with birth years roughly between 1930 and 1945.

13.3.3

Extending the Simulation Period

This section is not ﬁnished yet.

are taken from Table 6.3-1 in Section 6.3.

14.1

GERMAN LIFE HISTORY STUDY

225

Chapter 14

the general scientiﬁc public:2 a) Data from the ﬁrst survey (LV I) were sampled during the years 1981 – 83 and included 2171 members of the birth cohorts 1929 – 31, 1939 – 41, and 1949 – 51. b) Data from a second survey (LV II) were sampled in two parts, both relating to persons born in the years 1919 – 21; a ﬁrst part was conducted in 1985 – 86 and included 407 persons (LV IIA), a second part was conducted in 1987 – 88 and included 1005 persons (LV IIT). c) Data from a third survey (LV III) were sampled in 1989 and included 2008 members of the birth cohorts 1954 – 56 and 1959 – 61. All surveys were conducted in the territory of the former FRG. For our present study we take into account all female respondents from the surveys LV I, LV IIT, and LV III (only cohort 1959 – 61). The case numbers and how they distribute over the ﬁve cohorts is shown in the following table:3
Birth cohort Birth years 1919 − 21 1929 − 31 1939 − 41 1949 − 51 1959 − 61 Male 373 349 375 365 512 Female 632 359 355 368 489 Interview date 1987 − 88 1981 − 83 1981 − 83 1981 − 83 1989

Data from Non-oﬃcial Surveys
As was mentioned in Section 11.3, if one wants to investigate the timing and distribution of birth events, data from oﬃcial statistics are of only limited use. A closer investigation requires data which allow to relate birth events to women’s life courses. Such data can be gathered with retrospective surveys in which women are asked about the birth dates of their children. One example, a subsample of the 1970 census, has been discussed in the two preceeding chapters. In addition, several non-oﬃcial surveys are available that provide data on childbearing histories.1 In the present chapter we consider data from the following non-oﬃcial surveys that, in particular, provide information about number and birth dates of children: • the German Life History Study (GLHS), • the Socio-economic Panel (SOEP), • the Fertility and Family Survey (FFS), and • the DJI Family Survey (DJIFS). The main questions to be discussed in the present chapter concern age at ﬁrst childbearing, the proportion of childless women, and the distribution of the number of children. We also calculate cumulated cohort birth rates to allow comparisons with data from oﬃcial statistics.

C20 C30 C40 C50 C60

We also mention that all members of our subsample have a German citizenship. — In the remainder of this section we use this data set to investigate changes in the distribution of ages at ﬁrst childbearing and the number of children across the ﬁve birth cohorts.4 Age at First Childbearing 2. Denoting our subsample of the GLHS by Ω, we can deﬁne a threedimensional variable ˜ ˜ ˜ (C, T, D) : Ω −→ C × T × D an overview, see Wagner (1996). The data are available from the Zentralarchiv f¨r empirische Sozialforschung (K¨ln). We thank Karl Ulrich Mayer, the director of u o the GLHS, for the permission to use the data sets.
3 Of the 632 women of birth cohort C20 three did not give valid birth years for their children and will be excluded in further calculations. 4 We mention that the GLHS data have already been used in quite a large number of earlier studies. Concerning the questions of the present section, see, in particular, Huinink (1987, 1988, 1989), Blossfeld and Huinink (1989), Tuma and Huinink (1990). 2 For

14.1

German Life History Study

1. The German Life History Study (GLHS) is a long-term project conducted by the Max Planck Institute for Human Development (Berlin). The main data source of this project is a series of retrospective surveys in which members of selected birth cohorts were asked to provide detailed information about their life courses. Part of these data are available for speak of non-oﬃcial surveys in order to signify that these surveys are conducted, not by oﬃcial statistics, but by a variety of institutions of social research. Additional diﬀerences depend on circumstances. Most often the sample size of non-oﬃcial surveys is much smaller than the sample size of oﬃcial surveys. Furthermore, while some oﬃcial surveys (e.g., the Mikrozensus) are based on an obligation to give information, participation in non-oﬃcial surveys is always a matter of free decision. Consequently, there is often a substantial proportion of non-respondents in non-oﬃcial surveys; see, e.g., Porst (1996).
1 We

226

14

DATA FROM NON-OFFICIAL SURVEYS

14.1

GERMAN LIFE HISTORY STUDY

227

Table 14.1-1 Age at ﬁrst childbearing in our GLHS subsample.
C20 τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 50 51 52 53 66 67 68 69 Total d=1 d=0 1 2 11 21 28 37 60 60 68 41 35 28 28 22 15 15 7 10 7 5 7 5 3 1 3 C30 d=1 d=0 C40 d=1 d=0 C50 d=1 d=0 3 4 10 30 34 36 26 22 16 13 20 22 14 12 13 3 3 1 C60 d=1 d=0 1 3 6 10 16 14 22 24 17 21 27 26 35 27 8 1

1 5 16 23 23 29 22 21 32 32 31 16 17 16 7 11 5 8 3 1 1

5 15 21 21 25 36 40 37 21 23 13 21 6 10 5 8 3 2 2 2

age of ﬁrst childbearing (if T (ω) = 1) or the age in the interview year (if T (ω) = 0). The distribution of this three-dimensional variable, in terms of absolute frequency, is shown in Table 14.1-1.6 For example, there are 68 women in birth cohort C20 who gave birth to a ﬁrst child at age 24, 41 at age 25, and so on. In total, 520 women of this birth cohort had at least one child, and 109 remained childless. 3. The data from Table 14.1-1 can be used to estimate distributions of the age at ﬁrst childbearing. We use the formal framework introduced in Chapter 12 and refer to a duration variable ˜ ˆ Tc : Ωc −→ T := {0, 1, 2, 3, . . .} where the index c speciﬁes one of the birth cohorts in our sample. Since each birth cohort comprises three birth years, and the interviews extend over up to three years, also the censoring times extend over several ages. However, as seen from Table 14.1-1, for birth cohorts C20, C30, and C40, censoring only occurs after the last observed event (ﬁrst childbearing). For these birth cohorts, the data can therefore directly be used to calculate a ˆ frequency distribution of Tc : ˆ P[Tc ](τ ) = | {ω ∈ Ωc | T (ω) = τ } | | Ωc | 41 = 0.065 629

21 22 35 8

12 85 80 54

For example, referring to birth cohort C20, one immediately ﬁnds ˆ P[TC20 ](25) =
7 11 13 8

1

9 16 7 6 4 47 34 24 520 109 321 38 316 39 282 86 258 231

that is, 6.5 % of the members of C20 gave birth to a ﬁrst child at age 25. These values can then be used for the calculation of distribution functions, survivor functions, and rate functions. 4. The situation is slightly diﬀerent for birth cohorts C50 and C60 where event times and censoring times overlap in some years. To illustrate, we refer to birth cohort C50. Obviously, for ages under 30, one can calculate frequencies directly. For example, for τ = 25, one gets ˆ P[TC50 ](25) = 13 = 0.035 368

˜ C, with property space C := {C20, . . . , C60}, records the birth cohort; ˜ := {0, 1}, records whether a women has given D, with property space D ˜ birth to at least one child;5 and T , with property space T := {0, 1, 2, . . .}, records the age of the women which, depending on the value of D, is the
GLHS allows to distinguish women’s own children, step children, and adoptive children. For the present investigation we only take into account women’s own children.
5 The

However, this direct calculation is no longer possible for ages τ ≥ 30. We therefore use the Kaplan-Meier procedure introduced in Section 8.3.4. Table 14.1-2 illustrates the calculations for birth cohort C50. Notice that, until age 29, results are identical with those from a direct calculation of frequencies.
6 Note that the ages are not contiguous because the table refers to the realized property spaces. Note also that birth cohort C20 only contains 629 members because we have excluded three cases with unknown birth years of children.

228

14

DATA FROM NON-OFFICIAL SURVEYS

14.1

GERMAN LIFE HISTORY STUDY

229

Table 14.1-2 Kaplan-Meier procedure to calculate the survivor function for the age at ﬁrst childbearing. Data refer to cohort C50 in Table 14.1-1. τ 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 at risk 368 365 361 351 321 287 251 225 203 187 174 154 132 118 106 72 47 9 events 3 4 10 30 34 36 26 22 16 13 20 22 14 12 13 3 3 1 censored 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 22 35 8 rate 0.0082 0.0110 0.0277 0.0855 0.1059 0.1254 0.1036 0.0978 0.0788 0.0695 0.1149 0.1429 0.1061 0.1017 0.1226 0.0417 0.0638 0.1111 1 – rate 0.9918 0.9890 0.9723 0.9145 0.8941 0.8746 0.8964 0.9022 0.9212 0.9305 0.8851 0.8571 0.8939 0.8983 0.8774 0.9583 0.9362 0.8889 survivor function 1.0000 0.9918 0.9809 0.9537 0.8722 0.7798 0.6820 0.6114 0.5516 0.5081 0.4728 0.4185 0.3587 0.3206 0.2880 0.2527 0.2422 0.2267 0.2015

1

0.9

0.8

0.7

0.6

0.5

5. The survivor functions for all ﬁve birth cohorts are shown in Figure 14.1-1. Several points are remarkable. a) Until an age of about 27, the distribution for cohort C30 is quite similar to the distribution for cohort C20. After this age, that is, beginning at the end of the nineteen-ﬁfties, a substantially greater proportion of the women belong to cohort C30 give birth to a child. Eventually, the proportion of childless women is quite smaller in C30 than in C20. b) Compared with C30, members of birth cohort C40 begin childbearing at younger ages, but overall, both distributions are quite similar. In particular, in both cohorts, a high proportion of women, about 90 %, have at least one child. c) Like the members of C40, also the members of C50 begin childbearing at younger ages. However, beginning in the mid-sixties, birth rates begin to decline, and it might be supposed that the proportion of women who eventually remain childless will be substantially greater than it was in the two preceeding cohorts. d) Finally, members of birth cohort C60 delay the birth of a ﬁrst child, and although the data do not allow deﬁnite conclusions, it seems quite possible that the proportion of ﬁnally childless women will again be greater than in the preceeding cohorts.
0.4

C60

0.3

0.2

C50

C20 C30

0.1

C40

0 15 20 25 30 35 40 45

Fig. 14.1-1 Distribution of age at ﬁrst childbearing described by survivor functions, calculated from the data in Table 14.1-1.

230

14

DATA FROM NON-OFFICIAL SURVEYS

14.1

GERMAN LIFE HISTORY STUDY

231

C30 2 C40 C20 1.5

Table 14.1-3 Number of children in the GLHS subsample, classiﬁed with respect to mother’s birth cohort and age (τ ). τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Total C20 1 2 11 25 34 44 72 80 105 77 68 76 67 54 57 64 41 41 42 36 39 29 19 18 10 10 3 2 3 1 1 1132 C30 C40 C50 3 4 11 30 40 48 40 44 32 37 42 50 41 35 37 15 9 1 C60 1 3 6 11 17 18 26 33 27 34 56 44 67 46 13 2

C50

1 C60 0.5

0 15 20 25 30 35 40 45 50

Fig. 14.1-2 Cumulated cohort birth rates calculated from the data in Table 14.1-3.

The results can be compared with the distribution of age at ﬁrst marital childbearing. This was already done in Section 12.2.2. Number of Children 6. The next step is to investigate the number of children born of women in the GLHS subsample. We begin with the calculation of cumulated cohort birth rates. Table 14.1-3 shows the data. For example, 44 women belonging to birth cohort C20 have given birth to a child at an age of 21. The data can be used to calculate cumulated cohort births rates. The following table shows these rates, denoted by CCBR(τ ), until age τ as speciﬁed in the second column:
Cohort C20 C30 C40 C50 C60 τ 45 43 40 31 29 CCBR(τ ) 1.80 2.19 1.99 1.38 0.82 CCBR∗ (τ ) 2.15 1.96 1.39 0.99

1 5 16 26 26 44 37 44 49 53 66 48 62 63 35 43 35 33 26 22 17 15 7 2 7 5 1 1

5 16 22 34 33 48 60 67 64 56 56 54 35 39 23 23 22 13 14 11 5 1 4 2 2

789

709

519

404

hort birth rates calculated from oﬃcial statistics.7 Except for the youngest cohort, the rates are surprisingly similar. The diﬀerence for the youngest cohort is possibly due to the fact that the oﬃcial statistics also includes births of immigrants. 7. Figure 14.1-2 presents a graphical view of the cumulated cohort birth rates. It is remarkable that we do not ﬁnd a simple relationship between age at ﬁrst childbearing and completed cohort birth rates. This can be seen, for example, by comparing cohorts C30 and C40. Although members of C40 begin childbearing at younger ages, compared with members of C30
7 These are mean values of the year-speciﬁc rates published in Fachserie 1, Reihe 1 (1999, p. 198 -200). No oﬃcial data are available for C20; for C30, the mean value refers to the years 1930 and 1931.

The ﬁnal column, labeled CCBR∗ (τ ), shows corresponding cumulated co-

232

14

DATA FROM NON-OFFICIAL SURVEYS

14.2

SOCIO-ECONOMIC PANEL

233

Table 14.1-4 Number of women with 0, 1, 2, 3, 4, and 5 or more children, calculated from the data in the GLHS subsample. Percentage values relate to all women in each of the cohorts who have at least one child. Percentage values in brackets provide the proportion of ﬁnally childless women.
C20 Children 0 1 2 3 4 ≥5 N 109 185 168 104 40 23 % (17) 35.6 32.3 20.0 7.7 4.4 N 38 75 126 61 36 23 C30 % (11) 23.4 39.3 19.0 11.2 7.2 N 39 78 139 64 23 12 C40 % (11) 24.7 44.0 20.3 7.3 3.8 N 86 106 134 30 7 5 C50 % 37.6 47.5 10.6 2.5 1.8 C60 N 231 145 86 21 6 % 56.2 33.3 8.1 2.3

birth cohorts. We already know from oﬃcial statistics that the completed cohort birth rates continued to decline at least until birth cohort C60 (see Section 11.4). However, the data in our GLHS subsample do not allow to identify the changes in the distribution of children from which this tendency results.

14.2

Socio-economic Panel

(see Figure 14.1-1), the completed cohort birth rate is lower for C40 than for C30. Of course, a delay of childbearing might be accompanied by a decline in the total number of births; this will probably be true for cohort C60. However, a decline of birth rates can not be explained by simply referring to changes in the distribution of ages at ﬁrst childbearing.8 8. Cumulated and completed cohort birth rates provide information about the total number of children born, but not about the distribution of the number of children. So we should ﬁnally also look at the number of births per women. The data are shown in Table 14.1-4. Since members of birth cohorts C50 and C60 have not reached the end of the reproductive period by the time when the interviews were performed, an interpretation should be conﬁned to the cohorts C20, C30, and C40. a) Compared with C20, more women of C30 gave birth to at least one child. Moreover, the proportion of women with only one child declined, resulting in an increase of the mean number of children per women, from 2.2 in C20 to 2.5 in C30. The substantial increase in the completed cohort birth rate is therefore a result of both, the decline in the proportion of childless women and the increase in the mean number of children per women. b) The proportion of childless women in C40 remains roughly the same as it was in C30. There is, however, a tendency to reduce the number of children per women. In particular, the proportion of women with four or more children declines while the proportion of women with two children increases. The result is a decline of the mean number of children per women, from 2.5 in C30 to 2.3 in C40, and consequently also a decline in the completed cohort birth rate. It remains to be investigated how these tendencies continued in younger
8 See

1. Our second data source is the Socio-economic Panel (SOEP), already introduced in Section 8.4. In the present section we discuss data from the second wave (1985), in which participants were asked about children and their birth dates.9 Our data set will be conﬁned to women who belong to the subsample A of the SOEP which are mainly persons with a German citizenship.10 In total, 4353 women with birth years from 1892 to 1968 participated in this subsample. For the data set to be used in the present section we take into account all of these women who are born not earlier than 1908 and not later than 1957. The resulting number of 3203 women is partitioned into 5-year birth cohorts as shown in the following table:
Birth cohort C10 C15 C20 C25 C30 C35 C40 C45 C50 C55 Birth years 1908 − 12 1913 − 17 1918 − 22 1923 − 27 1928 − 32 1933 − 37 1938 − 42 1943 − 47 1948 − 52 1953 − 57 Number of women 209 189 263 322 325 338 439 339 395 384

2. As was done in the previous section, we begin with an investigation of the distribution of the age at ﬁrst childbearing. Data are shown in Tables 14.2-1a and 14.2-1b. As in Table 14.1-1, columns labeled d = 1 provide numbers of women who have given birth to a ﬁrst child at the corresponding age, and columns labeled d = 0 provide numbers of women who remained childless until the interview date. The survivor functions that can be calculated from these data are shown in Figure 14.2-1.11 3. Before any interpretations, the results should be compared with those
9 For

an earlier analysis of these data see Klein (1989).

10 This

is done in order to make our subsample comparable with the other surveys to be discussed in this chapter. This selection also allows to ignore sampling weights. the distributions for C25 and C30 are very similar we have omitted C25.

also the discussion in Section 12.2.5.

11 Since

234

14

DATA FROM NON-OFFICIAL SURVEYS

14.2

SOCIO-ECONOMIC PANEL

235

Table 14.2-1a Age at ﬁrst childbearing in our SOEP subsample. τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 Total C10 d=1 d=0 1 2 4 2 3 11 7 10 16 8 13 14 16 14 11 7 3 4 4 3 1 1 3 1 1 C15 d=1 d=0 C20 d=1 d=0 C25 d=1 d=0 C30 d=1 d=0 1 1 3 5 8 15 9 12 19 11 12 11 15 6 5 6 2 2 9 2 1 1 3 4 8 9 17 18 16 20 15 18 16 10 12 14 10 7 4 5 2 3 3 5 1 1 2 4 7 9 18 17 32 27 21 23 23 21 16 13 12 7 2 3 2 3 4 1 1 4 2 1 1 4 7 13 21 25 23 30 22 25 15 20 18 12 5 6 8 6 3 5 2 1 2 1 4 14 11 12 8 9 11 12 7 8 8 11 12 9 2 3 7 6 10 8 10 9 8 14 8 160 49 155 34 221 42 275 47 276 49

Table 14.2-1b Age at ﬁrst childbearing in our SOEP subsample.
C35 τ 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Total d=1 d=0 2 9 18 24 23 24 26 35 24 21 18 16 14 16 7 7 3 2 1 5 1 1 1 5 9 21 29 33 34 37 45 41 32 26 16 11 17 8 6 5 5 4 4 2 1 1 2 C40 d=1 d=0 C45 d=1 d=0 4 12 19 28 39 39 32 24 25 12 23 7 13 6 6 4 5 2 2 4 1 2 5 6 5 8 6 1 8 20 32 35 33 29 25 22 22 21 21 21 9 14 6 7 2 2 20 11 15 11 8 C50 d=1 d=0 C55 d=1 d=0 2 7 8 19 20 22 20 27 21 27 33 20 12 8 4 5

27 33 30 24 15

10 9 10 7 8 8 7 13 5 8 297 41 395 44 309 30 330 65 255 129

from the GLHS data discussed in the previous section. This can be done for cohorts C20, C30, C40, and C50, as shown in Figure 14.2-2. It comes without surprise that the survivor functions describing the distribution of ages at ﬁrst childbearing are not identical. Since the data result from surveys and the sampled cohort sizes are small, one might have expected even greater diﬀerences. In particular for cohorts C20 and C40, both data sets provide essentially the same estimates of the proportion of ﬁnally childless women, about 16 % for C20 and 10 % for C40. An exception is

236
1

14

DATA FROM NON-OFFICIAL SURVEYS

14.2

SOCIO-ECONOMIC PANEL

237
C20
1

1

C40

0.9

0.5

0.5

0

0 15 20 25 30 35 40 45 1 15 20 25 30 35 40 45

0.8
1

C30

C50

0.7
0.5 0.5

0.6

0 15 20 25 30 35 40 45

0 15 20 25 30 35 40 45

0.5

Fig. 14.2-2 Comparison of survivor functions for the age at ﬁrst childbearing estimated, respectively, with SOEP data (solid line) and GLHS data (dotted line).

0.4

the cohort C30. As will be seen below, based on a comparison of cumulated cohort birth rates, the data from the GLHS are probably more reliable than the SOEP data for this cohort. 4. Given that the data sets provide comparable results, Figure 14.2-1 can be used to supplement some conclusions already drawn from Figure 14.1-1. The most remarkable point is that the tendency to begin childbearing at younger ages already began with birth cohort following C10. Compared with this cohort, already women belonging to C15 had their ﬁrst child at younger ages. It is also seen that this tendency holds at least until birth cohort C45, roughly corresponding to the end of the baby boom in the mid-sixties. 5. Further information can also be gained about the proportion of ﬁnally childless women. The following table summarizes the results from the GLHS and SOEP data:
C10 C15 18 C20 17 16 C25 15 C30 11 15 C35 12 C40 11 10 C45 ≤9

0.3

C55

0.2 C50 0.1

C10 C15 C20

C35 C40 C45

C30

0 15 20 25 30 35 40 45

GLHS SOEP

23

Fig. 14.2-1 Distribution of age at ﬁrst childbearing described by survivor functions, based on the data in Tables 14.2-2a and 14.2-2b.

Leaving aside the SOEP result for C30, the ﬁgures indicate a long-term

238

14

DATA FROM NON-OFFICIAL SURVEYS

14.2

SOCIO-ECONOMIC PANEL

239

Table 14.2-2 Number of children in the SOEP subsample, classiﬁed with respect to mother’s birth cohort and age (τ ). τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Total C10 1 2 4 2 5 15 7 16 24 20 26 27 30 38 22 26 17 16 19 15 12 13 11 10 7 5 2 4 1 1 1 C15 C20 C25 C30 1 1 3 5 9 20 18 22 29 24 21 21 24 21 17 17 12 15 15 14 7 9 9 5 11 4 2 3 6 8 11 21 24 27 32 23 35 34 33 35 27 29 21 20 24 9 19 14 12 10 7 7 2 3 1 1 1 2 4 7 11 23 22 38 40 42 43 51 43 46 49 44 48 27 30 23 21 20 10 11 18 6 6 4 1 2 1 4 7 16 24 34 36 47 44 48 40 39 46 39 35 20 27 22 26 23 11 6 11 12 1 3 3 1 1 2 9 18 28 33 40 42 63 57 50 43 49 50 64 27 26 36 27 18 16 10 5 8 3 2 2 1 1 6 11 23 42 52 54 61 75 79 75 68 60 52 53 41 20 25 17 15 10 13 8 4 5 3 1 4 13 24 38 47 53 57 52 61 48 46 19 35 19 17 20 14 13 9 14 11 7 4 1 1 8 23 35 44 40 45 44 41 41 52 45 55 29 40 26 29 15 6 8 3 1 2 7 8 20 22 26 31 40 38 51 65 43 41 24 14 12 1 C35 C40 C45 C50 C55

Table 14.2-3 Cumulated cohort birth rates up to an age of τ , calculated from SOEP data (CCBRs (τ )), from GLHS data (CCBRg (τ )), and from oﬃcial statistics (CCBR∗ (τ )). Fachserie 1, Reihe 1, 1999 (pp. 198-200).
Birth cohort C10 C15 C20 C25 C30 C35 C40 C45 C50 C55 C60 τ 45 45 45 45 43 43 40 38 31 29 29

CCBRs (τ )
1.90 1.88 1.89 2.15 1.93 2.15 1.98 1.83 1.44 1.09

CCBRg (τ )

CCBR∗ (τ )

1.80 2.19 1.99 1.38 0.82 2.15 2.18 1.96 1.75 1.39 1.09 0.99

6. The next step is to investigate the number of children that were born of women in our SOEP subsample. We begin with the calculation of cumulated cohort birth rates. Table 14.2-2 shows the data and is organized in the same way as Table 14.1-2. With the exception of cohort C30, plotting the cumulated cohort birth rates would show mainly the same cross-cohort changes as have been visible in Figure 14.1-2. We therefore only compare the cumulated cohort birth rates up to some higher ages as shown in Table 14.2-3. Also shown are comparable rates calculated from oﬃcial statistics.12 The comparison suggests that the SOEP data for birth cohort C30 are, in fact, somewhat exceptional and that, for this cohort, the GLHS data might be more reliable. However, more interesting is the additional information that can be gained for birth cohorts born before 1930. Since oﬃcial statistics only allows to calculate completed cohort birth rates beginning with birth year 1930, one might easily get the impression of a long-term decline of these rates that began with birth cohorts following C35 (see Section 11.4). Quite to the contrary, our survey data suggest that the birth rates of cohorts with birth years roughly between 1925 and 1935 were exceptional high. 7. Finally, we can distinguish women with regard to parity. Results from the SOEP subsample are shown in Table 14.2-4. In the same way as was done in Table 14.1-4 in the previous section, the lower panel of Table 14.2-4 shows the distribution of parities in subsets of women having at least one child. This allows to separate the parity distribution from eﬀects that result from a changing proportion of ﬁnally childless women. As an example, we consider the proportion of women having four or more children. How this proportion developed is shown graphically in Figure
12 These

1 399 356 499 692 628 729 874 626 631 445

decrease in the proportion of ﬁnally childless women at least until birth cohort C45. Of course, in interpretating these ﬁgures one has to consider the fact that the data result from retrospective surveys and consequently only provide information about women who survived the interview dates in the 1980 s. The proportions of ﬁnally childless women would presumably quite higher if related to all women of the respective birth cohorts.

rates are calculated as mean values for 3-year periods in the same way as was explained in the previous section.

240

14

DATA FROM NON-OFFICIAL SURVEYS

14.3

FERTILITY AND FAMILY SURVEY

241

Table 14.2-4 Upper panel: Number of women with 0, 1, 2, 3, 4, and 5 or more children, calculated from the data in the SOEP subsample. Lower panel: Percentage values relating to all women in each of the cohorts who have at least one child.
Children 0 1 2 3 4 5+ Children 1 2 3 4 5+ C10 49 46 55 30 14 15 C10 28.8 34.4 18.8 8.8 9.4 C15 34 41 66 23 16 9 C15 26.5 42.6 14.8 10.3 5.8 C20 42 79 71 41 15 15 C20 35.7 32.1 18.6 6.8 6.8 C25 47 74 95 52 25 29 C25 26.9 34.5 18.9 9.1 10.5 C30 49 80 106 47 28 15 C30 29.0 38.4 17.0 10.1 5.4 C35 41 59 129 62 26 21 C35 19.9 43.4 20.9 8.8 7.1 C40 44 101 185 62 31 16 C40 25.6 46.8 15.7 7.8 4.1 C45 30 89 146 57 12 5 C45 28.8 47.2 18.4 3.9 1.6 C50 65 99 173 48 8 2 C50 C55 129 106 115 27 7 C55

decline only began roughly at the time when the baby boom ended in the second half of the 1960 s.

14.3

Fertility and Family Survey

1. Even if surveys refer to the same region and historical period they are likely to provide more or less diﬀerent data. So it is always a good idea to consider all possibly informative data sources and compare the information. In the present section we use data from the German part of the Fertility and Family Survey (FFS). The FFS project was initiated by the Population Activities Unit (PAU) of the United Nations Economic Commission for Europe (UNECE) in order to conduct comparable Fertility and Family Surveys in about 20 ECE member countries.13 The German FFS was conducted by the Bundesinstitut f¨r Bev¨lkerungsforschung (BiB, u o Wiesbaden) in 1992.14 While several studies using these data have already been performed and published,15 the data set is now generally available for scientiﬁc research.16 2. The sampling design intended to get data from 10000 persons, 5000 in the territory of the former FRG (“West”) and 5000 in the territory of the former GDR (“East”). In both territories, 3000 women and 2000 men of age 20 to 39, having a German citizenship, should be included.17 The ﬁeld work was done during the period May to September in 1992 using a random route method to select persons for the survey. The ﬁnal sample includes data from interviews with 10012 persons. The number of male and female sample members in both regions of Germany is shown in the left part of the following table:
Region All sample members Male Female 2024 1992 3012 2984 With valid birth year Male Female 2016 1982 3005 2971

20

10

0 1900

Birth cohort 1910 1920 1930 1940 1950

West East

Fig. 14.2-3 Percentages of women with four or more children belonging to birth cohorts C10, . . . , C45; calculated from SOEP data (solid line) and from GLHS data (dotted line).

Since for 38 persons neither a valid birth year nor a valid age at the time of the interview is known, the number of cases reduces as shown in the right part of the table. All remaining persons are born between 1952 and
13 See

14.2-3. The solid line connects ﬁgures calculated from the SOEP data, the dotted line connects comparable ﬁgres from the GLHS. Again, the value for the C30 cohort in the SOEP data should be considered as exceptional. However, the remarkable result is that we do not ﬁnd a continuous longterm decline in the proportion of women with four or more children. To the contrary, an initial decline was superseded by rising proportions in birth cohorts with birth years roughly between 1920 and 1930. A repeated

www.unece.org/ead/pau/ffs/. Festy and Prioux (2002) provide an overview and evaluation.

14 The

basic data documentation is by Pohl (1995). For additional information see the homepage of the BiB: www.bib-demographie.de. (1998), Roloﬀ and Dorbritz (1999).

15 Hullen 16 We

thank Gert Hullen (BiB) who provided us with a copy of the data. The data set is also available from the Zentralarchiv f¨r empirische Sozialforschung (K¨ln). u o more details on the sampling design see Pohl (1995, pp. 7-8).

17 For

242

14

DATA FROM NON-OFFICIAL SURVEYS

14.3

FERTILITY AND FAMILY SURVEY

243

Table 14.3-1 Age at ﬁrst childbearing in the FFS subsample.
West τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Total C55 C60 C65 d=1 d=0 d=1 d=0 d=1 d=0 1 6 12 32 28 49 38 30 34 40 39 36 30 30 28 33 20 13 12 6 8 3 1 529 2 5 6 13 22 21 32 35 31 39 50 32 47 39 33 14 9 4 1 47 37 35 34 46 199 435 287 266 500 3 4 13 13 28 25 19 53 23 32 26 17 8 2 91 46 67 42 41 East C55 C60 C65 d=1 d=0 d=1 d=0 d=1 d=0 3 3 7 35 59 67 94 94 68 54 42 25 17 11 13 7 7 4 5 2 2 2 1 6 14 36 69 106 108 90 62 52 48 33 17 15 6 8 3 2 19 10 11 12 30 82 676 91 565 184 1 5 9 23 57 88 95 91 76 49 30 30 5 6 26 24 15 9 17

1
C55 (West)

0.5

SOEP

FFS

141 103 101 96 59

52 45 42 19 26

0 15 1
C60 (West)

20

25

30

35

40

45

0.5

GLHS FFS

621

1972. Since our interest concerns births we only consider female sample members. In order to allow comparisons with the GLHS and SOEP we only consider women who belong to one of the birth cohorts shown in the following table:
Birth cohort C55 C60 C65 Birth years 1953 − 57 1958 − 62 1963 − 67 West 728 723 768 East 704 767 751

0 15 20 25 30 35 40 45

Fig. 14.3-1 Comparison of survivor functions for the age at ﬁrst childbearing. FFS survivor functions are calculated from the data in Table 14.3-1. The SOEP and GLHS survivor functions are taken from Figures 14.1-1 and 14.2-1, respectively.

3. As was done in the previous sections, we begin with an investigation of the distribution of ages at ﬁrst childbearing. Table 14.3-1, organized in the same way as Tables 14.1-1 and 14.2-1, shows the data and can be used to calculate survivor functions.18 For the cohorts C55 (West) and
18 Like

C60 (West) they can be compared with corresponding survivor functions from the SOEP and GLHS respectively. As can be seen in Figure 14.3-1, the curves agree quite well. So we can turn to a comparison of all six age distributions that can be calculated with the data in Table 14.3-1. The result is shown in Figure 14.3-2. Quite remarkable is the diﬀerence between the distributions in both territories. In the former GDR, women began childbearing at substantially younger ages, and also the proportion considered women’s own children. One should note, however, that in a few cases no valid birth year for the ﬁrst child is available, the number of women referred to in Table 14.3-1 is therefore slightly smaller than in the table in paragraph 2.

the GLHS, also the FFS allows to distinguish women’s own children from step children and adoptive children. For creating the data in Table 14.3-1 we have only

244
1

14

DATA FROM NON-OFFICIAL SURVEYS

14.3

FERTILITY AND FAMILY SURVEY

245

Table 14.3-2 Number of children in the FFS subsample, classiﬁed with respect to mother’s birth cohort and age (τ ).
West East C65 4 6 13 13 38 29 31 69 48 60 47 32 13 9 C55 3 4 8 37 62 79 113 128 116 105 113 86 74 53 41 47 28 26 23 8 8 6 5 1 1174 1213 C60 1 7 14 39 75 119 143 142 127 112 111 110 81 63 36 29 19 8 3 1 C65 1 5 10 26 61 100 114 138 118 108 69 71 30 13 1

0.9

τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Total Total∗

C55 1 7 12 36 31 61 50 51 49 70 71 62 61 64 62 69 52 40 38 26 25 11 1 4 1 955 981

C60 2 5 6 17 23 27 42 58 55 60 75 62 85 84 70 56 38 17 13 2

0.8

0.7

0.6

C65 (West)

0.5

0.4 C60 (West)

797 826

412 429

1240 1253

865 887

0.3 C55 (West) C65 (East)

0.2

of childless women was much smaller than in the former FRG. Furthermore, the distribution is quite similar for all three cohorts. In contrast, the tendency of delaying childbearing into older ages continues in the western part of Germany. Of course, at least for the birth cohorts C60 and C65, the data to not allow to reliably estimate the proportion of eventually childless women. 4. We now turn to the number of children and begin with cumulated cohort birth rates. The data are shown in Table 14.3-2. As in Table 14.3-1, we have only considered women’s own children. We also note that the FFS questionnaire only asked for birth years of up to four children. However, the number of women with more than four children is quite small (seven women have ﬁve, and four women have six children). More important is the number of cases where, for one or more children, there is no valid birth year. This is documented in the last two rows of Table 14.3-2. The row labeled Total∗ has been calculated from women’s report on the total number of their own children, so that the diﬀerence between both rows

0.1

C55 (East) C60 (East)

0 15 20 25 30 35 40 45

Fig. 14.3-2 Distribution of age at ﬁrst childbearing described by survivor functions, calculated from the data in Table 14.3-2.

246
1.5

14

DATA FROM NON-OFFICIAL SURVEYS

14.4

DJI FAMILY SURVEYS

247

14.4
C55

DJI Family Surveys

1

C60

0.5

C65

1. A further source of information about childbearing histories in Germany is a series of surveys conducted by the Deutsches Familieninstitut (DJI, M¨nchen). Data sets are available from the Zentralarchiv f¨r empirische u u Sozialforschung (K¨ln). In the present section we use data from a survey o conducted in the territory of the former FRG in 1988. The sample refers to persons with a German citizenship who, at the interview date in 1988, lived in private households and were between 18 and 55 years old.19 The ﬁnal sample size is 10043, 4554 men and 5489 women. The following table shows the distribution of birth years of the female participants:
Birth year 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 Number 130 121 129 153 143 139 149 142 123 129 148 137 96 Birth year 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 Number 96 128 165 143 158 173 172 153 166 185 157 182 180 Birth year 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 Number 161 169 158 158 165 179 145 147 112 118 114 66

0 1970

1980

1990

2000

Fig. 14.3-3 Cumulated cohort birth rates calculated from Table 14.3-2 for three cohorts in the western part of Germany (solid lines). The dotted lines show corresponding rates calculated from the SOEP (C55) and the GLHS (C60).

1.5

1

C55

0.5
C60 C65

For compatibility with the data discussed in previous sections we consider the following birth cohorts:
West East 1990 2000

Birth cohort C35 C40 C45 C50 C55 C60

Birth years 1933 − 1937 1938 − 1942 1943 − 1947 1948 − 1952 1953 − 1957 1958 − 1962

Number of women 676 682 605 811 843 826

0 1970

1980

Fig. 14.3-3 Cumulated cohort birth rates calculated from the data in Table 14.3-2 for the western part (solid lines) and the eastern part (dotted lines) of Germany.

amounts to the number of children without a valid birth year. However, the impact of these missing values on cumulated cohort birth rates is quite limited, and so the data can nevertheless be used for further investigation. Figure 14.3-3 shows these cumulated rates for the cohorts in the western part of Germany and, for cohorts C55 and C60, also provides a comparison with the results from the SOEP and the GLHS data, respectively. Figure 14.3-4 compares the rates between both territories.

Women born later than 1962 will not be considered because their age in 1988 does not allow any reliable conclusions about childbearing histories. 2. As was done in the previous sections, we begin with an investigation of ages at ﬁrst childbearing. This is easy because the data set already contains
19 For

a description of the sampling design see Alt (1991).

248

14

DATA FROM NON-OFFICIAL SURVEYS

14.4

DJI FAMILY SURVEYS

249
C35
DJI SOEP 1

Table 14.4-1 Age at ﬁrst childbearing in our DJI subsample.
C35 τ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 Total d=1 d=0 1 4 10 15 37 46 35 69 54 58 65 55 39 20 32 9 15 6 9 6 1 4 1 1 C40 d=1 d=0 2 1 10 12 25 51 53 65 68 59 46 50 31 29 28 14 18 7 5 4 6 3 2 1 1 3 1 1 1 10 15 15 28 18 11 22 14 15 18 594 80 595 86 526 79 711 99 671 172 521 305 C45 d=1 d=0 C50 d=1 d=0 C55 d=1 d=0 3 6 7 38 42 41 39 53 65 51 57 63 59 39 39 26 21 10 6 4 2 21 32 22 11 13 C60 d=1 d=0

1

C40
DJI GLHS SOEP

0.5

0.5

5 22 48 41 55 56 51 40 33 28 26 23 26 17 13 11 9 5 8 4 1

3 15 32 50 56 49 56 52 65 55 48 46 45 38 17 23 18 14 9 7 7 3 1 2 14 16 8 22 19

4 13 19 31 40 49 55 39 54 69 54 44 32 11 7 44 31 36 39 22

0 15 1 20 25 30 35 40 45

0 15 1 20 25 30 35 40 45

C45
DJI SOEP

C50
DJI GLHS SOEP

0.5

0.5

83 66 50 56 50

0 15 1 20 25 30 35 40 45

0 15 1 20 25 30 35 40 45

C55
DJI SOEP FFS

C60
DJI GLHS FFS

0.5

0.5

0 15 20 25 30 35 40 45

0 15 20 25 30 35 40 45

Fig. 14.4-1 Comparison of survivor functions for the age at ﬁrst childbearing for birth cohorts C35, C40, C45, C50, C55, and C60.

3 1

a variable providing the age of women at ﬁrst childbearing.20 Table 14.41 shows, separately for birth cohorts, how many women of speciﬁed age have given birth to a child (d = 1) or are censored at the interview date (d = 0).21 These data can be used to estimate survivor functions as in the previous sections. Figure 14.4-1 compares the survivor functions with estimates based on the GLHS SOEP and FFS data. For birth cohorts C35, C40, C45, and C50, the results are quite similar. Substantial diﬀerences only occur for the two younger cohorts, C55 and C60.

20 We

have used the SPSS ﬁle fall88.sav. The variable providing age at ﬁrst childbearing is F275 ALT. that for some birth cohorts the totals are slightly smaller than the number of cases tabulated in the preceeding paragraph because we have dropped cases with a reported age at ﬁrst childbearing below 15.

21 Notice

Chapter 15

Chapter 16

Birth Rates in East Germany
This chapter is not ﬁnished yet.

In- and Out-Migration
This chapter is not ﬁnished yet.

17.1

CONCEPTUAL FRAMEWORK

253

Chapter 17

An Analytical Modeling Approach
In the present chapter we begin with the discussion of an analytical model that can support modal reasoning about demographic processes. We begin with a version of the model that takes into account births and deaths but ignores migration. How to extend the model in order to include migration will be discussed in Section 18.6.

that begin in some arbitrary temporal location with an initial population Ω0 , here represented by the vector n0 . This requires the introduction of rules that can be used to derive n1 from n0 , n2 from n1 , and so on. Since we ignore migration (think of S as a closed region), it suﬃces to take into account birth and death events. However, only women can give birth to children, and so it is necessary to represent the process in the following way: nm 0 nf 0 −→ nm 1 −→ nf 1 −→ nm 2 −→ nf 2 −→ · · · −→ · · ·

17.1

Conceptual Framework

1. To introduce a conceptual framework for the model, we refer to a demographic process, (S, T ∗ , Ωt ), as discussed in Section 3.2. S provides the spatial context, T ∗ is the time axis, and Ωt represents the population living in the space S in the temporal location t ∈ T ∗ . The numbers of men and women in Ωt aged τ will be denoted by nm and nf , respectively; the t,τ t,τ total number of persons aged τ will be denoted by nt,τ := nm + nf . To t,τ t,τ simplify notations we will assume that age is measured in the same time units that are used in the deﬁnition of T ∗ . For example, if T ∗ refers to calendar years, it will be assumed that age is measured in completed years. We also assume a maximal age which will be denoted by τm .1 2. To formulate the model it is now helpful to use matrix notations.2 Classiﬁed by age, the male and female population will be represented, respectively, by the vectors   m   f nt,1 nt,1    .  . f nm :=  . (17.1.1)  and nt =  .  t . . nm m t,τ nf m t,τ

4. In order to formulate rules we use age-speciﬁc birth and death rates. Death rates for men and women at age τ in temporal location t will be denoted, respectively, by m δt,τ f and δt,τ

Given these rates, the number of men and women dying in t at age τ f m is δt,τ nm and δt,τ nf , respectively. Notice that the assumption of a t,τ t,τ f m maximal age τm implies that δt,τm = δt,τm = 1.
∗ 5. Age-speciﬁc birth rates will be denoted by βt,τ .3 In order to simplify the formulation of the model these rates will be interpreted as follows: ∗ βt,τ nf is the number of children, born of women at age τ in temporal t,τ location t, who survived the ﬁrst time unit and are consequently members of Ωt+1 . Of course, since only women can bear children, these birth rates need not be indexed with respect to sex. However, one has to take into account diﬀerences in the percentages of male and female births. We use σm and σf to denote the proportions (σm + σf = 1). Therefore, if nt+1,1 is the total number of children born in t, the number of male children is f nm t+1,1 = σm nt+1,1 and the number of female children is nt+1,1 = σf nt+1,1 . To ease notations, we assume that the sex ratio at birth is independent of mother’s age and constant over time.

In addition, we represent the total population by the vector nt := nm +nf . t t Notice that the count of vector elements begins with 1, not with 0, so that only persons who have reached an age of one time unit will be given an explicit representation. 3. The purpose of a demographic model is to provide a conceptual framework for thinking about possible developments of a population: n0 −→ n1 −→ n2 −→ · · · is not a serious limitation because τm can be given an arbitrarily high value; also, in practical applications, τm can be assumed to be an open-ended age class.
2 For a brief introduction to matrix notations and elementary rules see Rohwer and P¨tter (2002a, Appendix A). o 1 This

6. Since we only consider children who survived the ﬁrst time unit we also do not explicitly model death rates of children during the temporal location in which they are born. There is, however, a simple relationship ∗ between βt,τ and the birth rates βt,τ , introduced in Section 11.1:
∗ βt,τ = βt,τ (1 − δt,0 ) f m In this formulation, δt,0 = σm δt,0 + σf δt,0 is a weighted mean of the death rates of male and female children during their ﬁrst year of life.
3 We assume that these birth rates are deﬁned for all ages and have a value of zero at ages outside the reproductive period of women.

254

17

AN ANALYTICAL MODELING APPROACH

17.2

THE STABLE POPULATION

255

7. Assuming that birth and death rates are given, one can derive some elementary rules for the development of the population. First, the total number of children born in temporal location t and still alive in t + 1 can be derived from nf and the age-speciﬁc birth rates as follows: t τm 17.2

The Stable Population

nt+1,1 = τ =1

∗ βt,τ nf t,τ

Secondly, the relation between the number of men and women at ages τ ≥ 1 in two successive temporal locations can be derived from death rates: m nm +1 = (1 − δt,τ ) nm t+1,τ t,τ f f and nf t+1,τ +1 = (1 − δt,τ ) nt,τ

1. The model framework introduced in the previous section can be used to speculate about possible population developments. This, of course, requires additional assumptions about birth and death rates and how they change over time. The simplest assumption is that the rates are constant over time. This assumptions leads to the idea of a stable population, an idea ﬁrst developed by Alfred J. Lotka (1907, 1922). In the present section we illustrate the idea by an example; some mathematical details will be discussed in the next section. 2. To begin with, we distinguish between the size of a population and its age distribution. Changes of size can be described by growth rates. Denoting the size of the male and female population by nm := Στ nm and t t,τ nf := Στ nf , respectively, the growth rates are t t,τ ρm,t := nm − n m t+1 t nm t and ρf,t := nf − n f t t+1 nf t

Together, the three equations allow to derive nm and nf from nm and nf t 0 t 0 for all t > 0. Of course, this requires to think of the birth and death rates, and also the proportions of male and female births, as given and known parameters of the demographic process. 8. We now proceed with matrix notation. First, we deﬁne (τm , τm ) matrices   ∗ ∗ ∗ βt,1 βt,2 · · · βt,τm  0 0 ··· 0    Bt :=  . . .  . .   . . . . 0 0 ··· 0

The growth rate of the whole population, nt := nm +nf , is then a weighted t t mean, namely ρt = ρm,t nm + ρf,t nf t t nm + n f t t

which comprise the age-speciﬁc birth rates. The number of male and female children in t + 1 is then given, respectively, by  m   f  nt+1,1 nt+1,1     0 0     f f   = σm Bt nt and   = σf Bt nt . . . .     . . 0 0

3. We now assume that birth and death rates are constant over time. This implies that also the matrices Bt , Dm,t , and Df,t are independent of time and may simply be denoted by B, Dm , and Df . Using the deﬁnition F := Df + σf B, we get nf = F n f t t+1 (17.2.1)

Secondly, we deﬁne (τm , τm ) matrices Dm,t and Df,t which comprise the death rates of men and women:   0 0 ··· 0 0 m  1 − δt,1 0 ··· 0 0    m 0 1 − δt,2 · · · 0 0  Dm,t :=   . . . .  ..  . . . .  . . . . . m 0 0 · · · 1 − δt,τm −1 0

Using this equation and starting with an initial female population nf , we 0 may write: nf = Fnf , nf = Fnf = F2 nf , nf = Fnf = F3 nf 1 0 2 1 0 3 2 0 and so on. This leads to the general equation nf = F t nf t 0 (17.2.2)

f m Df,t is of the same form but has δt,τ instead of δt,τ . Using these matrices, the three equations derived in the previous paragraph can be written as

nm = Dm,t nm + σm Bt nf t+1 t t nf = Df,t nf + σf Bt nf t t t+1

(17.1.2)

which allows to calculate nf from a knowledge of the initial population nf t 0 and the matrix F. 4. To investigate the development of nf if nf and F are given, we begin t 0

256

17

AN ANALYTICAL MODELING APPROACH

17.3

MATHEMATICAL SUPPLEMENTS

257

Tab. 17.2-1 Development of nf in our example. t t 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 nf t1 1.00 1.60 1.22 1.62 1.51 1.70 1.75 1.87 1.98 2.09 2.21 2.33 2.47 2.61 2.76 2.92 3.08 3.26 3.45 3.64 3.85 nf t2 1.00 0.80 1.28 0.98 1.29 1.21 1.36 1.40 1.50 1.58 1.67 1.77 1.87 1.97 2.09 2.21 2.33 2.47 2.61 2.76 2.91 nf t3 1.00 0.70 0.56 0.90 0.68 0.91 0.85 0.95 0.98 1.05 1.11 1.17 1.24 1.31 1.38 1.46 1.54 1.63 1.73 1.83 1.93 nf t4 1.00 0.60 0.42 0.34 0.54 0.41 0.54 0.51 0.57 0.59 0.63 0.66 0.70 0.74 0.78 0.83 0.88 0.93 0.98 1.04 1.10 nf,p t1 0.25 0.43 0.35 0.42 0.38 0.40 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 0.39 nf,p t2 0.25 0.22 0.37 0.25 0.32 0.29 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 nf,p t3 0.25 0.19 0.16 0.23 0.17 0.21 0.19 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.20 nf,p t4 0.25 0.16 0.12 0.09 0.13 0.10 0.12 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 nf t 4.00 3.70 3.48 3.82 4.03 4.23 4.51 4.74 5.03 5.30 5.61 5.93 6.27 6.63 7.01 7.41 7.84 8.28 8.76 9.26 9.79 ρf,t -0.0750 -0.0595 0.0989 0.0531 0.0500 0.0658 0.0509 0.0613 0.0551 0.0583 0.0568 0.0574 0.0573 0.0572 0.0573 0.0572 0.0573 0.0573 0.0573 0.0573

5. A second result concerns the age distribution. This is seen if we explicitly distinguish between the population size and the age distribution. Since the size of the population is given by nf , the age distribution can be t represented by the vector nf,p := t 1 nf t nf t

whose components show the relative frequencies of persons in the age groups. As seen in Table 17.2-1, also these frequencies converge to some ﬁxed values, in our example: nf,p −→ nf,p ≈ (0.39, 0.30, 0.20, 0.11) t 6. To summarize the ﬁndings from this example, the demographic process eventually reaches some kind of equilibrium which is fully described by a time-independent growth rate, ρ∗ , and a time-independent age distribuf tion, nf,p . If this equilibrium is approximately reached in some temporal location t, then nf ≈ (1 + ρ∗ )k nf nf,p f t t+k (for k = 1, 2, 3, . . .)

ρ∗ is then called the intrinsic growth rate of the demographic process, and f nf,p is called its stable (female) age distribution.

with a small example. We assume that there are only four age groups (τm = 4), birth rates are given by
∗ ∗ ∗ ∗ β1 = 0, β2 = 2, β3 = 1.2, β4 = 0

17.3

Mathematical Supplements

and female death rates are given by f f f f δ1 = 0.2, δ2 = 0.3, δ3 = 0.4, δ4 = 1

We now discuss under which conditions intrinsic growth rates and stable age distributions do exist, and whether they depend on the initial population vector nf or only on the matrix F. 0 Existence of a Stable Population 1. We begin with the ﬁrst question, whether one can construct an intrinsic growth rate and a stable age distribution for some given matrix F. This depends on the coeﬃcients of F. As introduced in the previous section, F has the following structure:4   ∗ ∗ ∗ ∗ σf β2 · · · σf βτm −1 σf βτm σf β1   1 − δf 0 ··· 0 0 1   f   0 1 − δ2 · · · 0 0 F =   . . . .   .. . . . .   . . . . . f 0 0 0 · · · 1 − δτm −1
4 Matrices having this structure are often called Leslie matrices to remind of P. H. Leslie who has ﬁrst provided an extensive discussion with demographic applications, see Leslie (1945).

Now, assuming arbitrarily some initial female population nf = (1, 1, 1, 1) , 0 one can use equation (17.2.2) to calculate nf for all subsequent temporal t locations t > 0. Table 17.2-1 shows the result of the calculation for t = 1, . . . , 20. The total size of the female population is seen in the column labeled nf and its growth rate in the last column. Obviously, the growth t rates converge to a ﬁxed value, ρ∗ ≈ 5.73 %, in this example. This is the f ﬁrst remarkable result.

Furthermore, it will be assumed that the proportion of female births is σf = 0.5. From these assumptions one can calculate the matrix   0 1 0.6 0  0.8 0 0 0   F =   0 0.7 0 0  0 0.6 0 0

258

17

AN ANALYTICAL MODELING APPROACH

17.3

MATHEMATICAL SUPPLEMENTS

259

One can be sure that F ≥ 0, meaning that all coeﬃcients of F are nonf negative. One can also safely assume that 0 < δτ < 1, for τ = 1, . . . , τm −1, and consequently all entries in the subdiagonal of F are greater than zero. ∗ But a question concerns the birth rates βτ . Since the reproductive period ∗ of women is limited and, in general, τb < τm , we can assume that βτb > 0 ∗ but need to observe that βτ = 0 for τ > τb , implying that F has less than full rank. 2. We can proceed, however, in two steps. In a ﬁrst step we consider only the ﬁrst τb rows and and columns of F, that is, the matrix   ∗ ∗ ∗ ∗ σf β2 · · · σf βτb −1 σf βτb σf β1  1 − δf 0 ··· 0 0  1   f  ˜ 0 1 − δ2 · · · 0 0  F :=   . . . .   .. . . . .   . . . . . f 0 0 0 · · · 1 − δτb −1

From equation (17.3.1) and the structure of F it then follows that F nf,∗ = λ∗ nf,∗ = (1 + ρ∗ ) nf,∗ f (17.3.2)

showing that the age distribution which is represented by nf,∗ will not change when multiplied by F; all components of nf,∗ will grow, or shrink, with the same rate, ρ∗ . Therefore, to get the stable age distribution one f only has to transform nf,∗ into proper proportions: τm nf,p := nf,∗ τ τ j=1 nf,∗ j

˜ This is now a non-negative matrix which has full rank.5 Furthermore, F 6 is an irreducible matrix. This allows to apply a famous mathematical ˜ theorem by G. Frobenius.7 The theorem guarantees that F has at least ∗ one real positive eigenvalue, say λ , also called a dominant eigenvector ∗ ∗ ˜ of F, with a corresponding eigenvector, say v∗ = (v1 , . . . , vτb ) , whose coeﬃcients are all real and positive. So we can write the equation ˜ F v ∗ = λ∗ v ∗ (17.3.1)

4. To illustrate the argument we use the example of the previous section. ˜ In this example the matrix F is given by   0 1 0.6 ˜ 0  F =  0.8 0 0 0.7 0
TDA script:8 mdef(F,3,3) = 0.0,1.0,0.6, 0.8,0.0,0.0, 0.0,0.7,0.0; mev(F,ER,EI,EVR,EVI); mpr(ER); mpr(EI); mpr(EVR); mpr(EVI);

Calculating eigenvalues and eigenvectors can be done with the following

A further implication of the theorem that will be used below in the discus˜ sion of our second question is that all eigenvalues of F have an absolute value (modulus) which is less than, or equal to, λ∗ . 3. We can now derive a stable age distribution and an intrinsic growth rate. The intrinsic growth rate can be simply deﬁned by ρ∗ := λ∗ − 1. The f derivation of the stable age distribution is in two steps. In a ﬁrst step we deﬁne components of a vector nf,∗ by f,∗ nτ := ∗ vτ f 1−δτ −1 ∗ λ

One ﬁnds that the dominant eigenvalue is λ∗ = 1.0573 and the corresponding eigenvector is v∗ = (0.7405, 0.5603, 0.3710) The eigenvalue provides the intrinsic growth rate, ρ∗ = 0.0573, which is f identical with the value found in the previous section. The eigenvector can be used to calculate the components of nf,p : nf,∗ = 0.7405, nf,∗ = 0.5603, nf,∗ = 0.3710, and 1 2 3 nf,∗ = 4
8 More

for τ = 1, . . . , τb
∗ vτ −1

for τ = τb + 1, . . . , τm

5 This

˜ is seen by the determinant of F which is τb −1

∗ ˜ det(F) = ± σf βτb

τ =1

Y

f (1 − δτ ) = 0

The sign depends on whether τb is even or odd. this is meant that, for any two indices i and j (1 ≤ i < j ≤ τb ), one can ﬁnd further indices, say k1 , . . . , km , such that aik1 ak1 k2 · · · akm j > 0.
7 We 6 By

0.6 0.3710 = 0.2105 1.0573

refer to Gantmacher (1971, ch. xxiii).

detailed explanations of the practical calculations will be given in Section 17.5.1.

260

17

AN ANALYTICAL MODELING APPROACH

17.3

MATHEMATICAL SUPPLEMENTS

261

Of course, equation (17.3.2) does not change if nf,∗ is multiplied by an arbitrary scalar value. So we can rescale nf,∗ to get a frequency distribution with components adding to unity. The result is n f,p which, by deﬁning Λ := diag (λ1 , . . . , λτb ) and V := (v1 , . . . , vτb ), may also be written as a single matrix equation ˜ FV = VΛ ˜ As mentioned above, F has full rank and its eigenvectors are therefore linear independent. This implies that V is an invertible matrix and we ˜ may write F = VΛV−1 , from which it follows that ˜ Ft = VΛt V−1 This then allows to write ˜ nf,a = Ft nf,a = VΛt V−1 nf,a = VΛt u t 0 0 where, for the last equation, we have used the abbreviation u := V −1 nf,a . 0 In a next step this equation can be written in the following way:   t λ1 u1 τb   f,a . . (λt uj ) vj nt = (v1 , . . . , vτb )   = j . λt b uτ b τ j=1 = (0.39, 0.30, 0.20, 0.11)

and equals the age distribution found in the previous section. ˜ 5. It would suﬃce to calculate the dominant eigenvalue of F because the corresponding eigenvector, and consequently the stable age distribution, can be derived from the death rates. Let the dominant eigenvalue, λ∗ , be given. Since the corresponding eigenvector, v∗ , is determined only up to ∗ an arbitrary multiplicative factor, we can set v1 = 1. All further elements ∗ of v can be calculated recursively with the formula
∗ vτ f 1 − δτ −1 ∗ = vτ −1 λ∗

(for τ = 2, . . . , τm )

The argument also shows that, if λ∗ = 1, the stable age distribution depends only on the death rates, not on the birth rates. But, of course, λ∗ also depends on birth rates. Convergence to a Stable Age Distribution 6. We now turn to the second question, whether, beginning with an arbitrary initial female population nf , the sequence nf = Ft nf ﬁnally cont 0 0 verges to an equilibrium deﬁned by the intrinsic growth rate, ρ∗ , and the f stable age distribution, nf,p .9 As will be shown, the answer is positive under quite general conditions. To develop the argument, we ﬁrst consider ˜ the sub-matrix F which consists of the ﬁrst τb rows and columns of F. Correspondingly, we refer to the ﬁrst τb elements of nf by the vector nf,a . t t ˜ Since F is an upper block-diagonal matrix, it follows that ˜ nf,a = Ft nf,a t 0 (17.3.3)

˜ which shows that nf,a is a weighted mean of the eigenvectors of F. Finally, t dividing by λt ∗ , we get j 1 f,a n = λt ∗ t j τb j=1

λt j uj v j = u j ∗ v j ∗ + λt ∗ j

j=j ∗

λj λj ∗

t

uj v j

(17.3.4)

We now show that, given an additional assumption to be explained below, nf,a converges to a vector which is proportional to v ∗ , that is, the t ˜ eigenvector corresponding to the dominant eigenvalue of F. ˜ 7. This requires to refer to all eigenvalues of F which will be denoted by λj , with corresponding eigenvectors vj , for j = 1, . . . , τb . One of these eigenvalues, say λj ∗ = λ∗ , is the dominant one and has the corresponding eigenvector vj ∗ = v∗ . So we can write the equations ˜ Fvj = λj vj
9 It

8. This equation can be used to think about the convergence problem. From the theorem of Frobenius we already know that λj ∗ ≥ | λj | for all j = 1, . . . , τb . We now introduce a further assumption, to be discussed below, that λj ∗ > | λj | for all j = j ∗ . Given this assumption, it follows that the second term on the right-hand side of equation (17.3.4) will converge to zero and this, in turn, implies the convergence 1 f,a n −→ uj ∗ vj ∗ λt ∗ t j This shows that, for suﬃciently large t, nf,a ≈ λj ∗ nf,a t t+1 and nf,a will be approximately proportional to the eigenvector v ∗ . Moret over, also the remaining components of nf will converge to a stable age t distribution. This is seen from the fact that these remaining components only depend on the growth of the female population at age τb and the

(for j = 1, . . . , τb )

will be assumed that there is at least one woman of an age under, or equal to, τ b .

262

17

AN ANALYTICAL MODELING APPROACH

17.4

FEMALE AND MALE POPULATIONS

263

death rates at ages greater than, or equal to, τb . Therefore, if eventually the number of women at age τb grows, or shrinks, with a constant (intrinsic) rate, this will propagate to all higher ages. The stable age distribution for all ages may then be calculated as shown in the ﬁrst part of this section. ˜ 9. It remains to discuss the assumption that the dominant eigenvalue of F is greater, in magnitude, than all other eigenvalues. This is not necessarily the case. For example, the matrix ˜ F := 0 1 0.8 0

2. Since only women can bear children it is easy, however, to derive the development of the male population from the development of the female population. The argument goes in two steps. The ﬁrst step concerns newborn male children. As shown in Section 17.1, their number is given by τb nm t+1,1 = σm τ =τa

∗ βτ n f t,τ

has two real eigenvalues, 0.8944 and -0.8944, having the same magnitude. In this example, as shown by equation (17.3.4), nf,a will not converge to a t unique stable age distribution but oscillate between two diﬀerent distributions. Such cases are, however, exceptional. A suﬃcient condition for the existence of a dominant eigenvalue which is greater, in magnitude, than all other eigenvalues is that there are at least two successive ages with a positive birth rate.10 Therefore, cyclical solutions will only occur if one uses a highly aggregated Leslie matrix; for instance, a matrix that only distinguishes three age groups, below τa , between τa and τb , and above τb . If one distinguishes at least two age groups in the reproductive period one can safely assume the existence of a stable age distribution.

and therefore only depends on the number and age distribution of women in the reproductive period. Consequently, if the female population eventually has a stable age distribution and grows, or shrinks, with a constant rate ρ∗ , also the number of newborn male children will grow, or shrink, with the same rate, that is, we can write
∗ m nm t+1,1 = (1 + ρ ) nt,1

But this will then propagate to all further ages, and the age distribution of the male population will only depend on male death rates. For example, m m nm t+2,2 = nt+1,1 (1 − δ1 ), and m m m m m nm t+3,3 = nt+2,2 (1 − δ2 ) = nt+1,1 (1 − δ1 )(1 − δ2 )

So we may write for all ages τ > 1 the equation

17.4

Female and Male Populations m nm t+τ,τ = nt+1,1

τ −1 m (1 − δj ) j=1 ∗ Therefore, if nm t+1,1 grows, or shrinks, with a constant rate ρ , the same will be true for the number of men at all ages. Consequently, also the male population will eventually reach a stable age distribution which can be derived from male death rates in the same way as was shown in the previous section for females. To repeat the method of calculation, one ∗ begins with an arbitrary value for the number of males in age 1, say v1 = 1. Then one can calculate recursively ∗ vτ = m 1 − δτ −1 ∗ v 1 + ρ∗ τ −1

1. So far we have only considered the development of a female population. Assuming time-constant birth and death rates, it was shown that the development of a female population eventually reaches an equilibrium which is characterized by a constant growth rate, ρ∗ , and a stable age distribution, nf,p . So the question remains how a corresponding male population will develop. To ﬁnd an answer one can begin with equations (17.1.2) which have been derived at the end of Section 17.1. Assuming time-constant birth and death rates, they can be written as follows: nm = D m nm + σ m B nf t+1 t t and nf = Df nf + σf B nf t t t+1

If σm = σf and the age-speciﬁc death rates were identical for men and women, both the male and female population would eventually reach the same stable age distribution. However, as we have seen in Part II, both assumptions are not valid. Instead, most often σm > σf , and in most of the age groups death rates are higher for men than for women.
10 This

(17.4.1)

for τ = 2, . . . , τm . The frequencies in the stable age distribution are then ∗ ∗ simply nm,p = vτ /Σj vj . τ

is mentioned by Anton and Rorres (1991, p. 654) where one can also ﬁnd a good introduction to much of the mathematics behind the model. For a statement, and proof, of suﬃcient and necessary conditions see Demetrius (1971).

264

17

AN ANALYTICAL MODELING APPROACH

17.5

PRACTICAL CALCULATIONS

265

17.5

Practical Calculations

In the present section we discuss how one can practically calculate intrinsic growth rates and stable age distributions with real data.

˜ Box 17.5-1 TDA script to create the matrix F and calculate its eigenvalues and eigenvectors. silent = -1; mfmt = 7.4; mdef(F,3,3) = 0.0,1.0,0.6, 0.8,0.0,0.0, 0.0,0.7,0.0; mev(F,ER,EI,EVR,EVI); mpr(ER); mpr(EI); mpr(EVR); mpr(EVI); # echo commands # set the print format # define the matrix F

17.5.1

Two Calculation Methods

1. For the calculations we use matrix commands available in the computer program TDA.11 There are two possible approaches. The ﬁrst one relies on a direct calculation of the eigenvalues (and eigenvectors) of the matrix ˜ F introduced in Section 17.3. For an illustration we use the example from Section 17.2. The TDA script is shown in Box 17.5-1. The mdef command is used to deﬁne the matrix   0 1 0.6 ˜ 0  F =  0.8 0 0 0.7 0

# # # # # #

calculate eigenvalues and eigenvectors of F print real part of eigenvalues print imaginary part of eigenvalues print real part of eigenvectors print imaginary part of eigenvectors

ER -------0.5286 -0.5286 1.0573

EI ------0.1958 -0.1958 0.0000

EVR ----------------------0.4305 0.4305 0.7405 -0.7552 -0.7552 0.5603 1.0000 1.0000 0.3710

EVI -----------------------0.3697 0.3697 0.0000 0.2798 -0.2798 0.0000 -0.0000 0.0000 0.0000

called F in the script. Then the mev command is used to calculate eigenvalues and eigenvectors of this matrix. The command gets F as input and creates two vectors (ER and EI) and two matrices (EVR and EVI) as output. ER and EI contain, respectively, the real and imaginary parts of the eigenvalues, and EVR and EVI contain, respectively, the real and imaginary parts of the eigenvectors. Their contents are shown in the lower part of Box 17.5-1. Most important is the dominant eigenvalue which is 1.0573 in this example. As shown in Section 17.3, one can immediately derive the intrinsic growth rate and the stable age distribution.

2. An alternative calculation method relies on the fact that, beginning with an arbitrary female population vector nf , one can iteratively calculate new 0 population vectors nf = Fnf t 0 which ﬁnally converge to a stable population vector. Compared with the ﬁrst method, there are two advantages. One does not need to use the ˜ reduced matrix F but can directly work with the complete Leslie matrix F. And one gets, in addition, information about the number of iterations required to approximately reach the stable distribution. 3. To ease the application of this method TDA provides the mpit command. As input, the command requires information about the matrix F, the initial population vector nf , and the number of iterations to be 0 performed, say tn . The command has the following syntax: mpit(A,N,T,R)
11 This

T is a scalar that provides the number of iterations, tn . N is a column vector with τm components (equal to the number of rows of the Leslie matrix F) and contains the initial population vector. A is a matrix with τm rows and two columns; the ﬁrst column contains the age-speciﬁc birth rates and the second column contains the age-speciﬁc survivor rates. Using notations introduced in Section 17.1, the matrix A and the vector N are assumed to be deﬁned as follows:     f f ∗ n0,1 1 − δ1 σf β1     . . . . A =  .  and N =  .  . . . f ∗ f σf βτm 1 − δτm n0,τm As output, the command creates the matrix R with tn + 1 rows and τm columns. The t-th row (for t = 0, . . . , tn ) contains the elements of the vector nf . t 4. The TDA script in Box 17.5-2 illustrates the mpit command with the same example used above. In order to replicate the values shown in Table 17.2-1 in Section 17.2, the initial population vector N has all components set to 1. The mpit command then performs 20 iterations and saves the result in the matrix R. By adding the rows of R one gets the vector NT containing the population sizes which can be used, then, to calculate age distributions (in D) and the growth rates (in RT). Of course, the value in the last component of RT is not a valid growth rate.

program is freely available via www.stat.ruhr-uni-bochum.de/tda.html.

266

17

AN ANALYTICAL MODELING APPROACH

17.5

PRACTICAL CALCULATIONS

267

Box 17.5-2 TDA script to illustrate the mpit command. silent = -1; mfmt = 5.2; mdef(A,4,2) = 0, 1, 0.6, 0, mdefc(4,1,1,N); mpit(A,N,20,R); mpr(R); mmul(R,N,NT); mpr(NT); mexpr(R/NT,D); mpr(D); # # # # # echo commands print format define matrix A containing birth rates in the first column and survivor rates in the second column

17.5.2

Calculations for Germany 1999

0.8, 0.7, 0.6, 0.0;

# # # # # # #

define unit vector N (initial population) perform 20 iterations, save result in R print the resulting matrix R sum rows of R, save result in vector NT print NT calculate age distributions in D print age distributions

1. The intrinsic growth rate and stable female and male age distributions pertaining to Germany in the year 1999 can be calculated from Tables 7.11 and 11.1-1. In order to prepare the required data, shown in Table 17.5-1, we assumed that the reproductive age of women begins at τa = 14 and ends at τb = 51.12 Since in 1999 the number of male and female births was 396292 and 374448,13 one gets σm = 0.514 and σf = 0.486. The survivor rate during the ﬁrst year of life can then be calculated as 1 − (0.514 · 0.004952 + 0.486 · 0.004010) = 0.9955
∗ and this can be used to calculate the birth rates βτ , which are used for our model, from the birth rates βτ which are shown in Table 17.5-1: ∗ βτ = 0.9955 βτ

mexpr((lag(NT,1) - NT) / NT,RT); # calculate growth rates in RT mfmt = 7.4; # new print format mpr(RT); # print growth rates

R ---------------------1.00 1.00 1.00 1.00 1.60 0.80 0.70 0.60 1.22 1.28 0.56 0.42 1.62 0.98 0.90 0.34 1.51 1.29 0.68 0.54 1.70 1.21 0.90 0.41 1.75 1.36 0.85 0.54 1.87 1.40 0.95 0.51 1.98 1.50 0.98 0.57 2.09 1.58 1.05 0.59 2.21 1.67 1.11 0.63 2.33 1.77 1.17 0.66 2.47 1.87 1.24 0.70 2.61 1.97 1.31 0.74 2.76 2.09 1.38 0.78 2.92 2.21 1.46 0.83 3.08 2.33 1.54 0.88 3.26 2.47 1.63 0.93 3.45 2.61 1.73 0.98 3.64 2.76 1.83 1.04 3.85 2.91 1.93 1.10

NT ---4.00 3.70 3.48 3.82 4.03 4.23 4.51 4.74 5.03 5.30 5.61 5.93 6.27 6.63 7.01 7.41 7.84 8.28 8.76 9.26 9.79

D ---------------------0.25 0.25 0.25 0.25 0.43 0.22 0.19 0.16 0.35 0.37 0.16 0.12 0.42 0.26 0.23 0.09 0.38 0.32 0.17 0.13 0.40 0.29 0.21 0.10 0.39 0.30 0.19 0.12 0.40 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11 0.39 0.30 0.20 0.11

RT -------0.0750 -0.0595 0.0989 0.0531 0.0500 0.0658 0.0509 0.0613 0.0551 0.0583 0.0568 0.0574 0.0573 0.0572 0.0573 0.0572 0.0573 0.0573 0.0573 0.0573 -1.0000

2. Beginning with the ﬁrst of the two calculation methods discussed in ˜ the previous section, the next step is to create the matrix F which, in the current application, has 51 rows and columns, and calculate its dominant eigenvalue. We have done this with the TDA script shown in Box 17.5-3. Input is a data ﬁle, spm1.dat, that contains the data shown in Table 17.51.14 The dominant eigenvalue is approximately λ∗ = 0.985 corresponding to a negative intrinsic growth rate of ρ∗ = −1.5 %. The interpretation is: If the birth and death rates of 1999 would remain constant in the future, and if migration would not take place, the population would eventually decline with a rate of -1.5 % per year. 3. In order to calculate the stable age distribution we use the method described at the end of Section 17.4 for the male population but, of course, can also be applied to ﬁnd the stable female age distribution. We begin ∗ with v1 := 1 and then recursively apply formula (17.4.1). For example,
∗ v2 =

1 − 0.000421 ∗ 1 − 0.000304 ∗ ∗ v1 = 1.0148, v3 = v2 = 1.0299 0.985 0.985

and so on, will result in a vector that is proportional to the stable age distribution of men. Using the female death rates instead will produce a vector
12 The

number of 80 births at age 14 or below has been related to the midyear number of women at age 14, which was 437300 in 1999, and the number of 16 births at age 51 or above has been related to the midyear number of women at age 51, which was 483000 in 1999. 1, Reihe 1, 1999 (p. 42).

13 Fachserie 14 In

addition, the data ﬁle contains two more columns containing, respectively, agespeciﬁc numbers of men and women in 1999 in Germany, taken from Table 7.1-1 in Section 7.1. The female population vector will be used below to illustrate the second calculation method.

268

17

AN ANALYTICAL MODELING APPROACH

17.5

PRACTICAL CALCULATIONS

269

Table 17.5-1 Birth and death rates in Germany in 1999, calculated from Tables 7.1-1 and 11.1-1. τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 m δτ f δτ

Box 17.5-3 TDA script to calculate the intrinsic growth rate corresponding to data for Germany 1999. silent = -1; # echo commands mfmt = 7.4; # set print format nvar( # read the data file spm1.dat dfile = spm1.dat, AGE [2.0] = c1, DF[8.6] = 1 - c3, B1[8.6] = c4, BF[8.6] = 0.486 * 0.9955 * B1, ); tsel = AGE[1,,51]; mdef(BRF) = BF; tsel = AGE[1,,50]; mdef(DRF) = DF; mdiag(DRF,A); mdefc(50,1,0,N); mcath(A,N,A); mtransp(BRF,BRFT); mcatv(BRFT,A,F); mev(F,ER,EI,EVR,EVI); mpr(ER); mpr(EI); # select ages # birth rates of female children # select ages # survivor rates of women # # # # # create diagonal matrix create a null vector concatenate with A make BRF a row vector concatenate with A to get F

βτ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000183 0.000778 0.002762 0.006807 0.013850 0.024724 0.035231 0.044585 0.054319 0.062574 0.069394 0.078996 0.083525 0.085854 0.092532 0.093590 0.093382 0.089946 0.082654 0.072578 0.061521 0.050843 0.040948 0.030625 0.022808 0.017015 0.011966 0.007599 0.004957 0.002769 0.001371 0.000603

τ 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

m δτ

f δτ

βτ 0.000290 0.000104 0.000086 0.000046 0.000023 0.000033 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.004952 0.000421 0.000304 0.000223 0.000198 0.000127 0.000166 0.000152 0.000163 0.000115 0.000137 0.000144 0.000157 0.000168 0.000247 0.000313 0.000410 0.000654 0.001012 0.000959 0.000941 0.001015 0.000895 0.000879 0.000961 0.000803 0.000885 0.000849 0.000883 0.000820 0.000895 0.000880 0.000909 0.000989 0.001067 0.001145 0.001428 0.001479 0.001583 0.001882 0.002010 0.002212 0.002506 0.002818 0.003008 0.003426

0.004010 0.000352 0.000212 0.000165 0.000143 0.000106 0.000107 0.000126 0.000100 0.000117 0.000090 0.000090 0.000118 0.000112 0.000149 0.000192 0.000242 0.000327 0.000348 0.000372 0.000310 0.000353 0.000283 0.000270 0.000308 0.000345 0.000300 0.000320 0.000352 0.000354 0.000366 0.000394 0.000448 0.000495 0.000517 0.000606 0.000618 0.000784 0.000828 0.000959 0.001059 0.001178 0.001315 0.001467 0.001549 0.001729

0.003603 0.003927 0.004295 0.004574 0.005180 0.005445 0.006376 0.006121 0.007317 0.007989 0.008473 0.009509 0.009642 0.011211 0.012309 0.013351 0.014959 0.016750 0.018706 0.020027 0.022121 0.025004 0.028132 0.030690 0.033592 0.035825 0.038527 0.042586 0.047512 0.051429 0.056174 0.063623 0.070017 0.086292 0.077474 0.093884 0.103886 0.111364 0.134642 0.140858 0.155596 0.171477 0.184898 0.208687 1.000000

0.001953 0.002034 0.002198 0.002407 0.002618 0.002928 0.003213 0.003275 0.003737 0.004061 0.004049 0.004582 0.004596 0.005307 0.005793 0.006136 0.006751 0.007453 0.008709 0.009375 0.010193 0.011762 0.013154 0.014562 0.016121 0.017972 0.020385 0.022606 0.024907 0.028595 0.032308 0.037003 0.041655 0.051891 0.048273 0.061502 0.071132 0.075476 0.094869 0.100747 0.113830 0.129769 0.146592 0.164650 1.000000

# calculate eigenvalues and eigenvectors # print real part of eigenvalues # print imaginary part of eigenvalues

that is proportional to the stable age distribution of women. Finally, one only needs to normalize these vectors in order to get distributions, i.e., proportions adding to unity. The resulting stable age distributions are shown in Figures 17.5-1 and 17.5-2 and compared with the actual age distributions of men and women in Germany 1999.15 It is seen that a prolongation of the current birth and death rates would result in a substantial increase in the proportion of older people. 4. We now use TDA’s mpit command to perform the calculations. The script is shown in Box 17.5-4. The input data are again taken from the data ﬁle spm1.dat. Survivor rates and adjusted birth rates are created as explained above. In addition, we use column 6 of the data ﬁle to get the female population in 1999, classiﬁed by age. The script then creates the matrix A and the vector N to be used as input for the mpit command. The vector U is used to get the row sums of R. The result, the vector NT, contains the female population size at the 200 iterations. This vector is ﬁnally used to calculate the growth rates. Investigating the output, one
15 The

data are taken from Table 7.1-1 in Section 7.1.

270
0.02

17

AN ANALYTICAL MODELING APPROACH

17.5

PRACTICAL CALCULATIONS

271

Box 17.5-4 TDA script to calculate the intrinsic growth rate and stable female age distribution corresponding to data for Germany 1999. silent = -1; # echo commands mfmt = 8.4; # set print format nvar( # read the data file spm1.dat dfile = spm1.dat, AGE [2.0] = c1, DF[8.6] = 1 - c3, B1[8.6] = c4, BF[8.6] = 0.486 * 0.9955 * B1, NF [5.1] = c6, # female population in 1999 ); tsel = AGE[1,,90]; # select ages mdef(A) = BF,DF; # create the A matrix # create population vector mdef(N) = NF; mpit(A,N,200,R); # perform iterations mpr(R); # show result mdefc(90,1,1,U); mmul(R,U,NT); mpr(NT); # create a unit vector # calculate population size # print NT # calculate growth rates # print growth rates

0.01

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 17.5-1 Frequency curves (restricted to ages less than 90) representing the age distribution of men in Germany 1999 (solid line) and the corresponding stable age distribution (dotted line).
0.02

mexpr((lag(NT,1) - NT) / NT,RT); mpr(RT);

0 0.01 -0.5

-1 0 0 10 20 30 40 50 60 70 80 90 100 -1.5

Fig. 17.5-2 Frequency curves (restricted to ages less than 90) representing the age distribution of women in Germany 1999 (solid line) and the corresponding stable age distribution (dotted line).

-2 0 10 20 30 40 50 60 70 80 90 100

ﬁnds that a stable growth rate of about -1.5 % is reached in about 100 iterations. 5. How long it takes to approximately reach an equilibrium depends on the extent to which the initial (current) and the ﬁnal (stable) age distribution diﬀer. As shown by Figure 17.5-2, the diﬀerences are quite substantial and it therefore requires many iterations to reach, at least approximately, the stable distribution. In our application one would need about 50 –100

Fig. 17.5-3 Year-to-year growth rates of the female population in Germany resulting from 100 iterations of the current female age distribution, based on birth and death rates in 1999.

iterations (years). This is illustrated in Figure 17.5-3 which shows the ﬁrst 100 elements of the vector RT calculated by the script in Box 17.5-4. Correspondingly, one might calculate how the population size would decrease.

272

17

AN ANALYTICAL MODELING APPROACH

Of course, the results of these calculations should not be mistaken for a population projection. They simply serve to investigate the implications of the current birth and death rates under the ﬁctitious assumption that they will not change and that neither in- nor out-migration will take place.

Chapter 18

Conditions of Population Growth
The previous chapter has introduced a general framework for analytical models and, as one application, has discussed the question how the population in Germany would develop if current birth and death rates would not change and migration would not take place. Of course, the question is hypothetical, and so is the answer. In the present chapter we continue with this kind of hypothetical question but try to get a somewhat closer understanding of how the intrinsic growth rate depends on birth and death rates. In Section 18.6 we also take into account migration.

18.1

Reproduction Rates

1. We begin with a discussion of reproduction rates. The total birth rate [zusammengefasste Geburtenziﬀer] in the year t, introduced in Section 11.1, is deﬁned as1 τb TBR t := τ =τa

βt,τ

(multiplied by 1000)

where the age-speciﬁc birth rates are denoted by βt,τ . It is simply the sum of the age-speciﬁc birth rates and shows how many children would be born of 1000 women if their childbearing would conform to the current birth rates and mortality would not take place until the end of the reproductive period. Table 18.1-1 shows values for both territories of Germany, Figure 18.1-1 provides a graphical illustration.2 Obviously, since about 1970, the number of births is below a replacement level which would require a total birth rate of about 2000. 2. Reproduction rates are modiﬁcations of the total birth rate which refer to only female births and take into account the mortality of women until the end of the reproductive period. The ﬁrst variant, called gross reproduction rate [Bruttoreproduktionsrate], is deﬁned as
GRR t := σt,f TBR t

where σt,f is the proportion of female births in year t. The idea behind this
1 In the literature, the total birth rate is also termed ‘total fertility rate’ and accordingly abbreviated by TFR. 2 Calculation of total birth rates for the territory of the former FRG is based on a reproductive period from 15 to 49 years. For the territory of the former GDR, the age range is 15 – 45 until 1988, 15 – 44 in 1989, and 15 – 40 since 1990.

274

18

CONDITIONS OF POPULATION GROWTH

18.2

RELATIONSHIP WITH GROWTH RATES

275

Table 18.1-1 Total birth rates in the territory of the former FRG (TBR a ) and in the territory of the former GDR (TBR b ). Source: Fachserie 1, Reihe 1, 1999 (pp. 50 -51). t 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966

2500 2000 1500 1000 500 0 1950

TBR a
2100.2 2067.7 2078.8 2053.5 2101.8 2108.4 2204.3 2300.9 2290.1 2368.1 2365.7 2456.8 2440.7 2518.4 2542.5 2507.5 2534.6

TBR b

t 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983

TBR a
2489.6 2382.1 2214.0 2016.3 1920.8 1712.9 1543.5 1512.5 1451.3 1454.8 1404.6 1380.7 1379.1 1444.9 1435.2 1407.2 1330.9

TBR b
2337.9 2296.8 2235.7 2192.5 2131.0 1786.0 1576.8 1539.7 1541.7 1636.8 1850.6 1899.0 1894.6 1941.8 1853.9 1858.2 1789.8

t 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

TBR a
1290.6 1280.8 1345.3 1368.0 1412.5 1395.4 1450.1 1421.8 1401.6 1392.6 1347.2 1339.3 1395.9 1440.6 1413.1 1405.8

TBR b
1735.4 1734.2 1699.9 1739.9 1670.2 1572.3 1517.7 977.2 830.4 774.9 772.2 838.2 947.7 1039.0 1086.7 1148.4

2398.5 2369.8 2350.3 2346.7 2262.3 2208.2 2205.4 2346.9 2328.3 2397.0 2415.1 2469.5 2507.6 2483.4 2424.4

1960

1970

1980

1990

2000

Fig. 18.1-1 Total birth rates in the territory of the former FRG (solid line) and in the territory of the former GDR (dotted line). Data are taken from Table 18.1-1.

deﬁnition is that only female births can contribute to further population growth. However, since the proportion of female births is close to 0.5 without much variation, the development of the gross reproduction rate is most often quite similar to the development of the total birth rate. 3. A next step is to take into account mortality of women until the end of the reproductive period. The idea is that the age-speciﬁc birth rate βt,τ only refers to women who are still alive at age τ . To formally introduce the deﬁnition, we use Gf to denote the proportion of women who reach t,τ at least age τ . These proportions can be derived from period life tables or directly from female death rates in the year t. While the Statistisches Bundesamt uses data from life tables,3 we prefer to use the female death f rates, δt,τ .4 The proportion of women still alive at age τ is then calculated as τ −1

Its value provides the mean number of female births per women assuming that the current birth and death rates apply until the end of the reproductive period. 4. Based on the general life table 1986/88 and assuming a reproductive period from 15 to 50 years, the Statistisches Bundesamt has calculated a value of 0.651 for the net reproduction rate in Germany in the year 1999. 5 Since, in Germany, female mortality until the end of the childbearing period is very low, the net reproduction rate is only slightly lower than the gross reproduction rate:
GRR 1999 = σ1999,f TBR 1999 /1000 = 0.486 · 1360.9/1000 = 0.661

In fact, a plot of the net reproduction rates would be very similar to the total birth rates shown in Figure 18.1-1.

Gf = t,τ j=0 f (1 − δt,j )

18.2

Relationship with Growth Rates

This leads to the deﬁnition of a net reproduction rate [Nettoreproduktionsrate] : τb τb τ −1

NRR t := σt,f τ =τa
3 See, 4 As

βt,τ Gf t,τ

= σt,f τ =τa

βt,τ j=0 f (1 − δt,j )

1. Reproduction rates are hypothetical constructs. Their interpretation is based on the assumption that the current birth and death rates prevail for an indeﬁnite period of time. This is similar to the model introduced in Chapter 17 and, in fact, there is a close relationship between the net reproduction rate and the intrinsic growth rate that derives from this model. In order to discuss this relationship we refer to the matrix F = Df +σf B that
5 Fachserie 1, Reihe 1, 1999 (p. 53). Using the deﬁnition given above, one can derive a value of 0.645 from the data in Table 17.5-1 in Section 17.5.2.

e.g., Fachserie 1, Reihe 1, 1999 (p. 53).

will be shown in the next section, this allows to easily connect the calculations with the modeling framework introduced in Chapter 17.

276

18

CONDITIONS OF POPULATION GROWTH

18.2

RELATIONSHIP WITH GROWTH RATES

277

The intrinsic growth rate is ρ∗ = λ∗ − 1, λ∗ being the dominant eigenvalue ˜ ˜ of F. So we have to investigate how λ∗ depends on the elements of F. ˜ 2. We ﬁrst mention that the elements of F can be used to calculate the net reproduction rate. As shown in Section 17.1, the relationship between the ∗ rates βτ , which are used for the model formulation, and the age-speciﬁc birth rates βt,τ is given by f ∗ ∗ βτ := βt,τ = βt,τ (1 − δt,0 )

was deﬁned in Section 17.2. The ﬁrst row of this matrix contains adjusted ∗ age-speciﬁc birth rates, βτ , and the subdiagonal contains the age-speciﬁc f female survivor rates (1 − δτ ).6 The intrinsic growth rate, ρ∗ , depends on these rates. In fact, as shown in Section 17.3, it suﬃces to consider the ˜ sub-matrix F which consists of the ﬁrst τb rows and columns of F and has the following structure:   ∗ ∗ ∗ ∗ σf β1 σf β2 · · · σf βτb −1 σf βτb  1 − δf 0 ··· 0 0  1   f  ˜ 0 1 − δ2 · · · 0 0  F =   . . . .   .. . . . .   . . . . . f 0 0 0 · · · 1 − δτb −1

In this formulation, I is an identity matrix and det (λI − A) is the determinant of (λI − A) considered as a polynomial in λ, also called the characteristic polynomial of A. We further state without proof that τb τ −1 ∗ βτ λτb −τ τ =τa j=1 f (1 − δj )

˜ det (λI − F) = λτb − σf

˜ We can ﬁnd, therefore, the eigenvalues of F as the solutions of the equation τb τ −1 ∗ βτ τ =τa

λ

τb

− σf

λ

τb −τ j=1

f (1 − δj ) = 0

Writing this equation in the form τb τ −1 ∗ βτ λτb −τ τ =τa j=1 f (1 − δj ) = λτb

σf

and dividing both sides by λτb , we get, for λ = 0, τb τ −1 ∗ βτ λ−τ τ =τa j=1 f (1 − δj ) = 1

g(λ) := σf τb (18.2.1)

So we get the net reproduction rate in the following way: τb NRR t = σt,f τ =τa τb

βt,τ Gf = σt,f t,τ τ =τa τ −1 ∗ βt,τ τ =τa ∗ βt,τ j=1 f (1 − δt,j )

f ∗ βt,τ Gf /(1 − δt,0 ) t,τ

= σt,f

4. In general, the equation g(λ) = 1 has τb , possibly complex, roots. How˜ ever, we are only interested in the dominant eigenvalue of F which is real and positive. Its existence is guaranteed by the theorem of Frobenius that was invoked in Section 17.3 but can also be shown directly.8 Be∗ f cause βτ ≥ 0 and also (1 − δτ ) ≥ 0, g(λ) is a monotonically decreasing, continuous function for all λ > 0. A possible graph of g is shown below: g(λ)

Using the fact that = 0 for τ < τa , and omitting the period index t, one arrives at the formulation7 τb τ −1 ∗ βτ τ =τa j=1 f (1 − δj )

NRR = σf

This then shows how the net reproduction rate is related to the elements ˜ of the matrix F. 3. For the next step we need a mathematical fact which will be stated without proof: For any (n, n) matrix A, its eigenvalues are the roots of the so-called characteristic equation det (λI − A) = 0
6 As

PSfrag replacements 1 λ λ∗ Furthermore, g(λ) → ∞ if λ → 0 and g(λ) → 0 if λ → ∞. It follows that there is a unique real and positive value, λ∗ , where g(λ∗ ) = 1 and, since ˜ no larger positive root exists, this is the dominant eigenvalue of F.
8 See

in the previous chapter, in order to simplify notations we omit the period index t. τ = 1 the product term is assumed to be 1. also Anton and Rorres (1991, p. 653).

7 For

278

18

CONDITIONS OF POPULATION GROWTH

18.4

GROWTH RATES AND AGE DISTRIBUTIONS

279

5. Equation (18.2.1) also shows how the dominant eigenvalue depends on the birth and death rates: If one or more of the birth rates increase, or one or more of the death rates decrease, the dominant eigenvalue, and consequently the intrinsic growth rate, increases. A special case occurs if the net reproduction rate equals 1. This implies that the dominant eigenvalue also has the value 1, and the intrinsic growth rate will be zero. Of course, this argument concerns the intrinsic growth rate. The value of the actual growth rate also depends on the current age distribution and so it can happen that a population might well grow for some time although the net reproduction rate is already less than 1. However, if the net reproduction rate is below 1 and there is no immigration, the population eventually declines.

In both cases the net reproduction rate has now a value of 0.8, implying a negative intrinsic growth rate. Calculating the dominant eigenvalues, we ﬁnd λ∗ = 0.9107 and λ∗ = 0.9188. So case (d) has actually a relatively c d higher growth rate ρ∗ = −8.12 %, compared with ρ∗ = −8.93 % in case c d (c). This can be explained by referring to the stable age distribution. If the population growth is positive, as in cases (a) and (b), there will be relatively more women in younger age classes and a shift of birth rates to these younger age classes will increase the growth rate. On the other hand, if the population growth is negative, there will be relatively more women in older age classes and a shift of birth rates to these older age classes will increase the growth rate. 4. The argument can also be formulated in terms of a mean generational distance, which is formally identical to the mean childbearing age of women, restricted to women who give birth to at least one child, and can be deﬁned as τb f τ =τa τ βτ Gτ τb f τ =τa βτ Gτ

18.3

The Distance of Generations

1. In general, there is no simple and direct relationship between the net reproduction rate and the intrinsic growth rate. An exception is the case when the NRR has a value of 1. The intrinsic growth rate is then zero and also independent of the distribution of the age-speciﬁc birth rates. Except for this special case, the growth rate also depends on the timing of births. In particular, in the case of a positive net reproduction rate: if the mean age at childbearing increases the growth rate will decline and, conversely, if the mean age at childbearing decreases the growth rate will increase. 2. To illustrate this argument we consider the two matrices simplicity, we assume zero death rates:    0 0.5 0.7 0 0 0.7 0.5  1 0  1 0 0 0  0  and Fb :=  ˜ ˜ Fa :=   0 1  0 1 0 0  0 0 0 1 0 0 0 1 where, for  0 0   0  0

It is often argued that, if this mean generational distance increases, the population growth rate will decrease. But this is actually only true if the net reproduction rate is greater than 1. Otherwise, if the population growth is negative, an increase in the mean generational distance will result in a less negative growth rate.

18.4

Growth Rates and Age Distributions

In both cases the net reproduction rate is 1.2, the diﬀerence is in the timing of births. In case (a) more children are born at an older age, in case (b) more children are born at a younger age of their mothers. Calculating the dominant eigenvalues, we ﬁnd λ∗ = 1.0734 and λ∗ = 1.0787 which shows a b that the intrinsic growth rate is higher in the second case. 3. One should notice, however, that this depends on whether the net reproduction rate is above or below 1. If less than 1, the relationship becomes reversed as shown by the following example:     0 0.5 0.3 0 0 0.3 0.5 0  1 0  0 0  0 0   and Fd :=  1 0 ˜  ˜ Fc :=   0 1  0 1 0 0  0 0  0 0 1 0 0 0 1 0

1. The argument in the previous section has shown that age distributions play a signiﬁcant role in the analysis of population growth. On the other hand, the age distribution also depends on population growth. This is most easily shown by referring to the stable female age distribution. As has been discussed in Section 17.3, this age distribution is proportional to the eigenvector, v∗ , that corresponds to the dominant eigenvalue λ∗ and, if λ∗ is known, can easily be computed from the age-speciﬁc death rates: ∗ one begins with an arbitrary positive value for v1 and then recursively applies the formula
∗ vτ = f 1 − δτ −1 ∗ vτ −1 λ∗

(for τ = 2, . . . , τm )

If the net reproduction rate is 1 (λ∗ = 1), the formula shows that the age distribution only depends on the age-speciﬁc death rates.9 But if
9 In this case the age distribution would equal the life table age distribution discussed in Section 7.4.4.

280
0.02

18

CONDITIONS OF POPULATION GROWTH

18.5

DECLINING IMPORTANCE OF DEATH RATES

281

18.5

Declining Importance of Death Rates

(b) 0.01 (a)

1. In general, the intrinsic growth rate depends both on birth and death rates. However, death rates are only important until the end of the reproductive period. Furthermore, in modern societies these death rates are already very low. For example, referring to the period life table for the year 1999 (see Table 7.3-1 in Section 7.3.2), out of 1000 women only 23 died until an age of 45. One can expect, therefore, that further progress in diminishing death rates will not have any substantial consequences for the intrinsic growth rate. 2. To illustrate the argument we refer again to the year 1999. As was shown in Section 13.3.2, the birth and death rates of that year imply an intrinsic growth rate of -1.51 %. We now assume that death rates were zero until the end of the reproductive period. The corresponding intrinsic growth rate would then be -1.48 %.

0 0 10 20 30 40 50 60 70 80 90 100

Fig. 18.4-1 Solid line: Female age distribution in Germany 1999. Dotted lines: (a) Stable age distribution calculated from current birth and death rates. (b) Stable age distribution if the net reproduction rate would be 1.

18.6

Population Growth with Immigration

population growth is positive, or negative, this is no longer the case. To show this we rewrite the formula in the following way (assuming that f ∗ v1 = 1 − δ0 ): 1 λ∗ τ −1 τ −1 f (1 − δj ) = j=1

∗ ∗ vτ = v 1

1 λ∗

τ −1

Gf τ

This shows that, if λ∗ > 1, the frequencies of the higher age classes are multiplied by a factor that decreases with age and consequently become relatively smaller. Conversely, if λ∗ < 1, the multiplicative factor increases with age, and this then implies that frequencies of the higher age classes become relatively larger. 2. As an illustration we consider again the stable female age distribution that was calculated in Section 17.5.2 for Germany in 1999. Two of the three frequency curves shown in Figure 18.4-1 are identical with the curves shown in Figure 17.5-2. The solid line depicts the actual female age distribution in 1999, the dotted curve (a) is the stable age distribution calculated from the birth and death rates in 1999. Since these rates imply a net reproduction rate which is far below 1, there is a huge shift towards the older age classes. The dotted curve (b) is calculated from the assumption that the female death rates have their actual values but the birth rates have values to ensure a net reproduction rate of 1. The age distribution is then solely determined by the current death rates.

1. A further question concerns the eﬀects of immigration on population growth. To provide a brief discussion we extend the female population model introduced in Chapter 17 to include female net immigration. Remember the original model formulation: nf = Fnf , where nf is a female t t t+1 population vector for the year t, and F is the Leslie matrix assumed to be time-independent. We now consider an additional vector   f mt,1   . mf :=  .  t . mf m t,τ where mf is the net immigration of women aged τ in the year t. Of course, t,τ components might be negative if out-migration exceeds in-migration. Using this vector, an extended model can be written as follows: nf = Fnf + mf t t t+1

(18.6.1)

The formulation assumes that there is a single Leslie matrix F that provides the birth and death rates both for native and immigrant women.10 2. A simple solution is possible if we assume a time-constant immigration vector mf ≥ 0. Beginning with a base year t = 0, we ﬁnd: nf = Fnf + mf 1 0 nf = Fnf + mf = F2 nf + Fmf + mf 2 1 0
10 For

a similar approach to include migration into a Leslie model see Lilienbecker (1991), further possibilities have been discussed by Sivamurthy (1982).

282

18

CONDITIONS OF POPULATION GROWTH

18.6

POPULATION GROWTH WITH IMMIGRATION

283

15

In-migration Out-migration Net immigration

Table 18.6-1 Female in-migration (mf,i ), out-migration (mf,o ), and net t,τ t,τ immigration (mf ), classiﬁed according to age τ , in the year t = 1999 in t,τ Germany. Source: Fachserie 1, Reihe 1, 1999 (pp. 116 -117). τ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 mf,i t,τ 2942 5123 4609 4428 4342 4266 4181 4296 4083 3975 3841 3826 3853 3785 3796 4015 4661 5231 7577 12043 15172 16383 16788 15885 14811 13025 11514 10679 9600 9199 8495 7724 7035 6518 6381 6101 5710 5559 mf,o t,τ 1131 2823 3061 3117 2978 2814 3184 3245 2680 2441 2287 2269 2058 2033 1937 1971 2166 2570 3263 4957 7095 9280 9766 9796 9393 8615 7953 7416 7119 7023 6842 6360 6102 5674 5470 4923 4650 4348 mf t,τ 1811 2300 1548 1311 1364 1452 997 1051 1403 1534 1554 1557 1795 1752 1859 2044 2495 2661 4314 7086 8077 7103 7022 6089 5418 4410 3561 3263 2481 2176 1653 1364 933 844 911 1178 1060 1211 τ 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75∗ mf,i t,τ 5024 5121 4652 4290 3932 3768 3641 3416 3103 2966 2805 2739 2686 2490 2260 1828 1466 1497 1419 1576 1787 1899 1915 1843 1863 1610 1410 1262 1036 1057 920 1071 918 960 851 726 740 5050 mf,o t,τ 3983 4006 3583 3150 2825 2832 2646 2410 2274 2180 2021 2041 1924 1877 1779 1551 1415 1466 1387 1399 1458 1422 1668 1548 1385 1301 1072 1070 898 814 712 754 626 626 513 514 447 3721 mf t,τ 1041 1115 1069 1140 1107 936 995 1006 829 786 784 698 762 613 481 277 51 31 32 177 329 477 247 295 478 309 338 192 138 243 208 317 292 334 338 212 293 1329

10

5

0 0 10 20 30 40 50 60 70 80 90

Fig. 18.6-1 Age distribution of female immigrants and emigrants in Germany 1999. Data are taken from Table 18.6-1.

and so on, in general:11 t−1 nf = F t nf + t 0 j=0 Fj mf

This equation can be used to think about equilibrium conditions. A suﬃcient condition is that the intrinsic growth rate implied by F is negative. Then, if t becomes larger, Ft nf converges to zero, and the population 0 vector nf converges to t ¯ nf := (I − F)−1 mf This also implies that, in the long run, the growth rate becomes zero and ¯ the time-constant population nf only depends on the net immigration and the parameters of the Leslie matrix F. 3. For an illustration we continue with the data used in Section 17.5.2 providing the Leslie matrix F and the initial female population vector nf for 0 the year 1999. In addition, we use the data shown in Table 18.6-1 about female immigration and emigration in Germany in the same year. The age class 75∗ is open-ended and covers also all higher ages. Altogether, 369049 women immigrated and 248108 women emigrated during the year 1999 resulting in a net immigration of 120941 women. As shown in Figure 18.6-1, both in- and out-migration mainly take place in younger ages. For the net immigration vector mf we therefore only use the ﬁgures from Table 18.6-1 until the age 74 (mf = 0 for τ ≥ 75), in total about 121000 τ
11 By

convention, F0 equals the identity matrix I.

persons. According to the model (18.6.1), we assume that the same female in-migration takes place also in all years following 1999. Then, after 51 iterations of the model, one ﬁnds the projected female population vector for the year 2050. The total female population would then be about 41.3 million, instead of 34.4 million as projected by a model without immigration. Since mainly young women immigrate, also the age distribution would be quite diﬀerent. This is illustrated in Figure 18.6-2. The solid line shows the female age distribution in 1999; the two other lines show, respectively,

284
700 600 500 400 300 200 100 0 0 10 20 30 with immigration without immigration

18

CONDITIONS OF POPULATION GROWTH

18.6

POPULATION GROWTH WITH IMMIGRATION

285

700 600 500 400 300 200 100 0 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

Fig. 18.6-2 Age distribution of the female population in 1999 (solid line) and of the projections for the year 2050 with and without immigration (dotted lines). The ordinate refers to absolute frequencies.
70 60 50 40 30 20 10 0 1990 2000 2010 2020 2030 with immigration without immigration

Fig. 18.6-4 Age distribution, in absolute frequencies, of the female population in Germany 1999 (solid line) and stable age distribution derived from the 1999 Leslie matrix and a constant net immigration according to Table 18.6-1 (dotted line).

to women aged 65 to 89. Figure 18.6-3 compares the development of this ratio in models with and without immigration. 4. Since the Leslie matrix for Germany in 1999 implies a negative intrinsic growth rate of about -1.5 %, without immigration the population would vanish in the long run. On the other hand, a constant net immigration would not only slow down the population shrinkage but eventually stabilize the population at a constant level. In our model, this long-term level is ¯ given by the population vector nf and can easily be calculated from the Leslie matrix F and the net immigration vector mf . Using the data for Germany in 1999, the total number of female persons aged 1 to 89 would eventually stabilize at about 18.7 million. One should also note that the equation ¯ n = (I − F)−1 m is linear; a proportional increase, or decrease, of the immigration vector would result in the same proportional increase, or decrease, of the ﬁnal population size. 5. The long-run equilibrium also implies a stable age distribution. Figure 18.6.4 compares this stable age distribution with the actual age distribution of the female population in 1999. The female old age dependency ratio would be 41 % compared with 31 % in 1999. However, to put these ﬁgures into perspective one should compare the models with and without immigration. This is done in Figure 18.6-5 that compares the stable age distributions from models with and without immigration. The stable age

2040

2050

Fig. 18.6-3 Projected development of female old age dependency ratios until the year 2050, with and without immigration.

the age distribution of the projected female population in the year 2050 with and without immigration. A simple summary measure is the female old age dependency ratio [Altenquotient] deﬁned as12 number of women aged 65 and over number of women aged 20 to 64 In our example, we can calculate this measure with a numerator that refers
12 We

mention that there is no general convention where to beginn the “old ages”.

286
0.02

18

CONDITIONS OF POPULATION GROWTH

Appendix A

Appendix
0.01

This appendix has two sections. Section A.1 provides some hints about how to ﬁnd data and additional information from oﬃcial statistics in Germany. Section A.2 brieﬂy summarizes some notation from set theory that is used in the main text.

A.1
0 0 10 20 30 40 50 60 70 80 90

Data from Oﬃcial Statistics

This section is not ﬁnished yet.

Fig. 18.6-5 Stable age distributions (relative frequencies) implied by the Leslie models with immigration (solid line) and without immigration (dotted line).

distribution of the model without immigration would correspond to a female old age dependency ratio of 62 %. Remarkably, only a small part of the much lesser ratio of 41 % in the model with immigration is due to the fact that mainly young women immigrate. If, instead of the ﬁgures in Table 18.6-1, we assume ages of immigrants equally distributed between 1 and 50 years, the ﬁnal old age dependency ratio would only slightly increase to 42.4 %.

288

A

APPENDIX

A.2

SETS AND FUNCTIONS

289

A.2

Sets and Functions

Throughout the text we stressed the fact that statistical variables are to be regarded as functions and that statistical distributions are functions on sets of sets. Sets and functions thus play a fundamental role in all statistical constructs. The following two sections summarize the basic notations for sets and functions.

We call such an ordered collection a pair if it has two elements. If the collection contains three elements, we call it a triple. Generally, we call an ordered collection of n elements (a1 , . . . , an ) an n-tuple. 6. When a set, say B, has been deﬁned one can build a new set by the construction C := {b ∈ B | b has the property . . . } Here, C is the name of the set consisting of all those elements of B which have the property given after the vertical line. The new set is a subset of the set B. We write C ⊆ B if C is a subset of B, i.e. if each element of C is also an element of B. Consequently, B ⊆ B is always true. We write C ⊂ B if there are elements of B that do not belong to C. 7. Given two sets A and B, one can deﬁne new sets by the operations of union and intersection: The union A ∪ B is the set of elements belonging to at least one of the sets A and B. The intersection A ∩ B is the set of elements belonging both to A and B. It might happen that the intersection contains no elements at all. If this happens we call the two sets mutually exclusive or disjoint. We call a set with no elements empty. But according to our deﬁnition of the equality of sets there is only one empty set. We call it the empty set and denote it by ∅. 8. As a direct consequence of the deﬁnitions, union and intersection are commutative A∪B = B∪A A∩B = B∩A and distributive A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) / 9. If B is a subset of A then B c := {a ∈ A | a ∈ B} is the complement of B in A. In general, if A and B are sets, A \ B := {a ∈ A | a ∈ B} is the / complement of A ∩ B in A. 10. Given a set A, a partition of A is a set of subsets of A with elements A1 , . . . , Am , such that the union of all these sets is equal to A (A1 ∪ . . . ∪ Am = A) and such that all distinct pairs Ai , Aj are mutually exclusive (Ai ∩ Aj = ∅ for all i, j ∈ {1, . . . , m} provided that i = j). For example, if A := {a1 , a2 , a3 }, then {{a1 }, {a2 , a3 }}

Notations from Set Theory
1. The basic idea is that people are able to comprehend arbitrary objects into a set [Menge]. Georg Cantor (1845–1918), the originator of set theory, gave the following explanation:
“Unter einer Menge“ verstehen wir jede Zusammenfassung M von bestimmten ” wohlunterschiedenen Objekten unsrer Anschauung oder unseres Denkens (welche die Elemente“ von M genannt werden) zu einem Ganzen.” (Cantor 1962, p. 282) ”

In accordance with this explanation, the construction of a set is a mental operation without any speciﬁc implication for the ontological status of the resulting set. Furthermore, there is no restriction in the kinds of objects that can be considered to be elements of a set. 2. We generally use capital letters to denote sets. The elements of the set, i.e. the entities belonging to the set, are written in small letters.1 Thus, A := {a1 , a2 , a3 } deﬁnes the set A to be the collection made up by the elements a1 , a2 and a3 . 3. Most of the sets that appear in this text have a ﬁnite number of elements. For a set A with a ﬁnite number of elements we use the abbreviation | A | for ‘the number of elements of A’. If A := {a1 , a2 , a3 }, then | A | = 3. 4. We us the symbol ∈ as an abbreviation for “belongs to”. Thus we write a ∈ A. Similarly, we use the symbol ∈ as an abbreviation for “does not / belong to”. Two sets are equal if both sets have the same elements. In other words, for two sets A and B, A = B if each element of A is also an element of B and each element of B is also an element of A. Sets are therefore completely determined when its elements are given, while the order in which elements are given is irrelevant: {a1 , a2 , a3 } = {a2 , a3 , a1 }. 5. When the order of elements in a collection is of importance we write (a1 , a2 , a3 ) In this case, the order of the three elements makes a diﬀerence, i.e. (a1 , a2 , a3 ) = (a2 , a1 , a3 ) try to follow this convention throughout the text. But occasionally we will have to refer to sets whose members are themselves sets.
1 We

is a partition of A.

290

A

APPENDIX

A.2

SETS AND FUNCTIONS

291

11. The power set of a set A is the set of all its subsets. We use the symbol P(A). Both the empty set ∅ and the set A itself are elements of the power set. Using once again A := {a1 , a2 , a3 } we have P(A) = {∅, {a1}, {a2 }, {a3 }, {a1 , a2 }, {a1 , a3 }, {a2 , a3 }, {a1 , a2 , a3 }} The number of elements belonging to a power set is |P(A)| = 2|A| . 12. Another elementary notion is that of a Cartesian product of two or more sets. Given two sets A and B, the Cartesian product A × B is the set of all ordered pairs that can be constructed from elements of A and B. As an example, if A := {1, 2} and B := {3, 4, 5} then A × B = {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)} The Cartesian product of three and more sets is constructed similarly. For example, if C := {6} then A × B × C = {(1, 3, 6), (1, 4, 6), (1, 5, 6), (2, 3, 6), (2, 4, 6), (2, 5, 6)} One might also construct the Cartesian product of a set with itself: A × A × A = {(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2), (2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)} In this case we use the following abbreviation An := A × · · · × A n times 13. The Cartesian product operates distributively on unions and intersections: A × (B ∪ C) = (A × B) ∪ (A × C) A × (B ∩ C) = (A × B) ∩ (A × C) In particular, A×∅ = ∅×A = ∅ But the Cartesian product is not, in general, commutative: B × A = {(3, 1), (4, 1), (5, 1), (3, 2), (4, 2), (5, 2)} = A × B

The Notion of Function
1. The notion of function is fundamental to statistics. We use the word in the same sense as it is now used in mathematics. Given two sets A and B, a function relates to each element of A a unique element of B. We write f : A −→ B where f is the name of the function, A is called the domain of the function, and B is the counterdomain of the function. If a ∈ A is an element of the domain of the function f , we write f (a) for the unique element of B which is related to a through the function f . We call f (a) the value of the function evaluated at the argument a. 2. Given the two sets A := {1, 2} and B := {3, 4, 5} we might deﬁne a function f : A −→ B by deﬁning f (1) = 3, f (2) = 4, that is by giving its values for all the arguments. Next we must say when two functions are to be regarded as equal. We will say that two functions f : A −→ B and g : C −→ D are equal if A = C, B = D, and f (a) = g(a) for all a ∈ A. For example, if g : {1, 2} −→ {3, 4} with g(1) = 3 and g(2) = 4, then f = g. 3. With a function f : A −→ B we can associate a further function, called a set function, that takes subsets of the domain as its argument. In a slight abuse of notation we will denote that function by the same symbol f . Thus we write f : P(A) −→ P(B) where the set function relates a subset C ⊆ A to the unique subset f (C) := {b ∈ B | there is an a ∈ C such that f (a) = b} of B. In shorter notation, f (C) = {f (a) | a ∈ C} We call f (C) the image of C under f . Especially, the image of A is called the range of the function. Obviously, f (A) ⊆ B; but as in the example above we may have f (A) = B. 4. The set function associated with a function f : A −→ B always has an inverse set function deﬁned by f −1 : P(B) −→ P(A)

292

A

APPENDIX

which to each subset of the counterdomain of f relates a unique subset according to f −1 (C) := {a ∈ A | f (a) ∈ C} where C is an arbitrary element of P(B). We call f −1 (C) the preimage of C with respect to f . For example, if f : {1, 2} −→ {3, 4, 5} is deﬁned by f (1) = 3 and f (2) = 4, then f −1 ({3}) = {1}, f −1 ({4}) = {2}, f −1 ({5}) = ∅, f −1 ({3, 4}) = {1, 2}, f −1 ({3, 5}) = {1}, f −1 ({4, 5}) = {2}, f −1 ({3, 4, 5}) = {1, 2}, f −1 (∅) = ∅ The union and intersection operations are preserved under inverse set functions: f −1 (C ∪ D) = f −1 (C) ∪ f −1 (D) f −1 (C ∩ D) = f −1 (C) ∩ f −1 (D) where C and D are arbitrary subsets of B. 5. It should be clear that the mathematical notion of a function is fundamentally diﬀerent from the use of the word in connection with purposes and aims. Even if this is fairly obvious from the deﬁnitions, we should stress, ﬁrst, that functions are created by the human mind. It is the scientist who conceptualizes sets, and the scientist who constructs relations and functions between sets. Neither functions nor sets are empirical facts. Secondly, however, there is a diﬀerence between the uses of the concepts in mathematics and in statistics. In mathematics, one might create sets and functions without regard to empirical facts. In contrast, statistical methods are constructed in order to support reﬂections on empirical facts. Thus, in statistics, the usefulness of sets and functions will not only depend on their formal properties as such but much more on the intended meaning of the sets and functions.

References
Note: Sources from Oﬃcial Statistics cited in the main text are not contained in this list of references.
Alt, C. 1991. Stichprobe und Repr¨sentativit¨t. In: H. Bertram (Hg.), Die a a Famlie in Westdeutschland, 497-531. Opladen: Leske + Budrich. Anderson, R. N. 1999. Method for Constructing Complete Annual U.S. Life Tables. National Center for Health Statistics. Vital Health Statistics Series 2, No. 129. Anton, H., Rorres, C. 1991. Elementary Linear Algebra. Applications Version. New York: Wiley. Bach, W, Handl, J., M¨ller, W. 1980. Volks- und Berufsz¨hlung 1970. Codebuch u a a und Grundausz¨hlung. Mannheim: VASMA-Projekt. Balzer, W. 1997. Die Wissenschaft und ihre Methoden. Grunds¨tze der Wissena schaftstheorie. M¨nchen: Alber. u Baumol, W. J. 1966. Economic Models and Mathematics. In: S. R. Krupp (ed.), The Structure of Economic Science, 88 – 101. Englewood Cliﬀs: PrenticeHall. [A German translation appeared in: H. Albert (ed.), Theorie und Realit¨t, 153 – 168. T¨bingen: Mohr 1972.] a u Birg, H., Filip, D., Fl¨thmann, E.-J. 1990. Parit¨tsspeziﬁsche Kohortenanalyse o a des generativen Verhaltens in der Bundesrepublik Deutschland nach dem 2. Weltkrieg. IBS-Materialien Nr. 30. Institut f¨r Bev¨lkerungsforschung und u o Sozialpolitik (IBS) an der Universit¨t Bielefeld. a Blossfeld, H.-P., Huinink, J. 1989. Die Verbesserung der Bildungs- und Berufschancen von Frauen und ihr Einﬂuß auf den Prozeß der Familienbildung. Zeitschrift f¨r Bev¨lkerungswissenschaft 15, 383 – 404. u o Bolte, K. M., Kappe, D., Schmid, J. 1980. Bev¨lkerung. Statistik, Theorie, Geo schichte und Politik des Bev¨lkerungsprozesses. Opladen: Leske + Budrich. o Borst, A. 1990. Computus. Zeit und Zahl in der Geschichte Europas. Berlin: Wagenbach. [English translation: The Ordering of Time. From the Ancient Computus to the Modern Computer. Cambridge: Polity Press 1993.] Bortkiewicz, L. v. 1911. Sterblichkeit und Sterblichkeitstafeln. In: J. Conrad, L. Elster, W. Lexis, E. Loening (Hg.), Handw¨rterbuch der Staatswiso senschaften, Bd. 7, 930 – 944. Jena: Gustav Fischer. Bortkiewicz, L. v. 1919. Bev¨lkerungswesen. Leipzig: Teubner. o Brand, M. 1982. Physical Objects and Events. In: W. Leinfellner, E. Kraemer, J. Schank (eds.), Language and Ontology, 106 – 116. Wien: H¨lder-Pichlero Temsky. B¨rgin, G., Schnorr-B¨cker, S. 1986. ISI-“Declaration on Professional Ethics” – u a Internationaler Berufskodex f¨r Statistiker aus der Sicht der Bundesstatistik. u Wirtschaft und Statistik 34, 573 – 581. Cantor, G. 1962. Gesammelte Abhandlungen mathematischen und philosophischen Inhalts. Hrsg. von E. Zermelo. Hildesheim: Georg Olms. Coleman, J. S. 1968. The Mathematical Study of Change. In: H. M. Blalock, A. B. Blalock (eds.), Methodology in Social Research, 428 – 478. New York:

294
McGraw Hill. Danto, A. C. 1985. Narration and Knowledge (including: Analytical Philosophy of History). New York: Columbia University Press. [There is a German translation of the previously written Analytical Philosophy of History: Analytische Philosophie der Geschichte. Frankfurt: Suhrkamp 1980.] Demetrius, L. 1971. Primitivity Conditions for Growth Matrices. Mathematical Biosciences 12, 53 – 58. Dinkel, R. 1983. Analyse und Prognose der Fruchtbarkeit am Beispiel der Bundesrepublik Deutschland. Zeitschrift f¨r Bev¨lkerungswissenschaft 9, 47 – 72. u o Dinkel, R. 1984. Sterblichkeit in Perioden- und Kohortenbetrachtung. u o Zeitschrift f¨r Bev¨lkerungswissenschaft 10, 477 – 500. Dinkel, R. H. 1989. Demographie. Bd. 1: Bev¨lkerungsdynamik. M¨nchen: Vero u lag Franz Vahlen. Dinkel, R. H. 1992. Kohortensterbetafeln f¨r die Geburtsjahrg¨nge ab 1900 bis u a 1962 in den beiden Teilen Deutschlands. Zeitschrift f¨r Bev¨lkerungswissenu o schaft 18, 96 – 116. Dinkel, R. H., Meinl, E. 1991. Die Komponenten der Bev¨lkerungsentwicklung o in der Bundesrepublik Deutschland und der DDR zwischen 1950 und 1987. Zeitschrift f¨r Bev¨lkerungswissenschaft 17, 115–134. u o Dinkel, R. H., H¨hn, C., Scholz, R. (eds.) 1996. Sterblichkeitsentwicklung – o unter besonderer Ber¨cksichtigung des Kohortenansatzes. M¨nchen: Harald u u Boldt Verlag. Esenwein-Rothe, I. 1982. Einf¨hrung in die Demographie. Bev¨lkerungsstruku o tur und Bev¨lkerungsprozeß aus der Sicht der Statistik. Wiesbaden: Franz o Steiner Verlag. Esenwein-Rothe, I. 1992. Wilhelm Lexis. Demograph und National¨konom. o Frankfurt: Haag + Herchen. Feichtinger, G. 1973. Bev¨lkerungsstatistik. Berlin: de Gruyter. o Feichtinger, G. 1979. Demographische Analyse und populationsdynamische Modelle. Wien: Springer-Verlag. Festy, P., Prioux, F. 2002. An Evaluation of the Fertility and Family Surveys Project. New York and Geneva: United Nations. Fisher, R. A. 1922. On the Mathematical Foundations of Theoretical Statistics. Philosophical Transactions of the Royal Society of London. Series A, Vol. 222, 309 – 368. Flask¨mper, P. 1962. Bev¨lkerungsstatistik. Hamburg: Verlag Richard Meiner. a o Fliegel, H. F., Flandern, T. C. van 1968. A Machine Algorithm for Processing Calendar Dates. Communications of the ACM 11, 657. Frege, G. 1990. Schriften zur Logik und Sprachphilosophie. 3. Auﬂ., hrsg. von G. Gabriel. Hamburg: Felix Meiner. Frey, G. 1961. Symbolische und Ikonische Modelle. In: H. Freudenthal (ed.), The Concept and the Role of the Model in Mathematics and Natural and Social Sciences, 89 – 97. Dordrecht: Reidel. F¨rst, G. 1972. u Wandlungen im Programm und in den Aufgaben der amtlichen Statistik in den letzten 100 Jahren. In: Statistisches Bundesamt, Bev¨lkerung und Wirtschaft 1872 – 1972, pp. 11 – 83. Wiesbaden: o Kohlhammer.

295
Galton, F. 1889. Natural Inheritance. London: Macmillan. Gantmacher, F. R. 1971. Matrizenrechnung (Teil II). Berlin: Deutscher Verlag der Wissenschaften. Glenn, N. D. 1977. Cohort Analysis. Beverly Hills: Sage. Hacker, P. M. S. 1982. Events and Objects in Space and Time. Mind 91, 1 – 19. Hauser, P. M., Duncan, O. D. 1959. Overview and Conclusions. In: P. M. Hauser, O. D. Duncan (eds.), The Study of Population, 1 – 26. Chicago: University of Chicago Press. Hendry, D. F., Richard, J.-F. 1982. On the Formulation of Empirical Models in Dynamic Econometrics. Journal of Econometrics 20, 3 – 33. H¨hn, C. 1984. Generationensterbetafeln versus Periodensterbetafeln. In: Neuo ere Aspekte der Sterblichkeitsentwicklung. Dokumentation der Jahrestagung 1983 der Deutschen Gesellschaft f¨r Bev¨lkerungswissenschaft e.V., 117 – 143. u o Wiesbaden: Selbstverlag der Deutschen Gesellschaft f¨r Bev¨lkerungswissenu o schaft e.V. Huinink, J. 1987. Soziale Herkunft, Bildung und das Alter bei der Geburt des ersten Kindes. Zeitschrift f¨r Soziologie 16, 367 – 384. u Huinink, J. 1988. Die demographische Analyse der Geburtenentwicklung mit Lebensverlaufsdaten. Allgemeines Statistisches Archiv 72, 359 – 377. Huinink, J. 1989. Das zweite Kind. Sind wir auf dem Weg zur Ein-Kind-Familie? Zeitschrift f¨r Soziologie 18, 192 – 207. u Huinink, J. 1998. Ledige Elternschaft junger Frauen und M¨nner in Ost und a West. In: R. Metze, K. M¨hler, K.-D. Opp (eds.), Der Transformationsu prozess. Analysen und Befunde aus dem Leipziger Institut f¨r Soziologie, u 301 – 320. Leipzig: Leipziger Universit¨tsverlag. a Hullen, G. 1998. Lebensverl¨ufe in West- und Ostdeutschland. L¨ngsschnitta a analysen des deutschen FFS. Opladen: Leske + Budrich. Hyndman, R. J., Fan, Y. 1996. Sample Quantiles in Statistical Packages. The American Statistician 50, 361 – 365. Imhof, A. E., Gehrmann, R., Kloke, I. E., Roycroft, M., Wintrich, H. 1990. Lebenserwartungen in Deutschland vom 17. bis 19. Jahrhundert (Life Expectancies in Germany from the 17th to the 19th Century). Weinheim: VCH – Acta humaniora. International Statistical Institute 1986. Declaration on Professional Ethics. International Statistical Review 54, 227 – 242. Kaplan, E. L., Meier, P. 1958. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association 53, 457 – 481. Kendall, M., Stuart, A. 1977. The Advanced Theory of Statistics, vol. 1 (4th ed.). London: Charles Griﬃn & Comp. Kertzer, D. I. 1983. Generation as a Sociological Problem. Annual Review of Sociology 9, 125 – 149. Keyﬁtz, N. 1977. Applied Mathematical Demography. New York: Wiley. Klein, T. 1988. Mortalit¨tsver¨nderungen und Sterbetafelverzerrungen. Zeita a schrift f¨r Bev¨lkerungswissenschaft 14, 49 – 67. u o Klein, T. 1989. Bildungsexpansion und Geburtenr¨ckgang. K¨lner Zeitschrift u o f¨r Soziologie und Sozialpsychologie 41, 483 – 503. u

296
Klein, T. 1993. Soziale Determinanten der Lebenserwartung. K¨lner Zeitschrift o f¨r Soziologie und Sozialpsychologie 45, 712 – 730. u Knodel, J. E. 1974. The Decline of Fertility in Germany, 1871 – 1939. Princeton: Princeton University Press. Knodel, J. 1975. Ortssippenb¨cher als Quelle f¨r die Historische Demographie. u u Geschichte und Gesellschaft 1, 288–324. Kottmann, P. 1987. Verrechtlichung und Bev¨lkerungsweisen im industriellen o Deutschland. Historical Social Research, No. 41, 28–39. Leibniz, G. W. 1985. Kleine Schriften zur Metaphysik. Philosophische Schriften, Band 1, hrsg. von H. H. Holz. Darmstadt: Wissenschaftliche Buchgesellschaft. Leslie, P. H. 1945. On the Use of Matrices in Certain Population Mathematics. Biometrika 33, 183 – 212. Lexis, W. 1875. Einleitung in die Theorie der Bev¨lkerungsstatistik. Strassburg: o Tr¨bner. u Lilienbecker, T. 1991. Konstante Migrationsstr¨me im Modell der stabilen Beo v¨lkerung. In: G. Buttler, H.-J. Hoﬀmann-Nowotny, G. Schmitt-Rink (eds.), o Acta Demographica 1991, 63–80. Heidelberg: Physica-Verlag. Lindner, F. 1900. Die unehelichen Geburten als Sozialph¨nomen. Ein Beitrag a zur Statistik der Bev¨lkerungsbewegung im K¨nigreich Bayern. Leipzig: Deio o chert’sche Verlagsbuchhandlung. Lombard, L. B. 1986. Events. A Metaphysical Study. London: Routledge. Lorimer, F. 1959. The Development of Demography. In: P. M. Hauser, O. D. Duncan (eds.), The Study of Population, 124 – 179. Chicago: University of Chicago Press. Lotka, A. J. 1907. Relation Between Birth Rates and Death Rates. Science N.S. 26, 21 – 22. Lotka, A. J. 1922. The Stability of the Normal Age Distribution. Proceedings of the National Academy of Science of the USA 8, 339 – 345. Mannheim, K. 1952. The Problem of Generations. In: Essays on the Sociology of Knowledge, 276 – 320. London: Routledge & Kegan Paul. [This essay ﬁrst appeared in German: Das Problem der Generationen. K¨lner Viertelo u jahreshefte f¨r Soziologie 7 (1928), 157 – 185, 309 – 330.] Marschalck, P. 1984. Bev¨lkerungsgeschichte Deutschlands im 19. und 20. Jahro hundert. Frankfurt: Suhrkamp. Matras, J. 1973. Population and Societies. Englewood Cliﬀs: Prentice Hall. Mayer, K. U., Huinink, J. 1990. Age, Period, and Cohort in the Study of the Life Course: A Comparison of Classical A-P-C-Analysis with Event History Analysis. In: D. Magnusson, L. R. Bergman (eds.), Data Quality in Longitudinal Research, 211 – 232. Cambridge: Cambridge University Press. [A German version of this paper appeared in: K. U. Mayer (Hg.), Lebensverl¨ufe und a sozialer Wandel, 442 – 459. Opladen: Westdeutscher Verlag 1990.] Merrell, M. 1947. Time-speciﬁc Life Tables Contrasted with Observed Survivorship. Biometrics Bulletin 3, 129–136. Reprinted in: P. M. Hauser, O. D. Duncan (eds.), The Study of Population, 108 – 114. Chicago: University of Chicago Press 1956.

297
Meyer, K, R¨ckert, G. R. 1974. Allgemeine Sterbetafel 1970/72. Wirtschaft und u Statistik, Heft 7, 465 – 475, 392∗ – 395∗ . Meyer, K., Paul, C. 1991. Allgemeine Sterbetafel 1986/88. Wirtschaft und Statistik, Heft 6, 371 – 381, 234∗ – 241∗ . Mueller, U. 1993. Bev¨lkerungsstatistik und Bev¨lkerungsdynamik. Berlin: de o o Gruyter. Mueller, U. 2000. Die Maßzahlen der Bev¨lkerungsstatistik. In: U. Mueller, o B. Nauck, A. Diekmann (Hg.), Handbuch der Demographie, Bd. 1, 1 – 91. Berlin: Springer-Verlag. Mueller, U., Nauck, B., Diekmann, A. (Hg.) 2000. Handbuch der Demographie. Berlin: Springer-Verlag. Namboodiri, K., Suchindran, C. M. 1987. Life Table Techniques and their Applications. New York: Academic Press. Newell, C. 1988. Methods and Models in Demography. New York: Guilford Press. Olkin, I., Gleser, L. J., Derman, C. 1980. Probability Models and Applications. New York: Macmillan Publ. Pfeil, E. 1967. Der Kohortenansatz in der Soziologie. Ein Zugang zum Geneo u rationsproblem? K¨lner Zeitschrift f¨r Soziologie und Sozialpsychologie 19, 645 – 657. Pohl, K. 1995. Design und Struktur des deutschen FFS. Materialien zur Bev¨lkerungswissenschaft, Heft 82a. Wiesbaden: Bundesinstitut f¨r Bev¨lkeo u o rungsforschung. Porst, R. 1996. Aussch¨pfungen bei sozialwissenschaftlichen Umfragen. Die o Sicht der Institute. ZUMA-Arbeitsbericht 96/07. Mannheim: Zentrum f¨r u Umfragen, Methoden und Analysen. Pressat, R. 1972. Demographic Analysis. Methods, Results, Applications. Transl. from French by J. Matras. Foreword by N. Keyﬁtz. Chicago: Aldine & Atherton. Proebsting, H. 1984. Entwicklung der Sterblichkeit. Wirtschaft und Statistik, Heft 1, 13 – 24, 438∗ – 440∗ . Richards, E. G. 1998. Mapping Time. The Calendar and its History. Oxford: Oxford University Press. Riley, M. W. 1986. Overview and Highlights of a Sociological Perspective. In: A. B. Sørensen, F. E. Weinert, L. R. Sherrod (eds.), Human Development and the Life Course: Multidisciplinary Perspectives, 153 – 175. Hillsdale: Lawrence Erlbaum Ass. Rinne, H. 1996. Wirtschafts- und Bev¨lkerungsstatistik. 2. Auﬂ. M¨nchen: o u Oldenbourg. Rives, N. W., Serow, W. J. 1984. Introduction to Applied Demography. London: Sage. Rohwer, G., P¨tter, U. 2001. Grundz¨ge der sozialwissenschaftlichen Statistik. o u Weinheim: Juventa. Rohwer, G., P¨tter, U. 2002a. Methoden sozialwissenschaftlicher Datenkono struktion. Weinheim: Juventa.

298
Rohwer, G., P¨tter, U. 2002b. Wahrscheinlichkeit. Begriﬀ und Rhetorik in der o Sozialforschung. Weinheim: Juventa. Roloﬀ, J., Dorbritz, J. (Hg.) 1999. Familienbildung in Deutschland Anfang der 90er Jahre. Ergebnisse des deutschen Family and Fertility Survey. Opladen: Leske + Budrich. Rosow, I. 1978. What Is a Cohort and Why? Human Development 21, 65 – 75. R¨ckert, G.-R. 1975. Zur Bedeutung der Ver¨nderungen der Geburtenabst¨nde u a a in der Bundesrepublik Deutschland. Zeitschrift f¨r Bev¨lkerungswissenschaft u o 1, 85 – 93. Russell, B. 1996. The Principles of Mathematics (ﬁrst edition 1903). London: Norton & Comp. Ryder, N. B. 1964. The Process of Demographic Translation. Demography 1, 74 – 82. Ryder, N. B. 1965. The Cohort as a Concept in the Study of Social Change. American Sociological Review 30, 843 – 861. Ryder, N. B. 1968. Cohort Analysis. International Encyclopedia of the Social Sciences. Vol. 2, 546 – 550. Samuelson, P. A. 1952. Economic Theory and Mathematics – An Appraisal (with Discussion). American Economic Review, Papers and Proceedings, 56 – 73. Schepers, J., Wagner, G. 1989. Soziale Diﬀerenzen der Lebenserwartung in der Bundesrepublik Deutschland – Neue empirische Analysen. Zeitschrift f¨r u Sozialreform 35, 670 – 682. Schimpl-Neimanns, B., Frenzel, H. 1995. 1-Prozent-Stichprobe der Volks- und a Berufsz¨hlung 1970 – Datei mit Haushalts- und Familiennummern und revidierter Teilstichprobe f¨r West-Berlin. Dokumentation der Datenaufbereiu tung. Mannheim: ZUMA-Technischer Bericht T95/06. Schmid, C. 1993. Der Zugang zu den Daten der Demographie. ZUMA-Arbeitsbericht 93/07. Mannheim: Zentrum f¨r Umfragen, Methoden und Analysen. u Schmid, C. 2000. Zugang zu den Daten der Demographie. In: U. Mueller, B. Nauck, A. Diekmann (Hg.), Handbuch der Demographie, Band 1, 476 – 523. Berlin: Springer-Verlag. Schubnell, H. 1973. Der Geburtenr¨ckgang in der Bundesrepublik Deutschland. u Schriftenreihe des Bundesministers f¨r Jugend, Familie und Gesundheit, u Band 6. Bonn – Bad Godesberg: Bundesminister f¨r Jugend, Familie und u Gesundheit. Schubnell, H. , Herberger, L. 1970. Die Volkszaehlung am 27. Mai 1970. Wirtschaft und Statistik 22, Heft 4, 179 – 185. Sch¨tz, W. 1977. 100 Jahre Standes¨mter in Deutschland. Kleine Geschichte u a der b¨rgerlichen Eheschließung und der Buchf¨hrung des Personenstandes. u u Frankfurt: Verlag f¨r Standesamtswesen. u Schwarz, K. 1964. Allgemeine Sterbetafel f¨r die Bundesrepublik Deutschland u 1960/62. Wirtschaft und Statistik, Heft 7. Schwarz, K. 1973. Ver¨nderung der Geburtenabst¨nde und Auswirkungen auf a a die Geburtenentwicklung. Wirtschaft und Statistik, Heft 11, 638 – 641. Schwarz, K. 1974. Die Frauen nach der Kinderzahl. Ergebnis der Volksz¨hlung a am 27. Mai 1970. Wirtschaft und Statistik 26, Heft 6, 404 – 410.

299
Shryock, H. S., Siegel, J. S. 1976. The Methods and Materials of Demography. Condensed Edition by E. G. Stockwell. New York: Academic Press. Oaks: Sage. Sivamurthy, M. 1982. Growth and Structure of Human Population in the Presence of Migration. New York: Academic Press. o Statistisches Bundesamt 1972. Bev¨lkerung und Wirtschaft 1872 – 1972. Hera a ausgegeben anl¨ßlich des 100 j¨hrigen Bestehens der zentralen amtlichen Statistik. Wiesbaden: Kohlhammer. Statistisches Bundesamt 1985. Bev¨lkerung gestern, heute und morgen. Bearo beitet von Helmut Proebsting. Wiesbaden: Kohlhammer. Tuma, N. B., Huinink, J. 1990. Postwar Fertility Patterns in the Federal Republic of Germany. In: K. U. Mayer, N. B. Tuma (eds.), Event History Analysis in Life Course Research, 146 – 169. Madison: University of Wisconsin Press. United Nations 1958. Multilingual Demographic Dictionary. Prepared by the Demographic Dictionary Committee of the International Union for the Scientiﬁc Study of Population. English Section. New York: United Nations, Department of Economic and Social Aﬀairs. Wagner, M. 1996. Lebensverl¨ufe und gesellschaftlicher Wandel: Die westdeuta schen Teilstudien. ZA-Informationen 38, 20 – 27. Wagner, M. 2001. Kohortenstudien in Deutschland. In: Kommission zur Verbesserung der informationellen Infrastruktur zwischen Wissenschaft und Statistik (Hg.), Wege zu einer besseren informationellen Infrastruktur. BadenBaden: Nomos. White, A. R. 1975. Modal Thinking. Oxford: Basil Blackwell. Winkler, W. 1960. Mehrsprachiges demographisches W¨rterbuch. Deutschsprao chige Fassung bearbeitet auf der Grundlage der von einer W¨rterbuchkomo mission der Union Internationale pour L’Etude Scientiﬁque de la Population erstellten und von den Vereinten Nationen ver¨ﬀentlichten franz¨siso o a chen, englischen und spanischen Ausgaben. Universit¨t Hamburg: Deutsche Akademie f¨r Bev¨lkerungswissenschaft. u o W¨rzberger, P., St¨rtzbach, B., St¨rmer, B. 1986. u o u Volksz¨hlung 1987. a Rechtliche Grundlagen nach dem Urteil des Bundesverfassungsgerichts vom 15. Dezember 1983. Wirtschaft und Statistik, Heft 12, 927 – 957. Wunsch, G. J., Termote, M. G. 1978. Introduction to Demographic Analysis. Principles and Methods. New York: Plenum Press. Young, C. M. 1978. Cohort Analysis of Mortality – An Historical Survey of the Literature. Working Papers in Demographie No. 10. Department of Demography, Research School of Social Sciences, Australian National University, Canberra.

301

Name Index
Alt, C., 247 Anton, H., 262, 277 B¨rgin, G., 44 u Bach, W., 185 Balzer, W., 48 Baumol, W. J., 48, 49 Birg, H., 214 Blossfeld, H.-P., 225 Bolte, K. M., 173 Borst, A., 18, 20 Bortkiewicz, L. v., 10, 103 Bosse, H. P., 169 Brand, M., 14 Cantor, G., 288 Danto, A., 15 Demetrius, L., 262 Derman, C., 48 Dinkel, R. H., 13, 120, 123, 212, 217 Dorbritz, J., 241 Duncan, O. D., 10 Esenwein-Rothe, I., 30, 34 Fan, Y., 191 Feichtinger, G., 9 Festy, P., 241 Filip, D., 214 Fisher, R. A., 38 Fl¨thmann, E.-J., 214 o Flandern, T. C. van, 21 Flask¨mper, P., 104 a Fliegel, H. F., 21 Frege, G., 39 Frenzel, H., 185 Frey, G., 48 Frobenius, G., 258 Galton, F., 44 Gantmacher, F. R., 258 Glenn, N. D., 35, 36 Gleser, L. J., 48 H¨hn, C., 120 o Hacker, P. M. S., 14 Handl, J., 185 Hauser, P. M., 10 Hendry, D. F., 48 Herberger, L., 184 Huinink, J., 36, 173, 225 Hullen, G., 241 Hyndman, R. J., 191 Imhof, A. E., 124 Kaplan, E. L., 131 Kappe, D., 173 Kendall, M., 8 Kertzer, D. I., 35 Keyﬁtz, N., 13 Klein, T., 138, 233 Knapp, G. F., 34 Knodel, A. E., 124 Knodel, J. E., 64 Kottmann, P., 173 Leibniz, G. W., 18 Leslie, P. H., 257 Lexis, W., 9, 34 Lilienbecker, T., 281 Lindner, F., 173 Lombard, L. B., 14 Lorimer, F., 8 Lotka, A. J., 255 M¨ller, W., 185 u Mannheim, K., 35 Marschalck, P., 64 Matras, J., 7, 173 Mayer, K. U., 36, 225 Meier, P., 131 Meinl, E., 123 Merrell, M., 118 Mueller, U., 9, 30, 165 Namboodiri, K., 104, 106 Newell, C., 165, 213 Olkin, I., 48 P¨tter, U., 19, 39, 98, 252 o Pfeil, E., 35

Pohl, K., 241 Porst, R., 224 Pressat, R., 9, 165 Prioux, F., 241 Proebsting, H., 111 R¨ckert, G.-R., 201 u Richard, J.-F., 48 Richards, E. G., 20 Riley, M. W., 33 Rinne, H., 32 Rohwer, G., 19, 39, 98, 252 Roloﬀ, J., 241 Rorres, C., 262, 277 Rosow, I., 36 Russell, B., 39 Ryder, N. B., 35, 36, 205 Samuelson, P. A., 48 Scaliger, J., 21 Sch¨tz, W., 59 u Schepers, J., 138 Schimpl-Neimanns, B., 185 Schmid, C., 57, 58 Schmid, J., 173 Schnorr-B¨cker, S., 44 a Scholz, R. D., 120 Schubnell, H., 184 Schwarz, K., 185 Shryock, H. S., 10 Siegel, J. S., 10 Sivamurthy, M., 281 St¨rtzbach, B., 57 o St¨rmer, B., 57 u Stuart, A., 8 Suchindran, C. M., 104, 106 Tremote, M. G., 28 Tuma, N. B., 225 W¨rzberger, P., 57 u Wagner, G., 138 Wagner, M., 35, 225 White, A. R., 48 Winkler, W., 7 Wunsch, G. J., 28 Young, C. M., 118

303

Subject Index
Accounting equation, 29, 66 Age distribution, 71 stable, 257 Age, measures of, 33 Age-period diagram, 34 Birth cohort, 35 Birth rate age-speciﬁc, 167, 253 age-speciﬁc cohort, 171 crude, 165 cumulated cohort, 171 general, 165 total, 169 Calendar, 19 Cartesian product, 290 Censored observations, 132 Census, 57 Cohort, 35 retrospective, 182 Cohort birth rate completed, 172 cumulated, 171 Cross-tabulation, 76 Data matrix, 42 Death rate age-speciﬁc, 85, 253 age-speciﬁc cohort, 119 crude, 85 standardized, 88 Demographic process, 28 Demography deﬁnition, 7 formal, 9 Distribution conditional, 77 statistical, 8 Distribution function, 92, 95 Duration variable, 94, 131 Event set, 96, 132 Flow quantity, 31 Frequency curve, 73 Frequency function, 45 Function range, 42 counterdomain, 291 domain, 291 image, 291 range, 291 set function, 291 Gross reproduction rate, 273 Growth rate, 32, 255 intrinsic, 257 Indicator variable, 150 Kaplan-Meier procedure, 131, 139, 227 Left truncation, 142 Lexis diagram, 34 Life table, 97 cohort, 97 period, 97 Mean generational distance, 279 Mean growth rate, 33 Mean life length, 90 Mean residual life, 111 Mean value, 41 conditional, 102 Median, 94 Midyear population size, 29 Migration, 281 Model general notion, 49 statistical, 51 Net reproduction rate, 274 Old age dependency ratio, 284 Parity progression rate, 213 Partition, 289 Population pyramid, 78 Power set, 290 Property set, 45 Property space, 39 conceptual, 42

realized, 42 two-dimensional, 75 Quantile, 191 Rate, 31 Rate function, 96 Reference set, 40 Reproductive period, 166 Retrospective survey, 182 Risk set, 96, 132 Set function, 291 Sex ratio, 79 Society, 7 Stable population, 255 Stock quantity, 31 Structure statistical deﬁnition, 8 Survivor function, 95 conditional, 120 Time axis, 13 Total birth rate, 273 Variable duration, 94 logical, 38 spatial, 43 statistical, 39 two-dimensional, 75

Demography of Germany

Similar Documents

Asdfghjkl

The Paper

Germany: General Background

Albert Einstein's Why Socialism ?

Women & Franco

Demographic Transitional Theory

Mba Student

Nuclear Tech Paper

Advanced Integrated Systems

European Tourism Law

Consumer Perception Towards Organic Food

Corporate Governance

Ficci Make in India Report

Toys R Us

New Beetle

Popular Essays