Chinese Fertility, Sex-Ratios and the 1-child policy

In this lab we will look at results from the 1990 census. A 1% sample of the census has been made available by the IPUMS International project at the University of Minnesota. We’ve created a 10% sample of this 1% sample (a 1 in 1,000 sample) and prepared it for you. The data set is quite large, covering about 200,000 households.

We saw that, at the aggregate, the 1-child policy did not cause major changes in the level of fertility. However, we may be able to detect more subtle changes. Chinese parents may have reacted to the 1-CP not by reducing the number of children but by trying to influence the sex of their children either by sex-selective abortion, hiding girls when the census take came, or trying for another child when they only had girls (taking advantage of exceptions to the 1-CP).

In this lab we look at the sex-ratio at birth and how parents decisions to have more children may have depended on the sex composition of the children they already have. Using the 1990 census, we will also try to see if Chinese couples became more strategic regarding the sex of their children after the introduction of the 1-child policy in 1979.

Our goals in this analysis are

  1. To look for signs of sex-preference based on the birth order and sex composition of the children that have been born

  2. To see if the 1-child policy may have amplified son-preference.

We will look at the following indicators:

(Parity is the number of births a woman has had. So a woman of parity 3 has had 3 births. The parity progression ratio is defined as PPRi, probability of having a child of order i+1 among women of parity i. So, if PPR3=.2 it means that 20 percent of women who have 3 children “progress” to having a 4th birth.)

Part 0. Load the data

# leave this chunk as-is, but do press the green button to load quiz info
tot = 0
answer.key = "eJytVctu2zAQvOcrFulBCeDKtWMUbYEiyAtoLkXRFOgxoKWVREQiFZKK4n59Z0nZceAeegiQyCYp7s7OzK7zRzN0a3Yf8gV9JTyP8sI6x0XIlfHj9uD4+vgob7QJ0/JXoz3hT1HBxg+ebEXFgHsmkGOvS3zBC4ZCw9TYwXNj2xIxtPHBDUXQ1vgp1k/uW1VwfPX7BY06NLSxg6MEgE44r/MZZReZBHwcbGB/ilD8jHtGSagp0qXF1ewqw82Ssmt8OqYCmbQPgiyGljSlCiqnyyHQmgsFdLKLp7wv5yhhaEOsSr3An2qdxdAofksU/UZEa7JAqgiDatsNPRg7AsUmgLOa1NoiFZYEgKioaHRbgivkUoE6hX0XP4wN1KgnBiyc8hPeXVtnclS702kZdVoe6rSMFOx0isvbiioWEmvtWk/4xjEinSCj4966wOXpjEYBMtoBRTaq7wWapV7fV+cHmi3fTLMYKbvc53Knxz7qE8ctroAWgFrbjT+doB7ljssBMATpK5LOIklnhyTJwfHNjqS4vJ3UE7Ee5Bmt7XXXt5xHP2FpdME7dDoQd32jvP7DfuIRQs9oPUgYsJzBPAZGgH2QKBmp5MKx8jyLOqN1WlZOdlttsAmiemdjKGtAQKeQ/oD+szejP0bKbjKxb6EMeeaEPTV27ewo5q1Vj7rDyJy6+QCjpFj4EJtuif+1dqFBBLEb1RxENYAz9b7zowlVFbAnQRdXP1I/yqJTiJiIBqbOR9lTsdokCv9JlrAsQBKAnC7QcJhEKvZxg4ZEHIjBrncc1fbn+55ZRc+sDj2zet1YqzQAOc4WsSGwdRgXGnYhQQedqLKpsEnd/xR39Wbirg57CwTvht3r6T2LuTw/k5PrU1mq9ZF3bap2YFMwKt28mB31YNyhK8D0NzvKsIKHA4wv7k7eHpvNVjbMX+lgOYMMKQp6o5Z5aF0pky75JrV2qatK9jYCK6c7O6OeHWaT36HnJ/mZKaLseOm95xZVShLM28gz7sHTTQi9/zKfs8lH/aB7LrXKravnsprf7d+83958dwWtVVRx8fnTR2rVKIu1LkspfS8dshz/BYXWdSM="
library(quizify)
source.coded.txt(answer.key)
df <- read.csv("/data175/china_1990_first_three_kids_by_age_and_sex.csv",
                na.string = "")


## display
head(df)
##   hhserial age1 age2 age3 sex1 sex2 sex3    educmom    educpop    agric
## 1    19000   17   NA   NA    f <NA> <NA> university university nonagric
## 2    26000    5   NA   NA    f <NA> <NA>  secondary university nonagric
## 3    50000   19   16   NA    m    f <NA>  secondary university nonagric
## 4    52000    2   NA   NA    m <NA> <NA> university university nonagric
## 5    70000   18   NA   NA    m <NA> <NA>    primary  secondary nonagric
## 6    71000    7   NA   NA    f <NA> <NA>  secondary  secondary nonagric
##   nkids
## 1     1
## 2     1
## 3     2
## 4     1
## 5     1
## 6     1
## for convenience we assign the contents of this data.frame to
## individual vectors.
sex1 <- df$sex1
sex2 <- df$sex2
sex3 <- df$sex3
age1 <- df$age1
age2 <- df$age2
age3 <- df$age3
nkids <- df$nkids

Note: We ignore educmom, educpop, and agric for this lab.

An “NA” means “not applicable” or “not available”. Here an “NA” in age2 means there was only 1 child, and an “NA” in age3, following a number for age2 means there were only 2 children. (The same applies to NAs in sex2 and sex3).

We have listed only the first 3 births (omitting all higher order births). We have also dropped all of the households with zero children.

Some additional considerations are

Young children (particularly those younger than school age) are thought to be underreported in the Chinese census. This underreporting is thought in part to be due to trying to avoid the fines associated with the 1-child policy.

We are looking at the sexes of surviving children. We don’t see those who were born and then who died.

Q0.1 Household number 50000 has

A. 19 boys and 16 girls

B. A 19 year old boy and a 16 year old girl ever born.

C. A 19 year old girl and a 16 year old boy ever born.

D. A 19 year old boy and a 16 year old girl reported in the census as living in their household.

##  "Replace the NA with your answer (e.g., 'A' in quotes)"
answer0.1 = 'D'
quiz.check(answer0.1)
## Your  answer0.1 : D
## Correct.
## Explanation:  Both 'C' and 'D' are consistent with the data. But because these are the results of a household census, 'D' is correct. We don't actually know anything about any other children that may or may not have been ever born.
## Write your own code
## df[df$hhserial == somenumber,]
df[df$hhserial == 50000,] # looks up that particular household serial
##   hhserial age1 age2 age3 sex1 sex2 sex3   educmom    educpop    agric
## 3    50000   19   16   NA    m    f <NA> secondary university nonagric
##   nkids
## 3     2

Part 1. The proportion female (by age and birth order)

We begin by looking at the proportion female πf by age in 1990 and by birth order.

Our hypotheses is that the 1-child policy might have caused the proportion of female births to decline. It may have also encouraged underreporitng of surviving girls. Differential mortality and infanticide are not thought to be major causes of sex differences in surviving children.

Let’s first look at the proportion female πf by age in 1990

get.prop.female <- function(sex)
{
    ## this function takes a vector of sexes ("m" and "f")
    ## and computes the proportion female
    round(prop.table(table(sex[!is.na(sex)]))["f"],4)
    ## Technical note: this is the original version of this function, which
    ## should now work fine since all the empty strings "" in the
    ## 'sex' variable have been read-in as "NA".
}

## usage example:

pi.1 <- get.prop.female(sex = sex1)     # returns proportion oldest
print(pi.1)                             # children (1st births that
##      f 
## 0.4751
                                        # are girls


age.vec = 0:15                          # this is a vector of ages in
                                        # 1990 that we are interested
                                        # in. We loop through these
                                        # values.

pi.1.by.age.vec <- NULL             # initialize a vector for storing
                                        # proportion female (pi) of
                                        # 1st births by age of oldest
                                        # child (1st birth)

## now we loop through the ages and get proportion female by age in 1990

for (i in 1:length(age.vec))
{
    this.age = age.vec[i]
    pi.1.by.age.vec[i] <- get.prop.female(sex1[age1 == this.age])
}

print(pi.1.by.age.vec)
##  [1] 0.4940 0.4801 0.4989 0.4849 0.4922 0.4925 0.4726 0.4760 0.4712 0.4776
## [11] 0.4638 0.4720 0.4704 0.4681 0.4645 0.4678

And plot results

plot(age.vec, pi.1.by.age.vec,
     type = "l",
     ylim = c(.4, .6))
abline(v = 10, lty = 2) # approximate begining of 1-CP (1990 - 10 =
                        # 1980 = about 9 months after 1-CP introduction
                        # in 1979
abline(h = .5, lty = 2)

Interesting. It looks like the proportion of 1st births that are girls is increasing as we go forward in time (The figure shows decreases with age in 1990, but the older children were born earlier.) This is the opposite of what we expected.

Q1.1 We were expecting …

A. πf to increase with age, because fewer girls would be born (or reported) after the 1-CP.

B. πf to decrease with age, because fewer girls would be born (or reported) after the 1-CP.

C. No change in πf because π-day was last week and π always equals 3.14…

##  "Replace the NA with your answer (e.g., 'A' in quotes)"
answer0.2 = 'B'
quiz.check(answer0.2)
## Your  answer0.2 : B
## Correct.
## Explanation:  'B' is correct because fewer girls (relative to boys) would 
## .reduce pi_f.

Let’s see if the same suprising trend holds true for 2nd births.

We have copied and pasted the relevant code above to calculate “pi.2.by.age.vec” and plot this on the same graph using the following commands. Insert your name using the text function in the upper left. For the graded questions you will submit a copy of this graph.

pi.2.by.age.vec <- NULL             # initialize a vector for storing
                                        # proportion female (pi) of
                                        # 1st births by age of oldest
                                        # child (1st birth)

## now we loop through the ages and get proportion female by age in 1990
for (i in 1:length(age.vec))
{
    this.age = age.vec[i]
    pi.2.by.age.vec[i] <- get.prop.female(sex2[age2 == this.age])
}
print(pi.2.by.age.vec)
##  [1] 0.4512 0.4523 0.4559 0.4715 0.4629 0.4449 0.4432 0.4423 0.4477 0.4686
## [11] 0.4568 0.4524 0.4641 0.4668 0.4475 0.4665
plot(age.vec, pi.1.by.age.vec,
     type = "l",
     ylim = c(.4, .6))
lines(age.vec, pi.2.by.age.vec,
      type = "l", col = "red")
abline(v = 10, lty = 2) # approximate begining of 1-CP
text(2, .58, "graph by \n d. suryakusuma")
 text(x = 5, y = .42, "2nd births", col = "red")
 text(x = 5, y = .5, "1st births", col = "black")
 abline(h = .45, lty = 3, col = "red")

Wow. Now we see that 2nd births are less likely than 1st births to be girls (and more likely to be boys). This effect is strongest for the younger children, born after the 1-CP.

Q1.2 The 1-CP appears to have …

A. Reduced the proportion female of 1st births.

B. Reduced the proportion female of 2nd births.

C. Reduced the proportion female of 2nd children reported in the census

D. Done nothing. There is no trend in π2 by age.

E. It’s complicated. In order to see something, we need to consider the difference in proportion female between 1st and 2nd births, before and after the 1 child policy.

##  "Replace the NA with your answer (e.g., 'A' in quotes)"
answer0.3 = 'E'
quiz.check(answer0.3)
## Your  answer0.3 : E
## Correct.
## Explanation:  'E' We can see there is a growing gap between the proportion female in 1st and 2nd births as we get to younger children born after the 1CP. But the main cause seems to be the increase in proportion female of 1st births. Any ideas of how to interpret this?

Sex ratio by birth order (and age)

To finish up this part of the analysis let’s look to see

  1. If the proportion female declines with birth order (without accounting for age)

  2. If the proportion female declines with birth order and with the 1-CP (accounting for age)

Without age:

print("sex1")
## [1] "sex1"
pi.1 = get.prop.female(df$sex1)
print(pi.1)
##      f 
## 0.4751
print("sex2")
## [1] "sex2"
pi.2 = get.prop.female(df$sex2)
print(pi.2)
##      f 
## 0.4567
print("sex3")
## [1] "sex3"
pi.3 = get.prop.female(df$sex3)
print(pi.3)
##      f 
## 0.4337
## insert command for pi.3 = in next line

# print(pi.3) # un-comment this line to print

Q1.3 What happens to the proportion female by birth order.

A. It falls consistently, indicating more sex-selection in favor of sons for higher-order births.

B. It falls consistently, indicating more sex-selection (or under-reporting) in favor of sons for higher-order births.

C. There is no clear pattern

D. It falls from 1st to 2nd births, but rises for 3rd births.

##  "Replace the NA with your answer (e.g., 'A' in quotes)"
answer0.4 = 'B'
quiz.check(answer0.4)
## Your  answer0.4 : B
## Correct.
## Explanation:  'B' is correct. Because this is a census, the sex ratio could also be influenced by reporting practices. However, it's not clear why the incentive not to report higher order births would differ by sex. So, perhaps this is evidence of sex-selective abortion. See https://en.wikipedia.org/wiki/Sex-selective_abortion#China for 1986 law forbidding sex-selection.

With age:

Here we cut-and-paste the code above to create “pi.2.by.age.vec” and “pi.3.by.age.vec”

# initialize a vector for storing proportion female (pi) of 1st births by age of oldest child (1st birth)
pi.1.by.age.vec <- NULL
pi.2.by.age.vec <- NULL
pi.3.by.age.vec <- NULL

## now we loop through the ages and get proportion female by age in 1990
for (i in 1:length(age.vec))
{
    this.age = age.vec[i]
    pi.1.by.age.vec[i] <- get.prop.female(sex1[age1 == this.age])
    pi.2.by.age.vec[i] <- get.prop.female(sex2[age2 == this.age])
    pi.3.by.age.vec[i] <- get.prop.female(sex2[age3 == this.age])
}
print(pi.2.by.age.vec)
##  [1] 0.4512 0.4523 0.4559 0.4715 0.4629 0.4449 0.4432 0.4423 0.4477 0.4686
## [11] 0.4568 0.4524 0.4641 0.4668 0.4475 0.4665
print(pi.3.by.age.vec)
##  [1] 0.5620 0.5722 0.5371 0.5470 0.5172 0.5050 0.5555 0.5276 0.5387 0.5402
## [11] 0.5481 0.5239 0.5199 0.5083 0.5158 0.5297
## insert code here for plotting pi.1.age.vec, pi.2.by.age.vec, and
## pi.3.age.vec all on same graph (with pi.3.age.vec in "blue")

plot(age.vec, pi.1.by.age.vec,
     type = "l",
     ylim = c(.4, .6))
lines(age.vec, pi.2.by.age.vec,
      type = "l", col = "red")
lines(age.vec, pi.3.by.age.vec,
      type = "l", col = "blue")
abline(v = 10, lty = 2) # approximate begining of 1-CP
text(2, .59, "graph by \n d. suryakusuma")
 text(x = 5, y = .55, "3rd births", col = "blue")
 text(x = 5, y = .42, "2nd births", col = "red")
 text(x = 5, y = .48, "1st births", col = "black")
 abline(h = .45, lty = 3, col = "red")

Q1.4 What does adding 3rd births do to the story?

A. Not much. The pattern is too noisy.

B. It re-inforces what we’ve already found, that there is sex-selection for higher births that might have increased after 1-CP

C. It contradicts what we’ve already found.

D. Nothing. pi is always 3.14 ….

For discussion (No auto-correct answers available.)

Part 2. A closer look. How the decision to have another child

depends on sex of children already born.

The evidence we have looked at above is fairly indirect, averaging across all families. We saw that the higher order births were more likely to be boys than girls.

We can go further and look at variation between households. We can take advantage of the fact that the first child is sometimes a boy and sometimes a girl and see if those who have a girl are more likely to “try again” for a boy.

Parity progression to 2, based on sex of birth 1.

We now calculate PPR1 (the progression from parity 1 to a second birth). We do this twice, once for those whose first child was a boy and once for those whose first child was a girl. We include age of the 1st child so that we take into account that it takes time to have a conceive and give birth to a 2nd child.

We show how to do this in full for 2nd births. For your assignment you will look at 3rd births.

get.ppr.by.parity <- function(nkids, parity)
{
    num = sum(nkids > parity)          # number of hh in nkids vec that
                                        # have another birth

    denom = length(nkids)               # number of hh in nkids vector
    ppr = num/denom
    return(ppr)
}


ppr1.sex1f.vec = NULL               # ppr for families with sex1 = "f"
ppr1.sex1m.vec = NULL               # ppr for families with sex1 = "m"

for (i in 1:length(age.vec))
{
    this.age = age.vec[i]
    ppr1.sex1f.vec[i] <- get.ppr.by.parity(nkids[sex1 == "f" &
                                                 age1 == this.age],
                                           parity = 1)
    ppr1.sex1m.vec[i] <- get.ppr.by.parity(nkids[sex1 == "m" &
                                                 age1 == this.age],
                                           parity = 1)
}

Now let’s plot

plot(age.vec, ppr1.sex1f.vec, #female
     col = "pink", lwd = 3, type = "l",
     ylab = "ppr1",
     xlab = "age1",
     main = "Parity progression from 1 to 2, by sex of 1st child")
lines(age.vec, ppr1.sex1m.vec, col = "blue", lwd = 3, type = "l") #male
abline(v = 10, lty = 2)