Child Gender Ratios

By | 2012-10-01

There’s a country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female? (You may assume the question is asked as an expectation of course, since any particular country can be anything in principle — 100% girls is possible, just not likely)

It’s (reportedly) asked as one of the Google interview questions.

The non-mathematician can’t tell you the answer, but believes that the above policy results in more boys. They’re wrong (well, ish, for practical purposes it’s good enough). This answer is exactly what (reportedly) Google expect:

Assuming a random arrival pattern of boys and girls: – half of all couples will have a boy as their first child and that is the end of that. – if the other half, who’ve had a girl, try again, half of them will go on to have a boy and half will go on to have another girl.

So out of 100 couples, we end up with: – 50 having one boy = 50 boys – 25 having one girl and one boy = 25 girls and 25 boys – 25 having two girls = 50 girls

… and so on.

The logic is good (and it is the logic I had remembered for a long time). It basically says: “any given birth has a 50% chance of being a girl”, therefore the number of girls in the country will be number of births * 0.5; hence the fraction of the population that is female is 50%. Regardless of stopping criteria.

It took repeated reads of Steve Landsburg’s blog to convince me that even this, cleverer, answer is not correct either.

The faulty assumption in the above analysis is shown by this fact:

E[G]E[G+B]E[GG+B]

(

E[]

being the expectation operator.) That is to say that the expectation of a ratio is not necessarily equal to the ratio of the expectations.

The error is made by calculating the expectation over all arrangements of individuals instead of over all arrangements of countries.

We’ll first answer a simpler question by considering a country with only one family. We calculate expectation over all possible one family countries. The possible arrangements of children in that one family country are:

N  children   % girls     likelihood
-----------------------------------------------------------
0  B             0%        0.5
1  GB           50%        0.5 * 0.5
2  GGB          66%        0.5 * 0.5 * 0.5
3  GGGB         75%        0.5 * 0.5 * 0.5 * 0.5
4  GGGGB        80%        0.5 * 0.5 * 0.5 * 0.5 * 0.5
5  GGGGGB       83%        0.5 * 0.5 * 0.5 * 0.5 * 0.5 * 0.5
... etc ...
n  nG+B        n/(n+1)     0.5^(n+1)

Remember that expected value is the sum of values multiplied by the probability of that value. So for the specific case of the one family country, we are simply summing up the product of columns three and four in the above table:

E1[GG+B]=n=012n+1nn+1

Fortunately this is a convergent series, so has a real answer (which the mathoverflow link tells me is):

1ln(2)=30.69

This should already be sufficient to convince you of the difference between the expectation of a ratio, and the ratio of expectations.

We can do the same for countries with two families; although it gets horrible looking pretty quickly:

children       % girls     likelihood
------------------------------------------------------------
B / B            0%        (0.5) * (0.5)
B / GB          33%        (0.5) * (0.5 * 0.5)
B / GGB         50%        (0.5) * (0.5 * 0.5 * 0.5)
B / GGGB        60%        (0.5) * (0.5 * 0.5 * 0.5 * 0.5)
B / GGGGB       66%        (0.5) * (0.5 * 0.5 * 0.5 * 0.5 * 0.5)
B / GGGGGB      71%        (0.5) * (0.5 * 0.5 * 0.5 * 0.5 * 0.5 * 0.5)
 ... etc ...
GB / B          33%        (0.5 * 0.5) * (0.5)
GB / GB         50%        (0.5 * 0.5) * (0.5 * 0.5)
GB / GGB        60%        (0.5 * 0.5) * (0.5 * 0.5 * 0.5)
GB / GGGB       66%        (0.5 * 0.5) * (0.5 * 0.5 * 0.5 * 0.5)
GB / GGGGB      71%        (0.5 * 0.5) * (0.5 * 0.5 * 0.5 * 0.5 * 0.5)
GB / GGGGGB     75%        (0.5 * 0.5) * (0.5 * 0.5 * 0.5 * 0.5 * 0.5 * 0.5)
 ... etc ...

Yuck. Regardless of how nasty this is getting, the expectation for two families is:

E2[GG+B]=n=0n+12n+2nn+2

I’m afraid, that the mathoverflow article loses me then; as I have never heard of the “digamma function” it talks about. I can see what it’s doing though — it’s simply a way of converting the infinite sum expression into a direct equation.

Leaving that nightmare to the real mathematicians, for our purposes all we care about is that

Ek[GG+B]1214k50%

as

k

. Importantly though: it never quite reaches it.

Executive summary: The expected percentage of girls in a country of

k

families, operating the “stop on boy” policy, is less than 50%.


There is actually an awful lot more to this problem. A fascinating follow up guest post at Steve Landsburg’s is worth reading.

Leave a Reply