Example: Trends in Gender

We are now equipped with enough coding skills to examine features and trends in subgroups of the U.S. population. In this example, we will look at the distribution of males and females across age groups. We will continue using the us_pop table from the previous section, but with all years.

us_pop

AGE	2010	2011	2012	2013	2014	2015
0	3951330	3963087	3926540	3931141	3949775	3978038
1	3957888	3966551	3977939	3942872	3949776	3968564
2	4090862	3971565	3980095	3992720	3959664	3966583
3	4111920	4102470	3983157	3992734	4007079	3974061
4	4077551	4122294	4112849	3994449	4005716	4020035
5	4064653	4087709	4132242	4123626	4006900	4018158
6	4073013	4074993	4097605	4142916	4135930	4019207
7	4043046	4083225	4084913	4108349	4155326	4148360
8	4025604	4053203	4093177	4095711	4120903	4167887
9	4125415	4035710	4063152	4104072	4108349	4133564

... (296 rows omitted)

As we know from having examined this dataset earlier, a description of the table appears online. Here is a reminder of what the table contains.

Each row represents an age group. The SEX column contains numeric codes: 0 stands for the total, 1 for male, and 2 for female. The AGE column contains ages in completed years, but the special value 999 represents the entire population regardless of age. The rest of the columns contain estimates of the US population.

Understanding `AGE` = 100

As a preliminary, let’s interpret data in the final age category in the table, where AGE is 100. The code below extracts the rows for the combined group of men and women (SEX code 0) for the highest ages.

us_pop.where('SEX', are.equal_to(0)).where('AGE', are.between(97, 101))

AGE	2010	2011	2012	2013	2014	2015
97	68893	73274	77156	79953	83089	92377
98	47037	50670	54509	57015	59726	61991
99	32178	33636	36779	39271	41468	43641
100	54410	57702	61821	66189	71626	76974

Not surprisingly, the numbers of people are smaller at higher ages – for example, there are fewer 99-year-olds than 98-year-olds.

It does come as a surprise, though, that the numbers for AGE 100 are quite a bit larger than those for age 99. A closer examination of the documentation shows that it’s because the Census Bureau used 100 as the code for everyone aged 100 or more.

The row with AGE 100 doesn’t just represent 100-year-olds – it also includes those who are older than 100. That is why the numbers in that row are larger than in the row for the 99-year-olds.

Overall Proportions of Males and Females

We will now begin looking at gender ratios in any year (you can select the year e.g. 2014). First, let’s look at all the age groups together. Remember that this means looking at the rows where the “age” is coded 999. The function all_ages returns a table containing this information. There are three rows: one for the total of both genders, one for males (SEX code 1), and one for females (SEX code 2).

def all_ages(year="2014"):
    us_pop_year = us_pop.select(list(us_pop.labels[:2])+[year])
    return us_pop_year.where('AGE', are.equal_to(999))
interact(all_ages, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

Row 0 of all_ages contains the total U.S. population in each of the two years. The United States had just under 319 million in 2014.

Row 1 contains the counts for males and Row 2 for females. Compare these two rows to see that in 2014, there were more females than males in the United States.

The population counts in Row 1 and Row 2 add up to the total population in Row 0.

For comparability with other quantities, we will need to convert these counts to percents out of the total population. Let’s access the total for a particular year and name it pop_base. Then, we’ll show a population table with a proportion column. Consistent with our earlier observation that there were more females than males, about 50.8% of the population in 2014 was female and about 49.2% male in each of the two years.

def all_ages_proportion(year="2014"):
    pop_base = all_ages(year).column(year).item(0)
    return all_ages(year).with_column('Proportion', all_ages(year).column((year))/pop_base).set_format('Proportion', PercentFormatter)
interact(all_ages_proportion, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

Proportions of Boys and Girls among Infants

When we look at infants, however, the opposite is true. Let’s define infants to be babies who have not yet completed one year, represented in the row corresponding to AGE 0. Here are their numbers in the population. You can see that male infants outnumbered female infants.

def infants(year="2014"):
    us_pop_year = us_pop.select(list(us_pop.labels[:2])+[year])
    return us_pop_year.where('AGE', are.equal_to(0))
interact(infants, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

As before, we can convert these counts to percents out of the total numbers of infants. The resulting table shows that in 2014, just over 51% of infants in the U.S. were male.

def infants_proportion(year="2014"):
    infants_year = infants(year).column(year).item(0)
    return infants(year).with_column('Proportion', infants(year).column(year)/infants_year).set_format('Proportion', PercentFormatter)
interact(infants_proportion, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

In fact, it has long been observed that the proportion of boys among newborns is slightly more than 1/2. The reason for this is not thoroughly understood, and scientists are still working on it.

Female:Male Gender Ratio at Each Age

We have seen that while there are more baby boys than baby girls, there are more females than males overall. So it’s clear that the split between genders must vary across age groups.

To study this variation, we will separate out the data for the females and the males, and eliminate the row where all the ages are aggregated and AGE is coded as 999.

The tables females and males contain the data for each the two genders.

def females(year="2014"):
    us_pop_year = us_pop.select(list(us_pop.labels[:2])+[year])
    return us_pop_year.where('SEX', are.equal_to(2)).where('AGE', are.not_equal_to(999))
interact(females, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

def males(year="2014"):
    us_pop_year = us_pop.select(list(us_pop.labels[:2])+[year])
    return us_pop_year.where('SEX', are.equal_to(1)).where('AGE', are.not_equal_to(999))
interact(males, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

The plan now is to compare the number of women and the number of men at each age, for each of the two years. Array and Table methods give us straightforward ways to do this. Both of these tables have one row for each age.

males('2014').column('AGE')

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])

females('2014').column('AGE')

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])

For any given age, we can get the Female:Male gender ratio by dividing the number of females by the number of males. To do this in one step, we can use column to extract the array of female counts and the corresponding array of male counts, and then simply divide one array by the other. Elementwise division will create an array of gender ratios for all the years.

def ratios(year="2014"):
    return Table().with_columns('AGE', females(year).column('AGE'),year+' F:M RATIO', females(year).column(year)/males(year).column(year))
interact(ratios, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

You can see from the display that the ratios are all around 0.96 for children aged nine or younger. When the Female:Male ratio is less than 1, there are fewer females than males. Thus what we are seeing is that there were fewer girls than boys in each of the age groups 0, 1, 2, and so on through 9. Moreover, in each of these age groups, there were about 96 girls for every 100 boys.

So how can the overall proportion of females in the population be higher than the males?

Something extraordinary happens when we examine the other end of the age range. Here are the Female:Male ratios for people aged more than 75.

ratios("2014").where('AGE', are.above(75)).show()

AGE	2014 F:M RATIO
76	1.23487
77	1.25797
78	1.28244
79	1.31627
80	1.34138
81	1.37967
82	1.41932
83	1.46552
84	1.52048
85	1.5756
86	1.65096
87	1.72172
88	1.81223
89	1.91837
90	2.01263
91	2.09488
92	2.2299
93	2.33359
94	2.52285
95	2.67253
96	2.87998
97	3.09104
98	3.41826
99	3.63278
100	4.25966

Not only are all of these ratios greater than 1, signifying more women than men in all of these age groups, many of them are considerably greater than 1.

At ages 89 and 90 the ratios are close to 2, meaning that there were about twice as many women as men at those ages in 2014.
At ages 98 and 99, there were about 3.5 to 4 times as many women as men.

If you are wondering how many people there were at these advanced ages, you can use Python to find out:

males("2014").where('AGE', are.between(98, 100))

SEX	AGE	2014
1	98	13518
1	99	8951

females("2014").where('AGE', are.between(98, 100))

SEX	AGE	2014
2	98	46208
2	99	32517

The graph below shows the gender ratios plotted against age. The blue curve shows the 2014 ratio by age.

The ratios are almost 1 (signifying close to equal numbers of males and females) for ages 0 through 60, but they start shooting up dramatically (more females than males) starting at about age 65.

That females outnumber males in the U.S. is partly due to the marked gender imbalance in favor of women among senior citizens.

def ratios_plot(year="2014"):
    return ratios(year).plot('AGE')
interact(ratios_plot, year=list(us_pop.labels[2:]));

interactive(children=(Dropdown(description='year', index=4, options=('2010', '2011', '2012', '2013', '2014', '…

import nbinteract as nbi
def normal(mean, sd):
    '''Returns 1000 points drawn at random fron N(mean, sd)'''
    return np.random.normal(mean, sd, 1000)
# Pass in the `normal` function and let user change mean and sd.
# Whenever the user interacts with the sliders, the `normal` function
# is called and the returned data are plotted.
nbi.hist(normal, mean=(0, 10), sd=(0.1, 2.0))

# Clicking the Show widget button below loads all widgets on the page.
# Widgets will automatically load for all subsequent pages until you close
# the tab/window.

interactive(children=(IntSlider(value=5, description='mean', max=10), FloatSlider(value=1.05, description='sd'…

Figure(axes=[Axis(label='X', scale=LinearScale()), Axis(label='Y', orientation='vertical', scale=LinearScale()…

Example: Trends in Gender

Understanding AGE = 100

Overall Proportions of Males and Females

Proportions of Boys and Girls among Infants

Female:Male Gender Ratio at Each Age

Understanding `AGE` = 100