We're looking at how Kansas City, Missouri and how Kansas City, Kansas changed over time between the 2000 and 2010 censuses. We decided to look at change in people who identify as 'White', 'African American', 'Asian', and 'American Indian', and especially wanted to compare white and nonwhite. A couple of things to note: first, the blockgroups do not change much between those periods, so we believe that it is valid to draw comparisons directly. Also, we're only looking at the 15 counties that comprise the Kansas City limits.
This barplot shows the population comparisons in each country of Kansas City between the two years. We notice that in general there is a population increase in the counties, especially in Johnson County. We do notice that in a few counties there is little or no change in population over the ten years. We chose the barplot to show this trend because it provides the clearest method to visually compare across multiple counties between the two time frames.
We selected choropleths to represent the population distribution of each race in Kansas City over time, since the data was more interesting to display geographically than in, say, a density plot or a bar plot. For every block group, we took the population in each racial category and divided it by the total population of that block group, resulting in a percentage. This data was then divided up into deciles. The choropleth shows which block groups have higher percentages of each race. We can tell from the maps that block groups in red have the highest percentage of a certain race, while block groups in off-white have the tenth highest percentage of that race.
Choropleths are advantageous over other types of graphs because they have a geographic dimension to them, which can let you identify clusters with certain characteristics. They also allow you to look at every block group separately. If we were to make a bar plot depicting the population change from 2000 to 2010 for each block group, we would have a very long chart; however choropleths make this comparison compact.
However, in using choropleths, we must somehow bin the data into general categories. Using a bar plot or such would allow us to get more accurate readings off the chart, giving us exact populations rather than population (percentage) ranges. Also, if we had data over several years (instead of just two years, ten years apart) we would be able to make density plots over time which would show trends that would not be visible just by comparing several choropleths side-by-side.
In general, we see that white presence increases in some blockgroups and decreases in others. However, it is important to note that their presence generally decreases in the suburbs and increases in the city center. The neighborhood that they have a very low presence in, relatively, in 2000, is a mostly African American neighborhood at that time (see African American choropleth below), and as a group, African Americans are becoming more prosperous and moving out of downtown and in to the suburbs, so this is part of the reason for this.
In generally, we can see that African Americans are increasing in almost every area. As mentioned earlier, his has been credited by some people as being a result of prosperity. As more African Americans become wealthier, they are moving out of the city center, and into the suburbs.
In 2000, we see a concentration of American Indians in the northern part of the city center. However, in 2010, the population seems to have spread out, being evenly distributed across the Kansas City center.
Compared to other races, there hasn't been much of a change in the population density of American Indians in Kansas City. In the city center, we can see that there has been some migration (perhaps) from east to west: while population in the east has dropped, that in the western block groups has increased. In the outskirts of the city, the population has remained quite stable with an exception in the north, where the percentage of American Indians has increased.
From the general map, we can see a definite increase in population of Asians from 2000 to 2010. In 2000, Asians were largely concentrated in two parts in the city center: a small area just north of the center of the map, and the south-western quadrant of the Zoomed-in map. In 2010, the first cluster seems to have dispersed toward the south, giving rise to Kansas City's “Chinatown,” according to tripadvisor.com and similar sites.
This could signify the fact that Asians have become more populous and more prosperous in Kansas City from 2000 to 2010. It is interesting to note that the populations of Asians in both the suburbs and in downtown have increased greatly, suggesting that it is not merely a migration of Asians from downtown to the suburbs (or vice versa) but that both young professionals and middle-aged families are moving to Kansas City, downtown and suburbs, respectively.
This shows that Kansas City is a growing city with promising opportunities for people of ages, and of all family-sizes.
Before we started to look for relationships between income, age, and race of the population in Kansas City, we decided that we would take look at just the distribution of the income in the city. We did not look at just the distribution of age as we did for income, as we found that there was not much spread in the average ages between the blocks (i.e. the data proved not as interesting). We did however, look at age in relation to income and race.
We decided the best way to display this was through a geographical plot using choropleths, and split the groups of income of the population into by increments of $20,000. The choropleths are advantageous in that they allow you to look at every block group geographically, as a bigger picture. They allow one to see which blocks of the cities have a higher or lower income at once visually. A disadvantage of choropleths is that one cannot get a very accurate reading for the income. Choropleths are terrific for looking at the big picture for the distribution of income across the city, but when it comes to details, it can be a bit more difficult to immediately tell what range of income each block has.
Most of Kansas City and its surroundings have an income below six figures, with the exception of some blocks in the very middle of the city and some of the suburban outskirts immediately outside that area.
Density plots give a clear indication of the distribution of average income by block group for the different races. The density plot is based on block groups with the majority of people being of a single race; for instance, the line for whites is plotting average income density for block groups that are majority white. The bandwidth used for all races except American Indian was 2481, the average bandwidth, as it gives a distribution for most of the races that is smoothed enough to cut out noise but while still showing the important features of the distribution such as mode locations and distribution spread. The bandwidth used for American Indians was 6000, the bandwidth that gave the same features as 2481 for the other races, because due to small sample size, a 2481 bandwidth gave a distribution with three very large modes at $5000, $30000 and $60000 with zero density for a large space between them, which seemed unlikely.
Given correct choice of bandwidth, the density plot, as mentioned before, gives a distribution for most of the races that is smoothed enough to cut out noise but while still showing the important features of the distribution such as mode locations and distribution spread. The density plot has other advantages. Unlike a histogram, which also would display much more noise, the lines for the density plot can all be within one graph – a histogram would require several different plots on top of each other. Methods like strip charts could also be effective, using different colors to denote the different races on the same plot, but such a chart would be much busier and likely much harder to read.
The density plot does, however, have its disadvantages. The plot is dependent on choosing the correct bandwidth in order to best display the data, and if the choice of bandwidth is incorrect, it can distort the data by over- or under-smoothing.
It seems that on average, whites and Asians make more money than blacks and American Indians, with generally multimodal distributions indicating different levels of employment (e.g. jobs requiring a college education vs. jobs that don't.)
The black line, indicating the average income density for Kansas City block groups, follows very closely behind the red line, indicating the average income density for whites. This is most likely because of the high white population in comparison to other races, as is indicated in the chloropleths. Whites are the only race with a single mode for income, which occurs at about $35000 per year, and there is a significant right skew; whites also have the largest range of income, going up to $140000 per year. Asians have two modes, at $35000 and $50000, with the mode at $35000 being larger. The range for Asians is much smaller than that for whites, however, appearing to cap off at about $70000 per year. Blacks have two modes, both smaller than those for whites and Asians, occurring at about $20000 and $30000 per year, although the range goes to $115000 per year, the second largest. American Indians have three modes for income, at about $5000, $25000, and $60000 per year, with a range extending to $70000 per year. The largest of the modes for American Indians occurs at $25000.
Bean plots are a good way to separate continuous data into categories like gender. These two bean plots were created by taking the data from the American Community Survey in 2011 that gave each block group's average income and age for both males and females. The income plot has a logarithmic scale due to the skewness of the data.
Bean plots have the advantage of, unlike box plots, giving the distribution of data as well as descriptive statistics such as the mean. While both violin plots and box-percentile plots do the same, they lack one of the other useful features of the bean plot, which is that the bean plot adds the individual data points to the beans, stacking them to indicate density.
The disadvantage of bean plots compared to box plots, violin plots and box-percentile plots is that they lack information about the spread of the data – the other three types of plot mark the IQR, and box plots also mark the location of outliers. However, we find that the plotting of the data points is more useful than the marking of the IQR, and from the marking of the data points the location of the IQR can be inferred.
The average income by block group for men in Kansas City seems to generally be higher than that of women. The mean for men occurs around 40000 per year and the mean for women occurs around 30000 per year, a sizeable difference. The range for men is also larger, reaching past $200,000 per year. This indicates that the average income per block group is probably influenced by gender balance.
In this plot, we look at the relationship between average age of all people and the respective average wealth of all people. We notice that with our linear model, there is a slight positive relationship, which signifies that higher age corresponds with higher wealth. However, we notice more flat relationships with our non-parametric models, and given the higher accuracy of such non-parametric models, it may be best to write off any proposed relationship between age and wealth. We chose to display this data via the scatterplot because it is the easiest way to display comparisons between quantitative data.
In this plot, we want to look at the same relationship as before, but with the added component of race. From the scatter plot, we notice that non-whites tend to be clustered a bit lower than whites, and this is confirmed with our average lines, which show that non-whites are on average a bit poorer and a bit younger than whites. We also notice that whites have a larger spread than the nonwhites, whose variance seems to be more stable.