Our research project is concerned with the first 7 generations of Pokemon, and is represented by a dataset of the following variables:
Two research questions we will be tackling are:
Research Question 1: For the first research question, we would like to start with a distribution of Pokemon that captures their type and the generation they were added to help visualize this relationship. The easiest (and best) way to do this would be with a stacked bar plot.
From this stacked bar graph, we can determine a few things about the
distribution of data in our Pokemon dataset. First, there is a
significantly greater number of water-types than any other type, and
there are the least amount of ice-type Pokemon. There is also a
significant proportion of poison type Pokemon from Generation 1 than any
other generation (almost half of poison types are generation 1).
There are many Pokemon that are pet-sized and modeled after things in the real world, but there are also some huge ones out there. Since we have some data regarding the height and weights of our Pokemon in this dataset, why not investigate if there is a particular type that boasts the largest and/or heaviest Pokemon? We would like to use the median instead of the average as a measure of center here because there are some pretty notable outliers for these Pokemon, like Wailord.
From this, we’ve gathered that bug, grass, and fairy types have the
lowest median heights and weights among all Pokemon. We’ve also
determined that dragon types have a higher median weight and higher
median height than all other types of Pokemon.For those familiar with
Pokemon, this may be unsurprising given that there is a disproportionate
amount of dragon types amongst pokemon with higher base stats (which may
suggest some relationship between biometrics and base_stats).
Research Question 2: This brings us to our second research question: what makes a Pokemon strong? Though our data limits us to using base stats as a metric for strength, we can still ask: in a strong Pokemon, what kind of stats are high? Do dragon types just have “higher stats”? Are Pokemon added later stronger than those added earlier? e.t.c.
Inspired by our first research question, we may look to consider
whether or not weight and height have an impact on these base stats. To
account for outliers in our data we have taken the heights and weights
to be log-scaled.
From these plots we observe two main clusters in each; for the plot of
weights and base stat totals we notice one cluster at a lower weights,
lower base stat and another at higher weights, higher base stat totals.
For the plot of heights and base stat totals we notice a similar pair of
clusterings: one with lower heights and lower base stats and one with
higher heights and higher base stats. This may suggest that there is
indeed a correlation between height with base stat total and weight with
base stat total.
Now let’s investigate if a Pokemon’s generation has any impact on its
base stats.
Immediately, we do also seem to observe some positive correlation with
the Pokemon’s generation and its Base Stat Total. However, upon further
inspection we notice that this is not the case for all types.
Interestingly enough, we observe that what a linear model would predict
indicates that dragon types seem to have “fallen off” in more recent
generations. Furthermore, the estimated slope coefficient from my linear
model is the largest for fairy types, indicating the strongest rate of
change; this suggests that the relationship between biometrics, typing,
and base stats is much more nuanced than we originally thought.
Since legendary Pokemon often tend to boast the highest base stats
and are often considered among the strongest Pokemon, we continue with a
correlation plot to demonstrate the different stat correlations for
legendary and non-legendary Pokemon:
We see a lot of very uninteresting things regarding the relationship between some stats, e.g. speed being negatively correlated or weakly correlated with hp, defense, and sp_def (faster pokemon are less bulky is a common theme among pokemon). But there are also some interesting things we observe from this correlation plot: most notably we see that for non-legendaries, higher HP usually comes with better attack; this is not the case for legendaries. In fact, hp doesn’t seem to be correlated with any stat when it comes to legendaries. The only pair of stats that legendary Pokemon see higher correlations in than non-legendary Pokemon is attack and Special Attack.
With these analyses and our two research questions, we have discovered quite a few potentially coincidental relationships between type, stat total, height, weight, and generation. To further confirm these relationships we could also conduct t-tests and explore this data again without outliers like Wailord. Earlier, I mentioned that strength of a Pokemon is also not determined by base stats, yet we used it because it was the simplest metric to consider and represent. Further analysis of a Pokemon’s strength requires the inclusion of some data from this dataset that I have omitted, namely the Pokemon’s ability and its type machups. However, this is very difficult to do given the varying definition of strength in varying contexts of Pokemon (battle, anime, e.t.c.); even in a fixed context many assumptions must be made to conclude the strength (perhaps its utility in a battle) of a Pokemon, such as the user’s skill and its contributions in its team of 6.