You are a data scientist working with Citi Bike NYC. You just received notice that there is additional funding to build two additional bike stations in New York City.
Your boss wants to position the new bike stations in a way that will increase ridership. He wants to make bike access easier for both subscribing members (so that they continue to be returning customers) and one-off riders (in hopes of convincing them to subscribe and use the system more frequently). Additionally, your boss is particularly concerned that female ridership is low, and wants to position the new bike stations in locations that may increase the number of female riders.
Given the dataset provided below and any external information about New York City, where do you recommend placing the two new bike stations, and why?
The dataset you will work with comes from the Citi Bike NYC system data. Specifically, you only have access to a subset of the full dataset from July 8th, 2015. As such, your presentations will be judged based on your use of this portion of the Citi Bike NYC dataset only. It will not help your cause to use any other Citi Bike NYC datasets for the purposes of this competition.
The dataset contains 15 variables and 35,047 observations, and can be accessed here. The variables are described below:
First and foremost, you must use the data provided as the primary source when justifying your answer.
You are welcome to use supplementary information (e.g. maps, info about important locations in NYC, etc) to justify your answer, but you should NOT use any additional data from Citi Bike NYC.
Exactly how you justify your answer is up to you. That said, we suggest the following:
Versions of this dataset have been analyzed before in other competitions. You are welcome to look into these for ideas and other information. That said, your final submission must be your own. Any teams suspected of copying/plagiarism will be disqualified at the discretion of the organizers and judges.