Tartan Data Science Cup
Episode II: Analytics Strikes Back!
Understanding if a customer is likely to buy a particular product is an important factor that drives decisions at Kroger.
Using the customer purchase history data provided here, your team is asked to predict which customers will purchase eggs in the following week.
Specifically, your team should answer the following questions in your report:
- What factors drive an increase or decrease in the likelihood of purchasing eggs?
- What actions can be taken by 84.51° to increase the probability that a household buys eggs in the next week?
Additional information on the problem can be found here.
Please create a two-column .csv file that contains the household_key and probability of purchasing eggs in the next week for each of the 967 households in the training dataset. A template for your submission can be found here.
IMPORTANT: When submitting your predictions, your file must match the exact format of the template file:
- All predictions are in a single .csv file
- Your file has the same number of rows
- Your file has the same column names
- The name of your file is your team name
You may not use any other data sources aside from the dataset provided above. Exactly how you justify your answer is up to you. That said, we suggest the following:
- Use graphics / data visualization
- When appropriate, incorporate the results of statistical models/tests
- Provide detailed descriptions of the methodology used, but be concise
Submissions: Each team should submit a single .zip file that contains:
- a single .csv file with your predictions, named as [team_name.csv], following the template available here
- a 1-2 page report describing the key results and methods used to analyze the data (submitted as a .pdf file)
- up to 3 slides for a 5-minute research presentation (submitted as a .pdf file)
- all (well-documented!) code used to analyze the data, obtain results, create graphics, etc (any programming language/software is acceptable)
Submission constitutes permission to post (anonymized) winning team entries online.
Finalists: Eight teams will make the finals, based on the following criteria:
- The top 25 teams based on prediction accuracy, using the Brier Score of their predicted probabilities, will be identified.
- From the top 25 teams, eight teams will be chosen based on a combination of their prediction accuracy and the quality of their reports, slides, and/or code.