Baseline Bias or Net Gain? 🎾
Tennis Court Position Impact on Player Performance
Introduction
Many tennis coaches use phrases like “recover to the middle,” “get back to the center,” or “always cover the middle.” This leads us to an important question: how do different areas of the court impact each player’s point-winning efficiency and play style? Understanding how court positions could potentially affect a player’s scoring ability and play style is important to improve their performance and strategy in a match or tournament. In this project, we use two models with relevant outputs and visualizations to determine the impacts of both ball and player placement.
Data
Data Description
The data used for this project was sourced from Kaggle, titled Tennis ATP Tour Queens Doubles 2019 by Robert Seidl. The dataset contains information on the 2019 ATP Tour Queen’s Club Championships doubles match between Andy Murray/Feliciano Lopez and Juan Cabal/Robert Farah. At the time of the match, Juan Cabal and Robert Farah were the top seeded doubles team in the tournament and ranked 10th in the world, while Andy Murray and Feliciano Lopez are predominantly singles players, and this tournament was their first time teaming up to play doubles.
The data is comprised of 5 datasets: ball_bounces
, events
, points
, rallies
, and serves
. The ball_bounces
dataset captures information on the (x, y) location of the ball each time it bounced on the court (161 observations); the events
dataset contains information on the event where each player hit the ball including the player who hit the ball, the stroke type, the (x,y) positions of the players, etc. (416 observations); the points
dataset contains information on the list of all points played in the match including server, receiver, number of strokes, time of rally, and score (116 observations); the rallies
dataset lists all of the rallies in the match with data on the server, receiver, length, and ball position (160 observations); and the serves
dataset recorded the position of the serve in the box for each successful serve in the match (110 observations).
Data Set-Up
When creating our visualizations, we omitted all NA’s from the datasets and filtered the rows with an undefined player in the winner
column, as well as the rows with an undefined player in the server
column. To create our Method 1 models and visualizations, we left-joined rallies
and ball_bounces
. In order to create our EDA visualization on shot types by player we left-joined points
and events
.
Key variables are defined as follows:
x
: horizontal position on court where ball landedy
: vertical position on court where ball landedx_murray
,y_murray
,x_cabal
,y_cabal
, etc.: the (x, y) position on court of the players during the pointserver
: player who is servingwinner
: player who won the pointhitter
: player who hit the ball, inevents
reason
: reason for winning the point (ace, net [net ball], out, winner)type
: type of hit/stroke made by player (points, topspin, volley, lob, return, slice, smash)strokes
: the number of strokes in the rallytotaltime
: the total time of the rally (in seconds)
Exploratory Data Analysis
Â
In tennis, holding serve is crucial, as losing serve can cost players or doubles teams a set. It is easier for players to win a game while on serve, as you hold the initial control of the point. In Figure 1, we can see that Robert Farah had the highest proportions of wins earned on his serve, followed by Feliciano Lopez, then Andy Murray, and lastly Juan Cabal. For Cabal, this indicates a potential weakness in serve and play on the baseline; in turn, Farah compensates for Cabal’s style of play by being a stronger server and baseline player.
Â
After examining the players’ serve patterns, we analyzed player shot preferences. The “points” column in Figure 2 refers to standard groundstrokes, like forehands and backhands; we can see that the three main shots used were groundstrokes, topspin shots, and volleys. Volleys are common in doubles, as they accelerate points and force difficult returns, while topspin keeps balls low and harder to attack. Murray and Lopez hit the most amount of groundstrokes, aligning with the fact that they are singles players, hence their stronger baseline play than doubles players (Cabal and Farah). Cabal has the highest amount of shot variation, while Farah recorded the most smashes.
Â
Another key aspect of play style is rally length, and Figure 3 shows rally duration by the number of shots. Rallies with 1–2 shots were the shortest, while 3–4 and 5–6 shot rallies had similar durations. This reflects the fast pace of doubles. Rallies with 7–8 shots were longer, and those with 9+ shots had the longest durations.
Methods
Method 1: Server’s Posterior Win Probability by Court Zone
Model Breakdown
For our first method, we used a Bayesian hierarchical logistic regression model to predict the probability that a server wins a point, based on factors like the court zone where the last ball in the rally bounced, the number of shots in a rally, and the server’s identity.
Our model draws samples from the full posterior distribution of all parameters, using both fixed effects (court zones and rally length) and random effects (player-specific intercepts). So, while these random effects allow each server to have their own baseline skill level, as shown in the results below, the posterior win probabilities shown in the court heatmap reflect the average server’s expected performance in each zone. We control for individual differences in the modeling process, but we take a population-level look at court zone effectiveness.
\(\space\)
Response Variable: \(y_i \sim \text{Bernoulli}(\text{logit}^{-1}(\mu_i))\) where
\(y_i =\) whether the server won point \(i\) (1 = win, 0 = loss)
\(\mu_i =\) linear predictor for point \(i\)
Â
Composite Model:
\[\mu_i = \alpha + \sum_{z} \beta_z \space \cdot \text{zone}_{i, z} + \beta_{\text{rally}} \space \cdot \text{rallylength}_i + u_{\text{server}[i]}\]
\(\alpha =\) baseline intercept (absorbed into zone reference level)
\(\beta_z =\) fixed effect for court zone \(z\)
\(\text{zone}_{i, z} =\) 1 if point \(i\) happened in zone \(z\), 0 otherwise (dummy variables)
\(\beta_{\text{rally}} =\) fixed effect for rally length
\(u_{server[i]} =\) random intercept for player serving during point \(i\)
Â
Fixed Effects: \(\beta_z\) and \(\beta_{\text{rally}}\)
Random Effects: \(u_j \sim N(0, \sigma_u)\) for each server \(j\)
Priors: \(\beta \sim N(0, 2.5)\) and \(\sigma_u \sim Exp(1)\)
Â
Model Assumptions
- Independence across rallies: Each point is treated as conditionally independent given the court zone, rally length, and server.
- Multilevel structure: Servers have different baseline skill levels, thus assigning a random intercept to each server. Our model also assumes that player skill differences are drawn from a common distribution \(u \sim N(0, \sigma_u)\), introducing partial pooling that shrinks players with fewer observations toward the overall average.
- Weakly informative priors: \(\beta \sim N(0, 2.5)\) allows us to learn meaningful effects without encouraging extreme values while \(\sigma_u \sim Exp(1)\) allows server effects to vary but assumes that most players are close to the average (unless very strong data evidence shows otherwise).
Model Justifications
We believe that our techniques are appropriate for our research question because a logistic regression is good at handling binary outcomes (
point_won
: 1 = server won the point, 0 = server lost) and works with mixed continuous and categorical predictors, which perfectly describes our data.Bayesian methods allow direct probability statements about parameters and predictions, while hierarchical modeling accounts for different servers having different abilities, and not accounting for this would bias the fixed effects for the court zones.
In addition, because our dataset is relatively small (just over 100 rallies), reporting the posterior win probabilities as smoothed estimates of zone-level win probabilities accounts for both the data and potential uncertainty, which is more reliable than raw win percentages.
Lastly, using weakly informative priors on fixed effects allows for regularization but is still flexible for lower sample sizes.
Model Comparison
We considered an alternative specification of excluding rally length as a predictor. However, while court zone information is central to our research question, we found that rally length has a meaningful influence on point outcomes. Omitting rally length led to wider posterior credible intervals and greater uncertainty in court zone effects, as well as worst MCMC convergence diagnostics due to higher autocorrelation. Thus, we kept rally length in the final model to better account for rally dynamics and improve predictive stability.
Model Evaluation
\(\space\)
Looking first at our trace plots in Figure 4, these help us determine whether or not the chains mixed well and fully explored the parameter space. We’re looking for all the chains bouncing around a stable mean with no wild drifts or separations. However, with our trace plots, we do see some jittery fluctuations across iterations, which could be a sign of poor mixing, but could also be caused by high-frequency noise.
We can look at additional diagnostics in Figure 5; because the autocorrelation drops off to near 0 by lag of 5, it’s another sign that our chains are mixing well and we’re getting independent samples.
Uncertainty Quantification
To quantify uncertainty of our estimates, we can look at the posterior density plots with 95% credible intervals of court zone effects.
Each ridge represents the posterior distribution of the effect of a court zone on the log-odds of the server winning the point, relative to the reference zone, and the peaks are the posterior means. For example, we can see that the Receiver Backcourt’s 95% CI is almost entirely above 0, indicating strong evidence of a beneficial effect.
Method 2: Bayesian Analysis of Court Positioning & Rally Duration
Model Breakdown
For our second method, we used a Bayesian multilevel model. The aim of this methodology was to see whether there are universal favorable court positionings during rallies or whether a unique favorability existed for each player. Thus, we used a multilevel model because it can account for player-specific and court-specific variation. Further, we believed rally duration (in seconds) was an adequate response variable because it can capture the dynamic outcome of how the point was won, reflecting player-specific style and how certain court positions contribute to prolonging or shortening rallies.
In particular, we decided to make a crossed effects model with varying slopes and varying intercepts. A crossed effects model is the best choice because rallies are influenced by many players and not predetermined players. We made the decision to have both varying slopes and varying intercepts because random slopes will capture player-specific variation in the x- and y-coordinate spaces and random intercepts will capture player-specific baseline behavior in the x- and y-coordinate spaces.
Â
Response Variable: \(y_{psxyi} \sim N(\mu_{psxyi}, \sigma)\) where
\(p\) indexes the player winner of rally \(i\)
\(s\) indexes the number of strokes in the rally \(i\), including the serve
\(x\) indexes the horizontal position on the court where the final stroke landed of rally \(i\)
\(y\) indexes the vertical position on the court where the final stroke landed of rally \(i\)
\(i\) indexes the rally played
Â
Composite Model:
\[ (\alpha_0 + \beta_{p} \cdot p_{i} + \beta_{x} \cdot x_{i} + \beta_{y} \cdot y_{i} + \beta_{s} \cdot s_{i}) + (v_{p_i, x} \cdot x_{i} + v_{p_i, y} \cdot y_{i}) + (u_{p_i, x} + u_{p_i, y}) \]
Fixed Effects:
\[ (\alpha_0 + \beta_{p} \cdot p_{i} + \beta_{x} \cdot x_{i} + \beta_{y} \cdot y_{i} + \beta_{s} \cdot s_{i}) \]
Random Effects:
\[ (v_{p_i, x} \cdot x_{i} + v_{p_i, y} \cdot y_{i}) + (u_{p_i, x} + u_{p_i, y}) \]
\(u_{p_i, x} =\) random effect for player winner \(p\) intercept associated with horizontal court position
\(v_{p_i, x} =\) random effect for player winner \(p\) slope in horizontal court positioning
\(u_{p_i, y} =\) random effect for player winner \(p\) intercept associated with vertical court position
\(v_{p_i, y} =\) random effect for player winner \(p\) slope in vertical court positioning
Â
Model Assumptions & Justifications
These effects were implemented in a Bayesian framework using Stan, allowing us to naturally quantify uncertainty and partial pooling across players. That is, since Bayesian estimation provides full posterior distributions, it improved our ability to measure uncertainty compared to frequentist models. We applied weakly informative priors to ensure reasonable regularization. \(N(0,1)\) priors were used for the fixed effects and random intercepts/slopes for player x- and y-coordinates, reflecting moderate deviations from zero. For the observation noise \(\sigma\), we used a half-Cauchy prior \(\text{Cauchy}(0,5)\), which accommodates potential heavy-tailed distributions in rally durations.
Model Comparison
Considering alternative specifications, we compared two approaches: one allowing both varying intercepts and varying slopes by player, and one with only varying intercepts. The model including random slopes provided a better fit, capturing player-specific variations in court positioning effects more accurately. Thus, we selected the random intercepts and random slopes model.
Model Evaluation
To evaluate the convergence of our model, we will analyze the R-hat ratio. If the chains are mixing well and converging to the true posterior distribution, then the R-hat ratio should be around 1.
Parameters | R-hat Ratio |
---|---|
alpha_0 | 1.0002898 |
Cabal | 0.9998986 |
Farah | 0.9997767 |
Lopez | 0.9997165 |
Murray | 0.9998379 |
stroke | 1.0001453 |
x | 1.0013648 |
y | 1.0011949 |
Random Effect Int Cabal, x | 0.9998473 |
Random Effect Int Farah, x | 0.9997950 |
Random Effect Int Lopez, x | 1.0000728 |
Random Effect Int Murray, x | 0.9998845 |
Random Effect Int Cabal, y | 0.9997617 |
Random Effect Int Farah, y | 0.9997534 |
Random Effect Int Lopez, y | 1.0002314 |
Random Effect Int Murray, y | 1.0000616 |
Random Effect Slope Cabal, x | 0.9998037 |
Random Effect Slope Farah, x | 1.0010175 |
Random Effect Slope Lopez, x | 1.0011660 |
Random Effect Slope Murray, x | 1.0013130 |
Random Effect Slope Cabal, y | 1.0014039 |
Random Effect Slope Farah, y | 1.0012631 |
Random Effect Slope Lopez, y | 1.0010657 |
Random Effect Slope Murray, y | 1.0011601 |
sigma | 1.0011525 |
Â
As noted in Figure 7, all parameters above have R-hat ratios ~1, which depicts convergence, ensuring that our model’s posterior estimates are stable and reliable.
To evaluate model performance, we compared the composite model that predicts full rally duration against isolated player-specific x-coordinate effects and isolated player-specific y-coordinate effects. This allowed us to examine predictive performance visually, ensuring model predictions aligned with observed patterns in rally duration.
Results
Method 1: Server’s Posterior Win Probability by Court Zone
\(\space\)
Some key takeaways are that the server has a 72% average win probability when the last bounce occurred in the Receiver Backcourt, the highest of all zones. So, when the server pushes the ball deep into the court, they are more likely to win the point. On the other hand, the Service Ad (53%) and Service Deuce (45%) zones showed more mid-range win probabilities, highlighting rallies where the receiver may have more time to recover and leading to more balanced results. Similarly, although the Receiver Deuce zone gives the server a 55% win rate, the Receiver Ad zone (50%) is more of a neutral zone.
These observations are consistent with common tennis strategy: deeper shots can force more defensive play or errors for receivers, while hitting into middle or near-net zones can give the server less strategic advantage.
Method 2: Bayesian Analysis of Court Positioning & Rally Duration
\(\space\)
Firstly, it is important to note that this model is only of all rallies that are won. Overall, we note that in comparison to predicted full rally duration times, the player-specific x-coordinate effect contributes around 0-1 seconds. Cabal appears to have the most positive slope for his player-specific x-coordinate effect, which means that rally duration tends to increase as Cabal goes toward the sidelines of the court, suggesting that he might be less aggressive from wide positions. In contrast, Lopez shows a decrease in rally duration as he moves to the sidelines, indicating that he may be more aggressive from wide positions. Meanwhile, Farah and Murray exhibit more balanced behaviors, with their rally durations not showing overtly positive trends, implying a more consistent ability to win rallies across wide positioning on court.
Â
When considering the player-specific depth effect (i.e., player-specific y-coordinate effect), we observe that as Cabal moves farther back on the court, rally duration decreases, suggesting that he is more aggressive than expected from deep court positions. In contrast, Lopez shows an increase in rally duration as he moves deeper, indicating that he is more consistent in baseline rallies, preferring to rally longer rather than hitting early winners. Meanwhile, Farah and Murray display moderate slopes, implying a more consistent rallying ability and a preference for baseline play.
The results from the player-specific x- and y-coordinate effects are particularly compelling, as they align with common perceptions of each player’s style. However, our model offers a more nuanced view, revealing how positioning on court influences each player’s effectiveness and preferred patterns of play.
Discussion
To answer our research question, “How do different areas of the court impact each player’s point-winning efficiency and play style?”, we made multiple models to determine a server’s win probability. Using a Bayesian logistic regression model in Stan, we found that when servers force rallies into the receiver’s back court, they have a 72% chance of winning that point. A Bayesian multilevel model further revealed player-specific court efficiencies: Cabal is more aggressive at the net and baseline but less so when playing wide; Farah is consistent across positions; Lopez is aggressive from wider positions with strong baseline play; Murray, similarly, shows balanced baseline strength with a mix of offense and defense.
We recognize that there were some limitations in our project. The data we used consisted of rallies from one tournament, which limits our ability to generalize our results. We could not see the individual player effects in different settings, like court materials, weather conditions, tournament style, etc. These factors could impact how long the rallies last for and determine other factors like ball bounces and rally dynamics. A player’s fatigue over the course of a tournament could also make a big impact on a player’s efficiency, which our model could not capture. Additionally, Farah and Cabal often play together, while Murray often plays singles. This is a limitation because our models might overestimate individual player effects. When players play individually, their strategy, like positioning or decision-making, might be different than when they play in doubles or singles.
For future work, if we had more data from different tournaments, we could compare models for when these players compete in singles or doubles matches, mainly doubles who usually play together. When a player plays doubles, but usually plays singles, they might change their strategy because they know that some areas are covered by their partner. Additionally, if players are paired with another player who they usually play with, it would be interesting to analyze whether they change their strategy based on the different partner dynamics. We can also look into if the players will cover different zones based on match context; for example, if they know that they are losing the rally, knowing if they will cover different zones could reveal how they change their strategy under pressure.
Appendix
Additional Model Fit Diagnostics for Method 1
Inference for Stan model: anon_model.
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 1.02 0.01 0.54 -0.09 0.67 1.03 1.38 2.04 4418 1
beta[2] 0.23 0.01 0.66 -1.14 -0.20 0.25 0.68 1.46 4170 1
beta[3] 0.10 0.01 0.62 -1.18 -0.31 0.11 0.52 1.27 5403 1
beta[4] 0.11 0.01 0.56 -1.05 -0.25 0.13 0.49 1.13 3743 1
beta[5] -0.22 0.01 0.59 -1.44 -0.60 -0.20 0.17 0.88 4675 1
beta[6] -0.31 0.00 0.09 -0.50 -0.37 -0.31 -0.25 -0.15 4871 1
Samples were drawn using NUTS(diag_e) at Mon Apr 28 12:02:33 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Â
Taking a look at our model fit, we see that the Rhat
is 1 and n_eff
is consistently greater than 1000 across all betas. This tells us our MCMC chains are well-mixed and converging to the same distribution, and that our model has many effective samples for each parameter, meaning our posterior summaries are precise.