Baseline Bias or Net Gain? 🎾

Tennis Court Position Impact on Player Performance

Author

Jessica Qiu, Arshriya Koul, Audrey Soetanto, Katie Rock

Introduction

Many tennis coaches use phrases like “recover to the middle,” “get back to the center,” or “always cover the middle.” This leads us to an important question: how do different areas of the court impact each player’s point-winning efficiency and play style? Understanding how court positions could potentially affect a player’s scoring ability and play style is important to improve their performance and strategy in a match or tournament. In this project, we use two models with relevant outputs and visualizations to determine the impacts of both ball and player placement.

Data

Data Description

The data used for this project was sourced from Kaggle, titled Tennis ATP Tour Queens Doubles 2019 by Robert Seidl. The dataset contains information on the 2019 ATP Tour Queen’s Club Championships doubles match between Andy Murray/Feliciano Lopez and Juan Cabal/Robert Farah. At the time of the match, Juan Cabal and Robert Farah were the top seeded doubles team in the tournament and ranked 10th in the world, while Andy Murray and Feliciano Lopez are predominantly singles players, and this tournament was their first time teaming up to play doubles.

The data is comprised of 5 datasets: ball_bounces, events, points, rallies, and serves. The ball_bounces dataset captures information on the (x, y) location of the ball each time it bounced on the court (161 observations); the events dataset contains information on the event where each player hit the ball including the player who hit the ball, the stroke type, the (x,y) positions of the players, etc. (416 observations); the points dataset contains information on the list of all points played in the match including server, receiver, number of strokes, time of rally, and score (116 observations); the rallies dataset lists all of the rallies in the match with data on the server, receiver, length, and ball position (160 observations); and the serves dataset recorded the position of the serve in the box for each successful serve in the match (110 observations).

Data Set-Up

When creating our visualizations, we omitted all NA’s from the datasets and filtered the rows with an undefined player in the winner column, as well as the rows with an undefined player in the server column. To create our Method 1 models and visualizations, we left-joined rallies and ball_bounces. In order to create our EDA visualization on shot types by player we left-joined points and events.

Key variables are defined as follows:

x: horizontal position on court where ball landed
y: vertical position on court where ball landed
x_murray, y_murray, x_cabal, y_cabal, etc.: the (x, y) position on court of the players during the point
server: player who is serving
winner: player who won the point
hitter: player who hit the ball, in events
reason: reason for winning the point (ace, net [net ball], out, winner)
type: type of hit/stroke made by player (points, topspin, volley, lob, return, slice, smash)
strokes: the number of strokes in the rally
totaltime: the total time of the rally (in seconds)

Exploratory Data Analysis

Figure 1: Stacked bar chart of proportion of points won by each player while serving

In tennis, holding serve is crucial, as losing serve can cost players or doubles teams a set. It is easier for players to win a game while on serve, as you hold the initial control of the point. In Figure 1, we can see that Robert Farah had the highest proportions of wins earned on his serve, followed by Feliciano Lopez, then Andy Murray, and lastly Juan Cabal. For Cabal, this indicates a potential weakness in serve and play on the baseline; in turn, Farah compensates for Cabal’s style of play by being a stronger server and baseline player.

Figure 2: Distribution of shot types (groundstrokes, topspin, volleys) used by each player

After examining the players’ serve patterns, we analyzed player shot preferences. The “points” column in Figure 2 refers to standard groundstrokes, like forehands and backhands; we can see that the three main shots used were groundstrokes, topspin shots, and volleys. Volleys are common in doubles, as they accelerate points and force difficult returns, while topspin keeps balls low and harder to attack. Murray and Lopez hit the most amount of groundstrokes, aligning with the fact that they are singles players, hence their stronger baseline play than doubles players (Cabal and Farah). Cabal has the highest amount of shot variation, while Farah recorded the most smashes.

Figure 3: Boxplots of rally duration (in seconds) categorized by number of rally shots

Another key aspect of play style is rally length, and Figure 3 shows rally duration by the number of shots. Rallies with 1–2 shots were the shortest, while 3–4 and 5–6 shot rallies had similar durations. This reflects the fast pace of doubles. Rallies with 7–8 shots were longer, and those with 9+ shots had the longest durations.

Methods

Method 1: Server’s Posterior Win Probability by Court Zone

Model Breakdown

For our first method, we used a Bayesian hierarchical logistic regression model to predict the probability that a server wins a point, based on factors like the court zone where the last ball in the rally bounced, the number of shots in a rally, and the server’s identity.

Our model draws samples from the full posterior distribution of all parameters, using both fixed effects (court zones and rally length) and random effects (player-specific intercepts). So, while these random effects allow each server to have their own baseline skill level, as shown in the results below, the posterior win probabilities shown in the court heatmap reflect the average server’s expected performance in each zone. We control for individual differences in the modeling process, but we take a population-level look at court zone effectiveness.
\(\space\)

Response Variable: \(y_i \sim \text{Bernoulli}(\text{logit}^{-1}(\mu_i))\) where

\(y_i =\) whether the server won point \(i\) (1 = win, 0 = loss)

\(\mu_i =\) linear predictor for point \(i\)

Composite Model:

\[\mu_i = \alpha + \sum_{z} \beta_z \space \cdot \text{zone}_{i, z} + \beta_{\text{rally}} \space \cdot \text{rallylength}_i + u_{\text{server}[i]}\]

\(\alpha =\) baseline intercept (absorbed into zone reference level)

\(\beta_z =\) fixed effect for court zone \(z\)

\(\text{zone}_{i, z} =\) 1 if point \(i\) happened in zone \(z\), 0 otherwise (dummy variables)

\(\beta_{\text{rally}} =\) fixed effect for rally length

\(u_{server[i]} =\) random intercept for player serving during point \(i\)

Fixed Effects: \(\beta_z\) and \(\beta_{\text{rally}}\)

Random Effects: \(u_j \sim N(0, \sigma_u)\) for each server \(j\)

Priors: \(\beta \sim N(0, 2.5)\) and \(\sigma_u \sim Exp(1)\)

Model Assumptions

Independence across rallies: Each point is treated as conditionally independent given the court zone, rally length, and server.
Multilevel structure: Servers have different baseline skill levels, thus assigning a random intercept to each server. Our model also assumes that player skill differences are drawn from a common distribution \(u \sim N(0, \sigma_u)\), introducing partial pooling that shrinks players with fewer observations toward the overall average.
Weakly informative priors: \(\beta \sim N(0, 2.5)\) allows us to learn meaningful effects without encouraging extreme values while \(\sigma_u \sim Exp(1)\) allows server effects to vary but assumes that most players are close to the average (unless very strong data evidence shows otherwise).

Model Justifications

We believe that our techniques are appropriate for our research question because a logistic regression is good at handling binary outcomes (point_won: 1 = server won the point, 0 = server lost) and works with mixed continuous and categorical predictors, which perfectly describes our data.
Bayesian methods allow direct probability statements about parameters and predictions, while hierarchical modeling accounts for different servers having different abilities, and not accounting for this would bias the fixed effects for the court zones.
In addition, because our dataset is relatively small (just over 100 rallies), reporting the posterior win probabilities as smoothed estimates of zone-level win probabilities accounts for both the data and potential uncertainty, which is more reliable than raw win percentages.
Lastly, using weakly informative priors on fixed effects allows for regularization but is still flexible for lower sample sizes.

Model Comparison

We considered an alternative specification of excluding rally length as a predictor. However, while court zone information is central to our research question, we found that rally length has a meaningful influence on point outcomes. Omitting rally length led to wider posterior credible intervals and greater uncertainty in court zone effects, as well as worst MCMC convergence diagnostics due to higher autocorrelation. Thus, we kept rally length in the final model to better account for rally dynamics and improve predictive stability.

Model Evaluation

\(\space\)

Figure 4: Trace plots for posterior draws of court zone effects

Looking first at our trace plots in Figure 4, these help us determine whether or not the chains mixed well and fully explored the parameter space. We’re looking for all the chains bouncing around a stable mean with no wild drifts or separations. However, with our trace plots, we do see some jittery fluctuations across iterations, which could be a sign of poor mixing, but could also be caused by high-frequency noise.

Figure 5: Autocorrelation plots of beta parameters

We can look at additional diagnostics in Figure 5; because the autocorrelation drops off to near 0 by lag of 5, it’s another sign that our chains are mixing well and we’re getting independent samples.

Uncertainty Quantification

To quantify uncertainty of our estimates, we can look at the posterior density plots with 95% credible intervals of court zone effects.

Figure 6: Posterior density distributions with 95% credible intervals for each court zone’s effect on server win probability

Each ridge represents the posterior distribution of the effect of a court zone on the log-odds of the server winning the point, relative to the reference zone, and the peaks are the posterior means. For example, we can see that the Receiver Backcourt’s 95% CI is almost entirely above 0, indicating strong evidence of a beneficial effect.

Method 2: Bayesian Analysis of Court Positioning & Rally Duration

Model Breakdown

For our second method, we used a Bayesian multilevel model. The aim of this methodology was to see whether there are universal favorable court positionings during rallies or whether a unique favorability existed for each player. Thus, we used a multilevel model because it can account for player-specific and court-specific variation. Further, we believed rally duration (in seconds) was an adequate response variable because it can capture the dynamic outcome of how the point was won, reflecting player-specific style and how certain court positions contribute to prolonging or shortening rallies.

In particular, we decided to make a crossed effects model with varying slopes and varying intercepts. A crossed effects model is the best choice because rallies are influenced by many players and not predetermined players. We made the decision to have both varying slopes and varying intercepts because random slopes will capture player-specific variation in the x- and y-coordinate spaces and random intercepts will capture player-specific baseline behavior in the x- and y-coordinate spaces.

Response Variable: \(y_{psxyi} \sim N(\mu_{psxyi}, \sigma)\) where

\(p\) indexes the player winner of rally \(i\)

\(s\) indexes the number of strokes in the rally \(i\), including the serve

\(x\) indexes the horizontal position on the court where the final stroke landed of rally \(i\)

\(y\) indexes the vertical position on the court where the final stroke landed of rally \(i\)

\(i\) indexes the rally played

Composite Model:

\[ (\alpha_0 + \beta_{p} \cdot p_{i} + \beta_{x} \cdot x_{i} + \beta_{y} \cdot y_{i} + \beta_{s} \cdot s_{i}) + (v_{p_i, x} \cdot x_{i} + v_{p_i, y} \cdot y_{i}) + (u_{p_i, x} + u_{p_i, y}) \]

Fixed Effects:

\[ (\alpha_0 + \beta_{p} \cdot p_{i} + \beta_{x} \cdot x_{i} + \beta_{y} \cdot y_{i} + \beta_{s} \cdot s_{i}) \]

Random Effects:

\[ (v_{p_i, x} \cdot x_{i} + v_{p_i, y} \cdot y_{i}) + (u_{p_i, x} + u_{p_i, y}) \]

\(u_{p_i, x} =\) random effect for player winner \(p\) intercept associated with horizontal court position

\(v_{p_i, x} =\) random effect for player winner \(p\) slope in horizontal court positioning

\(u_{p_i, y} =\) random effect for player winner \(p\) intercept associated with vertical court position

\(v_{p_i, y} =\) random effect for player winner \(p\) slope in vertical court positioning

Model Assumptions & Justifications

These effects were implemented in a Bayesian framework using Stan, allowing us to naturally quantify uncertainty and partial pooling across players. That is, since Bayesian estimation provides full posterior distributions, it improved our ability to measure uncertainty compared to frequentist models. We applied weakly informative priors to ensure reasonable regularization. \(N(0,1)\) priors were used for the fixed effects and random intercepts/slopes for player x- and y-coordinates, reflecting moderate deviations from zero. For the observation noise \(\sigma\), we used a half-Cauchy prior \(\text{Cauchy}(0,5)\), which accommodates potential heavy-tailed distributions in rally durations.

Model Comparison

Considering alternative specifications, we compared two approaches: one allowing both varying intercepts and varying slopes by player, and one with only varying intercepts. The model including random slopes provided a better fit, capturing player-specific variations in court positioning effects more accurately. Thus, we selected the random intercepts and random slopes model.

Model Evaluation

To evaluate the convergence of our model, we will analyze the R-hat ratio. If the chains are mixing well and converging to the true posterior distribution, then the R-hat ratio should be around 1.

Figure 7: Assessing Simulation Stability by R-hat Ratio
Parameters	R-hat Ratio
alpha_0	1.0002898
Cabal	0.9998986
Farah	0.9997767
Lopez	0.9997165
Murray	0.9998379
stroke	1.0001453
x	1.0013648
y	1.0011949
Random Effect Int Cabal, x	0.9998473
Random Effect Int Farah, x	0.9997950
Random Effect Int Lopez, x	1.0000728
Random Effect Int Murray, x	0.9998845
Random Effect Int Cabal, y	0.9997617
Random Effect Int Farah, y	0.9997534
Random Effect Int Lopez, y	1.0002314
Random Effect Int Murray, y	1.0000616
Random Effect Slope Cabal, x	0.9998037
Random Effect Slope Farah, x	1.0010175
Random Effect Slope Lopez, x	1.0011660
Random Effect Slope Murray, x	1.0013130
Random Effect Slope Cabal, y	1.0014039
Random Effect Slope Farah, y	1.0012631
Random Effect Slope Lopez, y	1.0010657
Random Effect Slope Murray, y	1.0011601
sigma	1.0011525

As noted in Figure 7, all parameters above have R-hat ratios ~1, which depicts convergence, ensuring that our model’s posterior estimates are stable and reliable.

To evaluate model performance, we compared the composite model that predicts full rally duration against isolated player-specific x-coordinate effects and isolated player-specific y-coordinate effects. This allowed us to examine predictive performance visually, ensuring model predictions aligned with observed patterns in rally duration.

Results

Method 1: Server’s Posterior Win Probability by Court Zone

\(\space\)

Figure 8: Server’s posterior win probability across different court zones

Some key takeaways are that the server has a 72% average win probability when the last bounce occurred in the Receiver Backcourt, the highest of all zones. So, when the server pushes the ball deep into the court, they are more likely to win the point. On the other hand, the Service Ad (53%) and Service Deuce (45%) zones showed more mid-range win probabilities, highlighting rallies where the receiver may have more time to recover and leading to more balanced results. Similarly, although the Receiver Deuce zone gives the server a 55% win rate, the Receiver Ad zone (50%) is more of a neutral zone.

These observations are consistent with common tennis strategy: deeper shots can force more defensive play or errors for receivers, while hitting into middle or near-net zones can give the server less strategic advantage.

Method 2: Bayesian Analysis of Court Positioning & Rally Duration

\(\space\)

Figure 9: Player-specific slopes for x-coordinate effects on rally duration

Firstly, it is important to note that this model is only of all rallies that are won. Overall, we note that in comparison to predicted full rally duration times, the player-specific x-coordinate effect contributes around 0-1 seconds. Cabal appears to have the most positive slope for his player-specific x-coordinate effect, which means that rally duration tends to increase as Cabal goes toward the sidelines of the court, suggesting that he might be less aggressive from wide positions. In contrast, Lopez shows a decrease in rally duration as he moves to the sidelines, indicating that he may be more aggressive from wide positions. Meanwhile, Farah and Murray exhibit more balanced behaviors, with their rally durations not showing overtly positive trends, implying a more consistent ability to win rallies across wide positioning on court.

Figure 10: Player-specific slopes for y-coordinate effects on rally duration

When considering the player-specific depth effect (i.e., player-specific y-coordinate effect), we observe that as Cabal moves farther back on the court, rally duration decreases, suggesting that he is more aggressive than expected from deep court positions. In contrast, Lopez shows an increase in rally duration as he moves deeper, indicating that he is more consistent in baseline rallies, preferring to rally longer rather than hitting early winners. Meanwhile, Farah and Murray display moderate slopes, implying a more consistent rallying ability and a preference for baseline play.

The results from the player-specific x- and y-coordinate effects are particularly compelling, as they align with common perceptions of each player’s style. However, our model offers a more nuanced view, revealing how positioning on court influences each player’s effectiveness and preferred patterns of play.

Discussion

To answer our research question, “How do different areas of the court impact each player’s point-winning efficiency and play style?”, we made multiple models to determine a server’s win probability. Using a Bayesian logistic regression model in Stan, we found that when servers force rallies into the receiver’s back court, they have a 72% chance of winning that point. A Bayesian multilevel model further revealed player-specific court efficiencies: Cabal is more aggressive at the net and baseline but less so when playing wide; Farah is consistent across positions; Lopez is aggressive from wider positions with strong baseline play; Murray, similarly, shows balanced baseline strength with a mix of offense and defense.

We recognize that there were some limitations in our project. The data we used consisted of rallies from one tournament, which limits our ability to generalize our results. We could not see the individual player effects in different settings, like court materials, weather conditions, tournament style, etc. These factors could impact how long the rallies last for and determine other factors like ball bounces and rally dynamics. A player’s fatigue over the course of a tournament could also make a big impact on a player’s efficiency, which our model could not capture. Additionally, Farah and Cabal often play together, while Murray often plays singles. This is a limitation because our models might overestimate individual player effects. When players play individually, their strategy, like positioning or decision-making, might be different than when they play in doubles or singles.

For future work, if we had more data from different tournaments, we could compare models for when these players compete in singles or doubles matches, mainly doubles who usually play together. When a player plays doubles, but usually plays singles, they might change their strategy because they know that some areas are covered by their partner. Additionally, if players are paired with another player who they usually play with, it would be interesting to analyze whether they change their strategy based on the different partner dynamics. We can also look into if the players will cover different zones based on match context; for example, if they know that they are losing the rally, knowing if they will cover different zones could reveal how they change their strategy under pressure.

Appendix

Additional Model Fit Diagnostics for Method 1

Inference for Stan model: anon_model.
4 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=20000.

         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  1.02    0.01 0.54 -0.09  0.67  1.03  1.38  2.04  4418    1
beta[2]  0.23    0.01 0.66 -1.14 -0.20  0.25  0.68  1.46  4170    1
beta[3]  0.10    0.01 0.62 -1.18 -0.31  0.11  0.52  1.27  5403    1
beta[4]  0.11    0.01 0.56 -1.05 -0.25  0.13  0.49  1.13  3743    1
beta[5] -0.22    0.01 0.59 -1.44 -0.60 -0.20  0.17  0.88  4675    1
beta[6] -0.31    0.00 0.09 -0.50 -0.37 -0.31 -0.25 -0.15  4871    1

Samples were drawn using NUTS(diag_e) at Mon Apr 28 12:02:33 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Taking a look at our model fit, we see that the Rhat is 1 and n_eff is consistently greater than 1000 across all betas. This tells us our MCMC chains are well-mixed and converging to the same distribution, and that our model has many effective samples for each parameter, meaning our posterior summaries are precise.