Cards, Credit, and Churn - Discovering Behavioral and Demographic Factors Across the Credit Cards

Author

Annie Chen, Helen Hao, Jiabao Ye

Introduction

Understanding the behaviors and demographic features of credit card users is crucial for banks aiming to lower their customer churn rates. By examining patterns across categorical and quantitative data, we uncover insights into customer profiles and investigate the factors affecting attrition. Our analysis will guide the bank in refining services, developing targeted marketing strategies, and ultimately improving customer retention.

Data Description

The dataset is from kaggle’s Credit Card customers. It mainly describes the credit card customers’ behaviors and consists of 10,000 customers with their age, income, marital status, credit card limit, amount of transactions, etc. There are 21 features included:

Clientnum: Client number. Unique identifier for the customer holding the account
Attrition_Flag: If the account is closed then 1 else 0
Customer_Age: Customer’s Age in Years
Gender: M=Male, F=Female
Dependent_count: Number of dependents
Education_Level: Educational Qualification of the account holder (example: high school, college graduate, etc.)
Marital_Status: Married, Single, Divorced, Unknown
Income_Category: Annual Income Category of the account holder (< $40K, $40K - 60K, $60K - $80K, $80K-$120K, > $120K, Unknown)
Card_Category: Type of Card (Blue, Silver, Gold, Platinum)
Months_on_book: Period of relationship with bank
Total_Relationship_Count: Total number of products held by the customer
Months_Inactive_12_mon: Number of months inactive in the last 12 months
Contacts_Count_12_mon: Number of Contacts in the last 12 months
Credit_Limit: Credit Limit on the Credit Card
Total_Revolving_Bal: Total Revolving Balance on the Credit Card
Avg_Open_To_Buy: Open-to-Buy Credit Line (Average of last 12 months)
Total_Amt_Chng_Q4_Q1: Change in Transaction Amount (Q4 over Q1)
Total_Trans_Amt: Total Transaction Amount (Last 12 months)
Total_Trans_Ct: Total Transaction Count (Last 12 months)
Total_Ct_Chng_Q4_Q1: Change in Transaction Count (Q4 over Q1)
Avg_Utilization_Ratio: Average Card Utilization Ratio

Our study specifically focuses on below variables:

Demographics variables: Gender, Marital_Status, and Income_Category

Financial variables: Total_Revolving_Bal, Card Categories, Credit_Limit, Avg_Utilization_Ratio, Total_Trans_Amt, Total_Trans_Ct, and Attrition_Flag

Research Questions:

Throughout the report, we are going to investigate three research questions:

1. What demographic and behavioral factors are associated with different card categories?

2. How do behavioral patterns influence credit card limits?

3. Can Customer Behavioral Profiles Be Clustered to Reveal Distinct Attrition Risk Segments?

Question 1：What demographic and behavioral factors are associated with different card categories?

We are curious about how demographic and behavioral factors, such as total revolving balance and income category, are related to the credit card categories. We then decided to use a box plot, which can clearly show the distribution, median, and variability of balances across groups. Visualizing all three variables simultaneously enables a quick comparison of spending behavior across card types within each income bracket.

1.1 Boxplot

This plot highlights how credit behavior (as measured by revolving balance) varies not just by income, but also by the type of card held, which often corresponds to customer value or loyalty tier. From the boxplot, we observe that Gold and Platinum cardholders tend to have higher median revolving balances across several income levels, especially in the $60K-$80K and $120K+ ranges. Also, people with higher income tend to have higher revolving balances, especially for Platinum cards, which is indicated by narrower IQRs and higher medians around 1600 dollars. Interestingly, Blue and Silver cardholders maintain lower or more varied balances regardless of income, suggesting that card tier is not solely determined by income but may also reflect spending habits or credit utilization behavior.

1.2 Mosaic Plot

Other than the total revolving balance and income category, we want to investigate how different marital statuses are associated with different card categories. Here, we decided to use the mosaic plot since it can show the proportion of individuals in each group and the statistical significance of deviations from expected values. The color shading highlights whether the associations are stronger or weaker than expected, which allows easy identification of patterns.

This mosaic plot shows that single people are significantly overrepresented in the Silver card category, indicated by the strong blue shading. This suggests that a higher-than-expected number of singles hold Silver cards. Conversely, married individuals are underrepresented in the Silver card category, as shown by the red shading. This means that fewer married people hold Silver cards than expected. Singles are often earlier in their financial journey, with moderate income and spending needs that align with mid-tier cards. In contrast, married individuals typically have higher combined incomes and larger financial responsibilities, making them more likely to choose higher-tier cards with better rewards, leading to their underrepresentation in the Silver category. Other card categories (Blue, Gold, and Platinum) do not show strong color shading across marital statuses, implying that distributions there more closely follow expected values. The plot also shows that divorced and unknown marital status groups have relatively small representation across all card types.

Based on the results of the graphs above, we can conclude that higher median revolving balances are associated with superior card categories. Also, singles are significantly overrepresented in the Silver card category, while married individuals are underrepresented. To better understand how customers with different credit card categories actually behave financially, we proceed to research how behavioral patterns affect credit card limits.

Question 2：How do behavioral patterns influence credit card limits?

For this part, we would like to focus on credit limits and the relationship between them and some key variables, such as card categories (Blue, Silver, Gold, Platinum), average utilization ratios (0-1 scale), and demographic information including gender. This data allows us to examine relationships between credit access, product offerings, and customer behavior.

2.1 Histogram

The histogram of the credit limit shows that the distribution is heavily right-skewed, with the majority of customers having lower credit limits. The highest frequency occurs in the range above $2,000, where approximately 2,200 customers are represented. The second highest frequency is around $4,000, with about 1,500 customers. As credit limits increase, the number of customers decreases substantially, creating a long tail to the right. There are only a small number of customers with credit limits between $15,000 and $30,000. Moreover, there is a noticeable spike at around $34,000, indicating a small but significant group of customers with very high credit limits compared to the rest of the population. This type of distribution is common in financial data, where most customers have modest credit limits while a smaller segment qualifies for much higher limits based on factors like income, credit history, and financial stability.

2.2 Ridgeline Plot

This ridgeline plot shows the credit limit distribution for four different card types by following the sequence of Blue, Silver, Gold, and Platinum from bottom to top. Each card category has its distinct distribution pattern: Blue cards have their highest concentration at very low credit limits around $3,000, with a strong right-skewed distribution while there is a tiny bump near the $35,000 mark. Silver cards have a bimodal pattern with a substantial peak around $13,000, a dip, and then their highest peak at approximately $35,000. Gold cards show a multimodal distribution with three main peaks: a small one around $16,000, another smaller one around $23,000, and the most significant peak at approximately $35,000. Platinum cards have two peaks similar to gold cards, one around $16,000 and a much larger one at approximately $35,000. The visualization reveals that higher-tier card categories (Silver, Gold, Platinum) tend to have concentrations of credit limits in the higher ranges, particularly around $35,000, while the Blue category is predominantly associated with much lower credit limits. This suggests a clear stratification of credit limit offerings based on card category, with premium categories receiving substantially higher limits. This visualization confirms that card categories serve as effective proxies for creditworthiness, with issuers systematically offering higher limits to premium cardholders.

2.3 Scatterplot

The scatterplot shows a negative relationship between credit limits and average utilization ratio (0-1 scale). The average utilization ratio is calculated by using the amount of revolving credit used divided by the total credit available. The regression lines show that as the average utilization ratio increases, credit limits decline, especially for male customers, as the slope for the blue line is steeper. In the scatterplot, each dot represents a customer and the points are densely clustered at lower utilization ratios (left side of the plot), suggesting most customers use less than 50% of their credit. Outliers with very high utilization (near 1.0) are rare but show significantly lower credit limits. The gender gap is striking at the low average utilization ratio, when it is close to zero, men receive around $17,000 in credit, while women receive around $7,000. At high utilization levels, men’s trend line even decreases below zero, indicating steep limit reductions, while women’s levels flatten out. This suggests both behavioral risk and potential gender bias in credit allocation. The data highlights how financial behavior influences credit access, and how banks may respond differently to men and women.


Call:
lm(formula = Credit_Limit ~ Avg_Utilization_Ratio * Gender, data = credit_card)

Residuals:
     Min       1Q   Median       3Q      Max 
-16208.0  -3865.7   -779.6   1642.8  27684.9 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     7321.6      148.7   49.23   <2e-16 ***
Avg_Utilization_Ratio          -6719.5      330.1  -20.36   <2e-16 ***
GenderM                        10324.7      201.0   51.36   <2e-16 ***
Avg_Utilization_Ratio:GenderM -18139.9      551.3  -32.91   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7086 on 10123 degrees of freedom
Multiple R-squared:  0.3923,    Adjusted R-squared:  0.3921 
F-statistic:  2178 on 3 and 10123 DF,  p-value: < 2.2e-16

To back up the above scatterplot, a regression analysis is provided. The model tests the interaction between the average utilization ratio and gender on credit limits. Females with 0 utilization have an average credit limit of $7321.6 and each 1 unit increase in utilization ratio (average utilization ratio rises from 0 to 1) will reduce females’ credit limit by $6719.5. In that case, when the female’s average utilization increases by 0.1, the credit limit will decrease by $671.95. Males start with $10,324.7 higher credit limits than females at 0 average utilization ratio, which is $17646.3. Moreover, males’ credit limits drop $18139.9 more per unit of utilization than females, which means as male average utilization increases by 0.1, the credit limit will decrease by $2485.94. All coefficients have p-values < 2e-16, indicating the statistical significance. The interaction term confirms that the relationship between utilization and credit limit differs significantly by gender. The regression model explains 39.23% of credit limit variance (R² = 0.3923), indicating utilization behavior and gender are substantial but not exclusive determinants of credit access.

The analysis reveals three key insights about credit limits, firstly, most customers receive modest limits (<$15k), with exponential decay in higher limits. Secondly, card categories effectively segment customers by creditworthiness, with premium cards receiving higher limits. Last but not least, the utilization behavior significantly impacts limits, with notable gender-based differences. These findings suggest that credit issuers rely on a combination of product segmentation, financial behavior, and demographic factors when determining limits, patterns that may also shape broader customer retention risks explored in our third research question.

Question 3: Can Customer Behavioral Profiles Be Clustered to Reveal Distinct Attrition Risk Segments?

While exploring the dataset, we became particularly interested in customer attrition patterns and whether behavioral profiles could be used to cluster customers into distinct risk segments, motivating our third research question.

3.1 Bar Chart

The bar chart shows that about 16.1% of customers in our dataset have attrited, while 83.9% remained. Although the attrited group is the minority, its financial impact can still be significant because every lost customer carries both a hard cost (acquisition investment) and a lost future revenue stream. In high-volume retail banking, small shifts in churn can translate into millions of dollars in lost value annually.

3.2 Heatmap

To justify the use of PCA, we generated a correlation heatmap using standardized numeric behavioral variables. The heatmap revealed several strong correlations, for example: the correlation between Total_Trans_Amt and Total_Trans_Ct is 0.81 and the correlation between Credit_Limit and Avg_Open_To_Buy is 1.00, showing a perfect correlation. These relationships suggest redundancy in the dataset. High correlation indicates shared variance, which PCA can compress into fewer orthogonal dimensions. At the same time, a perfect correlation between variables like Credit_Limit and Avg_Open_To_Buy poses a problem for PCA, as it violates assumptions of linear independence and results in a singular covariance matrix. Thus, Avg_Open_To_Buy was removed to preserve numerical stability. Overall, the heatmap supports the application of PCA as a dimensionality reduction technique.

3.3 Scree Plot

To perform PCA, according to the correlation heatmap, we decided to include the following variables: Total_Trans_Amt, Total_Trans_Ct, Total_Amt_Chng_Q4_Q1, Total_Ct_Chng_Q4_Q1, Credit_Limit, Contacts_Count_12_mon, Months_Inactive_12_mon, and Total_Relationship_Count. Those variables were chosen due to their behavioral relevance and moderate-to-strong correlations, which are suitable for dimensionality reduction. On the other hand, variables such as Avg_Open_To_Buy, Dependent_count, and Customer_Age were excluded either due to perfect correlation (in the case of Avg_Open_To_Buy) or weaker relevance to recent engagement behavior. The scree plot shows that the first two principal components explain approximately 43.5% of the total variance. PC1 and PC2’s results demonstrate that PCA successfully identifies underlying behavioral factors: the first component reflects general activity level, while the second captures recent changes in behavior. The huge decrease in explained variance after the second component also justifies using a 2D PCA representation.

3.4 Biplot

We then create a PCA biplot that visually segments customers in the new PCA space, where each point represents a customer projected into the 2D space of PC1 and PC2, and directional arrows indicate the contribution of original features to each principal component. We found that the biplot reveals a clear separation between attrited and existing customers: attrited customers cluster on the negative side of PC1 and PC2, suggesting low transaction activity, low momentum, and higher inactivity. Existing customers cluster on the positive side of both dimensions, aligned with high transaction count, high amount, and increasing behavioral trends. The visual evidence supports the idea that PCA effectively encodes meaningful behavior differences and that these dimensions are predictive of attrition risk.

Our PCA analysis did a great job of dimensionality reduction and exploratory visualization: We overlaid known churn labels to demonstrate separation. However, PCA alone was not an unsupervised clustering algorithm. To formally segment customers, we would apply K-means to the PC scores and evaluate cluster validity and attrition purity in future work.

Conclusion

This study provides a comprehensive exploration of how demographic factors, behavioral patterns, and financial engagement relate to credit card usage and customer attrition. We find that premium cardholders, particularly those holding Gold and Platinum cards, tend to have higher revolving balances and are often concentrated within higher income brackets, though spending behavior also plays an important role independent of income. Marital status significantly influences card selection, with single individuals being overrepresented among Silver cardholders while married customers are less represented in this category.

Our analysis of credit limits reveals a heavily right-skewed distribution, where most customers are assigned modest limits under $15,000, while a smaller group enjoys substantially higher limits near $35,000. The card categories effectively segment customers by creditworthiness, with premium cards receiving higher limits. We also identify notable gender disparities as male customers generally receive higher baseline credit limits but face steeper penalties for higher utilization ratios.

Finally, we care about what factors cause customer retention as we discovered that 16.1% of customers had left, which is a minority, but one with real financial impact. After noticing substantial redundancy among behavioral variables, we ran a PCA that revealed two key dimensions explaining 43.5% of behavioral variance: overall activity level and recent engagement trends. These findings suggest that demographic factors, financial behaviors, and product segmentation jointly influence not only customer experience but also long-term retention outcomes.

Recommendations

Based on our research questions and findings, we recommend three key strategic actions for the banks:

1. Optimize Card Category Offerings

According to our research findings regarding the demographic and behavioral factors associated with different card categories, the bank should refine its card offerings to fulfill the identified customer segments. Premium cards should be marketed more aggressively to high-income brackets where revolving balances tend to be higher, while Blue and Silver card benefits should be redesigned to encourage greater engagement from these more varied customer segments. Creating clearer value propositions for each card tier would strengthen customer identification with their product and reduce the likelihood of seeking alternatives elsewhere.

2. Reform Credit Limit Allocation Strategies

Our credit limit analysis revealed concerning patterns in limit allocation, particularly regarding gender disparities and utilization penalties. The bank should implement a more transparent and equitable credit limit system that reduces the observed gap between male and female customers at lower utilization levels. Additionally, the significant right-skewed distribution of credit limits suggests an opportunity to create more intermediate tiers between the most common limits and the highest limits, potentially increasing engagement among customers who feel their current limit is inadequate but don’t qualify for the highest tier.

3. Develop a Predictive Attrition Intervention System

We found a clear separation between existing and attrited customers revealed by our PCA analysis, which provides a strong foundation for a predictive intervention system. The bank should implement automated monitoring for the key behavioral indicators identified in our study: declining transaction frequency, increasing inactivity periods, and reduced transaction growth. When these warning signs appear, targeted retention initiatives should be triggered before the customer decides to leave. We believe this approach would allow the bank to focus retention resources on the customers most likely to attrite while also addressing the specific pain points that typically lead to customer departure.

Future study

While our study identifies clear associations between customer profiles, card usage patterns, and attrition risks, several opportunities remain for extending this research. Future studies should expand the range of variables considered by incorporating additional demographic information such as education levels, occupations, or household sizes. This would provide a more detailed understanding of customer segments and better capture factors influencing financial behavior.

Moreover, including financial metrics such as credit scores or loan history could improve predictive modeling and allow for a more comprehensive analysis of creditworthiness and retention risks. Expanding both the depth of variables and the sophistication of modeling techniques would enhance the bank’s ability to design more personalized, effective customer management strategies. Finally, we would apply machine learning clustering algorithms like K-Means to the PCA scores to formally segment customers based on their behavioral profiles.