Before you scroll

Key terms in this case study

Telco and machine-learning write-ups assume readers know churn metrics and model jargon. Skim these once — the same definitions apply in Sections 01–04.

Churn
A customer cancels or stops using paid service within the observation window. In the IBM dataset, churn is a yes/no label per account — the modelling target we try to predict early.
AUC-ROC
Summarises how well the model ranks who churns vs who stays (0.5 = random, 1.0 = perfect). It does not depend on one decision threshold — useful when marketing chooses different cut-offs for “who to call.”
Precision & recall
Precision: of everyone the model flags as “will churn,” how many actually churn — contact budget efficiency. Recall: of everyone who truly churned, how many we caught — coverage. Raising one often lowers the other; the business picks a balance.
DSL vs fibre
DSL is slower copper-line internet; fibre (fiber optic) is higher speed. The dataset encodes product type — churn differs by line quality, pricing, and support expectations, not only by speed alone.
Tenure
Months the customer has been subscribed. Low tenure usually means weaker switching costs and higher churn — a behavioural signal, not a demographic label.
RF & logistic regression
Logistic regression is a simple, interpretable baseline. Random Forest (RF) combines many decision trees to capture non-linear patterns (e.g. contract × charges). We compare both on the same test set.
Feature importance
For Random Forest, each input variable gets an importance score (roughly: how much splits on that variable reduce prediction error). High importance means the model leans on that field when scoring churn risk — useful for explaining why a customer is flagged.
01
Exploratory Data Analysis

Who is at risk of leaving?

Context — what we measured

The IBM Telco sample has 7,043 customer accounts, each with contract type, internet product, tenure, charges, and add-on services. We computed the share of customers who churned within each segment — not yet using a predictive model, just descriptive rates to find where risk concentrates.

Business insight

Contract structure and internet product type split the base into very different churn populations. If those splits are wide, retention budget should not be spent evenly — it should follow the segments with the highest churn rate and the highest margin.

Evidence — churn rate by segment · 7,043 customers · IBM Telco dataset

How to read this: each bar is the percentage of customers in that group who churned. The vertical axis stops at 50% so differences between contract and product types stay visible. Three panel groups are shown: contract length, internet technology, and tenure band.

Scale: 0% to 50%
0% 10% 20% 30% 40% 50% CONTRACT TYPE 42.7% Month-to-Month 11.3% One Year 2.8% Two Year INTERNET SERVICE 41.9% Fiber Optic 18.9% DSL 7.4% No Internet TENURE 47.7% 0 to 12m 6.6% 48m plus
Evidence — headline comparisons
15×
Contract churn multiplier
Month-to-month customers churn 15 times more often than two-year holders. The gap is structural, not behavioural.
41.9%
Fiber Optic Churn Rate
Fiber subscribers churn at 2.2x the DSL rate. A service-expectation gap that rewards better onboarding communication.
47.7%
First-Year Churn Rate
Customers in months 0 to 12 are the highest-risk cohort. Retention offers in month 3 to 6 carry the highest expected ROI.
Strategic insight

Contract structure is the single largest churn lever in this dataset. Month-to-month plans maximise flexibility for customers but minimise commitment for the telco — the data says that trade-off shows up directly in churn. Longer contracts are not just “legal length”; they are behavioural lock-in devices.

Recommendation — marketing & pricing

Make long-term contracts visibly more valuable at onboarding: bundle discounts, install waivers, or speed tiers that only attach to 12- or 24-month terms. Pair with clear fibre onboarding comms — fibre’s churn rate suggests an expectation gap, not only a network issue.

Technical — churn rate by segment (Python) expand_more
# Churn rate by key segments (Python / Pandas)
import pandas as pd

df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df['Churn_Binary'] = (df['Churn'] == 'Yes').astype(int)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Contract type
contract_churn = df.groupby('Contract')['Churn_Binary'].mean()
# Month-to-month    0.427  (15x higher than two-year)
# One year          0.113
# Two year          0.028

# Internet service
internet_churn = df.groupby('InternetService')['Churn_Binary'].mean()
# DSL              0.189
# Fiber optic      0.419  (2.2x the DSL rate)
# No               0.074

# Tenure bucket analysis
df['tenure_band'] = pd.cut(df['tenure'],
    bins=[0,12,24,48,72],
    labels=['0-12m','12-24m','24-48m','48m+'])
tenure_churn = df.groupby('tenure_band')['Churn_Binary'].mean()
# 0-12m    0.477  (highest-risk onboarding window)
# 12-24m   0.312
# 24-48m   0.154
# 48m+     0.066
02
Feature importance

What is really driving the decision to leave?

Context — what feature importance means

A Random Forest model learns patterns from all input fields (tenure, monthly charges, contract type, add-ons, etc.). Feature importance scores show which inputs the ensemble relied on most when predicting churn. This is not causal proof — but it tells product and retention teams which levers the model treats as informative.

Business insight

Tenure and billing fields dominate; age-style demographics fade. That implies operational and pricing data you already hold outperform segment marketing based on who someone “is.” Bundle depth (num_services) shows up strong — sticky products reduce churn before tenure compounds.

49.9%
of total RF importance
Explained by tenure plus charges alone. Behavioral, not demographic.
Rank 6
Engineered: num_services
Bundle depth (0 to 8 services) creates switching costs. Each added product reduces churn probability.
3.1%
Senior Citizen importance
The weakest feature in the model. Demographic targeting wastes budget when behavioural data is available.
Evidence — top 8 Random Forest feature importances

How to read this: longer bars mean the model split on that variable more often when deciding churn risk. Values sum to 1.0 across all features; the horizontal axis is importance weight (0–0.20 on this chart). Model: n_estimators=200, max_depth=10.

0 0.05 0.10 0.15 0.20 tenure 0.187 monthly_charges 0.164 total_charges 0.148 contract_type 0.089 internet_service 0.071 num_services * 0.065 online_security 0.062 tech_support 0.058 * engineered feature · top 3 behavioural features = 49.9% of total importance
Recommendation — product & onboarding

At onboarding, offer a bundle incentive to add two or three add-on services (security, backup, TV). That raises num_services early, increases switching cost, and aligns with what the model already weights — without waiting for tenure to accumulate.

Technical — feature engineering & RF importance (Python) expand_more
# Feature engineering and Random Forest importance (scikit-learn)
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OrdinalEncoder
from sklearn.model_selection import train_test_split
import pandas as pd

# Engineer service-bundle depth (0 to 8)
svc_cols = ['PhoneService','MultipleLines','OnlineSecurity',
            'OnlineBackup','DeviceProtection','TechSupport',
            'StreamingTV','StreamingMovies']
df['num_services'] = df[svc_cols].apply(
    lambda r: (r == 'Yes').sum(), axis=1)

# Ordinal-encode categoricals
cat_cols = df.select_dtypes('object').columns.drop('Churn')
enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
df[cat_cols] = enc.fit_transform(df[cat_cols])

X = df.drop(columns=['customerID','Churn','Churn_Binary'])
y = df['Churn_Binary']
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42)

rf = RandomForestClassifier(
    n_estimators=200, max_depth=10,
    min_samples_leaf=20, class_weight='balanced',
    random_state=42, n_jobs=-1)
rf.fit(X_tr, y_tr)

imp = pd.Series(rf.feature_importances_, index=X.columns)
print(imp.nlargest(8).round(3).to_string())
# tenure             0.187
# MonthlyCharges     0.164
# TotalCharges       0.148
# Contract           0.089
# InternetService    0.071
# num_services       0.065  (engineered)
# OnlineSecurity     0.062
# TechSupport        0.058
03
Model evaluation

How accurately can we predict who will leave?

Context — what we compared

We hold out a random 20% test set (stratified by churn) and train two models on the rest: logistic regression (linear, easy to explain) and Random Forest (non-linear). We report AUC-ROC to compare ranking quality, plus accuracy, precision, and recall at the model’s default threshold — then tune a probability threshold (0.35) for the forest model to favour catching more churners for outbound campaigns.

Business insight

Random Forest lifts AUC to 0.867 vs 0.844 for logistic regression — better ranking of who to contact first. At a 0.35 probability cut-off, roughly 72 of every 100 flagged customers are true churners (precision 72.3%), which matters when call-centre capacity is limited: fewer wasted conversations.

How to read the cards below: Accuracy is overall correctness. Precision is “when we say churn, how often we’re right” (save spend). Recall is “of all churners, how many we catch” (coverage). The dark card’s footnote explains precision at the tuned threshold, not at default 0.5.

Logistic Regression
Interpretable baseline · StandardScaler + C=0.1
0.844
AUC-ROC
Accuracy80.9%
Precision65.6%
Recall55.4%
Value: interpretable coefficients map directly to retention rule design.
Best
Random Forest
n_estimators=200 · max_depth=10 · threshold=0.35
0.867
AUC-ROC
Accuracy80.1%
Precision63.5%
Recall49.3%
At probability threshold 0.35 (below default 0.5): 72.3% precision — 72 of every 100 outbound contacts are genuine churners.
Strategic insight

The business does not need “the highest accuracy” — it needs ranked lists and efficient outreach. AUC answers ranking; precision at an agreed threshold answers “how many calls are wasted.” Retention should own the threshold choice jointly with analytics.

Recommendation — analytics & CRM

Ship the Random Forest scores to the CRM as a weekly field; default campaigns to the top decile by risk, then expand until contact capacity binds. Revisit the probability threshold each quarter using cost-per-save and capacity — not only static accuracy tables.

Technical — LR vs RF evaluation pipeline (Python) expand_more
# Full evaluation pipeline: LR vs Random Forest (scikit-learn)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, precision_score, recall_score, accuracy_score

# Logistic Regression (scaled, class-balanced)
lr_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf',    LogisticRegression(C=0.1, max_iter=1000, class_weight='balanced'))
])
lr_pipe.fit(X_tr, y_tr)
lr_pred  = lr_pipe.predict(X_te)
lr_proba = lr_pipe.predict_proba(X_te)[:,1]

# Random Forest
rf = RandomForestClassifier(
    n_estimators=200, max_depth=10, min_samples_leaf=20,
    class_weight='balanced', random_state=42, n_jobs=-1)
rf.fit(X_tr, y_tr)
rf_pred  = rf.predict(X_te)
rf_proba = rf.predict_proba(X_te)[:,1]

for name, pred, proba in [('LR',lr_pred,lr_proba),('RF',rf_pred,rf_proba)]:
    print(f"{name}  ACC={accuracy_score(y_te,pred):.3f}"
          f"  PREC={precision_score(y_te,pred):.3f}"
          f"  REC={recall_score(y_te,pred):.3f}"
          f"  AUC={roc_auc_score(y_te,proba):.3f}")
# LR  ACC=0.809  PREC=0.656  REC=0.554  AUC=0.844
# RF  ACC=0.801  PREC=0.635  REC=0.493  AUC=0.867

# Threshold tuning at 0.35 for maximum recall
rf_pred_t = (rf_proba >= 0.35).astype(int)
# PREC=0.723  REC=0.681  (72 of 100 flagged contacts are genuine churners)
04 — Context: from analysis to operations

Sections 01–02 showed where churn concentrates (contracts, fibre, tenure) and which fields the model trusts. Section 03 showed how well scores rank customers. Here we connect that to who does what: data engineering ships weekly scores; retention runs prioritised outreach; product tests offers tied to those scores — not blanket discounts.

Retention strategy

How to act on this

The RF pipeline produces a weekly churn probability per active account. Retention works down the ranked list — highest risk first — instead of uniform “save everyone” campaigns.

Strategic insight

Predicting churn requires behavioural and product signals (tenure, charges, bundle depth), not age-based segments alone — those fields are known at onboarding and refresh every bill cycle.

Recommendation

Tie save offers to contract upgrade + bundle expansion (Sections 01–02), and measure uplift by model decile — prove ROI before widening spend.

30d
Instrument and score
Deploy the RF pipeline. Score all active customers weekly. Surface churn probability in the CRM alongside contract type, tenure, and renewal date.
60d
Pilot retention offers
Run A/B campaigns on the top 1,000 predicted churners. Test contract upgrade incentives for month-to-month customers and bundle additions for low num_services accounts.
90d
Embed and iterate
Retrain on pilot outcomes. Shift threshold to match team capacity. Extend to time-to-churn prediction for high-CLV accounts at risk of early exit.