Before you scroll
Key terms in this case study
Telco and machine-learning write-ups assume readers know churn metrics and model jargon. Skim these once — the same definitions apply in Sections 01–04.
- Churn
- A customer cancels or stops using paid service within the observation window. In the IBM dataset, churn is a yes/no label per account — the modelling target we try to predict early.
- AUC-ROC
- Summarises how well the model ranks who churns vs who stays (0.5 = random, 1.0 = perfect). It does not depend on one decision threshold — useful when marketing chooses different cut-offs for “who to call.”
- Precision & recall
- Precision: of everyone the model flags as “will churn,” how many actually churn — contact budget efficiency. Recall: of everyone who truly churned, how many we caught — coverage. Raising one often lowers the other; the business picks a balance.
- DSL vs fibre
- DSL is slower copper-line internet; fibre (fiber optic) is higher speed. The dataset encodes product type — churn differs by line quality, pricing, and support expectations, not only by speed alone.
- Tenure
- Months the customer has been subscribed. Low tenure usually means weaker switching costs and higher churn — a behavioural signal, not a demographic label.
- RF & logistic regression
- Logistic regression is a simple, interpretable baseline. Random Forest (RF) combines many decision trees to capture non-linear patterns (e.g. contract × charges). We compare both on the same test set.
- Feature importance
- For Random Forest, each input variable gets an importance score (roughly: how much splits on that variable reduce prediction error). High importance means the model leans on that field when scoring churn risk — useful for explaining why a customer is flagged.
Who is at risk of leaving?
The IBM Telco sample has 7,043 customer accounts, each with contract type, internet product, tenure, charges, and add-on services. We computed the share of customers who churned within each segment — not yet using a predictive model, just descriptive rates to find where risk concentrates.
Contract structure and internet product type split the base into very different churn populations. If those splits are wide, retention budget should not be spent evenly — it should follow the segments with the highest churn rate and the highest margin.
How to read this: each bar is the percentage of customers in that group who churned. The vertical axis stops at 50% so differences between contract and product types stay visible. Three panel groups are shown: contract length, internet technology, and tenure band.
Contract structure is the single largest churn lever in this dataset. Month-to-month plans maximise flexibility for customers but minimise commitment for the telco — the data says that trade-off shows up directly in churn. Longer contracts are not just “legal length”; they are behavioural lock-in devices.
Make long-term contracts visibly more valuable at onboarding: bundle discounts, install waivers, or speed tiers that only attach to 12- or 24-month terms. Pair with clear fibre onboarding comms — fibre’s churn rate suggests an expectation gap, not only a network issue.
Technical — churn rate by segment (Python) expand_more
# Churn rate by key segments (Python / Pandas) import pandas as pd df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv') df['Churn_Binary'] = (df['Churn'] == 'Yes').astype(int) df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce') # Contract type contract_churn = df.groupby('Contract')['Churn_Binary'].mean() # Month-to-month 0.427 (15x higher than two-year) # One year 0.113 # Two year 0.028 # Internet service internet_churn = df.groupby('InternetService')['Churn_Binary'].mean() # DSL 0.189 # Fiber optic 0.419 (2.2x the DSL rate) # No 0.074 # Tenure bucket analysis df['tenure_band'] = pd.cut(df['tenure'], bins=[0,12,24,48,72], labels=['0-12m','12-24m','24-48m','48m+']) tenure_churn = df.groupby('tenure_band')['Churn_Binary'].mean() # 0-12m 0.477 (highest-risk onboarding window) # 12-24m 0.312 # 24-48m 0.154 # 48m+ 0.066
What is really driving the decision to leave?
A Random Forest model learns patterns from all input fields (tenure, monthly charges, contract type, add-ons, etc.). Feature importance scores show which inputs the ensemble relied on most when predicting churn. This is not causal proof — but it tells product and retention teams which levers the model treats as informative.
Tenure and billing fields dominate; age-style demographics fade. That implies operational and pricing data you already hold outperform segment marketing based on who someone “is.” Bundle depth (num_services) shows up strong — sticky products reduce churn before tenure compounds.
How to read this: longer bars mean the model split on that variable more often when deciding churn risk. Values sum to 1.0 across all features; the horizontal axis is importance weight (0–0.20 on this chart). Model: n_estimators=200, max_depth=10.
At onboarding, offer a bundle incentive to add two or three add-on services (security, backup, TV). That raises num_services early, increases switching cost, and aligns with what the model already weights — without waiting for tenure to accumulate.
Technical — feature engineering & RF importance (Python) expand_more
# Feature engineering and Random Forest importance (scikit-learn) from sklearn.ensemble import RandomForestClassifier from sklearn.preprocessing import OrdinalEncoder from sklearn.model_selection import train_test_split import pandas as pd # Engineer service-bundle depth (0 to 8) svc_cols = ['PhoneService','MultipleLines','OnlineSecurity', 'OnlineBackup','DeviceProtection','TechSupport', 'StreamingTV','StreamingMovies'] df['num_services'] = df[svc_cols].apply( lambda r: (r == 'Yes').sum(), axis=1) # Ordinal-encode categoricals cat_cols = df.select_dtypes('object').columns.drop('Churn') enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1) df[cat_cols] = enc.fit_transform(df[cat_cols]) X = df.drop(columns=['customerID','Churn','Churn_Binary']) y = df['Churn_Binary'] X_tr, X_te, y_tr, y_te = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42) rf = RandomForestClassifier( n_estimators=200, max_depth=10, min_samples_leaf=20, class_weight='balanced', random_state=42, n_jobs=-1) rf.fit(X_tr, y_tr) imp = pd.Series(rf.feature_importances_, index=X.columns) print(imp.nlargest(8).round(3).to_string()) # tenure 0.187 # MonthlyCharges 0.164 # TotalCharges 0.148 # Contract 0.089 # InternetService 0.071 # num_services 0.065 (engineered) # OnlineSecurity 0.062 # TechSupport 0.058
How accurately can we predict who will leave?
We hold out a random 20% test set (stratified by churn) and train two models on the rest: logistic regression (linear, easy to explain) and Random Forest (non-linear). We report AUC-ROC to compare ranking quality, plus accuracy, precision, and recall at the model’s default threshold — then tune a probability threshold (0.35) for the forest model to favour catching more churners for outbound campaigns.
Random Forest lifts AUC to 0.867 vs 0.844 for logistic regression — better ranking of who to contact first. At a 0.35 probability cut-off, roughly 72 of every 100 flagged customers are true churners (precision 72.3%), which matters when call-centre capacity is limited: fewer wasted conversations.
How to read the cards below: Accuracy is overall correctness. Precision is “when we say churn, how often we’re right” (save spend). Recall is “of all churners, how many we catch” (coverage). The dark card’s footnote explains precision at the tuned threshold, not at default 0.5.
The business does not need “the highest accuracy” — it needs ranked lists and efficient outreach. AUC answers ranking; precision at an agreed threshold answers “how many calls are wasted.” Retention should own the threshold choice jointly with analytics.
Ship the Random Forest scores to the CRM as a weekly field; default campaigns to the top decile by risk, then expand until contact capacity binds. Revisit the probability threshold each quarter using cost-per-save and capacity — not only static accuracy tables.
Technical — LR vs RF evaluation pipeline (Python) expand_more
# Full evaluation pipeline: LR vs Random Forest (scikit-learn) from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.metrics import roc_auc_score, precision_score, recall_score, accuracy_score # Logistic Regression (scaled, class-balanced) lr_pipe = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression(C=0.1, max_iter=1000, class_weight='balanced')) ]) lr_pipe.fit(X_tr, y_tr) lr_pred = lr_pipe.predict(X_te) lr_proba = lr_pipe.predict_proba(X_te)[:,1] # Random Forest rf = RandomForestClassifier( n_estimators=200, max_depth=10, min_samples_leaf=20, class_weight='balanced', random_state=42, n_jobs=-1) rf.fit(X_tr, y_tr) rf_pred = rf.predict(X_te) rf_proba = rf.predict_proba(X_te)[:,1] for name, pred, proba in [('LR',lr_pred,lr_proba),('RF',rf_pred,rf_proba)]: print(f"{name} ACC={accuracy_score(y_te,pred):.3f}" f" PREC={precision_score(y_te,pred):.3f}" f" REC={recall_score(y_te,pred):.3f}" f" AUC={roc_auc_score(y_te,proba):.3f}") # LR ACC=0.809 PREC=0.656 REC=0.554 AUC=0.844 # RF ACC=0.801 PREC=0.635 REC=0.493 AUC=0.867 # Threshold tuning at 0.35 for maximum recall rf_pred_t = (rf_proba >= 0.35).astype(int) # PREC=0.723 REC=0.681 (72 of 100 flagged contacts are genuine churners)
Sections 01–02 showed where churn concentrates (contracts, fibre, tenure) and which fields the model trusts. Section 03 showed how well scores rank customers. Here we connect that to who does what: data engineering ships weekly scores; retention runs prioritised outreach; product tests offers tied to those scores — not blanket discounts.
How to act on this
The RF pipeline produces a weekly churn probability per active account. Retention works down the ranked list — highest risk first — instead of uniform “save everyone” campaigns.
Predicting churn requires behavioural and product signals (tenure, charges, bundle depth), not age-based segments alone — those fields are known at onboarding and refresh every bill cycle.
Tie save offers to contract upgrade + bundle expansion (Sections 01–02), and measure uplift by model decile — prove ROI before widening spend.