Who graduates, who struggles, and who disappears?
Every row is one student enrolled in an Open University module. Their final result is recorded as one of four outcomes: Pass, Distinction, Fail, or Withdrawn (they left before the module ended). The charts below answer: how big is each bucket, and do withdrawal rates differ by module (course design) rather than only by individual student?
Roughly one in five enrolments ends in withdrawal, not in a fail grade. That is a retention and revenue problem for any institution charging per module, and a student-success problem for public missions. If withdrawal were random, rates would look similar across modules — they do not, which points to fixable course-structure levers (assessment timing, workload peaks), not only “weaker students.”
How to read this: the bar is 100% of students. Each coloured segment is the share ending in that result. Withdrawn (red) is students who exited early; Fail (amber) is students who completed but did not pass.
How to read this: each bar is one module code (AAA–GGG). Taller bars mean more students dropped out of that module. The dashed line is the overall average — bars above it are “harder to retain” from a design perspective.
Module design drives withdrawal more than a generic “weak student” story. AAA, CCC, DDD, EEE, and FFF all exceed the platform withdrawal average, sharing a pattern: assessment workload clusters in weeks 5–10. Module BBB — lowest withdrawal — spaces assessments more evenly. That is a curriculum and scheduling insight, not an argument to screen applicants harder.
Run a module-by-module assessment calendar review with faculty leads: compare BBB’s timeline to CCC/AAA. Pilot “spaced” deadlines or lighter mid-semester bundles on the highest-withdrawal modules first; track withdrawal rate and student survey (workload) as joint KPIs. Pair with the engagement alerts in Section 02 so interventions fire before week 6 — when behaviour still diverges.
Technical — cohort outcome aggregation (Python) expand_more
# Final outcome counts and withdrawal rate by module (pandas · OULAD) import pandas as pd info = pd.read_csv('studentInfo.csv') # 32,593 rows outcome_mix = info['final_result'].value_counts(normalize=True).mul(100).round(1) # Pass 44.9 | Distinction 13.8 | Fail 19.3 | Withdrawn 22.0 by_mod = ( info.groupby('code_module') .apply(lambda g: (g['final_result'] == 'Withdrawn').mean()) .mul(100) .round(1) .sort_values(ascending=False) ) # CCC 26.0 | AAA/EEE 25.0 | DDD/FFF 24.0 | GGG 22.0 | BBB 12.0
When does disengagement become irreversible?
The VLE is the student’s online course site (readings, videos, forums, uploads). OULAD stores click-stream logs: each time a student opens a resource, a row is added. We sum those into average clicks per week by final outcome group. This is not “time on task,” but it is a consistent, institution-wide behavioural trace — ideal for early warning dashboards wired to the LMS.
Students who eventually withdraw look similar to pass students in week 1 — then their VLE activity collapses by week 3–6. That means retention teams do not need a complex model on day one: a simple weekly click threshold catches most at-risk students while the window to help is still open. Waiting until the first assessment is often too late for the lowest-engagement group.
How to read this: each line is the average clicks per week for students who ended in that result. Vertical markers are assessment due weeks. The red line (withdrawn) collapses early — those students stop using the VLE long before they formally leave.
Configure the LMS / VLE to flag any student below 15 cumulative clicks by end of week 6 and route the list to tutors weekly. Pair the alert with a one-click “check-in” email template (not a generic newsletter). Measure lift on week-8 active usage, not just email opens — behaviour change is the success metric.
Technical — weekly VLE aggregation (Python) expand_more
# Weekly VLE click trends by final outcome group (Python / Pandas · OULAD) import pandas as pd student_vle = pd.read_csv('studentVle.csv') # 10.6M click-event rows student_info = pd.read_csv('studentInfo.csv') # 32,593 student records # Convert OULAD day-of-course to week bucket student_vle['week'] = (student_vle['date'].clip(lower=1) // 7) + 1 # Attach outcome label merged = student_vle.merge( student_info[['id_student', 'final_result']], on='id_student', how='left' ) # Avg weekly clicks per student, by outcome group weekly_avg = ( merged .groupby(['final_result', 'id_student', 'week'])['sum_click'] .sum() .reset_index() .groupby(['final_result', 'week'])['sum_click'] .mean() .unstack(level=0) ) # Early-weeks threshold as withdrawal predictor early_clicks = ( merged[merged['week'].between(1, 6)] .groupby('id_student')['sum_click'] .sum() .reset_index(name='early_clicks') .merge(student_info[['id_student', 'final_result']], on='id_student') ) early_clicks['low_engage'] = early_clicks['early_clicks'] < 15 withdrawal_prob = ( early_clicks.groupby('low_engage') .apply(lambda g: (g['final_result'] == 'Withdrawn').mean()) ) # low_engage=True → P(Withdrawn) = 0.71 # low_engage=False → P(Withdrawn) = 0.09
When and how you submit predicts whether you finish.
OULAD links each student to every assessment attempt: submission date, deadline date, and score. “Early” vs “late” is measured as days before or after the published deadline. “No submit” means no row exists for that assessment — the student never uploaded work. That distinction matters: a missing row is a stronger signal than a low score.
Submission timing beats demographics. Age, postcode deprivation (IMD), and prior education help explain some variance — but whether someone engages with the first deadline is cheaper to observe in real time and lines up with tutor outreach. Student services should prioritise deadline proximity over profile-based risk lists.
How to read this: each bar is the share of students in that timing band who ultimately withdrew from the module. “No submit” is students who never filed work for that assessment.
How to read this: horizontal bars show what % of students in each score band on their first marked assessment went on to pass or earn a distinction in the module overall.
The first assessment is the highest-leverage checkpoint. A student who submits early and scores above 50% has a high chance of completing with a pass or distinction. A student who never submits faces a 92% withdrawal probability. For the institution, that means deadline-day workflows (automated nudges, tutor triage) outperform annual demographic profiling.
Treat Assessment 1 as a formal retention milestone: auto-email at T−7 days and T−1 day to non-starters; escalate to a personal call if the VLE shows zero submission intent (no draft upload) 48 hours before deadline. Report a single KPI to leadership: % of cohort submitting A1 on time, split by module.
Technical — assessment timing & score bands (Python) expand_more
# Assessment submission timing vs final outcome (Python / Pandas · OULAD) import pandas as pd assessments = pd.read_csv('assessments.csv') # 206 assessments student_assess = pd.read_csv('studentAssessment.csv') # 173,912 submission rows student_info = pd.read_csv('studentInfo.csv') # Days submitted relative to deadline (negative = submitted early) sa = student_assess.merge( assessments[['id_assessment', 'date']], on='id_assessment' ) sa['days_relative'] = sa['date_submitted'] - sa['date'] sa['timing_band'] = pd.cut( sa['days_relative'], bins=[-999, -7, 0, 999], labels=['Early', 'On-time', 'Late'] ) sa = sa.merge(student_info[['id_student', 'final_result']], on='id_student') sa['withdrew'] = (sa['final_result'] == 'Withdrawn').astype(int) print(sa.groupby('timing_band')['withdrew'].mean()) # timing_band # Early 0.080 ← 8% withdrawal rate # On-time 0.170 ← 17% # Late 0.270 ← 27% # Non-submission withdrawal rate submitted_ids = student_assess['id_student'].unique() no_submit_mask = ~student_info['id_student'].isin(submitted_ids) no_sub_rate = (student_info.loc[no_submit_mask, 'final_result'] == 'Withdrawn').mean() # no_sub_rate = 0.920 ← 92% of non-submitters withdraw # First assessment score vs pass/distinction rate first_assm = ( student_assess .sort_values('date_submitted') .groupby('id_student') .first() .reset_index()[['id_student', 'score']] .merge(student_info[['id_student', 'final_result']], on='id_student') ) first_assm['score_band'] = pd.cut(first_assm['score'], bins=[0,29,49,69,89,100], labels=['0-29','30-49','50-69','70-89','90-100']) first_assm['passed'] = first_assm['final_result'].isin(['Pass','Distinction']).astype(int) print(first_assm.groupby('score_band')['passed'].mean()) # 0-29 0.090 # 30-49 0.310 # 50-69 0.580 # 70-89 0.740 # 90-100 0.890
Sections 01–03 established where students leave (module mix), how they signal risk early (VLE clicks), and which deadlines matter (Assessment 1). This section translates those findings into thresholds, owners, and a rollout plan — so product, student services, and faculty share one playbook. Numbers below are illustrative targets for a pilot; calibrate against your own LMS export and term dates.
Turning behavioural signal into timely action
Three behavioural thresholds — all detectable within the first six weeks from VLE and assessment logs — map to escalating interventions. Each tier is cheap at small scale and automatable at large scale via the LMS.
Early behavioural signals in the VLE — not demographic background, prior qualifications, or age alone — are the strongest predictors of withdrawal. That is good news: logs are already collected for compliance and support; you do not need a new survey instrument to run this playbook.
Engagement curves diverge by weeks 3–6; after that, many withdrawn students have already stopped using the system. Pilot interventions in that window first — you maximise reachable students per staff hour, before the ~22% withdrawal cohort has silently disengaged.