Before you scroll
Key terms in this case study
E-commerce and data tooling use shorthand that business readers may not share. These definitions match how the analysis uses each idea.
- Olist & marketplace
- Olist is a Brazilian e-commerce dataset of orders, sellers, products, and reviews — used here as a stand-in for a multi-seller marketplace. A seller is a merchant fulfilling orders; the platform tracks their dispatch timing and customer feedback.
- Dispatch & carrier handoff
- Dispatch delay is days from order approval to when the package reaches the carrier. Late handoff means the seller missed an internal window (e.g. >3 days to carrier). That is upstream of last-mile delivery — it is a seller-operations metric.
- SLA
- A formal or informal commitment on how fast sellers must ship or how fast orders must arrive. Here, SLAs are used as operating rules (e.g. two-day dispatch for high-risk sellers), not legal text.
- Freight-to-price ratio
- Shipping cost ÷ item price for each order line. When this ratio is high, shipping eats margin — the “loss zone” flags lines where freight is ≥80% of the item price.
- Snowflake & Power BI
- Snowflake is a cloud data warehouse where SQL runs on large tables. Power BI is the reporting layer: dashboards and KPIs for executives. The case study flows warehouse SQL → metrics → Power BI narrative.
- Seller risk tiers (High / Standard / Premium)
- Clusters derived from dispatch and late-handoff behaviour (see Scikit-Learn in the hero). High Risk sellers systematically miss timing; Premium meet expectations. Tiers drive differentiated actions — not one-size-fits-all penalties.
A 60-hour end-to-end analysis on Olist marketplace tables: 99,441 orders and 3,095 sellers. The work connects warehouse SQL to a decision story: where sellers break delivery promises, where freight erodes margin, and how lateness hits review scores — then packages that in Power BI for leadership.
Operational leakage and customer dissatisfaction show up in structured data (timestamps, prices, ratings) before they show up in board decks. The executive summary KPIs below are the “so what” before the deep dives: risk concentration, satisfaction gap, margin exposure, and service recovery opportunity.
Evidence — headline KPIs
Executive summary
Built a 60-hour, end-to-end analysis on the Olist marketplace tables covering 99,441 orders and 3,095 sellers to reduce operational leakage, protect profit margins, and improve delivery outcomes. I translated SQL + statistics into a decision-ready Power BI narrative by combining dispatch KPIs, seller risk tiers, freight economics ratios, and delivery-delay intelligence.
21.4%
Sellers Are High Risk
287 of 1,343 consistently break the delivery promise
46.2%
1-Star Rate When Late
vs 6.6% for on-time, a 7× satisfaction gap
6.1%
Items in Freight Loss Zone
Shipping cost ≥ 80% of item price, margin at risk
+1.72★
Score Gap: Late vs On-Time
2.57 → 4.29 avg review score, fixable with dispatch SLA
Which sellers are breaking the delivery promise?
Seller risk tiersSellers with at least 10 delivered orders were scored on dispatch delay and late handoff to the carrier. They were grouped into High Risk, Standard, and Premium tiers so operations can prioritise enforcement and incentives differently — not a single global average.
Roughly one in five qualified sellers sits in the High Risk bucket. That concentration means marketplace policy (SLAs, fees, visibility) can target a known subset instead of punishing the whole seller base.
Out of 1,343 qualified sellers, 287 (21.4%) fall into a High Risk pattern: they consistently dispatch too slowly or miss carrier handoff windows. This group takes on average 3× longer to ship than Premium sellers and is responsible for a disproportionate share of late deliveries and low review scores. The remaining sellers are either performing acceptably (Standard, 56.1%) or exceeding expectations (Premium, 22.6%).
Evidence — tier mix (qualified sellers)
How to read this: the bar chart shows the percentage of sellers in each tier. Switch to Technical (SQL) for the Snowflake logic that builds dispatch metrics per seller.
-- Seller dispatch risk scorecard (Snowflake · OLIST.RAW) WITH order_dispatch AS ( SELECT DISTINCT -- one row per (seller, order), prevents oi.seller_id, -- multi-item orders double-counting delay o.order_id, DATEDIFF('day', o.order_approved_at, o.order_delivered_carrier_date) AS dispatch_delay_days, CASE WHEN o.order_delivered_carrier_date -- late if seller took >3 days > DATEADD('day', 3, o.order_approved_at) THEN 1 ELSE 0 END AS late_handoff FROM ORDERS o JOIN ORDER_ITEMS oi USING (order_id) WHERE o.order_status = 'delivered' AND o.order_approved_at IS NOT NULL AND o.order_delivered_carrier_date IS NOT NULL ) SELECT seller_id, COUNT(*) AS orders_count, ROUND(AVG(dispatch_delay_days), 2) AS avg_dispatch_days, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY dispatch_delay_days) AS p50_delay, PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY dispatch_delay_days) AS p90_delay, ROUND(AVG(late_handoff), 3) AS late_handoff_rate FROM order_dispatch GROUP BY seller_id HAVING COUNT(*) >= 10 -- use expression, not alias ORDER BY p90_delay DESC; -- 1,343 sellers qualified | platform p90 = 6.0 days -- Top risk seller: p90 = 53.2 days, avg_score = 2.5
Seller risk scoring needs both dispatch metrics and customer voice — NLP on reviews adds early warning that SLA-only scores miss. Tiering turns that into actionable governance: different rules per segment.
High Risk: Enforce 2-day dispatch SLA; escalate or suspend repeaters.
Standard: Flag if late rate exceeds 20%.
Premium: Protect and incentivise.
What is draining margin?
Cost cuttingFor every delivered order line, we computed freight value ÷ item price (freight-to-price ratio). That shows where shipping consumes margin regardless of category marketing — the histogram buckets lines from cheap-to-ship to “freight dominates price.”
Most lines are healthy; damage sits in the long tail — a small share of items with extreme ratios. Electronics and phone accessories concentrate exposure, so shipping rules can be piloted by category instead of a blunt platform-wide fee change.
Freight policy is a conversion and margin lever at once: narrow pilots let you measure take-rate and margin impact before changing headline subsidies or default carriers — avoiding a shock to seller economics or checkout abandonment.
Evidence — freight-to-price distribution
How to read this: each bar is the share of order items in that ratio bucket. The shaded “loss zone” is freight ≥80% of item price. Use Technical (SQL) for category-level freight leakage.
-- Freight-to-price leakage by category (Snowflake · OLIST.RAW) SELECT p.product_category_name, COUNT(*) AS item_count, ROUND(AVG(oi.freight_value / NULLIF(oi.price, 0)), 3) AS avg_freight_ratio, ROUND(AVG(oi.freight_value), 2) AS avg_freight_brl, ROUND(AVG(oi.price), 2) AS avg_price_brl, SUM(CASE WHEN oi.freight_value > oi.price THEN 1 ELSE 0 END) AS items_freight_over_price FROM ORDER_ITEMS oi JOIN PRODUCTS p USING (product_id) JOIN ORDERS o USING (order_id) -- filter to delivered orders only WHERE o.order_status = 'delivered' AND p.product_category_name IS NOT NULL -- exclude uncategorised products GROUP BY p.product_category_name ORDER BY avg_freight_ratio DESC; -- Top: eletronicos 68.4%, telefonia 50.6%, utilidades_dom 40.3% -- Platform: 4,124 items (3.7%) where freight > item price -- Avg ratio = 32.1% | P90 ratio = 64.3%
I reviewed shipping cost versus item price across 112,650 order items to find where margin leaks. Most shipments are fine, but a small slice is the problem: 6.1% of items have freight costs that are 80% or more of the item price. Electronics (68.4%) and phone accessories (50.6%) drive most of that exposure, exactly where better tiered shipping rules can recover margin.
Pilot category-specific shipping bands (minimum order value, freight caps, or carrier tier) on electronics and telefonia first; measure gross margin per order line before rolling out platform-wide.
Why is customer satisfaction dropping?
Late delivery impact
Using delivered orders only, we classified each line as late if the customer received it after the estimated delivery date, otherwise on-time. We then joined order_reviews to see how star ratings distribute for each bucket — same cohort, apples-to-apples.
Lateness does not just lower the average score — it concentrates damage at 1★: nearly half of late orders land there, while on-time orders rarely do. Fixing ETA accuracy and carrier handoff has outsized impact on public review mix, not only on means.
I compared promised delivery dates versus what actually happened, then checked how that affects both ratings and review text. When orders are late, customers rate them much worse: 2.57 stars on average vs 4.29 stars for on-time orders. And 46.2% of late orders get 1-star reviews.
Review text and stars are a lagging signal for logistics. Investing in ETA reliability and proactive comms (delay notices) often beats reactive review moderation for marketplace trust.
Tie carrier incentives to on-time delivery (not only dispatch); add buffer to estimates in high-variance lanes; surface late-risk orders to ops before the customer leaves a 1★ review.
Evidence — review score vs on-time / late
How to read this: each pair of bars is the share of orders earning that star rating, split by late (orange) vs on-time (green). The left spike shows late orders piling into 1★. Use Technical (SQL) for correlation and cohort counts.
-- Delivery delay vs review score (Snowflake · OLIST.RAW) WITH delays AS ( SELECT DATEDIFF('day', o.order_estimated_delivery_date, o.order_delivered_customer_date) AS delivery_delay_days, r.review_score FROM ORDERS o JOIN ORDER_REVIEWS r USING (order_id) WHERE o.order_status = 'delivered' AND o.order_delivered_customer_date IS NOT NULL AND o.order_estimated_delivery_date IS NOT NULL -- avoid NULL in DATEDIFF ) SELECT ROUND(CORR(delivery_delay_days, review_score), 3) AS pearson_r, ROUND(AVG(CASE WHEN delivery_delay_days > 0 THEN review_score END), 2) AS avg_score_late, ROUND(AVG(CASE WHEN delivery_delay_days <= 0 THEN review_score END), 2) AS avg_score_ontime, SUM(CASE WHEN delivery_delay_days > 0 THEN 1 ELSE 0 END) AS late_order_count, COUNT(*) AS total_orders FROM delays; -- pearson_r = -0.266 | avg_score_late = 2.57 | avg_score_ontime = 4.29 -- 7,661 late orders (8.0%) | 46.2% receive 1-star ratings
The analysis rolls up into a single executive-ready view: BI dashboards connect seller risk, freight leakage, and delivery/review proof so leadership can prioritise SLA enforcement, logistics pilots, and customer trust in one place.
How do I act on this?
Seller risk tiers, freight leakage signals, and delivery-vs-review proof are aggregated in Power BI as an executive decision layer, with a phased 30 / 60 / 90-day plan to instrument, pilot, and scale.
A static PDF ages quickly; a published dashboard (or embed) keeps KPIs tied to refreshed warehouse data so product and ops iterate on the same numbers — especially as carrier mix and seller base change seasonally.