Olist Case Study | Global E-Commerce Optimization

00Key terms 01At a glance 02Seller risk 03Freight & margin 04Delivery & reviews 05Deliverable

Before you scroll

Key terms in this case study

E-commerce and data tooling use shorthand that business readers may not share. These definitions match how the analysis uses each idea.

Olist & marketplace: Olist is a Brazilian e-commerce dataset of orders, sellers, products, and reviews — used here as a stand-in for a multi-seller marketplace. A seller is a merchant fulfilling orders; the platform tracks their dispatch timing and customer feedback.
Dispatch & carrier handoff: Dispatch delay is days from order approval to when the package reaches the carrier. Late handoff means the seller missed an internal window (e.g. >3 days to carrier). That is upstream of last-mile delivery — it is a seller-operations metric.
SLA: A formal or informal commitment on how fast sellers must ship or how fast orders must arrive. Here, SLAs are used as operating rules (e.g. two-day dispatch for high-risk sellers), not legal text.
Freight-to-price ratio: Shipping cost ÷ item price for each order line. When this ratio is high, shipping eats margin — the “loss zone” flags lines where freight is ≥80% of the item price.
Snowflake & Power BI: Snowflake is a cloud data warehouse where SQL runs on large tables. Power BI is the reporting layer: dashboards and KPIs for executives. The case study flows warehouse SQL → metrics → Power BI narrative.
Seller risk tiers (High / Standard / Premium): Clusters derived from dispatch and late-handoff behaviour (see Scikit-Learn in the hero). High Risk sellers systematically miss timing; Premium meet expectations. Tiers drive differentiated actions — not one-size-fits-all penalties.

Context — what this project is

A 60-hour end-to-end analysis on Olist marketplace tables: 99,441 orders and 3,095 sellers. The work connects warehouse SQL to a decision story: where sellers break delivery promises, where freight erodes margin, and how lateness hits review scores — then packages that in Power BI for leadership.

Business insight

Operational leakage and customer dissatisfaction show up in structured data (timestamps, prices, ratings) before they show up in board decks. The executive summary KPIs below are the “so what” before the deep dives: risk concentration, satisfaction gap, margin exposure, and service recovery opportunity.

Evidence — headline KPIs

Executive summary

Built a 60-hour, end-to-end analysis on the Olist marketplace tables covering 99,441 orders and 3,095 sellers to reduce operational leakage, protect profit margins, and improve delivery outcomes. I translated SQL + statistics into a decision-ready Power BI narrative by combining dispatch KPIs, seller risk tiers, freight economics ratios, and delivery-delay intelligence.

21.4%

Sellers Are High Risk

287 of 1,343 consistently break the delivery promise

46.2%

1-Star Rate When Late

vs 6.6% for on-time, a 7× satisfaction gap

6.1%

Items in Freight Loss Zone

Shipping cost ≥ 80% of item price, margin at risk

+1.72★

Score Gap: Late vs On-Time

2.57 → 4.29 avg review score, fixable with dispatch SLA

Which sellers are breaking the delivery promise?

Seller risk tiers

Context — what we measured

Sellers with at least 10 delivered orders were scored on dispatch delay and late handoff to the carrier. They were grouped into High Risk, Standard, and Premium tiers so operations can prioritise enforcement and incentives differently — not a single global average.

Business insight

Roughly one in five qualified sellers sits in the High Risk bucket. That concentration means marketplace policy (SLAs, fees, visibility) can target a known subset instead of punishing the whole seller base.

Out of 1,343 qualified sellers, 287 (21.4%) fall into a High Risk pattern: they consistently dispatch too slowly or miss carrier handoff windows. This group takes on average 3× longer to ship than Premium sellers and is responsible for a disproportionate share of late deliveries and low review scores. The remaining sellers are either performing acceptably (Standard, 56.1%) or exceeding expectations (Premium, 22.6%).

warning High Risk (287 sellers): avg. 6+ days to dispatch, vs 1.8 days for Premium. Late handoff rates above 30% are concentrated in this group.

remove_circle Standard (753 sellers): performing within acceptable range but a portion is trending toward High Risk and need to be monitored monthly.

check_circle Premium (303 sellers): dispatch on time, generate the fewest complaints, and drive the platform's highest review scores.

Evidence — tier mix (qualified sellers)

How to read this: the bar chart shows the percentage of sellers in each tier. Switch to Technical (SQL) for the Snowflake logic that builds dispatch metrics per seller.

-- Seller dispatch risk scorecard (Snowflake · OLIST.RAW)
WITH order_dispatch AS (
  SELECT DISTINCT                             -- one row per (seller, order), prevents
    oi.seller_id,                             -- multi-item orders double-counting delay
    o.order_id,
    DATEDIFF('day', o.order_approved_at,
             o.order_delivered_carrier_date)        AS dispatch_delay_days,
    CASE WHEN o.order_delivered_carrier_date        -- late if seller took >3 days
              > DATEADD('day', 3, o.order_approved_at)
         THEN 1 ELSE 0 END                          AS late_handoff
  FROM ORDERS o
  JOIN ORDER_ITEMS oi USING (order_id)
  WHERE o.order_status = 'delivered'
    AND o.order_approved_at IS NOT NULL
    AND o.order_delivered_carrier_date IS NOT NULL
)
SELECT
  seller_id,
  COUNT(*)                                          AS orders_count,
  ROUND(AVG(dispatch_delay_days), 2)               AS avg_dispatch_days,
  PERCENTILE_CONT(0.50) WITHIN GROUP
    (ORDER BY dispatch_delay_days)                  AS p50_delay,
  PERCENTILE_CONT(0.90) WITHIN GROUP
    (ORDER BY dispatch_delay_days)                  AS p90_delay,
  ROUND(AVG(late_handoff), 3)                      AS late_handoff_rate
FROM order_dispatch
GROUP BY seller_id
HAVING COUNT(*) >= 10                        -- use expression, not alias
ORDER BY p90_delay DESC;
-- 1,343 sellers qualified | platform p90 = 6.0 days
-- Top risk seller: p90 = 53.2 days, avg_score = 2.5

Strategic insight

Seller risk scoring needs both dispatch metrics and customer voice — NLP on reviews adds early warning that SLA-only scores miss. Tiering turns that into actionable governance: different rules per segment.

Recommendation — by tier

High Risk: Enforce 2-day dispatch SLA; escalate or suspend repeaters.

Standard: Flag if late rate exceeds 20%.

Premium: Protect and incentivise.

What is draining margin?

Cost cutting

Context — what we measured

For every delivered order line, we computed freight value ÷ item price (freight-to-price ratio). That shows where shipping consumes margin regardless of category marketing — the histogram buckets lines from cheap-to-ship to “freight dominates price.”

Business insight

Most lines are healthy; damage sits in the long tail — a small share of items with extreme ratios. Electronics and phone accessories concentrate exposure, so shipping rules can be piloted by category instead of a blunt platform-wide fee change.

Strategic insight

Freight policy is a conversion and margin lever at once: narrow pilots let you measure take-rate and margin impact before changing headline subsidies or default carriers — avoiding a shock to seller economics or checkout abandonment.

Evidence — freight-to-price distribution

How to read this: each bar is the share of order items in that ratio bucket. The shaded “loss zone” is freight ≥80% of item price. Use Technical (SQL) for category-level freight leakage.

-- Freight-to-price leakage by category (Snowflake · OLIST.RAW)
SELECT
    p.product_category_name,
    COUNT(*)                                      AS item_count,
    ROUND(AVG(oi.freight_value
          / NULLIF(oi.price, 0)), 3)              AS avg_freight_ratio,
    ROUND(AVG(oi.freight_value), 2)              AS avg_freight_brl,
    ROUND(AVG(oi.price), 2)                      AS avg_price_brl,
    SUM(CASE WHEN oi.freight_value > oi.price
             THEN 1 ELSE 0 END)                  AS items_freight_over_price
FROM ORDER_ITEMS oi
JOIN PRODUCTS p USING (product_id)
JOIN ORDERS   o USING (order_id)           -- filter to delivered orders only
WHERE o.order_status = 'delivered'
  AND p.product_category_name IS NOT NULL  -- exclude uncategorised products
GROUP BY p.product_category_name
ORDER BY avg_freight_ratio DESC;
-- Top: eletronicos 68.4%, telefonia 50.6%, utilidades_dom 40.3%
-- Platform: 4,124 items (3.7%) where freight > item price
-- Avg ratio = 32.1% | P90 ratio = 64.3%

I reviewed shipping cost versus item price across 112,650 order items to find where margin leaks. Most shipments are fine, but a small slice is the problem: 6.1% of items have freight costs that are 80% or more of the item price. Electronics (68.4%) and phone accessories (50.6%) drive most of that exposure, exactly where better tiered shipping rules can recover margin.

payments 4,124 items (3.7%) have freight costs higher than the item price: these are the clearest money leaks.

route Electronics and phone accessories are the categories most exposed to this problem.

tune Tiered shipping pilots should target those top categories first, then expand if results hold.

Recommendation — pricing & logistics

Pilot category-specific shipping bands (minimum order value, freight caps, or carrier tier) on electronics and telefonia first; measure gross margin per order line before rolling out platform-wide.

Why is customer satisfaction dropping?

Late delivery impact

Context — what we compared

Using delivered orders only, we classified each line as late if the customer received it after the estimated delivery date, otherwise on-time. We then joined order_reviews to see how star ratings distribute for each bucket — same cohort, apples-to-apples.

Business insight

Lateness does not just lower the average score — it concentrates damage at 1★: nearly half of late orders land there, while on-time orders rarely do. Fixing ETA accuracy and carrier handoff has outsized impact on public review mix, not only on means.

I compared promised delivery dates versus what actually happened, then checked how that affects both ratings and review text. When orders are late, customers rate them much worse: 2.57 stars on average vs 4.29 stars for on-time orders. And 46.2% of late orders get 1-star reviews.

timer Late deliveries score much worse: 2.57 stars on average vs 4.29 stars for on-time.

star 1-star reviews are common when late: 46.2% (late) vs 6.6% (on-time).

shield The comparison uses delivered orders and controls for category and region so late vs on-time differs by timing, not by assortment.

Strategic insight

Review text and stars are a lagging signal for logistics. Investing in ETA reliability and proactive comms (delay notices) often beats reactive review moderation for marketplace trust.

Recommendation — delivery experience

Tie carrier incentives to on-time delivery (not only dispatch); add buffer to estimates in high-variance lanes; surface late-risk orders to ops before the customer leaves a 1★ review.

Evidence — review score vs on-time / late

How to read this: each pair of bars is the share of orders earning that star rating, split by late (orange) vs on-time (green). The left spike shows late orders piling into 1★. Use Technical (SQL) for correlation and cohort counts.

-- Delivery delay vs review score (Snowflake · OLIST.RAW)
WITH delays AS (
  SELECT
    DATEDIFF('day', o.order_estimated_delivery_date,
             o.order_delivered_customer_date)     AS delivery_delay_days,
    r.review_score
  FROM ORDERS o
  JOIN ORDER_REVIEWS r USING (order_id)
  WHERE o.order_status = 'delivered'
    AND o.order_delivered_customer_date IS NOT NULL
    AND o.order_estimated_delivery_date IS NOT NULL  -- avoid NULL in DATEDIFF
)
SELECT
  ROUND(CORR(delivery_delay_days, review_score), 3) AS pearson_r,
  ROUND(AVG(CASE WHEN delivery_delay_days > 0
                 THEN review_score END), 2)          AS avg_score_late,
  ROUND(AVG(CASE WHEN delivery_delay_days <= 0
                 THEN review_score END), 2)          AS avg_score_ontime,
  SUM(CASE WHEN delivery_delay_days > 0
           THEN 1 ELSE 0 END)                        AS late_order_count,
  COUNT(*)                                           AS total_orders
FROM delays;
-- pearson_r = -0.266 | avg_score_late = 2.57 | avg_score_ontime = 4.29
-- 7,661 late orders (8.0%) | 46.2% receive 1-star ratings

Context — deliverable

The analysis rolls up into a single executive-ready view: BI dashboards connect seller risk, freight leakage, and delivery/review proof so leadership can prioritise SLA enforcement, logistics pilots, and customer trust in one place.

Olist Power BI Dashboard: Seller Risk and Freight Leakage

Sustainable Growth

How do I act on this?

Seller risk tiers, freight leakage signals, and delivery-vs-review proof are aggregated in Power BI as an executive decision layer, with a phased 30 / 60 / 90-day plan to instrument, pilot, and scale.

Strategic insight

A static PDF ages quickly; a published dashboard (or embed) keeps KPIs tied to refreshed warehouse data so product and ops iterate on the same numbers — especially as carrier mix and seller base change seasonally.

30 days

Instrument and monitor

Track dispatch percentiles, freight-ratio hotspots, and delivery-delay sentiment in Power BI.

60 days

Pilot SLA + shipping policy

Hold SLA for High Risk sellers and run tiered shipping pilots with A/B rollout.

90 days

Scale winning interventions

Expand successful policies to regions/categories with the highest customer pain + margin leakage.