We're rolling out Procedures gradually to protect quality, which means we need a way to measure the impact of a Procedure on resolution success before expanding its scope. Today the only way to control which conversations a Procedure runs against is to gate it on a customer attribute (e.g. user segment, plan type). That introduces cohort bias: the two groups experience the product differently to begin with, so we can't isolate the effect of the Procedure itself. The comparison isn't apples-to-apples.
Request:
A native A/B test mode on individual Procedures, where:
- Fin randomly assigns eligible conversations to treatment (Procedure runs) or control (Procedure doesn't run) at a configurable split.
- Eligibility is defined by the Procedure's existing trigger conditions, so randomisation happens *inside* the qualifying topic space rather than across unrelated cohorts.
- Reporting shows a 1:1 comparison of outcomes between the two arms: resolution rate, CSAT, handover