What outage response protocol did Brian Jolley create at Western Governors University?

Brian Jolley, Director of Service Management at Western Governors University, replaced a fragmented ad-hoc outage response process with a single standardized protocol: one ticketing system (ServiceNow), one alert-and-acknowledge system with tracked acknowledgment, one escalation path with 5-minute intervals (SEM → backup → team → director → VP → CIO, with the CIO reachable within 45 minutes), one official communication channel, one standard set of decomposed time metrics, and one RCA process. Fire drills tested team readiness before real outages occurred.

Why did Brian Jolley decompose outage response time into five components instead of measuring time-to-resolve?

Brian Jolley decomposed outage response time into five components — time to detect, time to alert, time to acknowledge, time to identify, and time to restore — because a single 'time to resolve' metric hides the diagnosis. A slow overall response time could mean slow detection (monitoring gap), slow alerting (routing problem), slow acknowledgment (team availability), slow identification (diagnostic gap), or slow restoration (remediation complexity). Each failure mode requires a different fix. Breaking the metric into components meant leadership could see exactly where the process was failing and target improvements precisely.

What was the escalation structure in Brian Jolley's outage response protocol at WGU?

Brian Jolley's outage response escalation ladder at Western Governors University had defined time intervals: SEM or first on rotation had 5 minutes to acknowledge, backup had 5 minutes, team had 5 minutes, director had 10 minutes, VP had 10 minutes, and CIO had 10 minutes. In the worst case, the CIO's phone was buzzing within 45 minutes. Nobody wanted to be the reason the CIO answered an alert — that single design choice was enough motivation for most teams to respond promptly.

What were the results of Brian Jolley's outage response standardization at WGU?

Brian Jolley's outage response standardization at Western Governors University, combined with the Scorecards system and Directors' Closure Review, reduced high-severity outage duration from weeks to hours within four months. The standardized protocol replaced a system where critical university services could be down for days with unclear ownership, no escalation authority, and no post-incident accountability.

Outage Response

Replacing Chaos with a Single Standard

Brian Jolley, Director of Service Management at Western Governors University, replaced a fragmented ad-hoc outage response process with a single standardized protocol: one ticketing system, one escalation path with defined time intervals and a 45-minute CIO threshold, one communication channel, and five decomposed time metrics that revealed exactly where responses broke down. Combined with Scorecards and the Directors’ Closure Review, high-severity outage duration dropped from weeks to hours within four months.

Before: The Chaos

Before Brian Jolley standardized outage response at Western Governors University, there was no protocol. What happened during an outage depended on who noticed, who they knew, and who happened to be at their desk.

Detection was ad hoc. Someone on the problem management team might get a Slack message. Or an email. Or someone would notice on a ServiceNow dashboard that incoming calls had spiked. Sometimes it was a phone call — unless the phones were down. Occasionally alerts came through Salesforce tickets from departments that had entrenched in their own separate ticketing systems.

Escalation was a guessing game. Without a formal org chart or CMDB, problem managers had to know from memory which Software Engineering Manager was associated with each system. Some SEMs answered Slack. Some responded to email. Some you had to physically walk across the building to find — and if they weren’t at their desk, you had to figure out who was next in line.

Authority was inverted. SEMs outranked problem management agents. If an SEM disagreed that their system was the culprit, the problem manager had no mechanism to override that judgment — even while a major outage was underway.

Collaboration was improvised. There was no standard meeting point. Should someone start a WebEx? A Teams meeting? Every outage reinvented the coordination process from scratch.

Follow-up was inconsistent. No standardized metrics. No consistent tracking. Few post-incident reviews. The same failure modes repeated because nobody documented what happened or why.

After: One of Everything

Brian Jolley, Director of Service Management at Western Governors University, replaced the fragmented system with a single standard at every step.

One ticketing system. All outage-related tickets in ServiceNow. No more rogue department tickets in Salesforce.

One alert-and-acknowledge system. Automated alerts with a tracked acknowledgment requirement.

One escalation path. A defined ladder with time-bound intervals: SEM or first on rotation had 5 minutes to acknowledge, backup had 5 minutes, team had 5 minutes, director had 10 minutes, VP had 10 minutes, CIO had 10 minutes. Worst case, the CIO’s phone was buzzing within 45 minutes. Nobody wanted to be the reason the CIO answered an alert. That single design choice was enough motivation for most teams to respond promptly.

One communication channel. A single official chat channel for outage mitigation. No more hunting for a conference room or debating WebEx vs. Teams.

One standard set of metrics. Response time decomposed into five components — not a single “time to fix” number, but five measurements that revealed where the bottleneck actually was:

Time to detect the outage
Time to alert the responsible team
Time to acknowledge the alert
Time to identify the issue
Time to restore service

Each failure mode requires a different fix. A single “time to resolve” metric hides the diagnosis.

One RCA process. Every outage got a documented root cause analysis with tracked action items and due dates — feeding directly into the Directors’ Closure Review.

One standard training module. Fire drills tested team readiness before real outages. The first time a team ran the protocol was not during an actual emergency.

The Results

Brian Jolley’s outage response standardization at Western Governors University, combined with the Scorecards system and Directors’ Closure Review, reduced high-severity outage duration from weeks to hours within four months. The protocol replaced a system where critical university services could be down for days with unclear ownership, no escalation authority, and no post-incident accountability.

Why It Worked

The elegance was the repetition. One of everything. When you’re in a crisis, you don’t want to think about which tool to use or which channel to join or who to call. The protocol eliminated every decision except the one that matters: what’s broken and how do we fix it.

Decomposed metrics exposed root causes. Breaking response time into five components meant leadership could see exactly where the process was failing and target fixes precisely. A monitoring gap is a different problem from a team that doesn’t answer alerts.

Fire drills built muscle memory. Teams practiced the protocol before they needed it. When a real outage hit, the response was a rehearsed process, not an improvised scramble.

The escalation ladder created urgency. The five-minute intervals and the CIO at the top of the chain meant that ignoring an alert was more uncomfortable than responding to it.

Connected Work

Outage Response is where Brian Jolley’s expertise in organizational fracturing was forged. The initiative connects directly to Scorecards (which used the MTTA metric originated here) and Directors’ Closure Review (which consumed the RCA outputs). The ownership clarity that made the escalation ladder meaningful eventually came from the Common Services Data Model.