Mr wrote on Jun 30
th, 2014 at 10:43am:
How could they not complete in the massive window of time that they gave themselves?
You really have no clue...
Just to give you an idea...
$TELCO calls at 3 AM, Card 4*10 Gigabit went down on a backbone router... at 1AM. It needs to be replaced. ( notice that some customers have already been down for two hours ).
I ask for all the relevant information ( site location, how to get in the site, and so on ) and then call $SUB-CONTRACTOR ( dubbed as $SUB after ) for an on site tech to go there to replace the card. $SUB calls 30 minutes later asking me for more details ( site acces and location.. .dummy, I sent them to you when I opened the ticket ), I repeat the relevant information... 15 minutes laters $SUB gives me the name and mobile number of the tech that will change the card. ( yeah another 45 minutes lost )
Site is in the backyard of nowhere in Froggy Land, The tech from $SUB will take one hour to reach his office and grab cards ( notice the plural ) and then he will have to dr'ive for 2 hours... ( So basically when he reach the site [note it doesn't means when he replace the card, just parking his car near the site ], the card has been down for 5H45... )
Once there site access was good ( he got lucky, it's not uncommon that they can't get inside ), so he replaces the card ( add 15 to 30 more minutes ), new cards is DOA ( Dead on Arrival, about 10% are like that... in some case it's up to 20% ), he needs to go back to car to fetch the other card ( he got lucky he could bring two cards ) [ 15 more minutes ]... Second Card is good...
So in the end for the people connected to that router ( Bad Luck, the card was carrying the backbone ports, so lots of people was isolated ) the downtime was... Almost 6 hours...
And that's a lucky case...
I can give you one recent example where the time between being called ( 1hour after failure ) and the time for the new cards to work is more than 12 hours.
So since usually the Monday Maintenance is datacenter stuff, it can means so many things ( moving servers to another rack cabinet, replacing all the switches, upgrading the SAN, new firmwares on stuff... and Windows Patches ) that Massive Maintenance Windows are not that massive.
I could also tell you about a system update for a French $TELCO that spanned from Friday 6PM to Monday 7 AM... and at Sunday 5PM we noticed that we had forgotten half the data in the migration... It was a happy night after that ( Happy Coding, Happy data extracting and Happy
keep the customer busy so that he won't notice that we have a Problem.
Yes we managed to avoid having to extend the maintenance window, but it had a cost : nobody with any knowledge on that system was at work on monday, to the despair of the Project Mangler.