That's a good question.
I think the basic answer is that there does not seem to be an "LNS that seems to have had the most lockups".
These numbers are made up/approximate but I'm using them to hopefully try and illustrate the situation :
Let's say we have 10 live LNS running. And about every 30 days one hangs. This means we can say that means there is a hang approximately every 300 "LNS days". "LNS days" being a measure a bit like "man hours". The blips are still a fairly rare event (though obviously not rare enough). This means waiting a quite considerable amount of time to know for sure if a fix has worked. The way to progress fixing it at maximum speed is to deploy it as widely as possible, to try and ramp up the speed of acquisition of "LNS days", in the fewest number of "day days".
This does go back to my point about being a hardware developer as well an ISP, and how sometimes occasionally these two differing objectives can collide head on.
Your point is well received though that every kind of outage is bad, and that of course we do have to shuffle customers around (which can be seen as a "drop" to do the upgrades themselves.
Just to clarify though: when we were testing new software, obviously it ran on our test rig at our offices first, then loaded one at a time onto live LNS. The plan was to update them all one by one over a week or two. A hang occurred after doing I think only two live LNS, so the plan was halted, and then the rollback decision was made.
Alex.
---
Bloor
GM, A&A.