General Discussion
  >> General Broadband Chatter


Register (or login) on our website and you will not see this ad.


Pages in this thread: 1 | 2 | [3] | 4 | 5 | 6 | 7 | (show all)   Print Thread
Standard User RobertoS
(elder) Mon 29-May-17 00:02:27
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
smile
I know of a few on the forums, and no doubt there are others.

Heads should and no doubt will certainly roll, but they will probably those of the people too junior to have affected policy. In my opinion the first reply was spot on.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User Banger
(eat-sleep-adslguide) Mon 29-May-17 00:11:52
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
Sky News were reporting it had been outsourced to India. But no detail was given but one insider had reportedly said the system had been "flakey and unstable" since its inception last year.

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 70634/18326
Standard User Oliver341
(eat-sleep-adslguide) Mon 29-May-17 00:13:47
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Whatever they saved on outsourcing IT to India they'll probably pay ten times over in compensation and loss of reputation.

Oliver.


Register (or login) on our website and you will not see this ad.

Standard User RobertoS
(elder) Mon 29-May-17 00:23:42
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
Yes, but that is about staffing and where the maintenance and modernisation is done.

That was in mid-2016. But the hardware is the problem. As I said in my OP, fifteen years ago major companies had moved on from the danger of single-point failure and started to cater for it. For the whole global passenger and baggage control systems of BA to simply stop because of any kind of failure at Heathrow is unbelievable.

The technology is available to prevent it, far better and cheaper than less capable hardware and switching software was in those days.

Even worse, it will take years for them to upgrade the infrastructure to stop it happening. It isn't simple, but should have been in place a decade or so ago.

I've just googled "Distributed databases", that being the term used over fifteen years ago when the necessary software was first being developed. I can see from this result that things have progressed far beyond my knowledge of the subject. But it shows the sort of resilient architecture they should have had in place. One of the "Pros" it gives - "Single-site failure does not affect performance of system".

Heathrow would appear not to be "a single site", but the single hub. As I quote in my OP.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User micksharpe
(legend) Mon 29-May-17 00:29:12
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Interesting post on PPRuNe from someone who claims to be involved:

As both SLF (derogatory as that title is) and a highly experienced IT leader (biased towards Infrastructure) and someone who spent nearly 10 hours yesterday in T5 , I feel I have something to contribute.

First off, let's not confuse DR with BCP, although both failed yesterday.

For example while IT wherever were toiling over bringing systems back online , the CW/CE/First queues, that were right out of the terminal, were being "organised" by 2 women who were effectively herding cats. They were on a hiding to nothing as people were joining any one and then losing it when the staff come back round again 20 minutes later telling them to go and join the mega queue at WT. Not enough staff and definitely no sign of Managment at all. This got better during the afternoon, but still no sign of any Senior Staff at all. Even this morning they were trying to get us on a flight as my wife received a text to say it was cancelled but nothing showed on their system. Where were the managers, nowhere to be seen, as they "were in meetings". Maybe those meetings should have been through the night so everyone could be briefed for 0430.
On a more serious note we were told by staff they couldn't find any megaphones to replace the non working PA. I would suggest that these should be easy to find in case of a real emergency.

As for IT, outsourcing is not something I would advocate, but when it has crossed my path, I would never allow a system to go live without:
Rigorous functional testing of system
Rigorous DR Testing
Sign off of all infrastructure designs from someone qualified to do so and counter sign it myself.

The outsourcer should not have unrestricted responsibility for design of something thousands of miles away that isn't theirs. This also makes it easy to swap supplier should they prove to be sub par, which they will.
I guarantee someone within BA has signed that design off as suitable, and that's where heads should roll initially. Then look at your "partner"

Also all the previous posts regarding bean counters are a given as well. Scourge of IT !
On a personal note I'm not actually buying the power excuse but as we don't like to speculate within these halls I'll keep my opinion to myself. I will say however all the systems affected were internet facing.

Anyway, got all that off my chest, and resigned to go back to work on Tuesday instead of enjoying a few cold ones on the Greek coastline !
Standard User Oliver341
(eat-sleep-adslguide) Mon 29-May-17 00:29:53
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
I agree, it's unbelievable. Maybe Tata convinced BA they could cut their IT costs by decommissioning "unnecessary" redundant systems, because clearly their redundant systems, or the management of them, was totally inadequate.

We'll never know what the scope of this cost-cutting exercise was and what corners were cut, because BA will never admit to it.

Oliver.
Standard User ukhardy07
(knowledge is power) Mon 29-May-17 01:14:20
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
In reply to a post by RobertoS:
I assume my OP is too complicated for you.
Not so, just interested given who some of my day to day clients are. Bearing in my I work for an IT security firm and I do internal / external penetration testing, I also work with an IT Security Audit team who often audit the same clients I do technical testing for. All I'm saying is when it comes to some of these lager organisations I certainly can attest to their network topology / OS / security strategy / legacy IT etc.

What you claim to have experienced in your day and what I see today seem worlds apart. When you take a large organisation often times the risk rankings of systems is not fully understood, & nor is the risk appetite of an organisation. It is very common to accept a risk, formally record it, & hand the management of IT to a third party who maintains this "risk register." It is also very common for agreements with third parties to have no clear remediation timelines for vulnerabilities etc and hence OS sit unpatched for long periods of time. I'm not saying it is ideal, although commonly we see IT being outsourced and the vendor governance being poor. Often upper management cannot even attest to the line between their own ITs responsibilities and the third parties & it is a cat and mouse game of who is responsible for basic things.

When you account for all of this mismanagement & political madness, the last thing on anyone's mind is getting every system prioritised and the critical ones fully backed up / replicated sites etc. Even if they try, they'll figure out a tonne of legacy infrastructure exists which needs upgrading to proceed with the project, that'll take a year to figure out, then the business will change their requirements mid project and the cycle continues of nothing getting done for years on end.

If it wasn't like this in your days, it certainly is now.

Edited by ukhardy07 (Mon 29-May-17 01:22:10)

Standard User RobertoS
(elder) Mon 29-May-17 01:28:07
Print Post

Re: BA IT system down. How? Why?


[re: micksharpe] [link to this post]
 
Thanks for that link Mick. A lot if good stuff that i've been reading fir the last 10-15 minutes. Very much along the lines of my opening post here. I'll possibly read more later today. Sleep calls now smile.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User RobertoS
(elder) Mon 29-May-17 01:29:32
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
In reply to a post by Oliver341:
I agree, it's unbelievable. Maybe Tata convinced BA they could cut their IT costs by decommissioning "unnecessary" redundant systems, because clearly their redundant systems, or the management of them, was totally inadequate.

We'll never know what the scope of this cost-cutting exercise was and what corners were cut, because BA will never admit to it.
You make a couple of good points there Oliver.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User RobertoS
(elder) Mon 29-May-17 02:15:12
Print Post

Re: BA IT system down. How? Why?


[re: ukhardy07] [link to this post]
 
All very true, but also just confirms how unacceptable this BA fiasco is.

The principles were the same in my thirty-odd years, but the discipline wrt software was that anybody/any team can write a system to do the required task and demonstrate it working with test data. Only the good snd well-trained treated that as 10% of the work needed before sending it live.

The other 90% was programming to cater for all possible forecastable errors or failures, and any unexpected events, were handled gracefully. That being written into the system and programming specs, and coded in as each routine was written.

Similarly at least two thirds of the testing was of these.

Senior company management were cutting out the budget for that vital 90% by the end of my first ten years. It has been one of the major cost savings that were so easily made.

The security aspects that you deal with of course didn't exist in the early days, so we didn't need to think about them. Those came later, first with mainframe dumb terminals and later PCs with floppy disk drives. They added an increasing complexity, as we all know now we have rampant hacking, phishing and the rest. Keeping you in employment smile.

This BA failure though is, we are told, down to what is apparently the one and only central datacentre going down. Whether it was a power failure, a piece of kit causing it, or we are being lied to and it was a software bug or some other cause is not really the point. There had to be redundant mirror hardware or similar at at least one other site, and regular systematic testing of hardware systems, sub-systems or comms failure.

As per my link to distributed databases.

BA has, (or had until the LSE opens on Tuesday!), a market capitalisation of £12.9bn and �22.5bn revenues in 2016. (Sorry for mixing currencies, in bed holding an iPad up in one hand and typing with one finger).

It can fund resilient systems.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Pages in this thread: 1 | 2 | [3] | 4 | 5 | 6 | 7 | (show all)   Print Thread

Jump to