General Discussion
  >> General Broadband Chatter


Register (or login) on our website and you will not see this ad.


Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> (show all)   Print Thread
Standard User RobertoS
(elder) Sat 27-May-17 22:34:25
Print Post

BA IT system down. How? Why?


[link to this post]
 
It's fifteen years since I was seriously working in IT, but even then such a complete breakdown would have been almost inconceivable in a major network. Since when comms facilities, mass storage and processor power have all risen exponentially, with the size of the equipment falling dramatically.

BA are saying a major power failure has caused all passenger check-ins and baggage handling systems globally to crash. One!!?? "BA said a problem within the hub of their system, based near Heathrow, had led to a power outage".

Surely something that could affect BA worldwide should have at least two if not more mirrored systems/hubs geographically far apart? With dedicated links with minimal latency. It was possible in my day, so what and why and how can this have happened?

Is there anyone here dealing with large mission-critical systems able to suggest how BA have messed up so badly other than through penny-pinching and complete incompetence of their IT management?

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6

Edited by RobertoS (Sat 27-May-17 22:35:25)

Standard User deleted
(deleted) Sun 28-May-17 07:09:46
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
In reply to a post by RobertoS:
Is there anyone here dealing with large mission-critical systems able to suggest how BA have messed up so badly other than through penny-pinching and complete incompetence of their IT management?

I think you pretty much nailed it there smile
Standard User deleted
(deleted) Sun 28-May-17 09:02:17
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Don't blame the IT team....

I'm sure they just forgot their own advise and missed the 2nd bit of turning it back on laugh


Or on a more serious note. It ask for X Bean counters say No.


Register (or login) on our website and you will not see this ad.

Standard User deleted
(deleted) Sun 28-May-17 10:31:17
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
It seems to me that a bank holiday/half term weekend is just the time for mischief.

One week NHS, another British Airways

Poor security, bad software design all make it too easy

Major wake up call to industry generally
Standard User billford
(elder) Sun 28-May-17 10:45:07
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
In reply to a post by 961a:
One week NHS, another British Airways
There are differences... the NHS (amongst others) was clearly the victim of an attack, there doesn't appear to be any evidence, or reason to assume, the same for BA.

There's an old saying- don't automatically assume conspiracy when simple incompetence provides an adequate explanation.
Major wake up call to industry generally
True, but to look at resilience as well as security.

Bill
A level playing field is level in both directions.

_______________________________________Planes and Boats and ... ______________BQMs: IPv4 IPv6
Standard User ukhardy07
(knowledge is power) Sun 28-May-17 11:03:50
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Does every organisation fully understand the underlying assets that support their services? A common one is a switch going down taking a service offline, hard to mitigate against with zero downtime.

Also do you think every organisation has upto date patching for all software pieces and OS? There is a tonne of legacy out there often with denial of service vulnerabilities associated with it.
Standard User derekdel
(member) Sun 28-May-17 12:14:35
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Probably still running Windows Me....
Standard User ukhardy07
(knowledge is power) Sun 28-May-17 13:34:58
Print Post

Re: BA IT system down. How? Why?


[re: derekdel] [link to this post]
 
It's a great OS, it doesn't get any of those annoying windows updates anymore. Perfect. wink
Standard User deleted
(deleted) Sun 28-May-17 15:33:18
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
I remember reading somewhere that BA had laid off many experienced IT staff and outsourced some work to India in 2016.

That always goes well doesn't it.
Standard User RobertoS
(elder) Sun 28-May-17 15:48:09
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
Yes but.

2016 was too late as well. The infrastructure I believe was not fit for purpose long before that. It can't have been removed since that happened.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User ukhardy07
(knowledge is power) Sun 28-May-17 18:49:40
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
How are you arriving at these judgements just out of interest?

Is there something you know?
Standard User deleted
(deleted) Sun 28-May-17 18:55:10
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
In reply to a post by RobertoS:
The infrastructure I believe was not fit for purpose long before that. It can't have been removed since that happened.


Well it was working ok until...
BA says that: "The root cause was a power supply issue which our affected our IT systems - we continue to investigate this."


Or are they saying someone pulled the plug on their broadband connection wink

Problem ALL older companies face is that they run old systems and keep cobbling them to new systems, because it is simply to risky and massively expensive to invest in a totally new system to do it all. That will again be out of date in a few years time.
You only have to look at how costing and supply of government IT systems and thee issues that causes. Then look at a Co like BA and them doing the same....

Edited by deleted (Sun 28-May-17 18:59:26)

Standard User deleted
(deleted) Sun 28-May-17 19:24:18
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
Perhaps their accounts dept system forgot to pay the leccy bill and...
Standard User Banger
(eat-sleep-adslguide) Sun 28-May-17 19:31:07
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
My PSU fan is making a noise, maybe it was a similar incident? laugh

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 70634/18326
Standard User Andrue
(eat-sleep-adslguide) Sun 28-May-17 22:18:41
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
In reply to a post by JohnR:
Problem ALL older companies face is that they run old systems and keep cobbling them to new systems, because it is simply to risky and massively expensive to invest in a totally new system to do it all.
It's not just companies that face that dilemma. So do countries. Think of the UK's telecoms network wink

But in this case I think we can blame BA for penny pinching.

---
Andrue Cope
Brackley, UK
Standard User Andrue
(eat-sleep-adslguide) Sun 28-May-17 22:30:06
Print Post

Re: BA IT system down. How? Why?


[re: billford] [link to this post]
 
In reply to a post by billford:
the NHS (amongst others) was clearly the victim of an attack
I disagree about that. The word 'attack' implies targeted action for which there is no evidence and which, given the different organisations that suffered, seems unlikely. The affected organisations had similar vulnerabilities which resulted in similar experiences. I don't believe that the person or people that released that Malware had any specific intention or expectation that the NHS would fall victim. In fact they probably would have preferred that such a high profile institution was not the target. Blackmailers prefer to operate in the dark where few people know about them.

---
Andrue Cope
Brackley, UK
Standard User RobertoS
(elder) Sun 28-May-17 23:20:15
Print Post

Re: BA IT system down. How? Why?


[re: ukhardy07] [link to this post]
 
I assume my OP is too complicated for you.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User RobertoS
(elder) Sun 28-May-17 23:26:19
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
In reply to a post by JohnR:
In reply to a post by RobertoS:
The infrastructure I believe was not fit for purpose long before that. It can't have been removed since that happened.


Well it was working ok until ...
In reply to a post by RobertoS:
I assume my OP is too complicated for you.
Twice so in your case.

You clearly are not "dealing with large mission-critical systems [and] able to suggest how BA have messed up so badly other than through penny-pinching and complete incompetence of their IT management"

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User Banger
(eat-sleep-adslguide) Sun 28-May-17 23:52:46
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
I doubt you will get a reply from a large scale IT Admin on here as it could be their next job. laugh

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 70634/18326
Standard User Oliver341
(eat-sleep-adslguide) Sun 28-May-17 23:58:04
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
From what I gather, a lot of their IT systems have been outsourced to Tata Consultancy Services, an Indian company. Their UK staff are Indians on visas.

http://www.thisismoney.co.uk/money/news/article-3643...

Edit: it was mentioned earlier by someone, but here's the link.

Oliver.

Edited by Oliver341 (Mon 29-May-17 00:01:19)

Standard User RobertoS
(elder) Mon 29-May-17 00:02:27
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
smile
I know of a few on the forums, and no doubt there are others.

Heads should and no doubt will certainly roll, but they will probably those of the people too junior to have affected policy. In my opinion the first reply was spot on.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User Banger
(eat-sleep-adslguide) Mon 29-May-17 00:11:52
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
Sky News were reporting it had been outsourced to India. But no detail was given but one insider had reportedly said the system had been "flakey and unstable" since its inception last year.

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 70634/18326
Standard User Oliver341
(eat-sleep-adslguide) Mon 29-May-17 00:13:47
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Whatever they saved on outsourcing IT to India they'll probably pay ten times over in compensation and loss of reputation.

Oliver.
Standard User RobertoS
(elder) Mon 29-May-17 00:23:42
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
Yes, but that is about staffing and where the maintenance and modernisation is done.

That was in mid-2016. But the hardware is the problem. As I said in my OP, fifteen years ago major companies had moved on from the danger of single-point failure and started to cater for it. For the whole global passenger and baggage control systems of BA to simply stop because of any kind of failure at Heathrow is unbelievable.

The technology is available to prevent it, far better and cheaper than less capable hardware and switching software was in those days.

Even worse, it will take years for them to upgrade the infrastructure to stop it happening. It isn't simple, but should have been in place a decade or so ago.

I've just googled "Distributed databases", that being the term used over fifteen years ago when the necessary software was first being developed. I can see from this result that things have progressed far beyond my knowledge of the subject. But it shows the sort of resilient architecture they should have had in place. One of the "Pros" it gives - "Single-site failure does not affect performance of system".

Heathrow would appear not to be "a single site", but the single hub. As I quote in my OP.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User micksharpe
(legend) Mon 29-May-17 00:29:12
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Interesting post on PPRuNe from someone who claims to be involved:

As both SLF (derogatory as that title is) and a highly experienced IT leader (biased towards Infrastructure) and someone who spent nearly 10 hours yesterday in T5 , I feel I have something to contribute.

First off, let's not confuse DR with BCP, although both failed yesterday.

For example while IT wherever were toiling over bringing systems back online , the CW/CE/First queues, that were right out of the terminal, were being "organised" by 2 women who were effectively herding cats. They were on a hiding to nothing as people were joining any one and then losing it when the staff come back round again 20 minutes later telling them to go and join the mega queue at WT. Not enough staff and definitely no sign of Managment at all. This got better during the afternoon, but still no sign of any Senior Staff at all. Even this morning they were trying to get us on a flight as my wife received a text to say it was cancelled but nothing showed on their system. Where were the managers, nowhere to be seen, as they "were in meetings". Maybe those meetings should have been through the night so everyone could be briefed for 0430.
On a more serious note we were told by staff they couldn't find any megaphones to replace the non working PA. I would suggest that these should be easy to find in case of a real emergency.

As for IT, outsourcing is not something I would advocate, but when it has crossed my path, I would never allow a system to go live without:
Rigorous functional testing of system
Rigorous DR Testing
Sign off of all infrastructure designs from someone qualified to do so and counter sign it myself.

The outsourcer should not have unrestricted responsibility for design of something thousands of miles away that isn't theirs. This also makes it easy to swap supplier should they prove to be sub par, which they will.
I guarantee someone within BA has signed that design off as suitable, and that's where heads should roll initially. Then look at your "partner"

Also all the previous posts regarding bean counters are a given as well. Scourge of IT !
On a personal note I'm not actually buying the power excuse but as we don't like to speculate within these halls I'll keep my opinion to myself. I will say however all the systems affected were internet facing.

Anyway, got all that off my chest, and resigned to go back to work on Tuesday instead of enjoying a few cold ones on the Greek coastline !
Standard User Oliver341
(eat-sleep-adslguide) Mon 29-May-17 00:29:53
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
I agree, it's unbelievable. Maybe Tata convinced BA they could cut their IT costs by decommissioning "unnecessary" redundant systems, because clearly their redundant systems, or the management of them, was totally inadequate.

We'll never know what the scope of this cost-cutting exercise was and what corners were cut, because BA will never admit to it.

Oliver.
Standard User ukhardy07
(knowledge is power) Mon 29-May-17 01:14:20
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
In reply to a post by RobertoS:
I assume my OP is too complicated for you.
Not so, just interested given who some of my day to day clients are. Bearing in my I work for an IT security firm and I do internal / external penetration testing, I also work with an IT Security Audit team who often audit the same clients I do technical testing for. All I'm saying is when it comes to some of these lager organisations I certainly can attest to their network topology / OS / security strategy / legacy IT etc.

What you claim to have experienced in your day and what I see today seem worlds apart. When you take a large organisation often times the risk rankings of systems is not fully understood, & nor is the risk appetite of an organisation. It is very common to accept a risk, formally record it, & hand the management of IT to a third party who maintains this "risk register." It is also very common for agreements with third parties to have no clear remediation timelines for vulnerabilities etc and hence OS sit unpatched for long periods of time. I'm not saying it is ideal, although commonly we see IT being outsourced and the vendor governance being poor. Often upper management cannot even attest to the line between their own ITs responsibilities and the third parties & it is a cat and mouse game of who is responsible for basic things.

When you account for all of this mismanagement & political madness, the last thing on anyone's mind is getting every system prioritised and the critical ones fully backed up / replicated sites etc. Even if they try, they'll figure out a tonne of legacy infrastructure exists which needs upgrading to proceed with the project, that'll take a year to figure out, then the business will change their requirements mid project and the cycle continues of nothing getting done for years on end.

If it wasn't like this in your days, it certainly is now.

Edited by ukhardy07 (Mon 29-May-17 01:22:10)

Standard User RobertoS
(elder) Mon 29-May-17 01:28:07
Print Post

Re: BA IT system down. How? Why?


[re: micksharpe] [link to this post]
 
Thanks for that link Mick. A lot if good stuff that i've been reading fir the last 10-15 minutes. Very much along the lines of my opening post here. I'll possibly read more later today. Sleep calls now smile.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User RobertoS
(elder) Mon 29-May-17 01:29:32
Print Post

Re: BA IT system down. How? Why?


[re: Oliver341] [link to this post]
 
In reply to a post by Oliver341:
I agree, it's unbelievable. Maybe Tata convinced BA they could cut their IT costs by decommissioning "unnecessary" redundant systems, because clearly their redundant systems, or the management of them, was totally inadequate.

We'll never know what the scope of this cost-cutting exercise was and what corners were cut, because BA will never admit to it.
You make a couple of good points there Oliver.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User RobertoS
(elder) Mon 29-May-17 02:15:12
Print Post

Re: BA IT system down. How? Why?


[re: ukhardy07] [link to this post]
 
All very true, but also just confirms how unacceptable this BA fiasco is.

The principles were the same in my thirty-odd years, but the discipline wrt software was that anybody/any team can write a system to do the required task and demonstrate it working with test data. Only the good snd well-trained treated that as 10% of the work needed before sending it live.

The other 90% was programming to cater for all possible forecastable errors or failures, and any unexpected events, were handled gracefully. That being written into the system and programming specs, and coded in as each routine was written.

Similarly at least two thirds of the testing was of these.

Senior company management were cutting out the budget for that vital 90% by the end of my first ten years. It has been one of the major cost savings that were so easily made.

The security aspects that you deal with of course didn't exist in the early days, so we didn't need to think about them. Those came later, first with mainframe dumb terminals and later PCs with floppy disk drives. They added an increasing complexity, as we all know now we have rampant hacking, phishing and the rest. Keeping you in employment smile.

This BA failure though is, we are told, down to what is apparently the one and only central datacentre going down. Whether it was a power failure, a piece of kit causing it, or we are being lied to and it was a software bug or some other cause is not really the point. There had to be redundant mirror hardware or similar at at least one other site, and regular systematic testing of hardware systems, sub-systems or comms failure.

As per my link to distributed databases.

BA has, (or had until the LSE opens on Tuesday!), a market capitalisation of £12.9bn and �22.5bn revenues in 2016. (Sorry for mixing currencies, in bed holding an iPad up in one hand and typing with one finger).

It can fund resilient systems.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63018/13016Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User deleted
(deleted) Mon 29-May-17 09:32:08
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Morning Bob

Agreed generally about the differences regarding writing a program, then testing it.

As well as trying to forecast problems, quite a bit of my time was spent writing programs to provide data, etc, for testing the main program.

Some of those test programs could prove so useful, that I would end up producing a general user version, with its own trials before releasing to the end-user.
Standard User deleted
(deleted) Mon 29-May-17 10:22:32
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
In reply to a post by RobertoS:
Whether it was a power failure, a piece of kit causing it, or we are being lied to and it was a software bug or some other cause is not really the point. There had to be redundant mirror hardware or similar at at least one other site, and regular systematic testing of hardware systems, sub-systems or comms failure.


This is the problem in a nutshell, and is the reason why no one from BA management will give an interview

The creation of mirror hardware etc will cost mega, against all the current financial decisions. It will also take time and the hiring of specialists many of whom have been dumped by BA over the years and may be most unlikely to join again

Anyone seen or heard from Willie Walsh lately?

"Deputy heads must roll!" as the saying goes
Standard User caffn8me
(eat-sleep-adslguide) Mon 29-May-17 14:41:30
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
In reply to a post by 961a:
Perhaps their accounts dept system forgot to pay the leccy bill and...
It happens.

Quite a few years ago, a certain unnamed county ambulance service had transferred from being under the wing of the local area health authority to standing on its own.

One day, their radio systems went down across much of the county. A bit of poking around showed that their main repeater station at a large mast site wasn't working due to a power cut.

Their communications officer raced up their in their support Land Rover towing a generator to restore supplies and arrived to discover that the electricity board had disconnected the supply because the bill hadn't been paid. It seems that nobody had transferred the account.

Having said that, I don't really believe the power supply story in BA's case.

Sarah

--
If I can't drink my bowl of coffee three times daily, then in my torment, I will shrivel up like a piece of roast goat

Spiders on coffee - Badass spiders on drugs
Standard User 69bertie
(member) Mon 29-May-17 15:24:12
Print Post

Re: BA IT system down. How? Why?


[re: caffn8me] [link to this post]
 
Oliver341
Whatever they saved on outsourcing IT to India they'll probably pay ten times over in compensation and loss of reputation.


Doubtful, probably insured for such events, so minimal loss.

As to running Windows ME, the company I work for still runs equipment on Windows 95, OS2 and even well before that. True, no updates, no internet. Heaven, very rarely goes wrong! One reason the equipment still exists. And when it does, usually hardware related. Runs 24 hours a day. Only problem is getting the said hardware repaired.

Standard User Chrysalis
(legend) Mon 29-May-17 16:22:24
Print Post

Re: BA IT system down. How? Why?


[re: 69bertie] [link to this post]
 
I can only speculate as most people are.

But I wouldnt be surprised if this could have been avoided but wasnt due to accounting.

In a business, generally IT security and IT redundancy are seen as wasted expenditure, because these decisions get made during periods of good times when there is no public breach and the redundancy isnt needed. After events such as this companies may then react and spend money but then revert the decision some time later thinking its an easy cutback to help profit.

I cannot name the company but I have personally done work for a company where they had a marketing budget of 200 million, but was refusing requests for 500 euros to add redundancy to the server setup. Its all about their priorities, in their eyes, marketing adds customers and as such adds value to the company whilst money on infrastructure does not (unless of course everything goes down then they change their view).

Sky Fibre Pro BQM - IPv4 BQM - IPv6
Standard User MHC
(sensei) Mon 29-May-17 16:34:29
Print Post

Re: BA IT system down. How? Why?


[re: 69bertie] [link to this post]
 
In reply to a post by 69bertie:
Doubtful, probably insured for such events, so minimal loss.


Airlines of "self insure" for a lot of their risks - they know that there will incidents but the cost could be high and as they know the frequency of them, can accept the risk. This did, at one time, and may still do, the insuring of an airframe.

BT did at one time, self insure parts of their vehicle fleet is another example

In reply to a post by 69bertie:
As to running Windows ME, the company I work for still runs equipment on Windows 95, OS2 and even well before that. True, no updates,


I will not say where, but there are still some instances of Win3.11 installed and operational on aircraft.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

M H C


taurus excreta cerebrum vincit
Standard User Banger
(eat-sleep-adslguide) Tue 30-May-17 21:03:59
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
http://www.telegraph.co.uk/business/2017/05/30/exclu...

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 69892/17901
Standard User RobertoS
(elder) Tue 30-May-17 21:47:18
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Thanks Tim smile. Interesting but!

I'm not sure it excuses it.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63679/13080Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User deleted
(deleted) Tue 30-May-17 22:17:51
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
OK, I confess. In somewhat distant past, I have worked for BA as a contract programmer, on and off for several years.
BA's own managment was of poor quality and little understanding of IT, the whole place was run by few 'Old Techies' who have been there for yonks and knew the systems.
They would be retired now.
We had few contractors sourced from India (before Tata), they were very good at organising meetings and long discussions. As for dooing the work..., nah..

I have seen reports from 'Power Back Up' specialists and they don't recognise the BA reported scenario.
The truth will out, eventually....!
Standard User Michael_Chare
(fountain of knowledge) Wed 31-May-17 00:00:33
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Given the importance of these systems to BA's operations, I would have thought that they should have been able to fall back to a duplicate system at another location. No suggestion in the DT article that this is possible.

Michael Chare
Standard User ian72
(eat-sleep-adslguide) Wed 31-May-17 08:30:19
Print Post

Re: BA IT system down. How? Why?


[re: Michael_Chare] [link to this post]
 
The article does state there is a secondary data centre that took up "some of the slack". Given BA's reliance on IT I would have thought the secondary data centre should be scaled to take up all of the operations in the event of a failure of the primary - that does not appear to be the case if the article is correct.

Also, if it isn't capable of taking all of the load then prioritising systems that would keep core airline operations up and running should have been key - it appears that everything died when they should have been able to keep up a percentage of key services.

Edited by ian72 (Wed 31-May-17 08:36:47)

Standard User richi
(regular) Thu 01-Jun-17 11:04:17
Print Post

Re: BA IT system down. How? Why?


[re: ian72] [link to this post]
 
In reply to a post by ian72:
The article does state there is a secondary data centre that took up "some of the slack"
The whispers I've heard said that, when the failover data center came up, they discovered corrupt data, because the replication hadn't been working properly. So they couldn't use the secondary (known as Comet House).

In essence, all this talk about power surges is a smokescreen to cover up the fact that BA's DR strategy failed.

3 km line on THTG: 18/1.2 Mb/s with Sky
Previously: BT ISDN, Nildram, Plusnet, 186k, EFH, Be*, Plusnet (again), Pulse8
Standard User deleted
(deleted) Thu 01-Jun-17 11:39:55
Print Post

Re: BA IT system down. How? Why?


[re: richi] [link to this post]
 
Perhaps Macrium reflect would help them? It updates my system daily and sends me an e-mail if the backup is not successful
Standard User deleted
(deleted) Thu 01-Jun-17 11:48:30
Print Post

Re: BA IT system down. How? Why?


[re: richi] [link to this post]
 
Presumably they didn't test the replication was working due to cost constraints?
Standard User ian72
(eat-sleep-adslguide) Thu 01-Jun-17 11:50:11
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
They should be doing periodic DR tests - but if the data corruption happened after the last test then it may not have been a simple thing to spot. I keep telling people that it doesn't matter how much redundancy you put in you still have to account for the fact the IT may be unavailable for a protracted period of time and so have to have business continuity plans in place to know what to do if that happened.
Standard User RobertoS
(elder) Thu 01-Jun-17 12:27:26
Print Post

Re: BA IT system down. How? Why?


[re: ian72] [link to this post]
 
At this level it should not be about replicating files and then bringing the remote server in to replace the other after the world-wide system has gone down. It should simply be an automatic re-routing.

In principle similar to hot-swapping drives at a local level, but obviously with a great deal more complexity.

The failing of the Heathrow hub should not have been visible to the outside world at all.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63679/13080Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User ian72
(eat-sleep-adslguide) Thu 01-Jun-17 13:53:38
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Yes, it should be a replica of the hardware with near real time data sync between the file systems (and those file systems are likely to be on an enterprise grade SAN or similar). A failure at the main site would then automatically fail over to the secondary site with a delay measured in the milliseconds. However, if the data is corrupt at the secondary site for some unknown reason then the systems could fail catastrophically - this is what is currently being posited by some people.

It seems to have been a chain of failures (none of which should have happened) that has resulted in an outage that the resilience should have ensured didn't happen.

Whilst this shouldn't happen it seems I am seeing it more and more at the moment - the root causes are different but the result is the same, loss of critical services for days.
Standard User RobertoS
(elder) Thu 01-Jun-17 14:53:29
Print Post

Re: BA IT system down. How? Why?


[re: ian72] [link to this post]
 
For the replication to be duff means the system was not fit for purpose. Plus as per my opening post, before even this level of information was released:
Surely something that could affect BA worldwide should have at least two if not more mirrored systems/hubs geographically far apart? With dedicated links with minimal latency. It was possible in my day, so what and why and how can this have happened?
I agree with
In reply to a post by ian72:
The article does state there is a secondary data centre that took up "some of the slack". Given BA's reliance on IT I would have thought the secondary data centre should be scaled to take up all of the operations in the event of a failure of the primary - that does not appear to be the case if the article is correct.

Also, if it isn't capable of taking all of the load then prioritising systems that would keep core airline operations up and running should have been key - it appears that everything died when they should have been able to keep up a percentage of key services.
in some ways, but not in others. In particular, from the article:
Under normal circumstances, power would have been returned to the servers in Boadicea House slowly, allowing the airline�s other Heathrow data centre, at Comet House, to take up some of the slack.

But, on Saturday morning, just minutes after the UPS went down, power was resumed in what one source described as �uncontrolled fashion.� �It should have been gradual,� the source went on.

This caused �catastrophic physical damage� to BA�s servers, which contain everything from customer and crew information to operational details and flight paths. No data is however understood to have been lost or compromised as a result of the incident.

BA�s technology team spent the weekend rebuilding the servers, allowing the airline to return to normal operations as of today.

Sources close to the airline indicated that had the power been restored more gradually, BA would have been able to cope with the outage, and return services far more quickly than was the case.
There is no suggestion there that the backup system was in a position to take over any functions instantaneously. It could well have been purely running remote disc mirroring.

As for the Heathrow system coming straight back up so wrecking everything. What?

The point is that there should have been no downtime at all to the international online systems. We aren't taking about the Sainsbury's national network, which could legitimately work with a central failure, with tills and stock control running happily on the instore systems.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63679/13080Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User Banger
(eat-sleep-adslguide) Fri 02-Jun-17 20:41:53
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
Latest headline from the Independent and Telegraph is that an "IT worker switched off the power supply to the server". laugh

Tim
www.uno.net.uk & freenetname
Asus DSL-N55U and TP-Link WD9970 on 80 Meg LLU Fibre
http://www.thinkbroadband.com/speedtest/results.html...

Current Sync: 69892/17901
Standard User oldswan
(learned) Fri 02-Jun-17 21:57:15
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Sounds like a Specsavers advert doesn't it? I wonder if they will have the cheek to make one about it?
Standard User RobertoS
(elder) Fri 02-Jun-17 22:36:03
Print Post

Re: BA IT system down. How? Why?


[re: Banger] [link to this post]
 
Yes It was in The Times today.

Quite ridiculous. The contractor must have stopped the feed from the UPS system to the processor and storage modules. That should simply not be possible.

Turning off the mains supply to the UPS system would not have had any effect for a considerable time, during which alarms would have been going off all over the place.

Or should have been.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63679/13080Kbps @ 600m. BQMs - IPv4 & IPv6

Edited by RobertoS (Fri 02-Jun-17 22:36:48)

Standard User deleted
(deleted) Mon 05-Jun-17 20:06:36
Print Post

Re: BA IT system down. How? Why?


[re: RobertoS] [link to this post]
 
an electrical engineer disconnected the uninterruptible power supply which shut down BA's data centre.
This resulted in the total immediate loss of power to the facility, bypassing the backup generators and batteries... After a few minutes of this shutdown, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the systems and significantly exacerbated the problem."


So you can have the best systems in the world.....

But you can not legislate for stupid humans smile
Standard User MC31
(member) Mon 05-Jun-17 20:33:22
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
But was it the best system in the world ? dont sound like it to me.

these comments are my own and in no way represent any company that i may or may not be linked too.
Standard User deleted
(deleted) Mon 05-Jun-17 20:43:54
Print Post

Re: BA IT system down. How? Why?


[re: MC31] [link to this post]
 
In reply to a post by MC31:
But was it the best system in the world ? dont sound like it to me.


Didn't say it was.... Only said you CAN have... But nothing can stop someone who should know better.
Standard User MC31
(member) Mon 05-Jun-17 20:50:53
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
Fair point..

these comments are my own and in no way represent any company that i may or may not be linked too.
Standard User billford
(elder) Mon 05-Jun-17 20:58:32
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
In reply to a post by JohnR:
But nothing can stop someone who should know better.
Back in my early days as a design engineer I was told that it's not too hard to make something foolproof... if you try really hard you might even be able to make it idiot-proof.

But you can't make it proof against a determined idiot crazy

Bill
A level playing field is level in both directions.

_______________________________________Planes and Boats and ... ______________BQMs: IPv4 IPv6
Standard User RobertoS
(elder) Mon 05-Jun-17 21:41:52
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
Confirming my suggestion smile.

Edited by RobertoS (Mon 05-Jun-17 21:49:30)

Standard User RobertoS
(elder) Mon 05-Jun-17 21:51:55
Print Post

Re: BA IT system down. How? Why?


[re: MC31] [link to this post]
 
Well it certainly should not have caused a crash at another data centre or at any connected locations.

That resilience is available. It just wasn't built into the system.

My broadband basic info/help site - www.robertos.me.uk. Domains, site and mail hosting - Tsohost.
Connection - AAISP Home::1 80/20. Sync 63679/13080Kbps @ 600m. BQMs - IPv4 & IPv6
Standard User cheshire_man
(eat-sleep-adslguide) Mon 05-Jun-17 21:52:19
Print Post

Re: BA IT system down. How? Why?


[re: billford] [link to this post]
 
In reply to a post by billford:
In reply to a post by JohnR:
But nothing can stop someone who should know better.
Back in my early days as a design engineer I was told that it's not too hard to make something foolproof... if you try really hard you might even be able to make it idiot-proof.

But you can't make it proof against a determined idiot crazy
And then there are salesmen...

Tony
Happily running Windows 10 Pro on both desktop and laptop
We have more and more laws, and less and less enforcement
Standard User billford
(elder) Mon 05-Jun-17 22:01:09
Print Post

Re: BA IT system down. How? Why?


[re: cheshire_man] [link to this post]
 
laugh

Bill
A level playing field is level in both directions.

_______________________________________Planes and Boats and ... ______________BQMs: IPv4 IPv6
Standard User MHC
(sensei) Mon 05-Jun-17 23:05:05
Print Post

Re: BA IT system down. How? Why?


[re: deleted] [link to this post]
 
I wonder how many heads will roll for lying ... start at the top and work their way down.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

M H C


taurus excreta cerebrum vincit
Standard User deleted
(deleted) Tue 06-Jun-17 09:33:17
Print Post

Re: BA IT system down. How? Why?


[re: MHC] [link to this post]
 
Unfortunately I think it will be start at the bottom and work up until it approaches you.
Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> (show all)   Print Thread

Jump to