User comments on ISPs
>> AAISP

Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | (show all)

Print Thread

perlen
(newbie) Wed 10-Jan-24 22:27:11

Poor uptime and reliability

[link to this post]

Is anyone else getting bored of the constant blips and glitches with FTTP AAISP?

Lines on the z.witless LNS have dropped at least 8 or 9 times in the last two months since I joined:
https://aastatus.net/recent.cgi

I had less interuption in two years of el cheapo FTTC with TalkTalk.

Unfortunately I am in month 2 of a 12 month contract... less than ideal, and certainly not business class broadband.

Edited by perlen (Wed 10-Jan-24 23:17:51)

E300
(committed) Thu 11-Jan-24 09:25:10

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Yes its not gone unnoticed here either, think I managed 35 days up-time until work to update LNS routers started again this year. The issues are well explained on their service status page which goes some way to me being more accepting of the problems.

The drops over night don't worry me as I'm usually in bed and luckily everything comes back up without issue by itself. However, there have been a couple of random crashes last year during the day which was a bit more disruptive, and I think that is the problem they are still trying to fix.

I notice they've reported another random crash on the new firmware caused by a different bug, so presumably we will be having the drops and shuffle to different LNS's for more updates in the coming weeks, plus due to BT wanting people moved from a connection caused more drops. So its been a bit of a week of ups and downs, its not usually as bad as this. Looking at my BQM the drop overnight barely registered, I've had more packet loss with other ISPs just because they run congested.

I understand what you mean by expecting something better, it has been a bit flaky of late.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Thu 11-Jan-24 09:33:48)

njh
(newbie) Thu 11-Jan-24 09:36:12

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I am on a TalkTalk Wholesale line and my line was down for 4 hours last night:
https://aastatus.net/42599

Not the biggest deal, as I was fast asleep, but I did have various things complain about lack of the internet overnight. I do wonder if someone / something should have noticed faster than 4 hours...

Rhynchelma
(member) Thu 11-Jan-24 09:52:11

Re: Poor uptime and reliability

[re: njh] [link to this post]

TalkTalk have said that this was them.

AAISP do not have 24/7 staffing as I recall.

But the other stuff is irritating.

Not what you would expect.

Edited by Rhynchelma (Thu 11-Jan-24 09:56:10)

perlen
(newbie) Thu 11-Jan-24 10:42:15

Re: Poor uptime and reliability

[re: Rhynchelma] [link to this post]

Another LOS this morning:

Z.Witless
Jan 11, 09:20 AM

The Z.Witless LNS restarted (again) this morning at 09:20 causing pp drops for customers connected to it. Investigations underway.
Due to the various different problems we've had with Z.Witless we will take this unit out of service and replace it.

jimbof
(committed) Thu 11-Jan-24 21:14:43

Re: Poor uptime and reliability

[re: perlen] [link to this post]

That's a shame. I was lucky I guess, I moved off about 4 months ago. The as-yet unreleased status of the LNS devices used for the high speed FTTP services did make me a bit nervous when I signed up, but it never proved to be an issue in the time I was with AAISP. It does seem like they've had an unusually large number of gremlins in recent time. Fingers crossed for swift resolution.

perlen
(newbie) Thu 11-Jan-24 22:00:18

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

Issues...

Edited by perlen (Thu 11-Jan-24 22:02:44)

XGS_Is_On
(committed) Fri 12-Jan-24 07:59:53

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

They're released now, though the odds of anyone other than A&A using them as an LNS seem low regardless of stability issues. Very much made with them in mind.

Edited by XGS_Is_On (Fri 12-Jan-24 08:04:26)

jimbof
(committed) Fri 12-Jan-24 09:00:20

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

Ah ok, must be relatively recent. I thought they were still unreleased when I migrated off.

The spec always seemed a bit odd to me (only 2x 10G ports) given that Openreach are now selling >Gbit services.

XGS_Is_On
(committed) Fri 12-Jan-24 13:00:51

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

In reply to a post by jimbof:
The spec always seemed a bit odd to me (only 2x 10G ports) given that Openreach are now selling >Gbit services.

The 2 10G ports seem to share 10G backplane just FYI - they can't go to 10G in and out there are 2 for resilience.

List price £7,500 + VAT.

E300
(committed) Fri 12-Jan-24 13:18:53

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Looks like another drop, just cut off a video call. Not good.

AAISP BQM - IPv6 BQM - IPv4

jimbof
(committed) Fri 12-Jan-24 13:21:52

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

Even worse than I thought...

perlen
(newbie) Fri 12-Jan-24 17:35:37

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

Z.Witless is now in service and on new hardware - I was moved back on to it at 16:33 today.

Unfortunately base latency (was 9ms now 11ms) has increased by 2ms frown

perlen
(newbie) Fri 12-Jan-24 17:43:41

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Y.Witless crashed today also...

INITIAL
4¼ hours ago by Andrew
At 13:16 Customers on Y.Witless dropped and reconnected a few minutes later. The cause is being investigated as a matter of high priority.

UPDATE
55¾ minutes ago by Andrew
Y.Witless crashed again at 16:30.

RESOLUTION
28 minutes ago by Andrew
We've updated the main post regarding these drops: https://aastatus.net/42577

------------------

UPDATE
28¾ minutes ago by Andrew
An update of where we are (Friday 12th January).

Some customers have had interruption to their service this week as we have seen a number of crashes on both Z.Witless and Y.Witless.

Today we replaced the hardware of Z.Witless.

Our developers have been working on investigating each crash we have. We have been saying in recent updates that progress had been made on the crashes we have seen, and this week we applied the software update to two of our three 'Witless' LNSs. In our test lab we have never seen this updated software crash during 3 weeks of testing. However, we have had crashes this week since applying the updated software.

Usually with a crash, our developers are sent a crashlog with details specifying exactly where in the code the crash happened. However, the crashes that have been affecting us are different in that the hardware locks up and restarts - with this type of crash we have less forensic to work with which is making getting to the bottom of the problem that much harder.

We are still working hard to resolve this. We various avenues of investigation to take, and during the next week we will be planning more overnight work as well as datacentre trips.

We know how disruptive this has been for those customers affected, and we are doing all we can to work towards a stable service for everyone.

jpm
(fountain of knowledge) Fri 12-Jan-24 18:11:33

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
Z.Witless is now in service and on new hardware - I was moved back on to it at 16:33 today.

Unfortunately base latency (was 9ms now 11ms) has increased by 2ms

Have they explained why they are doing this sort of work at 5pm instead of scheduling overnight changes?

E300
(committed) Fri 12-Jan-24 18:14:35

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I see the same thing, a 1.5 to 2ms increase in latency if I connect to Z.Witless, I think this is because it is in a different data-centre to X and Y so routing changes.

Also I've found on BT Wholesale that latency can vary by plus or minus 2ms for me, as there is some variation in the back-haul routes, but a few drops of PPP will usually see it come back up on the shorter route.

Sometimes when we get these overnight drops, I can see a 4 or 5ms increase in latency the next morning because I've landed on Z.Witless plus got a longer routing on BT backhaul, which just doesn't feel right going in the wrong direction even though it doesn't make any noticeable difference.

The differing latency via BT Wholesale of a couple of ms I've seen with my previous ISPs as well, so just one of things, some sort of load balancing.

AAISP BQM - IPv6 BQM - IPv4

E300
(committed) Fri 12-Jan-24 18:19:11

Re: Poor uptime and reliability

[re: jpm] [link to this post]

In reply to a post by jpm:
Have they explained why they are doing this sort of work at 5pm instead of scheduling overnight changes?

This was another crash of an LNS and so connections fell over to Z.Witless. It's easy to spot these on the blip graph https://aastatus.net/index.cgi#blip if it was a controlled move over, we would see a red blip below the green blip, but because the LNS has just crashed, it doesn't update the graph with any disconnections, only re-connections show up.

So that's two crashes today a few hours apart.

At least they are open about the issues and we know what's going on, and so as customers we aren't thinking is it our kit or wasting time rebooting our own routers etc.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Fri 12-Jan-24 18:32:58)

perlen
(newbie) Fri 26-Jan-24 08:18:50

Re: Poor uptime and reliability

[re: E300] [link to this post]

And another:

Jan 26, 07:00 AM
https://aastatus.net/42612

6AM: Z.Witless LNS had a hardware lock-up, causing lines on it to drop and reconnect

E300
(committed) Fri 26-Jan-24 09:31:44

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
And another:

Jan 26, 07:00 AM
https://aastatus.net/42612

6AM: Z.Witless LNS had a hardware lock-up, causing lines on it to drop and reconnect

I had a drop overnight due to a "Lost Carrier" which suggest it was BT work then as I'm on X.Witless. Z.Witless I think is new hardware now with extra debug logging so having that crash on Z.Witless might be good news in a way, as they may find out what is causing it.

AAISP BQM - IPv6 BQM - IPv4

serichards
(regular) Fri 26-Jan-24 10:13:13

Re: Poor uptime and reliability

[re: E300] [link to this post]

I had a lost carrier just before 1am. Couple of minutes outage.

I'm on gormless then aimless judging by the traceroute.

I do like their naming scheme. I wonder if they have a feckless?!

perlen
(newbie) Fri 26-Jan-24 22:46:38

Re: Poor uptime and reliability

[re: serichards] [link to this post]

Second time today:

AFFECTING
Z.Witless
STARTED
Jan 26, 10:30 PM
https://aastatus.net/42613

Z.Witless hardware locked-up at 10:30 this evening, causing those line on it to drop an reconnect.

I am getting a bit sick of all these drops, could I get out of my remaining 10 months contract do you think?

E300
(committed) Sat 27-Jan-24 09:52:11

Re: Poor uptime and reliability

[re: perlen] [link to this post]

It is unfortunate you seem to be on Z.Witless which appears to be affected more so than the other two LNS's, although that could just be a quirk of randomness and then us seeing a pattern that gets disproved over time. It's also bad luck you've joined just when these problems started and so have not known anything better.

I would suggest getting in touch with their technical support and raising a ticket, nothing will happen otherwise. I'm not sure 'legally' this would be enough to get you out of the contract, as all services can have problems and time needs to be allowed for companies to rectify them plus the issue is a blip rather than a long lasting outage, and these services come with no SLAs or guarantees of up time. Whether as a goodwill gesture they would let you leave early is of course only something they can tell you.

It has been a bit disappointing of late, but A&A are probably more disappointed than we are, and at least we know what is happening and are kept up to date.

AAISP BQM - IPv6 BQM - IPv4

perlen
(newbie) Sat 03-Feb-24 15:32:38

Re: Poor uptime and reliability

[re: E300] [link to this post]

Just had another unexpected outage, I had to reboot my router.
My connection now terminates with u.gormless.thn.aa.net.uk which unfortunately seems very laggy.

perlen
(newbie) Sat 03-Feb-24 17:01:27

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Just had another outage, when it was back up I logged into the control pages.

A pin has been added: "LNS Kill requested by andrew"
The status page https://aastatus.net/42617 explains:

INITIAL
3¼ hours ago by Andrew
At 14:00 The Y.Witless LNS locked up, causing customers connected to it drop and reconnect a few minutes later.

UPDATE
31 minutes ago by Andrew
Most lines reconnected by 14:11

UPDATE
29¼ minutes ago by Andrew
Some customers had re-connected to the "U.Gormless" LNS - which doesn't have as much throughput capacity as the Witless LNSs - in order to ease congestion we will manually force some customers to move off U.Gormless by way of a PPP kill - this will force the customer's router to reconnect causing a short outage (typically less than a minute)

RESOLUTION
6½ minutes ago by Andrew
The lock-up of Y.Witless was unfortunate as it did cause a disruption to some our customers this afternoon and we had hoped that the work done a week ago to Y.Witless would have helped prevent this hang. However, Y.Witless is out of service and in it's locked state, where our developers can connect to its CPUs and see if they can gain more information.

I am now back on z.witless.thn.aa.net.uk

Edited by perlen (Sat 03-Feb-24 17:29:51)

jpm
(fountain of knowledge) Sat 03-Feb-24 21:56:27

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I'd probably be off at this point. A nice idea to run your own hardware but it seems like it's not working as planned.

perlen
(newbie) Mon 05-Feb-24 17:37:24

Re: Poor uptime and reliability

[re: jpm] [link to this post]

And it goes down again!
Right in the middle of a Teams meeting the Mrs was in (WFH)

REFERENCE
42618 / AA42618
PERMALINK
https://aastatus.net/42618
INFORMATION
At around 17:20 the Z.Witless LNS hung, causing customers on it to drop and reconnect.

Edited by perlen (Mon 05-Feb-24 17:37:44)

andrewhearn
(isp) Mon 05-Feb-24 18:03:35

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Sorry Perlen, and others affected by this.

The post https://aastatus.net/42608 has more information about the problem.

Due to the work we did last week we have been able to gain more information about today's (and Saturday's) lockup than we have been able to in the past. This low-level data comes from the CPU, memory and other hardware on the system whilst it's in the 'hung' state. This is being analysed and is providing clues, but work analysing this is still ongoing.

Andrew Hearn
GM, A&A
aa.net.uk [email protected] 033 33 400 999

The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).

candlerb
(knowledge is power) Mon 05-Feb-24 20:16:13

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

"We are not only an Internet Service Provider. We also design and build our own routers under the FireBrick brand."

I do wish you success in fixing this. Although I'm not an AAISP customer, it does sound like the brand risks becoming badly tainted by this saga.

I wonder if the older generation Firebricks are still around? Could these be used for live customer traffic, whilst the newer generation are used with an opt-in pool of beta testers?

E300
(committed) Tue 06-Feb-24 08:42:32

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
I wonder if the older generation Firebricks are still around? Could these be used for live customer traffic, whilst the newer generation are used with an opt-in pool of beta testers?

As I understand it the older generation Firebricks are still in use but for slower customers (<80Meg). When I first joined I found myself connecting to the older LNS's (Gormless) even though I was a 1000/100 customer. How did I know? Speeds were only up to around 300Meg. This was resolved very quickly by contacting them and a reconnect saw me on the new LNS's and speeds as expected. So it seems to me the older kit struggles with faster connections and so can't be used as a fallback for faster services.

They are now upgrading some of the older LNS's to the newer ones, they said the idea being a drop of one would not affect as many customers. I'm not sure that logic works though as if the LNS's are all equally as prone to locking up, then it doesn't matter how many of them there are, lockups will affect the same number of customers, just in smaller batches over more bits of kit. Perhaps they need the extra capacity as while they debug locked up boxes they remain out of use.

I'm sure it will all be sorted out soon now they have some debug data from the locked up boxes. Be interesting to find out the cause.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Tue 06-Feb-24 08:44:09)

bellerby
(newbie) Tue 06-Feb-24 09:56:31

Re: Poor uptime and reliability

[re: E300] [link to this post]

The older generation firebricks have a 1Gb backplane and core connection, hence no good for the faster connections, whereas the latest firebrick 9000 has a 10Gb backplane and core connection. What I find puzzling is that of the 3 original witless 9000,s (X,Y,Z), whilst Y and Z have locked up fairly regularly, X (which I happen to be connected to) has an uptime of around 82 days. Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?

E300
(committed) Tue 06-Feb-24 10:27:29

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

In reply to a post by bellerby:
The older generation firebricks have a 1Gb backplane and core connection, hence no good for the faster connections, whereas the latest firebrick 9000 has a 10Gb backplane and core connection. What I find puzzling is that of the 3 original witless 9000,s (X,Y,Z), whilst Y and Z have locked up fairly regularly, X (which I happen to be connected to) has an uptime of around 82 days. Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?

I'm now on X as well which also gets me the lowest latency, hopefully there I remain smile

Yes I would think if all else is the same it seems to suggest some low level hardware issue, which could be anything of course. I'm sure they've compared the differences between X and the others to try and pinpoint what might be up. X might well have the same issue, but differences in component tolerances might just be enough to keep it far enough away from whatever 'cliff edge' the other hardware falls over.

I'm glad I'm not the person having to sort it out!

AAISP BQM - IPv6 BQM - IPv4

j0hn83
(knowledge is power) Tue 06-Feb-24 11:36:25

Re: Poor uptime and reliability

[re: E300] [link to this post]

They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.

https://aastatus.net/42616

j0hn83
(knowledge is power) Tue 06-Feb-24 11:41:32

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

In reply to a post by bellerby:
Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?

As someone who manufactures Pcb's on a daily basis, I think it's unlikely to be a pcb build issue.
The Firebricks are manufactured on a very small scale so they won't do small test samples from batches. The pcb manufacturer will fully test each board including AOI (automatic optical inspection) and a whole suite of electrical testing (including things like flying probe testing).
Most hardware issues come from components added to the finished pcb.

The Firebrick PCB is nowhere near as complex as some of the other circuit boards made today (smaller tracks, tighter spacing, many more layers, thousands more tiny via's between layers).

I have no idea where Firebrick have their PCB's made but I know they are made in the UK.
Pretty much all UK based PCB manufacturer is low volume, high precision, quick turnaround work. Anything with high volume gets made overseas.

I would put my money on it being a software issue causing them to hang or an issue with a component on the board (chip/ram etc).

candlerb
(knowledge is power) Tue 06-Feb-24 13:02:31

Re: Poor uptime and reliability

[re: j0hn83] [link to this post]

In reply to a post by j0hn83:
They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.

https://aastatus.net/42616

Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?

j0hn83
(knowledge is power) Tue 06-Feb-24 13:11:24

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
In reply to a post by j0hn83:
They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.

https://aastatus.net/42616

Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?

That's certainly how I'm reading it.

Beta testing is usually voluntary but I guess not always.

Edit: don't think it's all customers, but certainly 3 Firebricks worth

On the status page linked further up by Andrew they write

The enlarged pool of LNSs will also reduce the number of customers affected if there is a lock-up of one LNS.

That only makes sense ifthey deploy more FB9000's to the current customers on those devices. Throwing in more customers from the older, stable Firebricks will just mean those customers are now more likely to be affected.

Why not add the extra FB9000's, spreading out the witless LNS load, but leave the stable gormless LNS's as they are? Probably down to the cost of rack space.

I don't intend on lecturing A&A on their network 😂 they are smarter than me. It just seems a bit backwards. Credit for the transparency though.

Edited by j0hn83 (Tue 06-Feb-24 13:24:39)

E300
(committed) Tue 06-Feb-24 13:17:00

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?

As I understand it, they are decommissioning several Gormless LNSs (on the stable hardware) and moving the customers over to other Gormless LNSs still on stable hardware, as they said they have plenty of capacity for slower customers (<80Meg).

I suspect they are doing this to free up rack space at the datacentre to then install a few more newer LNSs. Only customers already needing to connect to the newer LNSs (>80Meg) will connect to these upgraded ones. So no extra customers are using the troublesome new LNSs than they were before. The idea is if an LNS drops, then as there are fewer customers on each LNS then not so many people are affected by a single LNS crash.

I guess you could argue is it a good idea to add more problematic kit into the mix. Also if the new LNSs have the same probability of a crash, you aren't having less customers affected, just smaller batches of customer drop more often.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Tue 06-Feb-24 13:20:26)

bellerby
(newbie) Tue 06-Feb-24 15:42:24

Re: Poor uptime and reliability

[re: E300] [link to this post]

I guess another possibility is that the issue could be load related. The FB9000 was first introduced into the AA network in early 2022. I don't have any recollection of these lock ups being an issue during the first 12 months or so. Since early 2022 I would assume that a lot of the installed base would have migrated from ADSL/VDSL to FTTP. This is probably why A&A are now oversubscribed with the older FBs, and no doubt the load being placed on the 9000s will be inexorably increasing, both in number of connections and total throughput. Possibly the move to expand the pool of 9000s is to rule out this possibility.

cjn
(learned) Tue 06-Feb-24 17:07:59

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Please excuse my intrusion into others' grief, but I have a simple question that may be completely unrelated. I have Fibre BB from another supplier and have no intention of moving it to A&A. However I am considering moving just my current DV line to A&A. Is any of the current trouble likely to have any impact on my phone connection?

candlerb
(knowledge is power) Tue 06-Feb-24 18:24:21

Re: Poor uptime and reliability

[re: cjn] [link to this post]

No, it will be completely affected. The equipment being discussed, known as LNS or BNG, is what terminates broadband connections coming into their network. The traffic between their voice servers and the Internet won't go through these.

andrewhearn
(isp) Tue 06-Feb-24 20:52:40

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
As I understand it, they are decommissioning several Gormless LNSs (on the stable hardware) and moving the customers over to other Gormless LNSs still on stable hardware, as they said they have plenty of capacity for slower customers (<80Meg).

This is correct.

We have 20 or so of the FireBrick FB6000 LNSs which have more than enough capacity for the lower speed customers we have.

Once we've replaced a, b, c, d Gormless with FireBrick FB9000s they will then be put in to service for the faster-speed customers.

This isn't because the existing LNSs are overloaded (we don't believe load is the cause of the lock-ups), but to spread the load so fewer customers are affected.

Happy to answer more question if there are any!

Andrew Hearn
GM, A&A
aa.net.uk [email protected] 033 33 400 999

The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).

andrewhearn
(isp) Tue 06-Feb-24 21:20:29

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
No, it will be completely affected. The equipment being discussed, known as LNS or BNG, is what terminates broadband connections coming into their network. The traffic between their voice servers and the Internet won't go through these.

I think you mean: No, it will be completely unaffected smile

Andrew Hearn
GM, A&A
aa.net.uk [email protected] 033 33 400 999

The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).

bellerby
(newbie) Tue 06-Feb-24 21:32:21

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

Andrew, just to clarify, I wasn’t suggesting the 9000s were overloaded - there would have been obvious performance issues had that been the case. My ponit was simply that a presumably increasing load may have exposed the weakness. Good luck anyway - I’m sre the issue will be reolved.

cjn
(learned) Tue 06-Feb-24 21:36:35

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

Many thanks for your reassurance. OK, back to your regular grievances. smile

E300
(committed) Wed 07-Feb-24 09:19:55

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

In reply to a post by andrewhearn:
This is correct.

We have 20 or so of the FireBrick FB6000 LNSs which have more than enough capacity for the lower speed customers we have.

Once we've replaced a, b, c, d Gormless with FireBrick FB9000s they will then be put in to service for the faster-speed customers.

This isn't because the existing LNSs are overloaded (we don't believe load is the cause of the lock-ups), but to spread the load so fewer customers are affected.

Happy to answer more question if there are any!

Thank you for the update. With regards to X.Witless which appears to be pretty much rock solid, is that different somehow to the other Firebricks and if so has that been able to narrow down the potential issue?

AAISP BQM - IPv6 BQM - IPv4

candlerb
(knowledge is power) Wed 07-Feb-24 10:32:49

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

In reply to a post by andrewhearn:
I think you mean: No, it will be completely unaffected

Err, yes of course! Too late to edit post now though...

farnz
(member) Wed 07-Feb-24 10:35:38

Re: Poor uptime and reliability

[re: j0hn83] [link to this post]

In reply to a post by j0hn83:
In reply to a post by bellerby:
Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?

...snip...
I would put my money on it being a software issue causing them to hang or an issue with a component on the board (chip/ram etc).

Given the symptoms described in the status post, I'd be inclined to suspect something wrong with the pullup/pulldowns on the CONFIG pins in the M.2 socket for an NVMe SSD.

If the CPU reads all 4 CONFIG pins as 1s, it should ignore the rest of the pins in the socket; however, if one of them is marginal, and sometimes drops to a 0, the SoC should generate an interrupt and expect the CPU to change pinmux settings to match the CONFIG pins. I can envisage a few tight timing cases where the pinmux setting change causes the CONFIG pins to change again, and the resulting timings tickle bugs that "can't happen" if your design complies with PCIe spec (presence pair are the last two pins to connect, presence must be debounced, delay between presence connecting and hotplug being asserted), or M.2 (no hotplug allowed).

devonkev
(newbie) Thu 08-Feb-24 10:51:37

Re: Poor uptime and reliability

[re: perlen] [link to this post]

For an ISP to be this open and transparent is unheard of.

Most other ISPs would just stay silent, in the hope that you go away, making you think BT or TalkTalk is the culprit. In the meantime, trying to fix things in the background.

The benefit here is that it’s all in house (or they work very closely with Firebrick) and from what has been put out, it seems they are competent and confident in what they do.

Although I’m sure it is annoying, at least you can fall over to another LNS.

My advice is to stick with a company who clearly know what they are doing. All companies have a rough patch, and this is one of them, I’m sure with your support they’ll be able to ride through it.

I’ve used their L2TP service in the past with Starlink and found it great.

Rhynchelma
(member) Thu 08-Feb-24 12:36:32

Re: Poor uptime and reliability

[re: devonkev] [link to this post]

Excellent point.

perlen
(newbie) Tue 27-Feb-24 12:35:21

Re: Poor uptime and reliability

[re: Rhynchelma] [link to this post]

Just lost my SSH session to work...

https://aastatus.net/42629
Lines on the X.Witless LNS were affected, sessions are recovering.

Edited by perlen (Tue 27-Feb-24 12:35:37)

bellerby
(newbie) Tue 27-Feb-24 13:02:37

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Not only that - my long standing connection to x.witless was abruptly ended just before 02:00 this morning after 2 drops in quick succession. These are clearly shown on the blip graph but have not yet been commented upon. Now on c.gormless.

E300
(committed) Tue 27-Feb-24 13:21:34

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

I had my connection drop due to a local power problem, up to then had spent tens of days on the more reliable x.witless, but connected back up to one of the new gormless LNS's which then crashed a few days later. I was bit annoyed not being on x.witless anymore as that seemed pretty stable, alas it seems it just had a good run.

There doesn't seem to have been any updates for a while about the debug logs they've captured and if they are nearer to a fix, perhaps no news means they are no nearer.

I wonder if they have a plan B, for example buying in some alternative hardware?

They say on the service status page that x.witless likely crashed due to not having the newer software and a NVMe drive fitted, yet ironically its stayed up for a very long time, and LNS's having been already upgraded with an NVMe drive fitted have reached nowhere near that length of uptime before crashing. So fitting an NVMe drive doesn't appear to have improved stability that I can see.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Tue 27-Feb-24 13:27:15)

qazwsxedc
(newbie) Tue 27-Feb-24 14:07:35

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
I wonder if they have a plan B, for example buying in some alternative hardware?

There's plenty of COTS hardware capable of doing this job, but switching to some other make of core router would be the worst possible advertising for the Firebricks. That may be why they're still persisting with them.

candlerb
(knowledge is power) Tue 27-Feb-24 15:47:49

Re: Poor uptime and reliability

[re: qazwsxedc] [link to this post]

A dual-vendor strategy could be wise. Along with an opt-in beta testers group.

E300
(committed) Thu 29-Feb-24 12:43:13

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Some overnight work tonight (early hours of 1st March) so some drops and shuffling about again, they are also separating out CityFibre and BT/TalkTalk customers so we are on separate LNS's and don't mix smile

They've not said if they've found a problem and the software update is a fix for the drops, still it's not easy trying to fix something you can't replicate at will.

https://aastatus.net/42630

AAISP BQM - IPv6 BQM - IPv4

jalzoo
(learned) Wed 06-Mar-24 10:35:15

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Was with AAISP for 3 months via City fibre and I can honestly say it was the most unreliable expensive over rated ISP I have ever been with. I changed over to IDNet and I have had absolutely no problems at all plus I'm saving money happy days.

XGS_Is_On
(committed) Wed 06-Mar-24 12:46:48

Re: Poor uptime and reliability

[re: jalzoo] [link to this post]

In reply to a post by jalzoo:
Was with AAISP for 3 months via City fibre and I can honestly say it was the most unreliable expensive over rated ISP I have ever been with. I changed over to IDNet and I have had absolutely no problems at all plus I'm saving money happy days.

It might be cheaper and more stable but you don't have Continuous Quality Monitoring anymore.

jalzoo
(learned) Wed 06-Mar-24 15:10:31

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

In reply to a post by jalzoo:
It might be cheaper and more stable but you don't have Continuous Quality Monitoring anymore.

I'd take a £13 a month reduction & a stable connection over quality monitoring any day of the week. If I'm really that interested in monitoring the quality of my connection I would just use the free ones out there..

Edited by jalzoo (Wed 06-Mar-24 15:11:13)

perlen
(newbie) Sat 09-Mar-24 19:07:14

Re: Poor uptime and reliability

[re: jalzoo] [link to this post]

Still not fixed...

https://aastatus.net/42636
Customers on the X.Witless LNS dropped and reconnected at 11:35 today.

bellerby
(newbie) Sun 10-Mar-24 07:11:23

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Indeed not. However they have split off City Fibre connections from the rest. I'm on BTW and currently connected to g.gormless. So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns. Further conjecture is the thought that maybe the issue is triggered by CF. Looking at the history, the issue does appear to have started afer taking on CF. Just a thought.

E300
(committed) Sun 10-Mar-24 08:40:47

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

In reply to a post by bellerby:
Indeed not. However they have split off City Fibre connections from the rest. I'm on BTW and currently connected to g.gormless. So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns. Further conjecture is the thought that maybe the issue is triggered by CF. Looking at the history, the issue does appear to have started afer taking on CF. Just a thought.

I did wonder as well if they thought the City Fibre traffic was somehow causing a bug, hence the separation of traffic. One thing seems clear, given X.Witless had the longest uptime out of all of them without an NVMe drive, and with an NVMe drive it has crashed with a very short uptime and all the others appear no more stable with an NVMe drive either, suggests that isn't playing a part in the stability, or, it fixes a bug they've seen with artificial load testing, but that isn't the same bug causing the issues on live.

AAISP BQM - IPv6 BQM - IPv4

qazwsxedc
(newbie) Sun 10-Mar-24 09:54:30

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
In reply to a post by bellerby:
So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns.

I did wonder as well if they thought the City Fibre traffic was somehow causing a bug, hence the separation of traffic. One thing seems clear, given X.Witless had the longest uptime out of all [...]

Counterexample: Openreach FTTP here, on x.witless until yesterday, then reconnected to z.witless.
Crash messes up BQM, too, with history from preceding midnight to crash time becoming inaccessible. Reliability and BQM are two unique selling points of AA, and both are negatively affected by these crashes. I'm sure they are aware of this, but IMNSHO the steps taken to fix this so far haven't worked, therefore different steps would be now be advisable.

E300
(committed) Sun 10-Mar-24 11:11:57

Re: Poor uptime and reliability

[re: qazwsxedc] [link to this post]

In reply to a post by qazwsxedc:
Counterexample: Openreach FTTP here, on x.witless until yesterday, then reconnected to z.witless.
Crash messes up BQM, too, with history from preceding midnight to crash time becoming inaccessible. Reliability and BQM are two unique selling points of AA, and both are negatively affected by these crashes. I'm sure they are aware of this, but IMNSHO the steps taken to fix this so far haven't worked, therefore different steps would be now be advisable.

So in that case, that blows that theory that City Fibre connections might somehow be triggering a bug if that was what they were thinking as X.Witless appears to be an Oprenreach LNS.

Seems to be a lack of updates currently, presumably because they are no further forward.

AAISP BQM - IPv6 BQM - IPv4

perlen
(newbie) Fri 15-Mar-24 08:44:35

Re: Poor uptime and reliability

[re: E300] [link to this post]

Goodbye x.witless, thank you for your service:

Mar 15, 07:40 AM
https://aastatus.net/42643

1½ hours ago by Andrew
At 7:30AM, the X.Witless restarted causing customers on it to drop and reconnect.

UPDATE
1 hour ago by Andrew
Lines reconnected by 7:33

RESOLUTION
1 hour ago by Andrew
This was related to https://aastatus.net/42608 This LNS is now out of service and will be analysed by our developers.

E300
(committed) Fri 15-Mar-24 10:25:07

Re: Poor uptime and reliability

[re: perlen] [link to this post]

The irony is X.Witless was the most reliable LNS, they then put an NVMe drive in it like the others as they suspected the slot left empty was causing some instability, now X has been the least reliable just recently.

We've had no updates either for a while so that kind of points to them not being any further forward. I really want to be hearing about plan B's now as this has gone on for months.

AAISP BQM - IPv6 BQM - IPv4

E300
(committed) Tue 26-Mar-24 10:21:03

Re: Poor uptime and reliability

[re: E300] [link to this post]

Seems it was a rough night over night as awoke to a string of emails of link up and down, turns out due to BT testing some links https://aastatus.net/42648 Seems pretty poor of BT to do that kind of testing with no notice. Found myself on t.gormless and latency up to 9.5ms from 7.5ms the day before which was g.gormless I think, I did a drop and reconnect this morning and back on x.witless and latency at 6.5ms which feels a lot nicer smile

. For some reason only x.witless gets my latency that low.

It also seems at some point this week new firmware is going up to the LNS boxes that may contain a fix for the lockups, currently the LNS boxes are running on older but stable firmware hence we've not had any crashes recently, that's according to the latest update at https://aastatus.net/42647 The status note isn't clear but it seems to imply they are all being updated, I would hope they are just updating one or two first in case it makes things worse and not better. Fingers crossed it fixes things.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Tue 26-Mar-24 10:22:04)

jaydub
(fountain of knowledge) Tue 26-Mar-24 18:11:25

Re: Poor uptime and reliability

[re: E300] [link to this post]

I think they were doing LNS full upgrades last night according to the updates on https://aastatus.net/42647

The last two updates were:

UPDATE
1 day ago by Andrew
These upgrades are now scheduled to happen this week (Tues 26th through to Thursday 28th) from 3AM.

UPDATE
1¾ hours ago by Andrew
Due to BT Planned work on one of our hostlinks, https://aastatus.net/42640 we will not be scheduling any upgrades for the early hours of Wednesday 27th March.

So I think they are only avoiding doing the upgrades this evening, but were scheduled to do them last night and there may be further to come tomorrow night.

I guess this might have been disrupted by the BT link issue, but we won't know for certain until there is a further update on: https://aastatus.net/42647

E300
(committed) Mon 08-Apr-24 09:00:42

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

Well we had drops yesterday (Sunday) which don't seem to get flagged up now on the service status page, that put me onto a gormless LNS due for an update which then meant I got knocked off again overnight! Currently sitting on y.witless.

Not sure what the current status is for the latest fixes as the information seems a bit fragmented on the status page, but I think I read it as they've had to roll back to the last firmware that was stable, so haven't yet found the cause.

AAISP BQM - IPv6 BQM - IPv4

perlen
(newbie) Mon 08-Apr-24 09:07:47

Re: Poor uptime and reliability

[re: E300] [link to this post]

Yes, 10am Sunday I lost connection too.
No explanation from AAISP:

https://aastatus.net/recent.cgi

E300
(committed) Thu 11-Apr-24 08:32:57

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Another drop last night before 1am. Not sure what the cause was as nothing on the service status page. It seems drops are becoming the norm and so don't get a mention anymore on the service status pages.

It's been about 5 months since the problems started and no closer to a fix it seems. We keep going through cycles of upgrades then downgrades all of which cause more drops on top of the random crashes.

I would suggest they move to just having one live LNS they test the fix on for the crashes and keep the rest on the stable firmware, and allow their customers to choose to be BETA testers and connect to this test LNS. In order to get a decent number on it for a real world test, as an incentive give people a discount to become a BETA tester. That way they could also update this LNS more frequently and during more sociable hours, as I bet the person having to do these updates and downgrades are fed up with doing it in the dead of night.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Thu 11-Apr-24 08:33:57)

Sun4Lw5LIQy
(newbie) Thu 11-Apr-24 08:40:32

Re: Poor uptime and reliability

[re: E300] [link to this post]

For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.

E300
(committed) Thu 11-Apr-24 16:46:27

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.

Yes I agree and I'm out of contract and wondering how long I stick with it. It is starting to irk me that I'm paying a premium but getting that less and less reflected in the product and service. This last drop like the one before has gone completely unacknowledged and no real updates on this issue in a while now.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Thu 11-Apr-24 18:03:07)

bellerby
(newbie) Thu 11-Apr-24 18:01:41

Re: Poor uptime and reliability

[re: E300] [link to this post]

Couldn't agree more. Unfortunately I'm still under contract but things will have to improve for me to stay. For all we know the latest incident could have happened on the "more stable" software. The lack of any update is very concerning.

candlerb
(knowledge is power) Fri 12-Apr-24 09:28:50

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

I'm not an AAISP customer, and I'm very unlikely to consider them in future after this.

It sounds to me like AAISP now have a tough decision to make. They have to decide whether they are primarily a router hardware vendor, with an ISP on the side to act as a large group of (paying) beta testers; or primarily an ISP, whose job is to provide top-rate Internet connectivity.

If they want to be the latter, they have to acknowledge that Firebrick isn't currently "best of breed" when it comes to LNS/BRAS, and they need a second vendor to provide actual service while they sort out their problems, or at least to roll back to the older hardware.

The decision to remove the older, slower but reliable LNSes (which were providing service to the customers on lower speed connections), and replace them with a model which is faster but known to be unstable, was madness IMO.

jpm
(fountain of knowledge) Fri 12-Apr-24 11:53:28

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

It's at least time to put out a blog post with an update as to where they are currently and what the plans for resolving this look like. I think the Firebrick is an ARM appliance so they can't even run the software on something else while they debug the hardware.

j0hn83
(knowledge is power) Fri 12-Apr-24 13:56:21

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
The decision to remove the older, slower but reliable LNSes (which were providing service to the customers on lower speed connections), and replace them with a model which is faster but known to be unstable, was madness IMO.

Those on the slower, more reliable FB6000 LNS's (nicknamed gormless I believe) were moved to other FB6000's. The FB6000's were swapped with the newer troublesome FB9000's (nicknamed witless).
So nobody was moved from an FB6000 to an FB9000, but it added additional FB9000's to spread the witless LNS load.

So that particular move was sensible in my opinion, though the whole saga seems a bit of a s*** show.
Essentially everyone on a package above 80Mb is a beta tester.

E300
(committed) Fri 12-Apr-24 14:30:52

Re: Poor uptime and reliability

[re: j0hn83] [link to this post]

In reply to a post by j0hn83:
but it added additional FB9000's to spread the witless LNS load.

I wonder if they needed more of these LNSs because the firmware they are calling factory stable maybe a very early one? From memory from status updates over the last 16 months or so, the early firmware's were not optimised for the new processors (something to do with only running on one core or not multi-threaded enough), but were stable.

So if they had to go back to this non-optmised but stable firmware, then it would explain why they have had to throw more boxes at it to make up for the lower performance, as they presumably have more customers on the faster packages now and also have CityFibre as well with symmetrical connections they didn't have before.

That isn't what they told us at the time, they suggested more boxes would mean fewer people would be affected by a lock up, but more boxes with the same probability of crashing would just work out over time seeing the exact same number of customers affected, so I couldn't see the logic in that.

They've now had a L2TP router lock up (https://aastatus.net/42655) and drop a lot of customers and blamed that on an early FB9000 prototype and are/have replaced it with a new box, but I'm sure they said that about the LNS's and replaced the hardware with production kit, which we know didn't fix the issue. It would seem these new Firebox's are just not stable full stop, and I really hope they have a plan B.

Of course with the lack of any up to date information we are all reading things into what is going on and perhaps not coming to the correct conclusions.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Fri 12-Apr-24 14:48:23)

candlerb
(knowledge is power) Fri 12-Apr-24 15:11:04

Re: Poor uptime and reliability

[re: j0hn83] [link to this post]

In reply to a post by j0hn83:
Those on the slower, more reliable FB6000 LNS's (nicknamed gormless I believe) were moved to other FB6000's. The FB6000's were swapped with the newer troublesome FB9000's (nicknamed witless).
So nobody was moved from an FB6000 to an FB9000, but it added additional FB9000's to spread the witless LNS load.

Ah yes, thank you: re-reading the thread it was made clear earlier on. There is a slightly smaller FB6000 pool as a result, and slightly less headroom/redundancy.

E300
(committed) Fri 12-Apr-24 15:13:23

Re: Poor uptime and reliability

[re: E300] [link to this post]

Just to add there is some more information here https://social.aa.net.uk/public/local covering the issues and troubleshooting. Would be good if they added this link into the service status pages, its a bit more chatting and verbose in the information it provides.

AAISP BQM - IPv6 BQM - IPv4

aabloor
(newbie) Fri 12-Apr-24 16:40:50

Re: Poor uptime and reliability

[re: E300] [link to this post]

Hi all,

Thanks for all the feedback given in this thread.

We do appreciate it, and we know that our recent reliability for some customers has been unacceptable. I wanted to set out a bit more of the story, mainly for transparency rather than because we expect it to be "mitigation" in most people's minds.

This post refers to and updates a status post originally made at :

https://aastatus.net/42608

This is where our two roles; that of both an ISP with broadband customers, and also that of a hardware manufacturer meet each other head-on and, unfortunately and uncomfortably, collide.

To be abundantly clear, we are very sorry for the outages some customers have suffered. This falls below the standards we set ourselves. We are not happy about it, and a lot of effort is going into sorting it.

The story since
---------------

Several plausible causes have been found, fixed and tested in our testing process (before deploying live). Many of these will have fixed genuine problems, but not solved what appears to be the "main" issue.

Almost all of these have been at the meeting point between hardware and software. The problem with a hardware hang is that far less diagnostic information is available to assist with debugging.

On several go-arounds now, we have genuinely believed that the issue had been found and fixed, tested in our test-rig offline, and therefore we were keen to place the firmware in active use; the thought being that the sooner it was rolled out, the sooner the unreliability would disappear.

But then, some time after being put live, an FB9000 would suffer another hang. The nature of the hang has been unpredictable (i.e. when it would happen); sometimes taking days or weeks to surface. Meanwhile, until it did hang, we still believed the problem had been solved.

"Why not Cisco?"
----------------

Some customers have quite reasonably asked why we do not employ (even temporarily) a 3rd party hardware vendor as our LNS supplier, such as Cisco. This is an option, but the costs of implementation (in time and money) we still feel would be better spent on active R&D to resolve this problem.

We do still believe strongly that the FB9000, when stable, offers us features that distinguish our service from the service of almost all others. Simply, we want bonding, CQM graphs, low power consumption, etc.

It is part of what makes our ISP offering different and better; our USP.

Other issues
------------

Within this same time frame, we have had multiple instances of BT Wholesale doing planned work which they had not told us about in advance (and apparently not told other ISPs, too). We could have zeroed the impact of their planned work, had they told us they were doing it beforehand.

Multiple times we have raised this with our account manager and at higher levels, and we still have not had a satisfactory response. Of course, no wholesale network is 100% reliable; we are not unreasonable about this, but the combined appearance, especially to customers not following matters closely, is that it's "another LNS blip". Unlucky timing, which would be bad any time, but happens to be far worse just now.

A change of plan
----------------

Historically, our October "Factory" firmware from has been stable. The hangs we have seen have all occurred in releases prior to that one, or since that one. That release did have at least one major fix in it, addressing a hardware hang (the PCI/NVMe issue).

Our immediate decision is to therefore put all "live" production FB9000 hardware back onto the October "Factory" release, except for our test LNS. To this end, we have already rolled back almost all live LNSs.

Assistance requested if you're willing
--------------------------------------

We invite and encourage customers who do want to assist with the process of fixing this to prepend "test-" onto their login, which will steer them to the test LNS, and help the effort to fix the problem. Of course this may be less stable than our regular LNS. Email support for more details.

Rounding up
-----------

Hopefully this post shows we are listening, that there is a vast amount of work going on, and that we've taken a different approach, recognising that this state of affairs has remained too long and cannot be carried on.

I recognise that this level of openness is uncommon, but the situation we are in is uncommon; I doubt any other ISP develops its own core equipment.

I politely request that this post is taken for what it is; a genuine offer to :

* explain in more depth
* announce a change of direction
- and -
* apologise for the outages

... and not as an invitation to simply slag off everything we do.

Nothing we do happens by accident or because of a lack of thought, or a lack of awareness, or a cavalier approach to customer well-being. Decisions sometimes do prove to be wrong, but decisions *are* made, and made with the best of intentions.

There are human beings writing the code.
There are human beings in our Ops and Support teams.
And there are human beings managing the business.

Nobody takes this in any other way than "extremely seriously".

Thanks for taking the time to read this, and we are happy to answer any questions, of course.

--- B

---
Bloor
GM, A&A.

Edited by aabloor (Fri 12-Apr-24 16:47:38)

perlen
(newbie) Fri 12-Apr-24 17:10:50

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

Hi Alex, regarding:
"To this end, we have already rolled back almost all live LNSs."

Why wasn't this announced/warned about on A&A Status Page?
A few of us have seen unplanned stuff affecting multiple users without knowing the reason.
Thanks.

PCJM40
(committed) Fri 12-Apr-24 17:20:06

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

I am not an A&A customer but I have to commend you on your unexpected transparent and complete response, its what sets you apart from the masses and I hope you get to the bottom of this blip.

E300
(committed) Fri 12-Apr-24 17:38:02

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

Thanks for the update.

One question, why update all the LNS's when you think you have fixed it? I would suggest picking the LNS that seems to have had the most lockups and just updating that single one would be the prudent thing to do, then deciding on a period of time it must run crash free before you say its fixed, and only when its proved itself update the others.The overnight drops for firmware upgrades/downgrades only adds to the perception of things being even more broken.

I will log into the test LNS later today for the weekend, but during the week due to working from home I'll switch back.

I doubt any other ISP develops its own core equipment.

I think you are finding out why they don't smile

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Fri 12-Apr-24 17:38:49)

Rhynchelma
(member) Fri 12-Apr-24 17:46:39

Re: Poor uptime and reliability

[re: E300] [link to this post]

Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?

E300
(committed) Fri 12-Apr-24 17:58:57

Re: Poor uptime and reliability

[re: Rhynchelma] [link to this post]

In reply to a post by Rhynchelma:
Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?

Some recent drops were a problem created by suppliers, they typically happen overnight.

The problem we are discussing here has been an issue since November last year, where randomly and often during the day the connection has dropped for a quantity of customers because of a hardware problem with A&A's Firebrick they design themselves.

Apart from the recent BT issue that caused a lot of drops over a single night, I don't recall any other drop that wasn't caused by A&A and this issue, or due to A&A having to upgrade or downgrade firmware in order to try and fix it. To put things into perspective, the drops were not very often, I perhaps had one every few weeks, but less so now as they've reverted to stable firmware.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Fri 12-Apr-24 17:59:48)

Pipexer
(eat-sleep-adslguide) Fri 12-Apr-24 21:59:36

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

Obviously your situation is highly specialist and unique and so I suspect might have some bearing on the following thought, but, this is getting somewhat high profile and I am wondering why there has been no mention or consideration of getting specialist 3rd parties in to put a 2nd pair of eyes on the situation? I could imagine there would be no value in help with the software, but couldn't the hardware have an issue which a specialist could possibly identify?

I know this is potentially highly costly and often does not yield anything productive, however I know based on situations at work where I have been dealing with an outage or problem, if I couldn't make progress on something so high impacting after a certain period of time, I'd be exploring means of getting a 3rd party in even as means of simply reassuring customers and end users.

FWIW, I've not been affected whatsoever by these problems (either it's because I'm on <80Mbps or I just haven't noticed), however, I have been following it out of curiosity and the lack of mention of getting external help is something that to me has been noticably missing or at least not addressed.

Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?

Rhynchelma
(member) Fri 12-Apr-24 22:48:37

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
In reply to a post by Rhynchelma:
Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?

Some recent drops were a problem created by suppliers, they typically happen overnight.

The problem we are discussing here has been an issue since November last year, where randomly and often during the day the connection has dropped for a quantity of customers because of a hardware problem with A&A's Firebrick they design themselves.

Apart from the recent BT issue that caused a lot of drops over a single night, I don't recall any other drop that wasn't caused by A&A and this issue, or due to A&A having to upgrade or downgrade firmware in order to try and fix it. To put things into perspective, the drops were not very often, I perhaps had one every few weeks, but less so now as they've reverted to stable firmware.

Thanks for the clarification.

xela
(fountain of knowledge) Sat 13-Apr-24 15:41:36

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

Thank you

Those with long memories will know that this isn’t the first time AAISP have had the “why don’t you just use Cisco like everyone else?” thing thrown at them and there do seem to be some good reasons.

jimbof
(committed) Sat 13-Apr-24 19:44:22

Re: Poor uptime and reliability

[re: xela] [link to this post]

Pretty limited utility to the "average" punter, though. The bandwidth accounting features are useful to AAISP with their commercial model, but not really to the customer, other ISPs simply don't charge for bandwidth. Much of the functionality from the CQM side can be achieved using 3rd party services (arguably even more useful, as it takes into account issues in the ISP's transit).And the bonding services must be dying a death now for residential, with rubbish ADSL lines rapidly being something folk don't need to contend with, though maybe more useful to businesses for failover (I think central bonding is a bad idea for that use case, though, better using distinct network providers with local failover).

I raised the question of the stability of the (then beta) LNS HW around a couple of years ago when I joined AAISP, I was a little concerned it could be an issue. As it would happen, over the period I was a customer the LNS behaved impeccably; I left because no matter the quality of service, the anxiety over the quota system and trying to juggle changing quotas to use up my allowance at minimum cost just annoyed me too much.

I'm with Unchained now, who use Cisco for their LNS; performance is great (900Mbps line rate single threads to Cloudvider iperf services in UK and often EU) - and no quotas at lower cost than the least expensive AAISP service. Small outfit, great personal service. I certainly don't miss any features from the FB9000 I'm (no longer) connected to. I think it is a tenuous benefit for many, and in hindsight my concern was probably founded in common sense (and I got lucky over my tenure).

E300
(committed) Sun 14-Apr-24 09:35:06

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

In reply to a post by jimbof:
Pretty limited utility to the "average" punter, though. The bandwidth accounting features are useful to AAISP with their commercial model, but not really to the customer, other ISPs simply don't charge for bandwidth. Much of the functionality from the CQM side can be achieved using 3rd party services (arguably even more useful, as it takes into account issues in the ISP's transit).And the bonding services must be dying a death now for residential, with rubbish ADSL lines rapidly being something folk don't need to contend with, though maybe more useful to businesses for failover (I think central bonding is a bad idea for that use case, though, better using distinct network providers with local failover).

I raised the question of the stability of the (then beta) LNS HW around a couple of years ago when I joined AAISP, I was a little concerned it could be an issue. As it would happen, over the period I was a customer the LNS behaved impeccably; I left because no matter the quality of service, the anxiety over the quota system and trying to juggle changing quotas to use up my allowance at minimum cost just annoyed me too much.

I'm with Unchained now, who use Cisco for their LNS; performance is great (900Mbps line rate single threads to Cloudvider iperf services in UK and often EU) - and no quotas at lower cost than the least expensive AAISP service. Small outfit, great personal service. I certainly don't miss any features from the FB9000 I'm (no longer) connected to. I think it is a tenuous benefit for many, and in hindsight my concern was probably founded in common sense (and I got lucky over my tenure).

Yes there was a decent spell without issues. I agree that with FTTP there is less of a need for the bespoke monitoring they have which I've never needed. Perhaps they should move to tried and already tested third party kit for their FTTP customers as it seems a bit cheeky to use their customers as unpaid beta testers. Their unique selling point, ironically, is the reason I'm now looking at other options.

Their usage caps, well I'm borderline between the two options so just opt for the higher tier so I don't find myself checking usage towards the end of the month and rationing it. Given the issues recently I don't find myself quite as okay with paying over the odds, yet I don't want to go back to metering my connection by moving to their lower tier. Another reason to look elsewhere.

Unchained might be a contender. I did see on their website mention of line bonding and "Firebrick" was mentioned, so I did wonder if they were using Firebricks throughout. Is it definitely Cisco they use? It would be very annoying if after a few months they upgraded their Firebricks or the firmware and they get the same problems.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Sun 14-Apr-24 09:37:21)

XGS_Is_On
(committed) Sun 14-Apr-24 12:33:27

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

In reply to a post by aabloor:
We do still believe strongly that the FB9000, when stable, offers us features that distinguish our service from the service of almost all others. Simply, we want bonding, CQM graphs, low power consumption, etc.

It is part of what makes our ISP offering different and better; our USP.

Hey Alex,

Appreciate the rest and it's admirable, you make great points and I hope they are accepted with grace and good manners, but a couple of questions for my own interest on the quoted section and another little bit.

Bonding: you folks are only putting customers on 300 Mbit+ onto the FB9000, indicating they are on FTTP. Is there much of a market for bonding FTTP? The only use case I can think of is a very high end residential/SME using you guys across 2 different physical FTTP networks, so needing CityFibre and Openreach FTTP available to them. Using Openreach FTTP via BT Wholesale and TalkTalk Business is a use case however the protection is substantially reduced to the point where an active-backup solution may as well be used.

For capacity aside from racking up multiple Openreach services for higher upload due to their gross asymmetry I can't see another use case. I'm thinking there won't be that many desiring such capacity using your services as, as I recall, your usage per subscriber is very much towards the lower end of the market.

Regardless bonding can be done without PPP on carrier and not so carrier kit as I'm sure you folks are aware. Commodity servers are amazing at LNS duty, FPGA even better, ASIC even better still.

CQM had enormous value during copper times. The only folks on the 9000 are on FTTP. Issues with performance across your carriers are very few and far between now. Utilisation can be measured at the BNG, latency and loss rely on preferential treatment of LCP and without it may be measured out of band with similar accuracy so what customer benefit do you see going forward?

Another question related is are you folks going to use PPP indefinitely given a major feature relies on it? Openreach deliver Ethernet, CityFibre deliver Ethernet, at some point BT Wholesale will deliver Ethernet, they wanted to move away from PPP when GEA started, TTW etc will follow. Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?

Lower power consumption is super important but per subscriber how are the numbers?

I note according to the website the FB9000 has a pair of 10GbE ports, the only two on each appliance, that seem to share a single 10 gigabit backplane to the CPU. You folks start selling 1.8 and 2.3 Gbit services the symmetrical ones especially are going to be problematic: a customer uploading 2.3 Gbit/s is taking 2.3 Gbit/s of downstream capacity from the backplane and as we know saturating downstream for any length of time is hard, saturating upstream, which you folks do not meter, isn't. With relatively skinny pipes and relatively high burst compared to the pipes your LNS end up looking less like LNS and more like transport links with high burst to sustained ratio. You can mitigate this with racks and racks of LNS as you did the smaller models leaving fewer subscribers per chassis but where does that leave power consumption per subscriber, the metric that matters?

With the advent of DTT going IPTV the base load on networks is going to increase substantially and with higher burst products to continue to never be the bottleneck you're potentially going to have to provision a lot of kit and have relatively few subscribers on each chassis. As I mentioned above I remember your usage announcements per subscriber being at numbers many ISPs would envy as they were running at twice or more those numbers. DTT and other services going all-IP will close that gap a ton and your usage will jump reducing customers per appliance even more.

The FB9000 is brand new, can't be purchased yet, what kind of lifespan do you folks see for it?

I think there's a big asymmetry inherent in this part:

In reply to a post by aabloor:
This is where our two roles; that of both an ISP with broadband customers, and also that of a hardware manufacturer meet each other head-on and, unfortunately and uncomfortably, collide.

Without using Firebrick the ISP can still be bloody good, IMHO without CQM excellent reactive and proactive support is super important: I don't use you folks because FTTP doesn't really break often so high level support isn't really needed, I have an excellent altnet as my primary service, a personal friend provides my backup and regardless I have my own monitoring due to my job else you guys would be my backup. Without the ISP using Firebrick Firebrick is probably not so healthy putting a fair amount of pressure on the ISP to continue using Firebrick regardless and ensure the business remains viable.

On another note thank you again for my Ignis: he remains in the background of every work Zoom and Teams call as he's just the cutest <3

Edited by XGS_Is_On (Sun 14-Apr-24 12:37:26)

jpm
(fountain of knowledge) Sun 14-Apr-24 13:45:56

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.

XGS_Is_On
(committed) Sun 14-Apr-24 16:11:19

Re: Poor uptime and reliability

[re: jpm] [link to this post]

In reply to a post by jpm:
A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.

VPN.

E300
(committed) Sun 14-Apr-24 16:43:07

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

In reply to a post by XGS_Is_On:
Another question related is are you folks going to use PPP indefinitely given a major feature relies on it? Openreach deliver Ethernet, CityFibre deliver Ethernet, at some point BT Wholesale will deliver Ethernet, they wanted to move away from PPP when GEA started, TTW etc will follow. Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?

Lower power consumption is super important but per subscriber how are the numbers?

That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy smile

AAISP BQM - IPv6 BQM - IPv4

candlerb
(knowledge is power) Sun 14-Apr-24 20:09:04

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy

The more technical customers (and those prepared to pay A&A rates) will also likely use decent hardware and/or software.

PPPoE at 1Gbps is not hard to achieve in software. It's only an 8-byte extra header; similar work to VLAN tagging (4 bytes).

XGS_Is_On
(committed) Sun 14-Apr-24 20:15:26

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy

Think hardware accelerated PPPoE has been in SoCs for a while now, the weaksauce consumer routers ISPs give need it to hit the throughput target. Shouldn't need too much to get to gigabit throughput through software though unless using the software you described which uses a kernel with bad PPP functionality: single core decapsulation only, no multithreading.

Whether an ISP should be doing something as major as arranging the connectivity and installing BNGs to handle a subset of customers with broken software is a tricky one. PPP at worst doubles the cost of handling a packet but on the x86 kit only really an issue with the software you mentioned. The user always has the option to use different software with the same hardware, for free, rather than expecting the ISP to change their network to fit.

No PPP would break the current implementation of CQM, too, which I imagine is a major issue given how attached these folks clearly are to it.

Giving customers the option will probably have to wait until one of their wholesale suppliers announces removal of support for PPP and they've no choice but to use something else: I think the network, hardware and software is very much engineered around PPP at the moment.

XGS_Is_On
(committed) Sun 14-Apr-24 20:26:27

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
The more technical customers (and those prepared to pay A&A rates) will also likely use decent hardware and/or software.

PPPoE at 1Gbps is not hard to achieve in software. It's only an 8-byte extra header; similar work to VLAN tagging (4 bytes).

Remember how much work can be offloaded to a half-decent NIC relative to all-software implementations. NICs can take some of the work handling tagged frames off the CPUs.

The software in the distributions mentioned in the post you're replying to handle PPP on a single thread while they'd split handling of TCP/IP and UDP/IP across all cores which is I guess the discrepancy and why PPP requires much meatier hardware on those specific software packages.

jimbof
(committed) Sun 14-Apr-24 20:37:23

Re: Poor uptime and reliability

[re: E300] [link to this post]

The FTTP900 LNS definitely are Cisco. Unchained have some older Firebrick gear, and do do minimal ping based CQM using a Firebrick to my gateway, but I understand they went with Cisco in order to offer fast FTTP services. If there is something in particular you need to know I'd just reach out to them, they're very responsive.

I was also between quota levels on AAISP, so I would spend one month on the high quota and would then eke out the rollover on the low quota for 2 months. It mostly annoyed me it was so inflexible and blunt and I felt dutybound to not pay for more than I needed if it was being monitored... I don't actually need the £20 it was saving me...

perlen
(newbie) Sun 14-Apr-24 21:58:02

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

No jimbof mate, as Alex (aabloor) said, the Firebrick FB9000 service the FTTP 900Mbit lines - not Cisco.

Pipexer
(eat-sleep-adslguide) Sun 14-Apr-24 23:40:18

Re: Poor uptime and reliability

[re: perlen] [link to this post]

He’s not talking about AAISP

Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?

jimbof
(committed) Mon 15-Apr-24 08:04:29

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I was responding to E300's enquiry about Unchained ISP's use of Cisco LNS when they also mention Firebricks. I'm aware AAISIP use the FB9000s (I was on their FTTP900 service for a little over a year).

jimbof
(committed) Mon 15-Apr-24 08:15:56

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

There are a couple of holdouts where companies are not doing HW accelerated PPP on hardware that is barely capable of doing it at line rate; Ubiquiti being one obvious example. I'd welcome a DHCP / Ethernet based service as I do use Ubiquiti gear and have put an OPNsense box in front of it to do PPP...

This seems like a moot discussion though; the only companies that have done DHCP based FTTx in the UK are those who have their own equipment in the exchanges - so Sky and TalkTalk retail I believe. All the BTW backhaul based consumer services like AAISP are PPP because that is how that service works; the login details used steer the connection to the target ISPs LNS. BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.

E300
(committed) Mon 15-Apr-24 08:44:57

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

In reply to a post by XGS_Is_On:
Think hardware accelerated PPPoE has been in SoCs for a while now, the weaksauce consumer routers ISPs give need it to hit the throughput target. Shouldn't need too much to get to gigabit throughput through software though unless using the software you described which uses a kernel with bad PPP functionality: single core decapsulation only, no multithreading.

Works fine in software but I needed to upgrade from a PC Engines APU2E4 on moving to 1Gig, it would have been fine for the throughput otherwise. Granted in this case the issue is with the software.

It isn't also just FreeBSD based x64 routers that suffer, all routers will max out at a lower throughput when using PPPoE even with acceleration in the SoC. Whether that becomes a problem depends on what speed packages a person can get or upgrades to how and how long they want to keep their kit. In many cases the need for faster network ports means an upgrade, but you have routers coming with 2.5 or 10Gbps ports that can't manage more than 1.5 or 2.0Gbps when using PPPoE (UDM-Pro seems to one example), but will do significantly more otherwise. Also there is a price we are paying for this acceleration in the kit we buy, and I believe most of the world is or are moving away from PPPoE.

A&A do all sorts of techy and niche things, it is their UPS after all, so given its their own kit and their own software they are show casing, I would have thought it was right up their street.

Edit:

BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.

Guess that explains it then. Maybe they have the option with CityFibre in the future?

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Mon 15-Apr-24 08:51:05)

XGS_Is_On
(committed) Mon 15-Apr-24 09:53:45

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

In reply to a post by jimbof:
There are a couple of holdouts where companies are not doing HW accelerated PPP on hardware that is barely capable of doing it at line rate; Ubiquiti being one obvious example. I'd welcome a DHCP / Ethernet based service as I do use Ubiquiti gear and have put an OPNsense box in front of it to do PPP...

This seems like a moot discussion though; the only companies that have done DHCP based FTTx in the UK are those who have their own equipment in the exchanges - so Sky and TalkTalk retail I believe. All the BTW backhaul based consumer services like AAISP are PPP because that is how that service works; the login details used steer the connection to the target ISPs LNS. BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.

Your decision to use a vendor selling gateways rather than dedicated routers at your WAN edge with a ton of other stuff sapping the CPU and, hence, throughput. IDS/IDP, managing APs, etc, all takes a toll. Enthusiast kit with big interfaces doesn't guarantee line rate anyways in my experience: believe enabling IDS/IDP really harms throughput too.

A&A also use CityFibre and could use IP directly rather than PPP, however it was more of a forward thinking question on my part. Very aware their primary supplier uses PPP and mentioned that they didn't want to for GEA. At some point it'll go.

No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.

Edited by XGS_Is_On (Mon 15-Apr-24 10:10:08)

XGS_Is_On
(committed) Mon 15-Apr-24 10:07:46

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
Guess that explains it then. Maybe they have the option with CityFibre in the future?

They do, but until BTW start changing it is it worth it because a small subset of users have marginal hardware running single threaded PPP? All expense for zero practical gain. Need more of a business case, especially when your network is engineered around PPP.

My question to them was purely on the basis that one of the first things mentioned as a case to continue using Firebricks was CQM and that relies on PPP, as does their bonding code, so is the plan to overlay PPP even when the supplier networks are all IP. Also are they still all in on CQM despite that, in the FTTP world with wholesale supplier performance much better, there's far less of a need for it to detect issues.

Customers blip you get a bunch of notifications when they lose and reestablish from the wholesale supplier and your own BNG. Usage is easily monitored at the BNG.

E300
(committed) Mon 15-Apr-24 10:35:00

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

In reply to a post by XGS_Is_On:
They do, but until BTW start changing it is it worth it because a small subset of users have marginal hardware running single threaded PPP? All expense for zero practical gain. Need more of a business case, especially when your network is engineered around PPP.

You keep using the argument of cost or small number of users would benefit, but this ignores the whole ethos of A&A and why people pay extra to join them. I'm sure some made the same arguments back in 2002 when they started implementing IPv6, as in why go to the trouble, no one else uses it, limited websites to connect to, who benefits, what is the business case etc. A&A could turn IPv6 off tomorrow and the Internet would still work so it really wasn't necessary, but they went ahead anyway and were one of the first to do it.

A&A do try to do things differently, try to be ahead of the game, cater for the techies, work with industry to push things along, they advertise as being tech nerds themselves. If they've become like any other ISP, doing nothing unless forced to because it costs money with no practical short term gain, don''t cater for the "tech nerds" by offering anything different or innovative, then what is their selling point given the premium for their services and having their capped usage model that no other ISP has these days? Their "we fix any line" and all the monitoring as a selling point is disappearing as fast as FTTP is appearing

As I said, this was something I would have expected to have been right up their street, I understand the arguments against doing it for 99.99% of all other ISPs.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Mon 15-Apr-24 10:38:14)

jimbof
(committed) Mon 15-Apr-24 10:41:20

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.

When you say "no requirement" - you're talking about a hypothetical network setup BT could deploy, right, not something that is available now but ISPs aren't choosing to use?

BTW /Openreach appear to be doing the bare minimum of anything with the FTTP implementation, and there looks to be no steering in place with the BTW ISPs I've used.

I have had 2 BTW connections at the same property on the same PON, and BTW don't even police that a certain ISP PPPoE login is originating from a certain ONT. I could happily connect to either BTW ISP (in this case, AAISP or Unchained) via either of the ONTs that I had, and the connection ended up at the ISP whose login details you used, with no obvious error if you used the wrong ONT. In the case of AAISP, it totally broke their CQM setup as their CQM setup looks for the incoming ID it is expecting to tie up the data with your account, yet they still allows the PPPoE connection.

candlerb
(knowledge is power) Mon 15-Apr-24 12:02:06

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
A&A do try to do things differently, try to be ahead of the game

Enabling IPv6 is being ahead of the game; making their network more complex to support poor quality client routers in limited scenarios is not.

It's not as if anyone is proposing phasing out of PPP. As others have said, PPP has major advantage for broadband delivery: bonding is one, being able to use L2TP for backup connections is another.

Talktalk only did IPoE because they needed multicast for their old TV service. It's not because they want to make a service for techno-nerds. After all, they're *not* deploying IPv6 either.

jimbof
(committed) Mon 15-Apr-24 12:28:06

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

Talktalk only did IPoE because they needed multicast for their old TV service. It's not because they want to make a service for techno-nerds. After all, they're *not* deploying IPv6 either.

They didn't have to do it that way though, did they? BT retail also have a multicast IPTV offering while using PPPoE for the client. They could have mirrored that setup I believe.

XGS_Is_On
(committed) Mon 15-Apr-24 14:06:33

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

In reply to a post by jimbof:
No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.

When you say "no requirement" - you're talking about a hypothetical network setup BT could deploy, right, not something that is available now but ISPs aren't choosing to use?

BTW /Openreach appear to be doing the bare minimum of anything with the FTTP implementation, and there looks to be no steering in place with the BTW ISPs I've used.

I have had 2 BTW connections at the same property on the same PON, and BTW don't even police that a certain ISP PPPoE login is originating from a certain ONT. I could happily connect to either BTW ISP (in this case, AAISP or Unchained) via either of the ONTs that I had, and the connection ended up at the ISP whose login details you used, with no obvious error if you used the wrong ONT. In the case of AAISP, it totally broke their CQM setup as their CQM setup looks for the incoming ID it is expecting to tie up the data with your account, yet they still allows the PPPoE connection.

Openreach wouldn't care about the connections they'd just send both to BTW. They are more than capable of putting the connections into different VLANs on the same 4 port kit, though. I've had, briefly, a 4 port ONT sending 2 to BTW and 1 to Zen GEA.

Yes, I was talking about future capabilities, not current. I can't recall exactly what I was responding to so apologies if out of context. My point was there's no need for PPP to achieve what BTW need to.

jimbof
(committed) Mon 15-Apr-24 15:29:30

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

I get that there's no reason for it to have to work this way - but it's all in BTWs realm, and there's no indication that will ever happen (witness: 20 years of FTTC working this way, using PPPoE). I think the main point is folk are thinking that by having custom kit or being a niche provider AAISP could do something different and offer an alternative configuration with FTTP via BTW , but the overarching point is they can't because PPPoE is the way BTW make it work for everyone.

Chrysalis
(legend) Mon 15-Apr-24 17:55:14

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.

Is speed not stable then? Some peeps I know who joined AAISP in 2022, were saying it was great, but thats bad news if speeds arent sustained anymore on AAISP FTTP.

VM Gig1 - AAISP L2TP

Chrysalis
(legend) Mon 15-Apr-24 18:27:36

Re: Poor uptime and reliability

[re: jpm] [link to this post]

In reply to a post by jpm:
A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.

You dont need PPP auth for that, I use personal VPN setup's authing with a certificate, but it can also be done in other ways as well, I do echo XGS questions on the PPP, and with the contents of this thread have some doubts about my FTTP order with AAISP as well, the thing saving the order right now is that AAISP is only a 1 month commit, if it was the normal 18+ months I think I would have pulled the plug.

VM Gig1 - AAISP L2TP

jimbof
(committed) Mon 15-Apr-24 19:33:55

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Is that with CityFibre then? Openreach is still showing as 12 months on their website.
https://www.aa.net.uk/broadband/home1/

Chrysalis
(legend) Mon 15-Apr-24 19:38:17

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

Yes its a CF order, states 1 month on AAISP's website, I probably will get conformation from sales, but I am pretty sure they already told me its 1 month as well.

VM Gig1 - AAISP L2TP

jimbof
(committed) Mon 15-Apr-24 19:54:22

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Yes, it's shown as only 1 month for CF, I'm sure you are right in that case.

Sun4Lw5LIQy
(newbie) Tue 16-Apr-24 10:38:40

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

I keep getting hopped over different LNS devices and it’s a bit of a lottery. I am paying for 1GB internet and I’m lucky if I get 330MB. Some days it’s 200. I had the same router and ONT with wholesale TalkTalk and I was getting 800MBs. It feels like a downgrade.

Chrysalis
(legend) Tue 16-Apr-24 19:34:15

Re: Poor uptime and reliability

[re: E300] [link to this post]

Apparently Yayzi dont use PPPoE which has got my attention, so seems there is at least one ISP out there that isnt a Sky, VM or TT that uses IPoE.

VM Gig1 - AAISP L2TP

Chrysalis
(legend) Tue 16-Apr-24 19:35:24

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
I keep getting hopped over different LNS devices and it’s a bit of a lottery. I am paying for 1GB internet and I’m lucky if I get 330MB. Some days it’s 200. I had the same router and ONT with wholesale TalkTalk and I was getting 800MBs. It feels like a downgrade.

Ok I think you have a different issue, if those kind of speeds were common place we would be hearing about it, seems like a potential provisioning issue to me, I would contact AAISP support about it.

VM Gig1 - AAISP L2TP

Sun4Lw5LIQy
(newbie) Wed 17-Apr-24 12:44:22

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Just for clarity I also have experienced the LNS stalling issues I was told they are looking into it and they had added NVMe drives to some of the LNS systems to get things resolved.

I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past. I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.

Like I said when it comes to contract renewal I’ll be looking elsewhere.

jcre
(newbie) Wed 17-Apr-24 13:06:43

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

Whilst there's technical reasons to keep PPP in the BTW and TTB networks that would be harder if it was a pure layer 2 delivery (at least in my opinion), they both do offer VLAN based handover on their EoFTTx products - which attract a fair premium, and they probably want to keep EoFTTx as their premium offering to maintain that extra profit margin, which would incentivise keeping PPP in place for their less premium offerings.

FTTx is already handed over on the L2S in the exchange as a VLAN anyway.

E300
(committed) Wed 17-Apr-24 13:07:08

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past.

You really need to test using a wired connection. Wi-Fi is fairly fluid, it can work great one day and not so the next depending on interference or if it has decided switch channels etc.

I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.

I had the same problem 2 years ago and provisioned on to the wrong LNS, I'd have thought they had got this fixed by now, obviously not.

You could try connecting to the test LNS and see if that gives you better speed, just prefix your username with test- and reconnect. I see better single thread speeds when on that one presumably because it isn't loaded up with many customers. If its better you can assume the slow downs are not due to BTWholesale and backhaul

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Wed 17-Apr-24 13:09:39)

Chrysalis
(legend) Wed 17-Apr-24 13:55:33

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
Just for clarity I also have experienced the LNS stalling issues I was told they are looking into it and they had added NVMe drives to some of the LNS systems to get things resolved.

I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past. I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.

Like I said when it comes to contract renewal I’ll be looking elsewhere.

If it was load/congestion related your max would still be line rate as congestion wouldnt be around the clock.

Do you actually use the net over wifi or ethernet? Expecting gigabit throughput over wifi isnt realistic, and you need good equipment to even get half way there.

Your 300 does smack of provisioning problem as conveniently thats around the figure people report when OR get it wrong, and its also not far off what people reported when on the old AAISP LNS. But this is moot until you actually test over ethernet.

VM Gig1 - AAISP L2TP

Edited by Chrysalis (Wed 17-Apr-24 13:57:58)

Chrysalis
(legend) Wed 17-Apr-24 14:01:10

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
In reply to a post by Sun4Lw5LIQy:
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past.

You really need to test using a wired connection. Wi-Fi is fairly fluid, it can work great one day and not so the next depending on interference or if it has decided switch channels etc.

I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.

I had the same problem 2 years ago and provisioned on to the wrong LNS, I'd have thought they had got this fixed by now, obviously not.

You could try connecting to the test LNS and see if that gives you better speed, just prefix your username with test- and reconnect. I see better single thread speeds when on that one presumably because it isn't loaded up with many customers. If its better you can assume the slow downs are not due to BTWholesale and backhaul

Whats your single threaded like on the production LNS? I am used to max single threaded around the clock, so would be disappointed if its like Zen.

VM Gig1 - AAISP L2TP

E300
(committed) Wed 17-Apr-24 14:39:10

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

This is the current test, a bit better than it has been on the single thread test. I can replicate similar numbers by doing a single thread test at Speedtest.net.

IPv4, consistently around half the speed on a single thread.
https://www.thinkbroadband.com/speedtest/17133601033...

IPv6, which typically used to give lower figures due overheads is now better than IPv4, so something is a bit odd somewhere just recently
https://www.thinkbroadband.com/speedtest/17133599514...

The test LNS did a lot better on the single thread tests, 700+ from memory, but I can't disconnect and try it again at the minute.

It doesn't really matter now as I'm migrating away, the issues have gone on too long and the premium I'm paying isn't justified in the service received anymore. It is clear we are beta testing their new kit and paying them a premium to do it, I've stuck around and given them the benefit of the doubt but 5 or 6 months now and still not fixed.

Okay we are running back on stable firmware, but it still means another round of upgrades to come and potentially the same problem then rolling back again in the near future, then it all starts again a few weeks later. I think if they can't fix it after all this time, even when they have a stable firmware they should be able to compare and see what code changes have taken place, suggests this problem may not be fixable anytime soon and is quite a low level hardware issue.

AAISP BQM - IPv6 BQM - IPv4

Chrysalis
(legend) Wed 17-Apr-24 15:15:00

Re: Poor uptime and reliability

[re: E300] [link to this post]

Thank you, this combined that you see it performing better on the test LNS does indicate something not been right.

I am letting my order progress, but I wont lie my eyes have already wondered. Hence the post I made a few posts back.

I agree also on the hardware. Over the years I have done my own far share of diagnosing, kernel panics and the like, and it is good AAISP have been open, but when I read about CPU deadlocks, and it happening on all of the units, I cant look past either some kind of hardware issue (VRM or whatever) or a bios programming issue, the fact the factory firmware seems to be stable, but at the cost of not fully utilising the hardware gives hope a bios or p-state/c-state type adjustment could stabilise the entire kit and if I was AAISP that is what I would be pursuing, I have managed to make unstable kit stable via those kind of tweaks. But of course thats down to AAISP to figure out, I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned, the path they ended up taking only served to hurt both brands I think. It is good at least they have now pulled back to the factory firmware though and left the test device as an optional one for customers to utilise. They at least got to that point in the end.

Curious, if you dont mind saying, who are you migrating to?

VM Gig1 - AAISP L2TP

Edited by Chrysalis (Wed 17-Apr-24 15:19:05)

E300
(committed) Wed 17-Apr-24 15:51:34

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

In reply to a post by Chrysalis:
I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned

Exactly, I've not heard of a plan B except they haven't got one, they are just blinkered into Firebrick and that's it, and I just got to the point of thinking it is too much money a month to be a beta tester. I agree they should have offered some rebate or reduced monthly premium whilst these issues continued to those affected.

After some research and some info from another user here I'm moving to Unchained. So far a couple of questions I've had have been answered in mere seconds it seemed, and its great to support a smaller business and get that more personal touch. It's cheaper than A&A as well, mind you most ISPs are!

A&A say they are transparent and happy to engage, but I think the last update on the problems only came about because of more noise in this thread, and despite saying they are happy to answer questions, questions asked of them here hasn't had any response I've noticed! They talk the talk...

I'm sure A&A will get it sorted out it in the end one way or another, and maybe I will return as a customer in future who knows. It was great for many months,couldn't fault it, but I can't justify to myself the premium for their product whilst these issues are ongoing with no end in sight.

AAISP BQM - IPv6 BQM - IPv4

Edited by E300 (Wed 17-Apr-24 16:02:05)

Chrysalis
(legend) Wed 17-Apr-24 16:02:31

Re: Poor uptime and reliability

[re: E300] [link to this post]

Thank you, will see how my experience goes when I rejoin, but I am preparing for the worst if it happens, and I appreciate you sharing.

VM Gig1 - AAISP L2TP

perlen
(newbie) Wed 17-Apr-24 17:56:51

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Good point E300, aabloor the General Manager of A&A registered here on Fri 12-Apr-24 10:24:36 to post his message:
https://forums.thinkbroadband.com/aaisp/t/4755788-re...

he finishes that message with:
"Thanks for taking the time to read this, and we are happy to answer any questions, of course."

But no response has been made to ANY of the questions that have followed... and he still has:
Total Posts 1

XGS_Is_On
(committed) Wed 17-Apr-24 20:29:38

Re: Poor uptime and reliability

[re: jcre] [link to this post]

In reply to a post by jcre:
Whilst there's technical reasons to keep PPP in the BTW and TTB networks that would be harder if it was a pure layer 2 delivery (at least in my opinion), they both do offer VLAN based handover on their EoFTTx products - which attract a fair premium, and they probably want to keep EoFTTx as their premium offering to maintain that extra profit margin, which would incentivise keeping PPP in place for their less premium offerings.

FTTx is already handed over on the L2S in the exchange as a VLAN anyway.

I don't understand: why would it make a difference how premium something is whether it's delivered via PPP or Ethernet? Surely it's the SLA and QoS behind the service that makes it premium, not whether it's delivered as Ethernet or PPP?

Ethernet over FTTC is FTTC with a service level agreement - https://daisycomms.co.uk/resource/what-is-ethernet-o... - but is presented to the customer as Ethernet. It could happily be a PPP service with the NTE stripping the PPP to give an Ethernet presentation. The presence of PPP costs a little throughput but beyond that it makes no difference to the service's reliability or performance.

On it being harder not really, just different. CityFibre provide Ethernet handoff to everyone that wants it with PPP as an option. Broadband Network Gateways built to handle Ethernet presentation and handoff may eliminate the need for PPP, allowing providers to use EVPN technologies. Still encapsulating but a whole lot more you can do with the traffic than PPP over L2TP tunnels.

Edited by XGS_Is_On (Wed 17-Apr-24 20:34:22)

XGS_Is_On
(committed) Wed 17-Apr-24 20:41:58

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
In reply to a post by Chrysalis:
I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned

Exactly, I've not heard of a plan B except they haven't got one, they are just blinkered into Firebrick and that's it, and I just got to the point of thinking it is too much money a month to be a beta tester. I agree they should have offered some rebate or reduced monthly premium whilst these issues continued to those affected.

Mentioned this in my earlier post to Alex: the ISP arm can't drop the Firebrick hardware as they're the largest customer by far and if the ISP that's part of the same group as the hardware company are using someone else it's not a great sales story.

With that in mind no surprise at all that AAISP have no intention of using anything other than Firebrick. The hardware was built with their requirements in mind, customised to them and relies on them for sales. These folks work out of the same offices and are mates and colleagues so why would A&A do anything other than keep the faith unless the issue were catastrophic?

I'm not being mean with that, I'm just, well, surprised at Chrys being surprised. I would've thought this would be obvious.

Sun4Lw5LIQy
(newbie) Wed 17-Apr-24 21:10:53

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

The speeds I got previously was 800MB on the 5Ghz signal with the very same make and model router and the same ONT. the only thing that changed was the backhaul and the ISP.

I’ll give the other instructions a go but what I will say is my setup hasn’t changed much apart from a new router supplied with A&A and we have channel hopped multiple times to no avail. I’ve been told I’m lucky I’m on 300MB and that’s it. For me it’s not enough for the premium I’m paying.

Pipexer
(eat-sleep-adslguide) Wed 17-Apr-24 21:34:42

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
Good point E300, aabloor the General Manager of A&A registered here on Fri 12-Apr-24 10:24:36 to post his message:
https://forums.thinkbroadband.com/aaisp/t/4755788-re...

he finishes that message with:
"Thanks for taking the time to read this, and we are happy to answer any questions, of course."

But no response has been made to ANY of the questions that have followed... and he still has:
Total Posts 1

Indeed - no point saying something like that and then not responding to any questions posed.

Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?

Chrysalis
(legend) Thu 18-Apr-24 12:38:20

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

Well I didnt say that, was surprised it wasnt considered for temporary use. Maybe a naive thought.

Edited by Chrysalis (Thu 18-Apr-24 13:43:51)

XGS_Is_On
(committed) Thu 18-Apr-24 14:12:16

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

In reply to a post by Chrysalis:
Well I didnt say that, was surprised it wasnt considered for temporary use. Maybe a naive thought.

Understood. Think even temporary use would not be possible: no CQM, etc and they are huge fans of CQM. A&A continue to install these devices into datacentres so seems like they're committed.

njh
(learned) Fri 19-Apr-24 11:32:16

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

In reply to a post by aabloor:
Within this same time frame, we have had multiple instances of BT Wholesale doing planned work which they had not told us about in advance (and apparently not told other ISPs, too). We could have zeroed the impact of their planned work, had they told us they were doing it beforehand.

Multiple times we have raised this with our account manager and at higher levels, and we still have not had a satisfactory response. Of course, no wholesale network is 100% reliable; we are not unreasonable about this, but the combined appearance, especially to customers not following matters closely, is that it's "another LNS blip".

Hi Bloor!

Is there any way you could enhance your Continuous Quality Monitoring to provide a better indication of where in the network a fault is occurring. I believe the major areas are:
- a problem with the the DSL line / Modem
- a problem with the BT/TT wholesale network from the DSLAM to the LNS
- a problem with the AAISP LNS
- [a problem getting from the AAISP network to the Internet]

At the moment it is very hard to tell when there is a problem, where the problem is.

Is it possible for you to monitor the DSLAM, to compare with the DSL line? Or are you prevented from that kind of access?

Thanks,

njh.

Sun4Lw5LIQy
(newbie) Tue 23-Apr-24 14:03:02

Re: Poor uptime and reliability

[re: njh] [link to this post]

More speed drops. Speed went to 100MB, after reboot it’s now sitting at 200MB. Paying for 800MB. Again before anyone jumps to conclusions I used the very same router and ONT with my previous provider and the speeds was around 800MB. It was business broadband wholesale talktalk previously. Now the back haul is over bt open reach and not using the LLU kit at the exchange.

Called up A&A. Put through to the sales line. Been told it will cost me £160 to cancel the service. Really don’t like being a beta tester for a product. Silence is golden from A&A on the forum says everything I need to know as a customer. I use the line as a fully remote worker so I depend on stable speedy internet and I’m very disappointed in the service.

Edited by Sun4Lw5LIQy (Tue 23-Apr-24 14:05:30)

andrewhearn
(isp) Tue 23-Apr-24 15:30:20

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

In reply to a post by Sun4Lw5LIQy:
More speed drops. Speed went to 100MB, after reboot it’s now sitting at 200MB. Paying for 800MB. Again before anyone jumps to conclusions I used the very same router and ONT with my previous provider and the speeds was around 800MB. It was business broadband wholesale talktalk previously. Now the back haul is over bt open reach and not using the LLU kit at the exchange.

Called up A&A. Put through to the sales line. Been told it will cost me £160 to cancel the service. Really don’t like being a beta tester for a product. Silence is golden from A&A on the forum says everything I need to know as a customer. I use the line as a fully remote worker so I depend on stable speedy internet and I’m very disappointed in the service.

Hi, I'm not aware of your problems personally, but Shaun here has ben looking and is getting in touch with you.

Andrew Hearn
GM, A&A
aa.net.uk [email protected] 033 33 400 999

The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).

aabloor
(newbie) Tue 23-Apr-24 15:34:03

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Hi Peter,

There have been a number of announcements of LNS firmware related work, not always with a vast amount of notice, I admit. But I think where we've been rolling back the firmware we've notified of this; at least that is the intention. Where we've been unable to give a lot of notice, the decision to do this has been made on the basis that a little notice but with sooner stability is still better than a lot of notice, but a delay to stability. I think that makes sense! Hopefully?

Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 15:34:37

Re: Poor uptime and reliability

[re: PCJM40] [link to this post]

Many thanks indeed for the support!

Alex

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 15:52:14

Re: Poor uptime and reliability

[re: E300] [link to this post]

That's a good question.

I think the basic answer is that there does not seem to be an "LNS that seems to have had the most lockups".

These numbers are made up/approximate but I'm using them to hopefully try and illustrate the situation :

Let's say we have 10 live LNS running. And about every 30 days one hangs. This means we can say that means there is a hang approximately every 300 "LNS days". "LNS days" being a measure a bit like "man hours". The blips are still a fairly rare event (though obviously not rare enough). This means waiting a quite considerable amount of time to know for sure if a fix has worked. The way to progress fixing it at maximum speed is to deploy it as widely as possible, to try and ramp up the speed of acquisition of "LNS days", in the fewest number of "day days".

This does go back to my point about being a hardware developer as well an ISP, and how sometimes occasionally these two differing objectives can collide head on.

Your point is well received though that every kind of outage is bad, and that of course we do have to shuffle customers around (which can be seen as a "drop" to do the upgrades themselves.

Just to clarify though: when we were testing new software, obviously it ran on our test rig at our offices first, then loaded one at a time onto live LNS. The plan was to update them all one by one over a week or two. A hang occurred after doing I think only two live LNS, so the plan was halted, and then the rollback decision was made.

Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:03:59

Re: Poor uptime and reliability

[re: Pipexer] [link to this post]

Hi there,

Just replying this point:

"Why there has been no mention or consideration of getting specialist 3rd parties in to put a 2nd pair of eyes on the situation?"

There are already several people working on this internally. Without exaggeration there really probably aren't too many people out there who can be asked. SOOC manufacturers tend to assume you will be running something like Linux on their chips. Certainly out approach of an entire ground up OS, hardware drivers, networking etc is rare to the extend that some initially have trouble believing it to be true. But it really is.

That being said, yes, contact has been made with both TI and ARM on this, suggestions made, ideas explored.

It wasn't mentioned because this is all a pretty standard part of hardware development work, I think, in situations such as this one.

Cheers-

Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:05:02

Re: Poor uptime and reliability

[re: xela] [link to this post]

There really honestly are good reasons.

And yes you are right! We have somewhat been here before.

-A

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:07:14

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

Inevitably as time has gone by alternative solutions have come up.

And yes, not all end users can read and understand CQM graphs.

But as a business, I think we would be far far less capable of offering the services and support that we do without the FireBrick.

Not to take away from anything you say, though.

Cheers,
Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:20:41

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

Replying hopefully in order:

"Is there much of a market for bonding FTTP?"

Not zero but not a huge amount. As you correctly identify, some of that small amount of demand is from those who wish for higher upload speeds, rather than downstream speeds.

"Utilisation can be measured at the BNG, latency and loss rely on preferential treatment of LCP and without it may be measured out of band with similar accuracy so what customer benefit do you see going forward?"

You are right, CQM was an absolute assassin for problems on copper based RF services. And as fibre services are far more reliable, and not susceptible to things like water, REIN etc. But undoubtedly it has allowed us to diagnose a surprising range of issues with fibre circuits, backhaul network stuff can still be sniffed out, and even things like problematic GPONs etc. And of course So it definitely still has a use.

"Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?"

At present we feel that it's still the best option to run all "broadband" circuits using PPP. Ethernet is a different matter, although I think we may have some doing PPP over ethernet for specialist reasons.

"Lower power consumption is super important but per subscriber how are the numbers?"

Unquestionably the FB9000 is lower consumption per Mbit/sec or per customer or pretty much however you want to measure it than FB6000. So it is "greener". Really and truly compared to any PC based or (ex smile

Cisco solution it is minuscule. Absolutely tiny. At present we have no plans to launch higher than 1Gbit services via BTW as (I think) they are currently not offering it. We may offer slightly higher services via CityFibre, and blended in with other customers; we don't envisage it being too much of a problem in reality.

"The FB9000 is brand new, can't be purchased yet, what kind of lifespan do you folks see for it?"

The FB6000 lasted something like fifteen years, in active live use. I do not see any reason why the FB9000 wouldn't have a useful life of perhaps close to ten.

Thanks for your comments. Hopefully I've managed to answer vaguely OK most of them! Yours has been one of the most interesting to write a reply to so far!

Cheers-
Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:27:01

Re: Poor uptime and reliability

[re: perlen] [link to this post]

This has been addressed now.

I wanted to wait until there were a few questions, before answering them all in one go; particularly as some questions do repeat, I figured this was the best way.

Sorry if it looks like there's been a gap in contact.

Alex.

---
Bloor
GM, A&A.

gorebrush
(regular) Tue 23-Apr-24 16:29:00

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

I've been on AAISP for the last week or so now and I haven't experienced any speed drops.

I can consistently hit a 940 speedtest wired or wireless (6E, iPhone 15 Pro Max, admittedly sat close to the AP in the living room)

aabloor
(newbie) Tue 23-Apr-24 16:33:11

Re: Poor uptime and reliability

[re: njh] [link to this post]

Hi njh,

Thanks for the question.

CQM is rather a blunt animal. It is an echo and response at the PPP layer, which we then graph the timing of. Not much more or less than that. So with that methodology, it's rather hard to get inbuilt diagnostics of the sort you are suggesting.

We have built up a lot of knowhow over the years of how to interpret and read the graphs though as is described in the following article : https://support.aa.net.uk/CQM_Graphs

Over the years these have been able to narrow down a surprising variety of faults.

It would be theoretically possible for us to have realtime stats from things like intermediate hardware, if our carriers allowed it. But it wouldn't be "part of CQM". It could potentially be correlated against CQM data, though. Honestly I doubt this is likely to happen though, as carriers are famously defensive about giving anyone detailed stats about their networks and equipment (and understandably so).

If you've not seen the wiki page above before, it might give you a new appreciation of what sorts of stuff we can infer by looking at graphs at least!

Cheers,
Alex.

---
Bloor
GM, A&A.

aabloor
(newbie) Tue 23-Apr-24 16:35:54

Re: Poor uptime and reliability

[re: andrewhearn] [link to this post]

Adding to Andrew's point... just for clarity (and others have already said similar to this) ...

I am 100% certain this specific customer's problem is nothing whatsoever to do with the LNS troubles that are the subject of this thread. At no point has there been any impact on throughput. This has been an issue with stability, not throughput in live use.

We will obviously work with the customer to resolve it, just as we would with any other.

Alex.

---
Bloor
GM, A&A.

Edited by aabloor (Tue 23-Apr-24 16:37:38)

aabloor
(newbie) Tue 23-Apr-24 16:44:26

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

Tangential thought :

One possible use of AI would be to make a stab at interpretation of graph results.

We probably won't do it, but it feels like it would be a perfect application.

Alex.

---
Bloor
GM, A&A.

XGS_Is_On
(committed) Wed 24-Apr-24 00:05:02

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

In reply to a post by aabloor:
Tangential thought :

One possible use of AI would be to make a stab at interpretation of graph results.

We probably won't do it, but it feels like it would be a perfect application.

Alex.

It's what day job and others do: use AI to interpret telemetry and both assist with reporting and take action where appropriate.

DFScale
(regular) Wed 24-Apr-24 09:55:55

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

In reply to a post by aabloor:
SOOC manufacturers tend to assume you will be running something like Linux on their chips. Certainly out approach of an entire ground up OS, hardware drivers, networking etc is rare to the extend that some initially have trouble believing it to be true. But it really is.

Hmm.As a business model that sounds over ambitious for a relatively small company. Is your core expertise developing embedded systems or is it running an ISP network? If you are not 2 separate divisions with an arms length commercial relationship, such that the ISP is free to source its embedded systems elsewhere, I would not look at you as an ISP.

As for developing without Linux, you are again requiring 2 different core expertises, OS development and embedded network system development. There is a dichotomy on something like this between starting with nothing and building upwards or starting with Linux and cutting it down. The first course seems more suited to a large player, where the OS and the application development operate at arms length and you are forced to maintain abstraction between layers rather than taking [road to hell] advantage of the potential to abrogate abstraction when it gets in the way of an easy solution. As an outsider, I don't have the insight to what is going on with A&A, but the troubles reported here look like what I expect some time after strict layering of software has been abrogated.

Of course I could be wrong about the situation ...

aabloor
(newbie) Wed 24-Apr-24 11:28:42

Re: Poor uptime and reliability

[re: DFScale] [link to this post]

It is an ambitious business model.

But this is (quite extremely) not our first product.

In the last 20 years, we have done hardware and firmware on the FireBrick Plus and Soho, FireBrick 105, FireBrick 2500, FireBrick 2700, FireBrick 6000, FireBrick 2900, and now FireBrick 9000. We have many thousands of units out there in the wild, not just with A&A customers. We did change architecture after the FB105.

In summary, we do have considerable experience in this field.

We are a company of between 20 and 25 people.

Some of them work on the ISP business (only).

Some of them have a split role.

And some are essentially FireBrick only.

We have people whose only job is FireBrick software development and nothing else. This is not a case of some people who know devops "having a crack" at low level firmware development, by a very very long chalk.

The two business activities are somewhat symbiotic. It would not really make any difference, other than adding admin overhead to try and split them up. A&A The ISP absolutely is free to buy hardware from wherever. Splitting them up really wouldn't change that.

I feel your reply may possibly (entirely reasonably) assume we have far less experience and track record than we do. We have been doing this for years. It is hard but historically we have punched considerably above our small weight.

We have a website that has some in-depth blog posts which should also highlight broad aspects of our approach www.firebrick.co.uk if not perhaps in this subject area. We cover mistakes we've made in the past, and how we've resolved them. We fully intend to document this current situation, once we are comfortable it has been resolved.

Example of such a blog : https://www.firebrick.co.uk/about/news/fb2700-psu-tr...

And a bit of general history : https://www.firebrick.co.uk/about/history/

Alex.

---
Bloor
GM, A&A.

Edited by aabloor (Wed 24-Apr-24 11:39:57)

DFScale
(regular) Wed 24-Apr-24 12:50:12

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

In reply to a post by aabloor:
It is an ambitious business model ...

I am not doubting A&A's experience - it is needed to get away with what I consider actually to be a low level of issues for the ambition you have. Obviously you are a long way with this, so I wouldn't be suggesting a switch to Linux, but I if I am questioning anything about your development it is the decision not to start with a Linux core and strip that down.

And I don't think that it is people who know devops "having a crack" at low level firmware development. One of the hardest things with some development is holding it all in one head, which can lead to breaches of abstraction - this is where abstraction carried out between layers owned by different people leads to a conversation where neither side knows too much about the other and a conversation between 2 people is far more effective in splitting a problem than trying to resolve it all in one mind. Yes, your size makes you nimble, but it can also place too much of a load on someone who understands both sides of a problem.

PCJM40
(committed) Wed 24-Apr-24 15:47:02

Re: Poor uptime and reliability

[re: aabloor] [link to this post]

I think its worth re-evaluating your business model from time to time as what may have worked successfully for 20 years may not work for the next 20 years.

perlen
(newbie) Fri 03-May-24 20:48:56

Re: Poor uptime and reliability

[re: PCJM40] [link to this post]

No more LNS drops for me since 7th April 2024 - now that the FB9000 firmware has been reverted.

I am so glad to no longer be a beta tester, and once again have the reliable connection that I pay for!

- Perlen

Chrysalis
(legend) Sat 04-May-24 01:40:43

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I have had no issues since FTTP circuit activated. (week 3 April)

Edited by Chrysalis (Sat 04-May-24 01:48:17)

jimbof
(committed) Sun 05-May-24 17:37:24

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
I am so glad to no longer be a beta tester, and once again have the reliable connection that I pay for!

Maybe they could consider some kind of discount to encourage use of beta LNS instead of foisting the updates on the unsuspecting.

Rhynchelma
(member) Sun 05-May-24 19:58:10

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

They publish how to use the "test" device. No discount mentioned.

jimbof
(committed) Sun 05-May-24 20:10:39

Re: Poor uptime and reliability

[re: Rhynchelma] [link to this post]

I am aware, but it seems they've said part of their issue may have been not having enough folk on the test LNS before it went to live; there isn't any incentive at present to be on the test server (particularly given the premium pricing). Some kind of discount might encourage more to sign up for the test LNS.

Chrysalis
(legend) Sun 05-May-24 22:41:44

Re: Poor uptime and reliability

[re: jimbof] [link to this post]

In reply to a post by jimbof:
I am aware, but it seems they've said part of their issue may have been not having enough folk on the test LNS before it went to live; there isn't any incentive at present to be on the test server (particularly given the premium pricing). Some kind of discount might encourage more to sign up for the test LNS.

Something sensible might be divide say 20-30% of the monthly sub by 30 and then add a automatic compensation scheme, that will credit the account by that amount for every 24h when on a test LNS, this would also apply during periods when a open incident within their control is affecting service. If AAISP ever decide they dont need people testing, or they decide the testing after all isnt as valuable as they say it is then they could avoid paying this compensation by capping max connections on the test LNS or take it down.

gorebrush
(regular) Thu 23-May-24 20:16:09

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Week 2 April here and it's been 100% other than when I've been faffing internally.

Chrysalis
(legend) Fri 24-May-24 16:34:42

Re: Poor uptime and reliability

[re: gorebrush] [link to this post]

In reply to a post by gorebrush:
Week 2 April here and it's been 100% other than when I've been faffing internally.

Yeah my CQM I expect doesnt look great since circuit installed, but lots of work on my equipment, and some outages from national grid work also (I have moved ONT to UPS power now though since earlier this week). Not noticed any outages ISP side yet.

Edited by Chrysalis (Fri 24-May-24 16:35:30)

candlerb
(knowledge is power) Fri 24-May-24 17:12:50

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Update on the status page:

Our LNSs remain stable running the 'Factory' software. Work continues in our test lab and on non-customer affecting parts of our network to track down the problem with the alpha software.

For the past month or so our service has no incident of LNS hangs, most of our FB9000 LNSs have an uptime approaching 100 days. We are confident that the Factory software is stable.

Interesting that it was "alpha" software that was causing the problems. But good that the released software is stable.

Pipexer
(eat-sleep-adslguide) Fri 24-May-24 21:43:25

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
Update on the status page:

Our LNSs remain stable running the 'Factory' software. Work continues in our test lab and on non-customer affecting parts of our network to track down the problem with the alpha software.

For the past month or so our service has no incident of LNS hangs, most of our FB9000 LNSs have an uptime approaching 100 days. We are confident that the Factory software is stable.

Interesting that it was "alpha" software that was causing the problems. But good that the released software is stable.

Seeing as they develop the software themselves I wouldn't read too much into their nomenclature of calling it "alpha". It wouldn't have been in their interests to knowingly put buggy software into production. It probably was thought of as stable and about to make it to that as such, but then the problems happened so it never quite ever got out of the stage of officially being "alpha".

Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?

candlerb
(knowledge is power) Sat 25-May-24 09:00:32

Re: Poor uptime and reliability

[re: Pipexer] [link to this post]

I do wonder though, what shiny new features were so important in this new software that (a) they put it straight into production, and (b) they persisted with trying to fix it *in production* for months, rather than rolling back to the stable software straight away.

The positive from this is that the existence of software which is stable under the same load strongly suggests that it's *not* flaky hardware after all. Not 100%, but very likely.

E300
(committed) Sat 25-May-24 09:44:10

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

In reply to a post by candlerb:
I do wonder though, what shiny new features were so important in this new software that (a) they put it straight into production, and (b) they persisted with trying to fix it *in production* for months, rather than rolling back to the stable software straight away.

The positive from this is that the existence of software which is stable under the same load strongly suggests that it's *not* flaky hardware after all. Not 100%, but very likely.

I'm assuming they've gone back to the original Firebrick OS that only runs on 2 cores and the OS they need to get working is the complete rewrite that allows the new Firebrick to run using all the cores. This would explain why they can't just compare the code differences between the working one and the one that crashes, as they can't be compared. This might also explain why they needed to throw up more of the new Firebricks as they would be under-performing.

All conjecture on my part of course, but if you read this blog https://www.firebrick.co.uk/about/news/version-20/ the dates of the new version 2.0 OS going live coincided with the all the problems starting, and prior to the OS 2.0, the new Firebricks were stable when running on the original older software utilising only 2 cores.

Edited by E300 (Sat 25-May-24 09:50:00)

jaydub
(fountain of knowledge) Fri 31-May-24 10:26:21

Re: Poor uptime and reliability

[re: candlerb] [link to this post]

We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?

bellerby
(newbie) Fri 31-May-24 11:03:03

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

I also had an outage last night between 23:41 and 23:49, during which time my router actually received a WAN address in the 10.93 range - and I'm now with Zen via BTW! I put it down to a BTW issue.

candlerb
(knowledge is power) Fri 31-May-24 11:52:44

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
I'm assuming they've gone back to the original Firebrick OS that only runs on 2 cores and the OS they need to get working is the complete rewrite that allows the new Firebrick to run using all the cores.

Ha - "complete rewrite". Microtik did that for RouterOS v6 -> v7, and some three years later the "stable" train still is in complete flux and there's no "long-term" version at all.

Chrysalis
(legend) Fri 31-May-24 23:21:48

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

Yeah it looks like it wasnt AAISP wide, no outage here (via cityfibre).

Sun4Lw5LIQy
(newbie) Mon 03-Jun-24 10:10:55

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

In reply to a post by jaydub:
We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?

On the BTW network also observed the same outage for 11 minutes. Internet came back online shortly after. No network status page for this fault did check Zen Internet’s pages as they sometimes include downtime that BTW has planned on exchanges that’s publically viewable.

DFScale
(member) Mon 03-Jun-24 10:52:07

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

In reply to a post by jaydub:
We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?

Interesting I had an outage May/30/2024 23:41:46 for 3 seconds. Aquiss, Openreach fibre, City Fibre backhaul.

The fact that this has occurred across such a wide range of ISPs tends to suggest there might be a single point of failure out there for quite substantial disruption.

gorebrush
(regular) Wed 05-Jun-24 09:21:22

Re: Poor uptime and reliability

[re: bellerby] [link to this post]

10.93.x.x?

I smell CGNAT smile

gorebrush
(regular) Wed 05-Jun-24 09:23:05

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

I have 100% uptime since 25 April, and even then that was because I was doing something internally.

Chrysalis
(legend) Sun 25-Aug-24 15:16:54

Re: Poor uptime and reliability

[re: gorebrush] [link to this post]

Currently at 75 days, since the day I plugged ONT into my UPS.

perlen
(newbie) Thu 24-Oct-24 08:26:38

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

It seems like the dramas just go on and on:
https://aastatus.net/recent.cgi

Tuesday night: disconnections of broadband lines, 3x LNSs lost power.
Wednesday: shuffling people back around to the right LNS.
Wednesday afternoon: routing and traffic loss problems for two hours.
This morning, Thursday: massive packetloss and disruption for 45 minutes.

Edited by perlen (Thu 24-Oct-24 08:26:55)

behuk
(regular) Thu 24-Oct-24 08:38:01

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
Tuesday night: disconnections of broadband lines, 3x LNSs lost power.
Wednesday: shuffling people back around to the right LNS.
Wednesday afternoon: routing and traffic loss problems for two hours.
This morning, Thursday: massive packetloss and disruption for 45 minutes.

I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.

E300
(committed) Thu 24-Oct-24 09:01:29

Re: Poor uptime and reliability

[re: behuk] [link to this post]

In reply to a post by behuk:
I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.

Well given how transparent they say they are, they are now managing the bad PR by hiding the blip chart on their status page and not publicly saying what the issues are, but you can email in, perhaps sign an NDA before they tell you smile

We do apologise to customers affected by the problems this afternoon. Please feel free to email [email protected] for a 'Reason for Outage'

Blip chart is accessible still here: https://control.aa.net.uk/blip.cgi

Perhaps this is a new issue with their Firebrick 9000? Did they ever fix the random crashing issue or are they still using the last firmware that was stable?

Haven't they also replaced all their older but stable LNSs with the 9000s now?

Thankfully I'm an ex customer having already been put off by the last lot of troubles, currently on a 173 days of uptime without a single issue, that is from the day of switchover from AA to Unchained.

Edited by E300 (Thu 24-Oct-24 09:10:39)

behuk
(regular) Thu 24-Oct-24 09:42:08

Re: Poor uptime and reliability

[re: E300] [link to this post]

In reply to a post by E300:
In reply to a post by behuk:
I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.

Well given how transparent they say they are, they are now managing the bad PR by hiding the blip chart on their status page and not publicly saying what the issues are, but you can email in, perhaps sign an NDA before they tell you

We do apologise to customers affected by the problems this afternoon. Please feel free to email [email protected] for a 'Reason for Outage'

Blip chart is accessible still here: https://control.aa.net.uk/blip.cgi

Perhaps this is a new issue with their Firebrick 9000? Did they ever fix the random crashing issue or are they still using the last firmware that was stable?

Haven't they also replaced all their older but stable LNSs with the 9000s now?

Thankfully I'm an ex customer having already been put off by the last lot of troubles, currently on a 173 days of uptime without a single issue, that is from the day of switchover from AA to Unchained.

I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though? wink

E300
(committed) Thu 24-Oct-24 10:23:52

Re: Poor uptime and reliability

[re: behuk] [link to this post]

In reply to a post by behuk:
I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though?

I did say they had hidden it on their service status page, the obvious place their customers would look if they are having issues, it was always my first stop when I had problems so I knew it wasn't just me and I didn't need to start messing with my own kit. Why remove it now? Why ask people to email in for an explanation of the outage? I can only assume they are managing the PR situation.

Their openness was one of their unique selling points, if that is vanishing along with reliability, then what do they offer over and above any other average ISP?

My current ISP never claimed to have a blip chart, no other ISPs do as far as I know, still I've had no outages with my current ISP that needed me to check any sort of blip chart.

clnsp2
(newbie) Thu 24-Oct-24 11:43:17

Re: Poor uptime and reliability

[re: E300] [link to this post]

I contacted them regarding it and received an open and detailed summary, as usual.

E300
(committed) Thu 24-Oct-24 12:53:24

Re: Poor uptime and reliability

[re: clnsp2] [link to this post]

In reply to a post by clnsp2:
I contacted them regarding it and received an open and detailed summary, as usual.

Congrats on your first post and welcome to the forum.

Chrysalis
(legend) Thu 24-Oct-24 13:16:32

Re: Poor uptime and reliability

[re: perlen] [link to this post]

The LNS power loss was a data centre power feed issue, and the rebalance after was a domino effect from that.

It has been very stable since I signed up, hopefully this packet loss issue gets resolved soon.

Most other ISPs would be telling users to reboot their equipment without even disclosing a problem.

Also is still a link to the blip page on the customer control panel page, its there for customers to help them see quickly if a problem is their own side or not.

Edited by Chrysalis (Thu 24-Oct-24 13:20:34)

E300
(committed) Thu 24-Oct-24 13:26:46

Re: Poor uptime and reliability

[re: behuk] [link to this post]

In reply to a post by behuk:
I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though?

They've hidden it more effectively now as its been taken down completely, perhaps it will reappear when things are working better. wink

Edit: looks like only available if you log into the control panel, so not for public consumption anymore.

Edited by E300 (Thu 24-Oct-24 13:29:07)

perlen
(newbie) Thu 24-Oct-24 17:50:56

Re: Poor uptime and reliability

[re: E300] [link to this post]

Yeah it seems to be hidden now as you say, and also no longer updating. Less than ideal.
Still no update posted regarding this mornings issues...

perlen
(newbie) Thu 24-Oct-24 21:48:02

Re: Poor uptime and reliability

[re: perlen] [link to this post]

My graph from this morning:

https://ibb.co/bvmrkck

Edited by perlen (Thu 24-Oct-24 21:48:36)

XGS_Is_On
(experienced) Fri 25-Oct-24 20:04:28

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
My graph from this morning:

https://ibb.co/bvmrkck

Yeah, that tracks.

Pipexer
(eat-sleep-adslguide) Sun 27-Oct-24 21:27:14

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I would suspect given the lack of detail regarding the outage on Thursday, as well as the removal of the blip graph, it might have something to do with an ongoing DDoS. This is just speculation though and I have no evidence for this.

The lack of updates on the FB9000 bug fix is dissapointing though as I thought the idea was we were going to be kept in the loop on that.

Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?

perlen
(learned) Sun 27-Oct-24 21:56:52

Re: Poor uptime and reliability

[re: Pipexer] [link to this post]

Maybe we need to take precedence from: "Please feel free to email [email protected] for a Reason for Outage", to now "Please feel free to email [email protected] for a Progress with FireBrick 9000 firmware".

smile

XGS_Is_On
(experienced) Mon 28-Oct-24 00:48:08

Re: Poor uptime and reliability

[re: Pipexer] [link to this post]

In reply to a post by Pipexer:
I would suspect given the lack of detail regarding the outage on Thursday, as well as the removal of the blip graph, it might have something to do with an ongoing DDoS. This is just speculation though and I have no evidence for this.

There's a person on the right track.

Chrysalis
(legend) Mon 28-Oct-24 14:42:25

Re: Poor uptime and reliability

[re: perlen] [link to this post]

From what I can see from the last update made, there is no new firmware, they rolled back to stable factory build, and are running on that for the foreseeable future. Its doing the job and they have deployed enough of them to handle the load in that configuration.

That's my interpretation of what they have disclosed on it.

perlen
(learned) Mon 18-Nov-24 14:51:23

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Another DDOS just now?

https://ibb.co/Dk5VbNf

Pheasant
(eat-sleep-adslguide) Mon 18-Nov-24 15:00:52

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
Another DDOS just now?

https://ibb.co/Dk5VbNf

I've seen exactly the same BQM today on two other very separate connections of mine, so I don't believe this is down to A&A...

Rhynchelma
(member) Mon 18-Nov-24 15:23:48

Re: Poor uptime and reliability

[re: perlen] [link to this post]

Got the same pattern on my A&A connection. Same time.

But, as Pheasant says…

Pheasant
(eat-sleep-adslguide) Mon 18-Nov-24 15:40:17

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
Another DDOS just now?

https://ibb.co/Dk5VbNf

Asking for a friend...

BuckleZ
(knowledge is power) Mon 18-Nov-24 17:13:05

Re: Poor uptime and reliability

[re: perlen] [link to this post]

I'm on BT in the north of Ireland and got the same spike...

https://www.thinkbroadband.com/broadband/monitoring/...

BT Full Fibre 900 via ASUS RT-AX88U (Asuswrt Merlin)
Speedtest.net
IPv4 BQM

Chrysalis
(legend) Tue 19-Nov-24 11:06:55

Re: Poor uptime and reliability

[re: Pheasant] [link to this post]

I didnt notice without seeing this thread, but yeah I have it.

Given its also on Zen and IDNet, it looks like it is not specific to AAISP as you said.

perlen
(learned) Tue 19-Nov-24 12:07:25

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

Oh no! More FB firmware updates are on the way:

https://aastatus.net/42728

The work outlined below will start from Saturday 23rd November.

Background:
Our FireBrick team has been working on the 'hang' problem that we faced with the LNSs earlier in the year.

The nature of the problem has made investigating the problem very time consuming as it is extremely difficult to reproduce. However, we do believe that a plausible cause has been identified, and code changes have been made to mitigate the problem.

We have been testing this new code, both in our test lab and on a few select A&A routers, for over two months. During this time the new code has not caused the hardware to hang, where older versions of the code did.

Our next step is to run the new code on our LNSs, the ones our customers connect to for their broadband connections.

We plan to do this slowly, out of hours and in a couple of phases.

We believe the cause of the hang is related to how memory is initially allocated for the tasks the FireBrick will be performing, this means that if the hardware is going to hang then this will most likely happen over the first couple of days (or first couple of hours).

Stage one:

We plan to upgrade only one of our LNSs at first. We will move broadband connections on to it in the early hours of the morning and then move them back off a few hours later. This means that during the day, customers will be on the normal set of LNSs.

Then, each night, over the course of two weeks, the LNS will be power cycled and we will move an increasing number of connections over, until it is at the point of taking twice the amount of connections that we'd normally run on an LNS. (We normally run LNSs at around 40% capacity, so twice the number of connections is not a problem.)

Stage two:

Once we have confirmed that the hang is not happening, the second phase would be to run customer connections on the upgraded for a few days at a time.

We will go through a cycle of: move connections off, reboot the LNS, move connections on, wait a few days. Repeat. We will do this with an increasing number of connections until it's at the point of taking a normal amount of connections.

More information and to opt out:

So as to minimise impact to customers, the work of moving connections off and on will happen overnight between 1AM and 5AM.

As mentioned, this phase of upgrading involved only one LNS being upgraded. This will be the one named 'i.gormless'. The connections that will be moved on to 'i.gormless' will be those currently on the LNS named 'h.gormless'. if you are currently on 'h.gormless' (as seen on the top/left) of your line quality graph and want to opt out, then please email support.

Once this phase has been completed, we will review and plan the next stages.

Michael_Chare
(knowledge is power) Tue 19-Nov-24 12:26:21

Re: Poor uptime and reliability

[re: Pheasant] [link to this post]

+1
Gigaclear in Kent which is connected to Vodafone and Plusnet VDSL in Anglesey.

Michael Chare

candlerb
(knowledge is power) Tue 19-Nov-24 14:26:43

Re: Poor uptime and reliability

[re: perlen] [link to this post]

In reply to a post by perlen:
However, we do believe that a plausible cause has been identified

Sure I've heard that before somewhere smile

jaydub
(fountain of knowledge) Tue 19-Nov-24 21:00:20

Re: Poor uptime and reliability

[re: perlen] [link to this post]

We had several blips in the early hours of Saturday:

My Broadband Ping.

Since Monday our TBB monitor has looked like this one from today (and we were out of the house from approx 7am to 7pm today):

My Broadband Ping

Chrysalis
(legend) Wed 20-Nov-24 09:13:25

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

Did you ask them about Saturday, all I can say my Saturday graph is clean, the 2nd graph looks like something automated running like a speedtest or something.

Sun4Lw5LIQy
(newbie) Sun 26-Jan-25 11:55:46

Re: Poor uptime and reliability

[re: Chrysalis] [link to this post]

I just wanted to write in and say that uptime has improved considerably. AAISP did a rollback on their LNS firmware, they figured out what was causing the crashes and they rolled out new firmware in the end that’s stable. I’ve had a solid connection for the most of December/January and I’ve remained a customer of AA. Just wanted to chime in and say things have improved.

perlen
(learned) Sat 22-Feb-25 22:03:58

Re: Poor uptime and reliability

[re: Sun4Lw5LIQy] [link to this post]

I have now left Andrews and Arnold, and I am glad that I have done so.
My AAISP FTTP connection was the worst experience I have ever had with home broadband - the reliability/uptime was worse than even my old ADSL connections!
I am now with EE have have 5ms pings to major sites as opposed to 9ms pings with AAISP, and 1600/120 for £15 less than AAISP with 900/110.
I have had zero outages with EE for the last 3 months.
AAISP treating its broadband customers as beta testers for firebrick firmware is unforgivable.

Rhynchelma
(member) Mon 24-Feb-25 00:43:19

Re: Poor uptime and reliability

[re: perlen] [link to this post]

That's the wonder of the Internet, your experience does not echo others. Sure they had a blip but it was dealt with transparently and has been sorted.

Meanwhile their tech support is excellent,

Still worth the premium price to me and others.

jaydub
(fountain of knowledge) Mon 24-Feb-25 11:17:08

Re: Poor uptime and reliability

[re: Rhynchelma] [link to this post]

Perlen according to his post has been with EE for three months, so it is a bit of a dated comment really.

I too have had no issues for the last three months and TBH my only temptation to move away from A&A is the Aquiss half price for 6 months offer.

I agree with you they had a blip which took some time to resolve but that is history now

XGS_Is_On
(experienced) Mon 24-Feb-25 11:44:38

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

Folks pay their money and make their choice. FTTP has eroded some of the A&A USP but plenty of other attractions that appeal to their customers. I can't say I understand posting after months of silence to announce departure or at all if not relevant to others but that's life.

Chrysalis
(legend) Mon 24-Feb-25 18:09:27

Re: Poor uptime and reliability

[re: perlen] [link to this post]

As XGS said there was no need to announce 3 months later, people move between ISPs all the time. You was clearly not at ease though, so for your own peace of mind you moved.

But in terms of your latency the biggest thing affecting that will likely be the wholesale backhaul your connection is going over and your geographical location, for reference here is my typical ping to a UK location on AAISP using cityfibre national backhaul. I know as an example I get higher pings over BT wholesale backhaul as their routing isnt optimal for my city.

Pinging cloudflare.com [104.16.132.229] with 32 bytes of data:
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59

Ping statistics for 104.16.132.229:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 4ms, Maximum = 4ms, Average = 4ms

GoWest
(regular) Mon 24-Feb-25 19:52:33

Re: Poor uptime and reliability

[re: jaydub] [link to this post]

Delete 6 months: insert 3 months.

Realalemadrid
(experienced) Tue 25-Feb-25 09:51:42

Re: Poor uptime and reliability

[re: GoWest] [link to this post]

You obviously haven't seen the post on the 21st of Feb in this thread.

6 months half price

jpm
(fountain of knowledge) Tue 25-Feb-25 11:07:20

Re: Poor uptime and reliability

[re: XGS_Is_On] [link to this post]

I took out their business L2TP service last week and within 45 minutes of opening a support ticket I had the routed /29 that they say is available, and everything is working great. I'm happy to keep recommending them to people who have the budget.

GoWest
(regular) Tue 25-Feb-25 11:11:58

Re: Poor uptime and reliability

[re: Realalemadrid] [link to this post]

I stand corrected. That change didn’t last long!

jaydub
(fountain of knowledge) Tue 25-Feb-25 21:37:37

Re: Poor uptime and reliability

[re: GoWest] [link to this post]

In reply to a post by GoWest:
I stand corrected. That change didn’t last long!

Exactly! Temptation renewed as a result.

Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | (show all)

Print Thread

Jump to