|
|
Is anyone else getting bored of the constant blips and glitches with FTTP AAISP?
Lines on the z.witless LNS have dropped at least 8 or 9 times in the last two months since I joined:
https://aastatus.net/recent.cgi
I had less interuption in two years of el cheapo FTTC with TalkTalk.
Unfortunately I am in month 2 of a 12 month contract... less than ideal, and certainly not business class broadband.
Edited by perlen (Wed 10-Jan-24 23:17:51)
|
|
|
Yes its not gone unnoticed here either, think I managed 35 days up-time until work to update LNS routers started again this year. The issues are well explained on their service status page which goes some way to me being more accepting of the problems.
The drops over night don't worry me as I'm usually in bed and luckily everything comes back up without issue by itself. However, there have been a couple of random crashes last year during the day which was a bit more disruptive, and I think that is the problem they are still trying to fix.
I notice they've reported another random crash on the new firmware caused by a different bug, so presumably we will be having the drops and shuffle to different LNS's for more updates in the coming weeks, plus due to BT wanting people moved from a connection caused more drops. So its been a bit of a week of ups and downs, its not usually as bad as this. Looking at my BQM the drop overnight barely registered, I've had more packet loss with other ISPs just because they run congested.
I understand what you mean by expecting something better, it has been a bit flaky of late.
Edited by E300 (Thu 11-Jan-24 09:33:48)
|
|
|
I am on a TalkTalk Wholesale line and my line was down for 4 hours last night:
https://aastatus.net/42599
Not the biggest deal, as I was fast asleep, but I did have various things complain about lack of the internet overnight. I do wonder if someone / something should have noticed faster than 4 hours...
|
|
Register (or login) on our website and you will not see this ad.
|
|
|
TalkTalk have said that this was them.
AAISP do not have 24/7 staffing as I recall.
But the other stuff is irritating.
Not what you would expect.
Edited by Rhynchelma (Thu 11-Jan-24 09:56:10)
|
|
|
|
Another LOS this morning:
Z.Witless
Jan 11, 09:20 AM
The Z.Witless LNS restarted (again) this morning at 09:20 causing pp drops for customers connected to it. Investigations underway.
Due to the various different problems we've had with Z.Witless we will take this unit out of service and replace it.
|
|
|
|
That's a shame. I was lucky I guess, I moved off about 4 months ago. The as-yet unreleased status of the LNS devices used for the high speed FTTP services did make me a bit nervous when I signed up, but it never proved to be an issue in the time I was with AAISP. It does seem like they've had an unusually large number of gremlins in recent time. Fingers crossed for swift resolution.
|
|
|
Issues...
Edited by perlen (Thu 11-Jan-24 22:02:44)
|
|
|
They're released now, though the odds of anyone other than A&A using them as an LNS seem low regardless of stability issues. Very much made with them in mind.
Edited by XGS_Is_On (Fri 12-Jan-24 08:04:26)
|
|
|
|
Ah ok, must be relatively recent. I thought they were still unreleased when I migrated off.
The spec always seemed a bit odd to me (only 2x 10G ports) given that Openreach are now selling >Gbit services.
|
|
|
The spec always seemed a bit odd to me (only 2x 10G ports) given that Openreach are now selling >Gbit services.
The 2 10G ports seem to share 10G backplane just FYI - they can't go to 10G in and out there are 2 for resilience.
List price £7,500 + VAT.
|
|
|
Looks like another drop, just cut off a video call. Not good.
|
|
|
|
Even worse than I thought...
|
|
|
Z.Witless is now in service and on new hardware - I was moved back on to it at 16:33 today.
Unfortunately base latency (was 9ms now 11ms) has increased by 2ms
|
|
|
Y.Witless crashed today also...
INITIAL
4¼ hours ago by Andrew
At 13:16 Customers on Y.Witless dropped and reconnected a few minutes later. The cause is being investigated as a matter of high priority.
UPDATE
55¾ minutes ago by Andrew
Y.Witless crashed again at 16:30.
RESOLUTION
28 minutes ago by Andrew
We've updated the main post regarding these drops: https://aastatus.net/42577
------------------
UPDATE
28¾ minutes ago by Andrew
An update of where we are (Friday 12th January).
Some customers have had interruption to their service this week as we have seen a number of crashes on both Z.Witless and Y.Witless.
Today we replaced the hardware of Z.Witless.
Our developers have been working on investigating each crash we have. We have been saying in recent updates that progress had been made on the crashes we have seen, and this week we applied the software update to two of our three 'Witless' LNSs. In our test lab we have never seen this updated software crash during 3 weeks of testing. However, we have had crashes this week since applying the updated software.
Usually with a crash, our developers are sent a crashlog with details specifying exactly where in the code the crash happened. However, the crashes that have been affecting us are different in that the hardware locks up and restarts - with this type of crash we have less forensic to work with which is making getting to the bottom of the problem that much harder.
We are still working hard to resolve this. We various avenues of investigation to take, and during the next week we will be planning more overnight work as well as datacentre trips.
We know how disruptive this has been for those customers affected, and we are doing all we can to work towards a stable service for everyone.
|
|
|
Z.Witless is now in service and on new hardware - I was moved back on to it at 16:33 today.
Unfortunately base latency (was 9ms now 11ms) has increased by 2ms 
Have they explained why they are doing this sort of work at 5pm instead of scheduling overnight changes?
|
|
|
I see the same thing, a 1.5 to 2ms increase in latency if I connect to Z.Witless, I think this is because it is in a different data-centre to X and Y so routing changes.
Also I've found on BT Wholesale that latency can vary by plus or minus 2ms for me, as there is some variation in the back-haul routes, but a few drops of PPP will usually see it come back up on the shorter route.
Sometimes when we get these overnight drops, I can see a 4 or 5ms increase in latency the next morning because I've landed on Z.Witless plus got a longer routing on BT backhaul, which just doesn't feel right going in the wrong direction even though it doesn't make any noticeable difference.
The differing latency via BT Wholesale of a couple of ms I've seen with my previous ISPs as well, so just one of things, some sort of load balancing.
|
|
|
Have they explained why they are doing this sort of work at 5pm instead of scheduling overnight changes?
This was another crash of an LNS and so connections fell over to Z.Witless. It's easy to spot these on the blip graph https://aastatus.net/index.cgi#blip if it was a controlled move over, we would see a red blip below the green blip, but because the LNS has just crashed, it doesn't update the graph with any disconnections, only re-connections show up.
So that's two crashes today a few hours apart.
At least they are open about the issues and we know what's going on, and so as customers we aren't thinking is it our kit or wasting time rebooting our own routers etc.
Edited by E300 (Fri 12-Jan-24 18:32:58)
|
|
|
And another:
Jan 26, 07:00 AM
https://aastatus.net/42612
6AM: Z.Witless LNS had a hardware lock-up, causing lines on it to drop and reconnect
|
|
|
And another:
Jan 26, 07:00 AM
https://aastatus.net/42612
6AM: Z.Witless LNS had a hardware lock-up, causing lines on it to drop and reconnect
I had a drop overnight due to a "Lost Carrier" which suggest it was BT work then as I'm on X.Witless. Z.Witless I think is new hardware now with extra debug logging so having that crash on Z.Witless might be good news in a way, as they may find out what is causing it.
|
|
|
|
I had a lost carrier just before 1am. Couple of minutes outage.
I'm on gormless then aimless judging by the traceroute.
I do like their naming scheme. I wonder if they have a feckless?!
|
|
|
Second time today:
AFFECTING
Z.Witless
STARTED
Jan 26, 10:30 PM
https://aastatus.net/42613
Z.Witless hardware locked-up at 10:30 this evening, causing those line on it to drop an reconnect.
I am getting a bit sick of all these drops, could I get out of my remaining 10 months contract do you think?
|
|
|
It is unfortunate you seem to be on Z.Witless which appears to be affected more so than the other two LNS's, although that could just be a quirk of randomness and then us seeing a pattern that gets disproved over time. It's also bad luck you've joined just when these problems started and so have not known anything better.
I would suggest getting in touch with their technical support and raising a ticket, nothing will happen otherwise. I'm not sure 'legally' this would be enough to get you out of the contract, as all services can have problems and time needs to be allowed for companies to rectify them plus the issue is a blip rather than a long lasting outage, and these services come with no SLAs or guarantees of up time. Whether as a goodwill gesture they would let you leave early is of course only something they can tell you.
It has been a bit disappointing of late, but A&A are probably more disappointed than we are, and at least we know what is happening and are kept up to date.
|
|
|
|
Just had another unexpected outage, I had to reboot my router.
My connection now terminates with u.gormless.thn.aa.net.uk which unfortunately seems very laggy.
|
|
|
Just had another outage, when it was back up I logged into the control pages.
A pin has been added: "LNS Kill requested by andrew"
The status page https://aastatus.net/42617 explains:
INITIAL
3¼ hours ago by Andrew
At 14:00 The Y.Witless LNS locked up, causing customers connected to it drop and reconnect a few minutes later.
UPDATE
31 minutes ago by Andrew
Most lines reconnected by 14:11
UPDATE
29¼ minutes ago by Andrew
Some customers had re-connected to the "U.Gormless" LNS - which doesn't have as much throughput capacity as the Witless LNSs - in order to ease congestion we will manually force some customers to move off U.Gormless by way of a PPP kill - this will force the customer's router to reconnect causing a short outage (typically less than a minute)
RESOLUTION
6½ minutes ago by Andrew
The lock-up of Y.Witless was unfortunate as it did cause a disruption to some our customers this afternoon and we had hoped that the work done a week ago to Y.Witless would have helped prevent this hang. However, Y.Witless is out of service and in it's locked state, where our developers can connect to its CPUs and see if they can gain more information.
I am now back on z.witless.thn.aa.net.uk
Edited by perlen (Sat 03-Feb-24 17:29:51)
|
|
|
|
I'd probably be off at this point. A nice idea to run your own hardware but it seems like it's not working as planned.
|
|
|
And it goes down again!
Right in the middle of a Teams meeting the Mrs was in (WFH)
REFERENCE
42618 / AA42618
PERMALINK
https://aastatus.net/42618
INFORMATION
At around 17:20 the Z.Witless LNS hung, causing customers on it to drop and reconnect.
Edited by perlen (Mon 05-Feb-24 17:37:44)
|
|
|
Sorry Perlen, and others affected by this.
The post https://aastatus.net/42608 has more information about the problem.
Due to the work we did last week we have been able to gain more information about today's (and Saturday's) lockup than we have been able to in the past. This low-level data comes from the CPU, memory and other hardware on the system whilst it's in the 'hung' state. This is being analysed and is providing clues, but work analysing this is still ongoing.
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
|
"We are not only an Internet Service Provider. We also design and build our own routers under the FireBrick brand."
I do wish you success in fixing this. Although I'm not an AAISP customer, it does sound like the brand risks becoming badly tainted by this saga.
I wonder if the older generation Firebricks are still around? Could these be used for live customer traffic, whilst the newer generation are used with an opt-in pool of beta testers?
|
|
|
I wonder if the older generation Firebricks are still around? Could these be used for live customer traffic, whilst the newer generation are used with an opt-in pool of beta testers?
As I understand it the older generation Firebricks are still in use but for slower customers (<80Meg). When I first joined I found myself connecting to the older LNS's (Gormless) even though I was a 1000/100 customer. How did I know? Speeds were only up to around 300Meg. This was resolved very quickly by contacting them and a reconnect saw me on the new LNS's and speeds as expected. So it seems to me the older kit struggles with faster connections and so can't be used as a fallback for faster services.
They are now upgrading some of the older LNS's to the newer ones, they said the idea being a drop of one would not affect as many customers. I'm not sure that logic works though as if the LNS's are all equally as prone to locking up, then it doesn't matter how many of them there are, lockups will affect the same number of customers, just in smaller batches over more bits of kit. Perhaps they need the extra capacity as while they debug locked up boxes they remain out of use.
I'm sure it will all be sorted out soon now they have some debug data from the locked up boxes. Be interesting to find out the cause.
Edited by E300 (Tue 06-Feb-24 08:44:09)
|
|
|
|
The older generation firebricks have a 1Gb backplane and core connection, hence no good for the faster connections, whereas the latest firebrick 9000 has a 10Gb backplane and core connection. What I find puzzling is that of the 3 original witless 9000,s (X,Y,Z), whilst Y and Z have locked up fairly regularly, X (which I happen to be connected to) has an uptime of around 82 days. Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?
|
|
|
The older generation firebricks have a 1Gb backplane and core connection, hence no good for the faster connections, whereas the latest firebrick 9000 has a 10Gb backplane and core connection. What I find puzzling is that of the 3 original witless 9000,s (X,Y,Z), whilst Y and Z have locked up fairly regularly, X (which I happen to be connected to) has an uptime of around 82 days. Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?
I'm now on X as well which also gets me the lowest latency, hopefully there I remain
Yes I would think if all else is the same it seems to suggest some low level hardware issue, which could be anything of course. I'm sure they've compared the differences between X and the others to try and pinpoint what might be up. X might well have the same issue, but differences in component tolerances might just be enough to keep it far enough away from whatever 'cliff edge' the other hardware falls over.
I'm glad I'm not the person having to sort it out!
|
|
|
They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.
https://aastatus.net/42616
|
|
|
Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?
As someone who manufactures Pcb's on a daily basis, I think it's unlikely to be a pcb build issue.
The Firebricks are manufactured on a very small scale so they won't do small test samples from batches. The pcb manufacturer will fully test each board including AOI (automatic optical inspection) and a whole suite of electrical testing (including things like flying probe testing).
Most hardware issues come from components added to the finished pcb.
The Firebrick PCB is nowhere near as complex as some of the other circuit boards made today (smaller tracks, tighter spacing, many more layers, thousands more tiny via's between layers).
I have no idea where Firebrick have their PCB's made but I know they are made in the UK.
Pretty much all UK based PCB manufacturer is low volume, high precision, quick turnaround work. Anything with high volume gets made overseas.
I would put my money on it being a software issue causing them to hang or an issue with a component on the board (chip/ram etc).
|
|
|
They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.
https://aastatus.net/42616
Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?
|
|
|
They have recently moved customers off of gormless a, b & c with the intention of replacing these with more FB9000's tomorrow.
https://aastatus.net/42616
Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?
That's certainly how I'm reading it.
Beta testing is usually voluntary but I guess not always.
Edit: don't think it's all customers, but certainly 3 Firebricks worth
On the status page linked further up by Andrew they write
The enlarged pool of LNSs will also reduce the number of customers affected if there is a lock-up of one LNS.
That only makes sense ifthey deploy more FB9000's to the current customers on those devices. Throwing in more customers from the older, stable Firebricks will just mean those customers are now more likely to be affected.
Why not add the extra FB9000's, spreading out the witless LNS load, but leave the stable gormless LNS's as they are? Probably down to the cost of rack space.
I don't intend on lecturing A&A on their network 😂 they are smarter than me. It just seems a bit backwards. Credit for the transparency though.
Edited by j0hn83 (Tue 06-Feb-24 13:24:39)
|
|
|
Let me check I understand this. They're moving all remaining customers off the original stable and reliable LNSes, onto the new ones which have a crash rate of 2 in 3? Before they've fully diagnosed the problem, which they can't reproduce in the lab, but only happens when live customers are put on them?
As I understand it, they are decommissioning several Gormless LNSs (on the stable hardware) and moving the customers over to other Gormless LNSs still on stable hardware, as they said they have plenty of capacity for slower customers (<80Meg).
I suspect they are doing this to free up rack space at the datacentre to then install a few more newer LNSs. Only customers already needing to connect to the newer LNSs (>80Meg) will connect to these upgraded ones. So no extra customers are using the troublesome new LNSs than they were before. The idea is if an LNS drops, then as there are fewer customers on each LNS then not so many people are affected by a single LNS crash.
I guess you could argue is it a good idea to add more problematic kit into the mix. Also if the new LNSs have the same probability of a crash, you aren't having less customers affected, just smaller batches of customer drop more often.
Edited by E300 (Tue 06-Feb-24 13:20:26)
|
|
|
|
I guess another possibility is that the issue could be load related. The FB9000 was first introduced into the AA network in early 2022. I don't have any recollection of these lock ups being an issue during the first 12 months or so. Since early 2022 I would assume that a lot of the installed base would have migrated from ADSL/VDSL to FTTP. This is probably why A&A are now oversubscribed with the older FBs, and no doubt the load being placed on the 9000s will be inexorably increasing, both in number of connections and total throughput. Possibly the move to expand the pool of 9000s is to rule out this possibility.
|
|
|
|
Please excuse my intrusion into others' grief, but I have a simple question that may be completely unrelated. I have Fibre BB from another supplier and have no intention of moving it to A&A. However I am considering moving just my current DV line to A&A. Is any of the current trouble likely to have any impact on my phone connection?
|
|
|
|
No, it will be completely affected. The equipment being discussed, known as LNS or BNG, is what terminates broadband connections coming into their network. The traffic between their voice servers and the Internet won't go through these.
|
|
|
As I understand it, they are decommissioning several Gormless LNSs (on the stable hardware) and moving the customers over to other Gormless LNSs still on stable hardware, as they said they have plenty of capacity for slower customers (<80Meg).
This is correct.
We have 20 or so of the FireBrick FB6000 LNSs which have more than enough capacity for the lower speed customers we have.
Once we've replaced a, b, c, d Gormless with FireBrick FB9000s they will then be put in to service for the faster-speed customers.
This isn't because the existing LNSs are overloaded (we don't believe load is the cause of the lock-ups), but to spread the load so fewer customers are affected.
Happy to answer more question if there are any!
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
No, it will be completely affected. The equipment being discussed, known as LNS or BNG, is what terminates broadband connections coming into their network. The traffic between their voice servers and the Internet won't go through these.
I think you mean: No, it will be completely unaffected
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
|
Andrew, just to clarify, I wasn’t suggesting the 9000s were overloaded - there would have been obvious performance issues had that been the case. My ponit was simply that a presumably increasing load may have exposed the weakness. Good luck anyway - I’m sre the issue will be reolved.
|
|
|
Many thanks for your reassurance. OK, back to your regular grievances.
|
|
|
This is correct.
We have 20 or so of the FireBrick FB6000 LNSs which have more than enough capacity for the lower speed customers we have.
Once we've replaced a, b, c, d Gormless with FireBrick FB9000s they will then be put in to service for the faster-speed customers.
This isn't because the existing LNSs are overloaded (we don't believe load is the cause of the lock-ups), but to spread the load so fewer customers are affected.
Happy to answer more question if there are any!
Thank you for the update. With regards to X.Witless which appears to be pretty much rock solid, is that different somehow to the other Firebricks and if so has that been able to narrow down the potential issue?
|
|
|
I think you mean: No, it will be completely unaffected 
Err, yes of course! Too late to edit post now though...
|
|
|
Assuming they are all running the same software/firmware then does this imply some very low down hardware/component/pcb build issue?
...snip...
I would put my money on it being a software issue causing them to hang or an issue with a component on the board (chip/ram etc).
Given the symptoms described in the status post, I'd be inclined to suspect something wrong with the pullup/pulldowns on the CONFIG pins in the M.2 socket for an NVMe SSD.
If the CPU reads all 4 CONFIG pins as 1s, it should ignore the rest of the pins in the socket; however, if one of them is marginal, and sometimes drops to a 0, the SoC should generate an interrupt and expect the CPU to change pinmux settings to match the CONFIG pins. I can envisage a few tight timing cases where the pinmux setting change causes the CONFIG pins to change again, and the resulting timings tickle bugs that "can't happen" if your design complies with PCIe spec (presence pair are the last two pins to connect, presence must be debounced, delay between presence connecting and hotplug being asserted), or M.2 (no hotplug allowed).
|
|
|
|
For an ISP to be this open and transparent is unheard of.
Most other ISPs would just stay silent, in the hope that you go away, making you think BT or TalkTalk is the culprit. In the meantime, trying to fix things in the background.
The benefit here is that it’s all in house (or they work very closely with Firebrick) and from what has been put out, it seems they are competent and confident in what they do.
Although I’m sure it is annoying, at least you can fall over to another LNS.
My advice is to stick with a company who clearly know what they are doing. All companies have a rough patch, and this is one of them, I’m sure with your support they’ll be able to ride through it.
I’ve used their L2TP service in the past with Starlink and found it great.
|
|
|
|
Excellent point.
|
|
|
Just lost my SSH session to work...
https://aastatus.net/42629
Lines on the X.Witless LNS were affected, sessions are recovering.
Edited by perlen (Tue 27-Feb-24 12:35:37)
|
|
|
|
Not only that - my long standing connection to x.witless was abruptly ended just before 02:00 this morning after 2 drops in quick succession. These are clearly shown on the blip graph but have not yet been commented upon. Now on c.gormless.
|
|
|
I had my connection drop due to a local power problem, up to then had spent tens of days on the more reliable x.witless, but connected back up to one of the new gormless LNS's which then crashed a few days later. I was bit annoyed not being on x.witless anymore as that seemed pretty stable, alas it seems it just had a good run.
There doesn't seem to have been any updates for a while about the debug logs they've captured and if they are nearer to a fix, perhaps no news means they are no nearer.
I wonder if they have a plan B, for example buying in some alternative hardware?
They say on the service status page that x.witless likely crashed due to not having the newer software and a NVMe drive fitted, yet ironically its stayed up for a very long time, and LNS's having been already upgraded with an NVMe drive fitted have reached nowhere near that length of uptime before crashing. So fitting an NVMe drive doesn't appear to have improved stability that I can see.
Edited by E300 (Tue 27-Feb-24 13:27:15)
|
|
|
I wonder if they have a plan B, for example buying in some alternative hardware?
There's plenty of COTS hardware capable of doing this job, but switching to some other make of core router would be the worst possible advertising for the Firebricks. That may be why they're still persisting with them.
|
|
|
|
A dual-vendor strategy could be wise. Along with an opt-in beta testers group.
|
|
|
Some overnight work tonight (early hours of 1st March) so some drops and shuffling about again, they are also separating out CityFibre and BT/TalkTalk customers so we are on separate LNS's and don't mix
They've not said if they've found a problem and the software update is a fix for the drops, still it's not easy trying to fix something you can't replicate at will.
https://aastatus.net/42630
|
|
|
|
Was with AAISP for 3 months via City fibre and I can honestly say it was the most unreliable expensive over rated ISP I have ever been with. I changed over to IDNet and I have had absolutely no problems at all plus I'm saving money happy days.
|
|
|
Was with AAISP for 3 months via City fibre and I can honestly say it was the most unreliable expensive over rated ISP I have ever been with. I changed over to IDNet and I have had absolutely no problems at all plus I'm saving money happy days.
It might be cheaper and more stable but you don't have Continuous Quality Monitoring anymore.
|
|
|
It might be cheaper and more stable but you don't have Continuous Quality Monitoring anymore.
I'd take a £13 a month reduction & a stable connection over quality monitoring any day of the week. If I'm really that interested in monitoring the quality of my connection I would just use the free ones out there..
Edited by jalzoo (Wed 06-Mar-24 15:11:13)
|
|
|
Still not fixed...
https://aastatus.net/42636
Customers on the X.Witless LNS dropped and reconnected at 11:35 today.
|
|
|
|
Indeed not. However they have split off City Fibre connections from the rest. I'm on BTW and currently connected to g.gormless. So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns. Further conjecture is the thought that maybe the issue is triggered by CF. Looking at the history, the issue does appear to have started afer taking on CF. Just a thought.
|
|
|
Indeed not. However they have split off City Fibre connections from the rest. I'm on BTW and currently connected to g.gormless. So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns. Further conjecture is the thought that maybe the issue is triggered by CF. Looking at the history, the issue does appear to have started afer taking on CF. Just a thought.
I did wonder as well if they thought the City Fibre traffic was somehow causing a bug, hence the separation of traffic. One thing seems clear, given X.Witless had the longest uptime out of all of them without an NVMe drive, and with an NVMe drive it has crashed with a very short uptime and all the others appear no more stable with an NVMe drive either, suggests that isn't playing a part in the stability, or, it fixes a bug they've seen with artificial load testing, but that isn't the same bug causing the issues on live.
|
|
|
So possibly BTW/TTB connections are on the 4 gormless LNS with City Fibre on the 3 witless lns.
I did wonder as well if they thought the City Fibre traffic was somehow causing a bug, hence the separation of traffic. One thing seems clear, given X.Witless had the longest uptime out of all [...]
Counterexample: Openreach FTTP here, on x.witless until yesterday, then reconnected to z.witless.
Crash messes up BQM, too, with history from preceding midnight to crash time becoming inaccessible. Reliability and BQM are two unique selling points of AA, and both are negatively affected by these crashes. I'm sure they are aware of this, but IMNSHO the steps taken to fix this so far haven't worked, therefore different steps would be now be advisable.
|
|
|
Counterexample: Openreach FTTP here, on x.witless until yesterday, then reconnected to z.witless.
Crash messes up BQM, too, with history from preceding midnight to crash time becoming inaccessible. Reliability and BQM are two unique selling points of AA, and both are negatively affected by these crashes. I'm sure they are aware of this, but IMNSHO the steps taken to fix this so far haven't worked, therefore different steps would be now be advisable.
So in that case, that blows that theory that City Fibre connections might somehow be triggering a bug if that was what they were thinking as X.Witless appears to be an Oprenreach LNS.
Seems to be a lack of updates currently, presumably because they are no further forward.
|
|
|
Goodbye x.witless, thank you for your service:
Mar 15, 07:40 AM
https://aastatus.net/42643
1½ hours ago by Andrew
At 7:30AM, the X.Witless restarted causing customers on it to drop and reconnect.
UPDATE
1 hour ago by Andrew
Lines reconnected by 7:33
RESOLUTION
1 hour ago by Andrew
This was related to https://aastatus.net/42608 This LNS is now out of service and will be analysed by our developers.
|
|
|
The irony is X.Witless was the most reliable LNS, they then put an NVMe drive in it like the others as they suspected the slot left empty was causing some instability, now X has been the least reliable just recently.
We've had no updates either for a while so that kind of points to them not being any further forward. I really want to be hearing about plan B's now as this has gone on for months.
|
|
|
Seems it was a rough night over night as awoke to a string of emails of link up and down, turns out due to BT testing some links https://aastatus.net/42648 Seems pretty poor of BT to do that kind of testing with no notice. Found myself on t.gormless and latency up to 9.5ms from 7.5ms the day before which was g.gormless I think, I did a drop and reconnect this morning and back on x.witless and latency at 6.5ms which feels a lot nicer  . For some reason only x.witless gets my latency that low.
It also seems at some point this week new firmware is going up to the LNS boxes that may contain a fix for the lockups, currently the LNS boxes are running on older but stable firmware hence we've not had any crashes recently, that's according to the latest update at https://aastatus.net/42647 The status note isn't clear but it seems to imply they are all being updated, I would hope they are just updating one or two first in case it makes things worse and not better. Fingers crossed it fixes things.
Edited by E300 (Tue 26-Mar-24 10:22:04)
|
|
|
I think they were doing LNS full upgrades last night according to the updates on https://aastatus.net/42647
The last two updates were:
UPDATE
1 day ago by Andrew
These upgrades are now scheduled to happen this week (Tues 26th through to Thursday 28th) from 3AM.
UPDATE
1¾ hours ago by Andrew
Due to BT Planned work on one of our hostlinks, https://aastatus.net/42640 we will not be scheduling any upgrades for the early hours of Wednesday 27th March.
So I think they are only avoiding doing the upgrades this evening, but were scheduled to do them last night and there may be further to come tomorrow night.
I guess this might have been disrupted by the BT link issue, but we won't know for certain until there is a further update on: https://aastatus.net/42647
|
|
|
Well we had drops yesterday (Sunday) which don't seem to get flagged up now on the service status page, that put me onto a gormless LNS due for an update which then meant I got knocked off again overnight! Currently sitting on y.witless.
Not sure what the current status is for the latest fixes as the information seems a bit fragmented on the status page, but I think I read it as they've had to roll back to the last firmware that was stable, so haven't yet found the cause.
|
|
|
Yes, 10am Sunday I lost connection too.
No explanation from AAISP:
https://aastatus.net/recent.cgi
|
|
|
Another drop last night before 1am. Not sure what the cause was as nothing on the service status page. It seems drops are becoming the norm and so don't get a mention anymore on the service status pages.
It's been about 5 months since the problems started and no closer to a fix it seems. We keep going through cycles of upgrades then downgrades all of which cause more drops on top of the random crashes.
I would suggest they move to just having one live LNS they test the fix on for the crashes and keep the rest on the stable firmware, and allow their customers to choose to be BETA testers and connect to this test LNS. In order to get a decent number on it for a real world test, as an incentive give people a discount to become a BETA tester. That way they could also update this LNS more frequently and during more sociable hours, as I bet the person having to do these updates and downgrades are fed up with doing it in the dead of night.
Edited by E300 (Thu 11-Apr-24 08:33:57)
|
|
|
|
For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.
|
|
|
For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.
Yes I agree and I'm out of contract and wondering how long I stick with it. It is starting to irk me that I'm paying a premium but getting that less and less reflected in the product and service. This last drop like the one before has gone completely unacknowledged and no real updates on this issue in a while now.
Edited by E300 (Thu 11-Apr-24 18:03:07)
|
|
|
|
Couldn't agree more. Unfortunately I'm still under contract but things will have to improve for me to stay. For all we know the latest incident could have happened on the "more stable" software. The lack of any update is very concerning.
|
|
|
|
I'm not an AAISP customer, and I'm very unlikely to consider them in future after this.
It sounds to me like AAISP now have a tough decision to make. They have to decide whether they are primarily a router hardware vendor, with an ISP on the side to act as a large group of (paying) beta testers; or primarily an ISP, whose job is to provide top-rate Internet connectivity.
If they want to be the latter, they have to acknowledge that Firebrick isn't currently "best of breed" when it comes to LNS/BRAS, and they need a second vendor to provide actual service while they sort out their problems, or at least to roll back to the older hardware.
The decision to remove the older, slower but reliable LNSes (which were providing service to the customers on lower speed connections), and replace them with a model which is faster but known to be unstable, was madness IMO.
|
|
|
|
It's at least time to put out a blog post with an update as to where they are currently and what the plans for resolving this look like. I think the Firebrick is an ARM appliance so they can't even run the software on something else while they debug the hardware.
|
|
|
The decision to remove the older, slower but reliable LNSes (which were providing service to the customers on lower speed connections), and replace them with a model which is faster but known to be unstable, was madness IMO.
Those on the slower, more reliable FB6000 LNS's (nicknamed gormless I believe) were moved to other FB6000's. The FB6000's were swapped with the newer troublesome FB9000's (nicknamed witless).
So nobody was moved from an FB6000 to an FB9000, but it added additional FB9000's to spread the witless LNS load.
So that particular move was sensible in my opinion, though the whole saga seems a bit of a s*** show.
Essentially everyone on a package above 80Mb is a beta tester.
|
|
|
but it added additional FB9000's to spread the witless LNS load.
I wonder if they needed more of these LNSs because the firmware they are calling factory stable maybe a very early one? From memory from status updates over the last 16 months or so, the early firmware's were not optimised for the new processors (something to do with only running on one core or not multi-threaded enough), but were stable.
So if they had to go back to this non-optmised but stable firmware, then it would explain why they have had to throw more boxes at it to make up for the lower performance, as they presumably have more customers on the faster packages now and also have CityFibre as well with symmetrical connections they didn't have before.
That isn't what they told us at the time, they suggested more boxes would mean fewer people would be affected by a lock up, but more boxes with the same probability of crashing would just work out over time seeing the exact same number of customers affected, so I couldn't see the logic in that.
They've now had a L2TP router lock up (https://aastatus.net/42655) and drop a lot of customers and blamed that on an early FB9000 prototype and are/have replaced it with a new box, but I'm sure they said that about the LNS's and replaced the hardware with production kit, which we know didn't fix the issue. It would seem these new Firebox's are just not stable full stop, and I really hope they have a plan B.
Of course with the lack of any up to date information we are all reading things into what is going on and perhaps not coming to the correct conclusions.
Edited by E300 (Fri 12-Apr-24 14:48:23)
|
|
|
Those on the slower, more reliable FB6000 LNS's (nicknamed gormless I believe) were moved to other FB6000's. The FB6000's were swapped with the newer troublesome FB9000's (nicknamed witless).
So nobody was moved from an FB6000 to an FB9000, but it added additional FB9000's to spread the witless LNS load.
Ah yes, thank you: re-reading the thread it was made clear earlier on. There is a slightly smaller FB6000 pool as a result, and slightly less headroom/redundancy.
|
|
|
Just to add there is some more information here https://social.aa.net.uk/public/local covering the issues and troubleshooting. Would be good if they added this link into the service status pages, its a bit more chatting and verbose in the information it provides.
|
|
|
Hi all,
Thanks for all the feedback given in this thread.
We do appreciate it, and we know that our recent reliability for some customers has been unacceptable. I wanted to set out a bit more of the story, mainly for transparency rather than because we expect it to be "mitigation" in most people's minds.
This post refers to and updates a status post originally made at :
https://aastatus.net/42608
This is where our two roles; that of both an ISP with broadband customers, and also that of a hardware manufacturer meet each other head-on and, unfortunately and uncomfortably, collide.
To be abundantly clear, we are very sorry for the outages some customers have suffered. This falls below the standards we set ourselves. We are not happy about it, and a lot of effort is going into sorting it.
The story since
---------------
Several plausible causes have been found, fixed and tested in our testing process (before deploying live). Many of these will have fixed genuine problems, but not solved what appears to be the "main" issue.
Almost all of these have been at the meeting point between hardware and software. The problem with a hardware hang is that far less diagnostic information is available to assist with debugging.
On several go-arounds now, we have genuinely believed that the issue had been found and fixed, tested in our test-rig offline, and therefore we were keen to place the firmware in active use; the thought being that the sooner it was rolled out, the sooner the unreliability would disappear.
But then, some time after being put live, an FB9000 would suffer another hang. The nature of the hang has been unpredictable (i.e. when it would happen); sometimes taking days or weeks to surface. Meanwhile, until it did hang, we still believed the problem had been solved.
"Why not Cisco?"
----------------
Some customers have quite reasonably asked why we do not employ (even temporarily) a 3rd party hardware vendor as our LNS supplier, such as Cisco. This is an option, but the costs of implementation (in time and money) we still feel would be better spent on active R&D to resolve this problem.
We do still believe strongly that the FB9000, when stable, offers us features that distinguish our service from the service of almost all others. Simply, we want bonding, CQM graphs, low power consumption, etc.
It is part of what makes our ISP offering different and better; our USP.
Other issues
------------
Within this same time frame, we have had multiple instances of BT Wholesale doing planned work which they had not told us about in advance (and apparently not told other ISPs, too). We could have zeroed the impact of their planned work, had they told us they were doing it beforehand.
Multiple times we have raised this with our account manager and at higher levels, and we still have not had a satisfactory response. Of course, no wholesale network is 100% reliable; we are not unreasonable about this, but the combined appearance, especially to customers not following matters closely, is that it's "another LNS blip". Unlucky timing, which would be bad any time, but happens to be far worse just now.
A change of plan
----------------
Historically, our October "Factory" firmware from has been stable. The hangs we have seen have all occurred in releases prior to that one, or since that one. That release did have at least one major fix in it, addressing a hardware hang (the PCI/NVMe issue).
Our immediate decision is to therefore put all "live" production FB9000 hardware back onto the October "Factory" release, except for our test LNS. To this end, we have already rolled back almost all live LNSs.
Assistance requested if you're willing
--------------------------------------
We invite and encourage customers who do want to assist with the process of fixing this to prepend "test-" onto their login, which will steer them to the test LNS, and help the effort to fix the problem. Of course this may be less stable than our regular LNS. Email support for more details.
Rounding up
-----------
Hopefully this post shows we are listening, that there is a vast amount of work going on, and that we've taken a different approach, recognising that this state of affairs has remained too long and cannot be carried on.
I recognise that this level of openness is uncommon, but the situation we are in is uncommon; I doubt any other ISP develops its own core equipment.
I politely request that this post is taken for what it is; a genuine offer to :
* explain in more depth
* announce a change of direction
- and -
* apologise for the outages
... and not as an invitation to simply slag off everything we do.
Nothing we do happens by accident or because of a lack of thought, or a lack of awareness, or a cavalier approach to customer well-being. Decisions sometimes do prove to be wrong, but decisions *are* made, and made with the best of intentions.
There are human beings writing the code.
There are human beings in our Ops and Support teams.
And there are human beings managing the business.
Nobody takes this in any other way than "extremely seriously".
Thanks for taking the time to read this, and we are happy to answer any questions, of course.
--- B
---
Bloor
GM, A&A.
Edited by aabloor (Fri 12-Apr-24 16:47:38)
|
|
|
|
Hi Alex, regarding:
"To this end, we have already rolled back almost all live LNSs."
Why wasn't this announced/warned about on A&A Status Page?
A few of us have seen unplanned stuff affecting multiple users without knowing the reason.
Thanks.
|
|
|
|
I am not an A&A customer but I have to commend you on your unexpected transparent and complete response, its what sets you apart from the masses and I hope you get to the bottom of this blip.
|
|
|
Thanks for the update.
One question, why update all the LNS's when you think you have fixed it? I would suggest picking the LNS that seems to have had the most lockups and just updating that single one would be the prudent thing to do, then deciding on a period of time it must run crash free before you say its fixed, and only when its proved itself update the others.The overnight drops for firmware upgrades/downgrades only adds to the perception of things being even more broken.
I will log into the test LNS later today for the weekend, but during the week due to working from home I'll switch back.
I doubt any other ISP develops its own core equipment.
I think you are finding out why they don't
Edited by E300 (Fri 12-Apr-24 17:38:49)
|
|
|
|
Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?
|
|
|
Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?
Some recent drops were a problem created by suppliers, they typically happen overnight.
The problem we are discussing here has been an issue since November last year, where randomly and often during the day the connection has dropped for a quantity of customers because of a hardware problem with A&A's Firebrick they design themselves.
Apart from the recent BT issue that caused a lot of drops over a single night, I don't recall any other drop that wasn't caused by A&A and this issue, or due to A&A having to upgrade or downgrade firmware in order to try and fix it. To put things into perspective, the drops were not very often, I perhaps had one every few weeks, but less so now as they've reverted to stable firmware.
Edited by E300 (Fri 12-Apr-24 17:59:48)
|
|
|
Obviously your situation is highly specialist and unique and so I suspect might have some bearing on the following thought, but, this is getting somewhat high profile and I am wondering why there has been no mention or consideration of getting specialist 3rd parties in to put a 2nd pair of eyes on the situation? I could imagine there would be no value in help with the software, but couldn't the hardware have an issue which a specialist could possibly identify?
I know this is potentially highly costly and often does not yield anything productive, however I know based on situations at work where I have been dealing with an outage or problem, if I couldn't make progress on something so high impacting after a certain period of time, I'd be exploring means of getting a 3rd party in even as means of simply reassuring customers and end users.
FWIW, I've not been affected whatsoever by these problems (either it's because I'm on <80Mbps or I just haven't noticed), however, I have been following it out of curiosity and the lack of mention of getting external help is something that to me has been noticably missing or at least not addressed.
Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?
|
|
|
Are all of these dropouts the fault of A&A? Are any of them Openreach or who ever?
Some recent drops were a problem created by suppliers, they typically happen overnight.
The problem we are discussing here has been an issue since November last year, where randomly and often during the day the connection has dropped for a quantity of customers because of a hardware problem with A&A's Firebrick they design themselves.
Apart from the recent BT issue that caused a lot of drops over a single night, I don't recall any other drop that wasn't caused by A&A and this issue, or due to A&A having to upgrade or downgrade firmware in order to try and fix it. To put things into perspective, the drops were not very often, I perhaps had one every few weeks, but less so now as they've reverted to stable firmware.
Thanks for the clarification.
|
|
|
Thank you
Those with long memories will know that this isn’t the first time AAISP have had the “why don’t you just use Cisco like everyone else?” thing thrown at them and there do seem to be some good reasons.
|
|
|
|
Pretty limited utility to the "average" punter, though. The bandwidth accounting features are useful to AAISP with their commercial model, but not really to the customer, other ISPs simply don't charge for bandwidth. Much of the functionality from the CQM side can be achieved using 3rd party services (arguably even more useful, as it takes into account issues in the ISP's transit).And the bonding services must be dying a death now for residential, with rubbish ADSL lines rapidly being something folk don't need to contend with, though maybe more useful to businesses for failover (I think central bonding is a bad idea for that use case, though, better using distinct network providers with local failover).
I raised the question of the stability of the (then beta) LNS HW around a couple of years ago when I joined AAISP, I was a little concerned it could be an issue. As it would happen, over the period I was a customer the LNS behaved impeccably; I left because no matter the quality of service, the anxiety over the quota system and trying to juggle changing quotas to use up my allowance at minimum cost just annoyed me too much.
I'm with Unchained now, who use Cisco for their LNS; performance is great (900Mbps line rate single threads to Cloudvider iperf services in UK and often EU) - and no quotas at lower cost than the least expensive AAISP service. Small outfit, great personal service. I certainly don't miss any features from the FB9000 I'm (no longer) connected to. I think it is a tenuous benefit for many, and in hindsight my concern was probably founded in common sense (and I got lucky over my tenure).
|
|
|
Pretty limited utility to the "average" punter, though. The bandwidth accounting features are useful to AAISP with their commercial model, but not really to the customer, other ISPs simply don't charge for bandwidth. Much of the functionality from the CQM side can be achieved using 3rd party services (arguably even more useful, as it takes into account issues in the ISP's transit).And the bonding services must be dying a death now for residential, with rubbish ADSL lines rapidly being something folk don't need to contend with, though maybe more useful to businesses for failover (I think central bonding is a bad idea for that use case, though, better using distinct network providers with local failover).
I raised the question of the stability of the (then beta) LNS HW around a couple of years ago when I joined AAISP, I was a little concerned it could be an issue. As it would happen, over the period I was a customer the LNS behaved impeccably; I left because no matter the quality of service, the anxiety over the quota system and trying to juggle changing quotas to use up my allowance at minimum cost just annoyed me too much.
I'm with Unchained now, who use Cisco for their LNS; performance is great (900Mbps line rate single threads to Cloudvider iperf services in UK and often EU) - and no quotas at lower cost than the least expensive AAISP service. Small outfit, great personal service. I certainly don't miss any features from the FB9000 I'm (no longer) connected to. I think it is a tenuous benefit for many, and in hindsight my concern was probably founded in common sense (and I got lucky over my tenure).
Yes there was a decent spell without issues. I agree that with FTTP there is less of a need for the bespoke monitoring they have which I've never needed. Perhaps they should move to tried and already tested third party kit for their FTTP customers as it seems a bit cheeky to use their customers as unpaid beta testers. Their unique selling point, ironically, is the reason I'm now looking at other options.
Their usage caps, well I'm borderline between the two options so just opt for the higher tier so I don't find myself checking usage towards the end of the month and rationing it. Given the issues recently I don't find myself quite as okay with paying over the odds, yet I don't want to go back to metering my connection by moving to their lower tier. Another reason to look elsewhere.
Unchained might be a contender. I did see on their website mention of line bonding and "Firebrick" was mentioned, so I did wonder if they were using Firebricks throughout. Is it definitely Cisco they use? It would be very annoying if after a few months they upgraded their Firebricks or the firmware and they get the same problems.
Edited by E300 (Sun 14-Apr-24 09:37:21)
|
|
|
We do still believe strongly that the FB9000, when stable, offers us features that distinguish our service from the service of almost all others. Simply, we want bonding, CQM graphs, low power consumption, etc.
It is part of what makes our ISP offering different and better; our USP.
Hey Alex,
Appreciate the rest and it's admirable, you make great points and I hope they are accepted with grace and good manners, but a couple of questions for my own interest on the quoted section and another little bit.
Bonding: you folks are only putting customers on 300 Mbit+ onto the FB9000, indicating they are on FTTP. Is there much of a market for bonding FTTP? The only use case I can think of is a very high end residential/SME using you guys across 2 different physical FTTP networks, so needing CityFibre and Openreach FTTP available to them. Using Openreach FTTP via BT Wholesale and TalkTalk Business is a use case however the protection is substantially reduced to the point where an active-backup solution may as well be used.
For capacity aside from racking up multiple Openreach services for higher upload due to their gross asymmetry I can't see another use case. I'm thinking there won't be that many desiring such capacity using your services as, as I recall, your usage per subscriber is very much towards the lower end of the market.
Regardless bonding can be done without PPP on carrier and not so carrier kit as I'm sure you folks are aware. Commodity servers are amazing at LNS duty, FPGA even better, ASIC even better still.
CQM had enormous value during copper times. The only folks on the 9000 are on FTTP. Issues with performance across your carriers are very few and far between now. Utilisation can be measured at the BNG, latency and loss rely on preferential treatment of LCP and without it may be measured out of band with similar accuracy so what customer benefit do you see going forward?
Another question related is are you folks going to use PPP indefinitely given a major feature relies on it? Openreach deliver Ethernet, CityFibre deliver Ethernet, at some point BT Wholesale will deliver Ethernet, they wanted to move away from PPP when GEA started, TTW etc will follow. Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?
Lower power consumption is super important but per subscriber how are the numbers?
I note according to the website the FB9000 has a pair of 10GbE ports, the only two on each appliance, that seem to share a single 10 gigabit backplane to the CPU. You folks start selling 1.8 and 2.3 Gbit services the symmetrical ones especially are going to be problematic: a customer uploading 2.3 Gbit/s is taking 2.3 Gbit/s of downstream capacity from the backplane and as we know saturating downstream for any length of time is hard, saturating upstream, which you folks do not meter, isn't. With relatively skinny pipes and relatively high burst compared to the pipes your LNS end up looking less like LNS and more like transport links with high burst to sustained ratio. You can mitigate this with racks and racks of LNS as you did the smaller models leaving fewer subscribers per chassis but where does that leave power consumption per subscriber, the metric that matters?
With the advent of DTT going IPTV the base load on networks is going to increase substantially and with higher burst products to continue to never be the bottleneck you're potentially going to have to provision a lot of kit and have relatively few subscribers on each chassis. As I mentioned above I remember your usage announcements per subscriber being at numbers many ISPs would envy as they were running at twice or more those numbers. DTT and other services going all-IP will close that gap a ton and your usage will jump reducing customers per appliance even more.
The FB9000 is brand new, can't be purchased yet, what kind of lifespan do you folks see for it?
I think there's a big asymmetry inherent in this part:
This is where our two roles; that of both an ISP with broadband customers, and also that of a hardware manufacturer meet each other head-on and, unfortunately and uncomfortably, collide.
Without using Firebrick the ISP can still be bloody good, IMHO without CQM excellent reactive and proactive support is super important: I don't use you folks because FTTP doesn't really break often so high level support isn't really needed, I have an excellent altnet as my primary service, a personal friend provides my backup and regardless I have my own monitoring due to my job else you guys would be my backup. Without the ISP using Firebrick Firebrick is probably not so healthy putting a fair amount of pressure on the ISP to continue using Firebrick regardless and ensure the business remains viable.
On another note thank you again for my Ignis: he remains in the background of every work Zoom and Teams call as he's just the cutest <3
Edited by XGS_Is_On (Sun 14-Apr-24 12:37:26)
|
|
|
|
A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.
|
|
|
A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.
VPN.
|
|
|
Another question related is are you folks going to use PPP indefinitely given a major feature relies on it? Openreach deliver Ethernet, CityFibre deliver Ethernet, at some point BT Wholesale will deliver Ethernet, they wanted to move away from PPP when GEA started, TTW etc will follow. Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?
Lower power consumption is super important but per subscriber how are the numbers?
That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy
|
|
|
That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy 
The more technical customers (and those prepared to pay A&A rates) will also likely use decent hardware and/or software.
PPPoE at 1Gbps is not hard to achieve in software. It's only an 8-byte extra header; similar work to VLAN tagging (4 bytes).
|
|
|
That is one of the things I was hoping A&A might have had, the option to use IPoE to avoid needing an over powered pfSense or OPNsense box to get the 1Gig throughput, due to PPP having to be done in software. It's the sort of thing they should be doing to keep the more technical customer happy 
Think hardware accelerated PPPoE has been in SoCs for a while now, the weaksauce consumer routers ISPs give need it to hit the throughput target. Shouldn't need too much to get to gigabit throughput through software though unless using the software you described which uses a kernel with bad PPP functionality: single core decapsulation only, no multithreading.
Whether an ISP should be doing something as major as arranging the connectivity and installing BNGs to handle a subset of customers with broken software is a tricky one. PPP at worst doubles the cost of handling a packet but on the x86 kit only really an issue with the software you mentioned. The user always has the option to use different software with the same hardware, for free, rather than expecting the ISP to change their network to fit.
No PPP would break the current implementation of CQM, too, which I imagine is a major issue given how attached these folks clearly are to it.
Giving customers the option will probably have to wait until one of their wholesale suppliers announces removal of support for PPP and they've no choice but to use something else: I think the network, hardware and software is very much engineered around PPP at the moment.
|
|
|
The more technical customers (and those prepared to pay A&A rates) will also likely use decent hardware and/or software.
PPPoE at 1Gbps is not hard to achieve in software. It's only an 8-byte extra header; similar work to VLAN tagging (4 bytes).
Remember how much work can be offloaded to a half-decent NIC relative to all-software implementations. NICs can take some of the work handling tagged frames off the CPUs.
The software in the distributions mentioned in the post you're replying to handle PPP on a single thread while they'd split handling of TCP/IP and UDP/IP across all cores which is I guess the discrepancy and why PPP requires much meatier hardware on those specific software packages.
|
|
|
|
The FTTP900 LNS definitely are Cisco. Unchained have some older Firebrick gear, and do do minimal ping based CQM using a Firebrick to my gateway, but I understand they went with Cisco in order to offer fast FTTP services. If there is something in particular you need to know I'd just reach out to them, they're very responsive.
I was also between quota levels on AAISP, so I would spend one month on the high quota and would then eke out the rollover on the low quota for 2 months. It mostly annoyed me it was so inflexible and blunt and I felt dutybound to not pay for more than I needed if it was being monitored... I don't actually need the £20 it was saving me...
|
|
|
|
No jimbof mate, as Alex (aabloor) said, the Firebrick FB9000 service the FTTP 900Mbit lines - not Cisco.
|
|
|
He’s not talking about AAISP
Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?
|
|
|
|
I was responding to E300's enquiry about Unchained ISP's use of Cisco LNS when they also mention Firebricks. I'm aware AAISIP use the FB9000s (I was on their FTTP900 service for a little over a year).
|
|
|
|
There are a couple of holdouts where companies are not doing HW accelerated PPP on hardware that is barely capable of doing it at line rate; Ubiquiti being one obvious example. I'd welcome a DHCP / Ethernet based service as I do use Ubiquiti gear and have put an OPNsense box in front of it to do PPP...
This seems like a moot discussion though; the only companies that have done DHCP based FTTx in the UK are those who have their own equipment in the exchanges - so Sky and TalkTalk retail I believe. All the BTW backhaul based consumer services like AAISP are PPP because that is how that service works; the login details used steer the connection to the target ISPs LNS. BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.
|
|
|
Think hardware accelerated PPPoE has been in SoCs for a while now, the weaksauce consumer routers ISPs give need it to hit the throughput target. Shouldn't need too much to get to gigabit throughput through software though unless using the software you described which uses a kernel with bad PPP functionality: single core decapsulation only, no multithreading.
Works fine in software but I needed to upgrade from a PC Engines APU2E4 on moving to 1Gig, it would have been fine for the throughput otherwise. Granted in this case the issue is with the software.
It isn't also just FreeBSD based x64 routers that suffer, all routers will max out at a lower throughput when using PPPoE even with acceleration in the SoC. Whether that becomes a problem depends on what speed packages a person can get or upgrades to how and how long they want to keep their kit. In many cases the need for faster network ports means an upgrade, but you have routers coming with 2.5 or 10Gbps ports that can't manage more than 1.5 or 2.0Gbps when using PPPoE (UDM-Pro seems to one example), but will do significantly more otherwise. Also there is a price we are paying for this acceleration in the kit we buy, and I believe most of the world is or are moving away from PPPoE.
A&A do all sorts of techy and niche things, it is their UPS after all, so given its their own kit and their own software they are show casing, I would have thought it was right up their street.
Edit:
BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.
Guess that explains it then. Maybe they have the option with CityFibre in the future?
Edited by E300 (Mon 15-Apr-24 08:51:05)
|
|
|
There are a couple of holdouts where companies are not doing HW accelerated PPP on hardware that is barely capable of doing it at line rate; Ubiquiti being one obvious example. I'd welcome a DHCP / Ethernet based service as I do use Ubiquiti gear and have put an OPNsense box in front of it to do PPP...
This seems like a moot discussion though; the only companies that have done DHCP based FTTx in the UK are those who have their own equipment in the exchanges - so Sky and TalkTalk retail I believe. All the BTW backhaul based consumer services like AAISP are PPP because that is how that service works; the login details used steer the connection to the target ISPs LNS. BTW would have to make significant change to their network to change this, it's not something an ISP can implement on their own.
Your decision to use a vendor selling gateways rather than dedicated routers at your WAN edge with a ton of other stuff sapping the CPU and, hence, throughput. IDS/IDP, managing APs, etc, all takes a toll. Enthusiast kit with big interfaces doesn't guarantee line rate anyways in my experience: believe enabling IDS/IDP really harms throughput too.
A&A also use CityFibre and could use IP directly rather than PPP, however it was more of a forward thinking question on my part. Very aware their primary supplier uses PPP and mentioned that they didn't want to for GEA. At some point it'll go.
No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.
Edited by XGS_Is_On (Mon 15-Apr-24 10:10:08)
|
|
|
Guess that explains it then. Maybe they have the option with CityFibre in the future?
They do, but until BTW start changing it is it worth it because a small subset of users have marginal hardware running single threaded PPP? All expense for zero practical gain. Need more of a business case, especially when your network is engineered around PPP.
My question to them was purely on the basis that one of the first things mentioned as a case to continue using Firebricks was CQM and that relies on PPP, as does their bonding code, so is the plan to overlay PPP even when the supplier networks are all IP. Also are they still all in on CQM despite that, in the FTTP world with wholesale supplier performance much better, there's far less of a need for it to detect issues.
Customers blip you get a bunch of notifications when they lose and reestablish from the wholesale supplier and your own BNG. Usage is easily monitored at the BNG.
|
|
|
They do, but until BTW start changing it is it worth it because a small subset of users have marginal hardware running single threaded PPP? All expense for zero practical gain. Need more of a business case, especially when your network is engineered around PPP.
You keep using the argument of cost or small number of users would benefit, but this ignores the whole ethos of A&A and why people pay extra to join them. I'm sure some made the same arguments back in 2002 when they started implementing IPv6, as in why go to the trouble, no one else uses it, limited websites to connect to, who benefits, what is the business case etc. A&A could turn IPv6 off tomorrow and the Internet would still work so it really wasn't necessary, but they went ahead anyway and were one of the first to do it.
A&A do try to do things differently, try to be ahead of the game, cater for the techies, work with industry to push things along, they advertise as being tech nerds themselves. If they've become like any other ISP, doing nothing unless forced to because it costs money with no practical short term gain, don''t cater for the "tech nerds" by offering anything different or innovative, then what is their selling point given the premium for their services and having their capped usage model that no other ISP has these days? Their "we fix any line" and all the monitoring as a selling point is disappearing as fast as FTTP is appearing
As I said, this was something I would have expected to have been right up their street, I understand the arguments against doing it for 99.99% of all other ISPs.
Edited by E300 (Mon 15-Apr-24 10:38:14)
|
|
|
No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.
When you say "no requirement" - you're talking about a hypothetical network setup BT could deploy, right, not something that is available now but ISPs aren't choosing to use?
BTW /Openreach appear to be doing the bare minimum of anything with the FTTP implementation, and there looks to be no steering in place with the BTW ISPs I've used.
I have had 2 BTW connections at the same property on the same PON, and BTW don't even police that a certain ISP PPPoE login is originating from a certain ONT. I could happily connect to either BTW ISP (in this case, AAISP or Unchained) via either of the ONTs that I had, and the connection ended up at the ISP whose login details you used, with no obvious error if you used the wrong ONT. In the case of AAISP, it totally broke their CQM setup as their CQM setup looks for the incoming ID it is expecting to tie up the data with your account, yet they still allows the PPPoE connection.
|
|
|
A&A do try to do things differently, try to be ahead of the game
Enabling IPv6 is being ahead of the game; making their network more complex to support poor quality client routers in limited scenarios is not.
It's not as if anyone is proposing phasing out of PPP. As others have said, PPP has major advantage for broadband delivery: bonding is one, being able to use L2TP for backup connections is another.
Talktalk only did IPoE because they needed multicast for their old TV service. It's not because they want to make a service for techno-nerds. After all, they're *not* deploying IPv6 either.
|
|
|
Talktalk only did IPoE because they needed multicast for their old TV service. It's not because they want to make a service for techno-nerds. After all, they're *not* deploying IPv6 either.
They didn't have to do it that way though, did they? BT retail also have a multicast IPTV offering while using PPPoE for the client. They could have mirrored that setup I believe.
|
|
|
No requirement to use PPP to steer traffic to BTW customers: it's not dialup and they know which port on the DSLAM/OLT the connection came in on which is all they need. Map that to the ISP, repackage in different VLANs, EVPN, off it goes.
When you say "no requirement" - you're talking about a hypothetical network setup BT could deploy, right, not something that is available now but ISPs aren't choosing to use?
BTW /Openreach appear to be doing the bare minimum of anything with the FTTP implementation, and there looks to be no steering in place with the BTW ISPs I've used.
I have had 2 BTW connections at the same property on the same PON, and BTW don't even police that a certain ISP PPPoE login is originating from a certain ONT. I could happily connect to either BTW ISP (in this case, AAISP or Unchained) via either of the ONTs that I had, and the connection ended up at the ISP whose login details you used, with no obvious error if you used the wrong ONT. In the case of AAISP, it totally broke their CQM setup as their CQM setup looks for the incoming ID it is expecting to tie up the data with your account, yet they still allows the PPPoE connection.
Openreach wouldn't care about the connections they'd just send both to BTW. They are more than capable of putting the connections into different VLANs on the same 4 port kit, though. I've had, briefly, a 4 port ONT sending 2 to BTW and 1 to Zen GEA.
Yes, I was talking about future capabilities, not current. I can't recall exactly what I was responding to so apologies if out of context. My point was there's no need for PPP to achieve what BTW need to.
|
|
|
|
I get that there's no reason for it to have to work this way - but it's all in BTWs realm, and there's no indication that will ever happen (witness: 20 years of FTTC working this way, using PPPoE). I think the main point is folk are thinking that by having custom kit or being a niche provider AAISP could do something different and offer an alternative configuration with FTTP via BTW , but the overarching point is they can't because PPPoE is the way BTW make it work for everyone.
|
|
|
For a company that tries to pride its self on transparency the ball really has dropped recently. The FireBrick platform has some good features but the trade off is an unstable internet connection at a premium cost. I’m rooting for the team to get things fixed but they might have to go back to the drawing board and admit the current platform isn’t going to work for customers. Noticeable speed drops, connection drops and lack of transparency is making me heavily reconsider who I go with next. It won’t be A&A.
Is speed not stable then? Some peeps I know who joined AAISP in 2022, were saying it was great, but thats bad news if speeds arent sustained anymore on AAISP FTTP.
|
|
|
A selling point of A&A is the L2TP service that you can use if your primary service goes down. I don't know how easy it would be to provide that sort of functionality if you remove the "session" element of PPPoE and resort to DHCP. Maybe the easiest way to achieve it would be to have the L2TP and FTTP service allocate different IP addresses and relying on the end user to run a routing protocol to use their subnet on whatever connection they decided they wanted it on.
You dont need PPP auth for that, I use personal VPN setup's authing with a certificate, but it can also be done in other ways as well, I do echo XGS questions on the PPP, and with the contents of this thread have some doubts about my FTTP order with AAISP as well, the thing saving the order right now is that AAISP is only a 1 month commit, if it was the normal 18+ months I think I would have pulled the plug.
|
|
|
Is that with CityFibre then? Openreach is still showing as 12 months on their website.
https://www.aa.net.uk/broadband/home1/
|
|
|
Yes its a CF order, states 1 month on AAISP's website, I probably will get conformation from sales, but I am pretty sure they already told me its 1 month as well.
|
|
|
|
Yes, it's shown as only 1 month for CF, I'm sure you are right in that case.
|
|
|
|
I keep getting hopped over different LNS devices and it’s a bit of a lottery. I am paying for 1GB internet and I’m lucky if I get 330MB. Some days it’s 200. I had the same router and ONT with wholesale TalkTalk and I was getting 800MBs. It feels like a downgrade.
|
|
|
Apparently Yayzi dont use PPPoE which has got my attention, so seems there is at least one ISP out there that isnt a Sky, VM or TT that uses IPoE.
|
|
|
I keep getting hopped over different LNS devices and it’s a bit of a lottery. I am paying for 1GB internet and I’m lucky if I get 330MB. Some days it’s 200. I had the same router and ONT with wholesale TalkTalk and I was getting 800MBs. It feels like a downgrade.
Ok I think you have a different issue, if those kind of speeds were common place we would be hearing about it, seems like a potential provisioning issue to me, I would contact AAISP support about it.
|
|
|
|
Just for clarity I also have experienced the LNS stalling issues I was told they are looking into it and they had added NVMe drives to some of the LNS systems to get things resolved.
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past. I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.
Like I said when it comes to contract renewal I’ll be looking elsewhere.
|
|
|
|
Whilst there's technical reasons to keep PPP in the BTW and TTB networks that would be harder if it was a pure layer 2 delivery (at least in my opinion), they both do offer VLAN based handover on their EoFTTx products - which attract a fair premium, and they probably want to keep EoFTTx as their premium offering to maintain that extra profit margin, which would incentivise keeping PPP in place for their less premium offerings.
FTTx is already handed over on the L2S in the exchange as a VLAN anyway.
|
|
|
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past.
You really need to test using a wired connection. Wi-Fi is fairly fluid, it can work great one day and not so the next depending on interference or if it has decided switch channels etc.
I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.
I had the same problem 2 years ago and provisioned on to the wrong LNS, I'd have thought they had got this fixed by now, obviously not.
You could try connecting to the test LNS and see if that gives you better speed, just prefix your username with test- and reconnect. I see better single thread speeds when on that one presumably because it isn't loaded up with many customers. If its better you can assume the slow downs are not due to BTWholesale and backhaul
Edited by E300 (Wed 17-Apr-24 13:09:39)
|
|
|
Just for clarity I also have experienced the LNS stalling issues I was told they are looking into it and they had added NVMe drives to some of the LNS systems to get things resolved.
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past. I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.
Like I said when it comes to contract renewal I’ll be looking elsewhere.
If it was load/congestion related your max would still be line rate as congestion wouldnt be around the clock.
Do you actually use the net over wifi or ethernet? Expecting gigabit throughput over wifi isnt realistic, and you need good equipment to even get half way there.
Your 300 does smack of provisioning problem as conveniently thats around the figure people report when OR get it wrong, and its also not far off what people reported when on the old AAISP LNS. But this is moot until you actually test over ethernet.
Edited by Chrysalis (Wed 17-Apr-24 13:57:58)
|
|
|
I have contacted support about the speed issues but have been told it’s my WiFi. Again same unit as my previous ISP. I suspect it’s due to BT Openreach being my carrier instead of TalkTalk Business which had LLU’d the local exchange. I suspect that’s how I’ve been able to get better connectivity in the past.
You really need to test using a wired connection. Wi-Fi is fairly fluid, it can work great one day and not so the next depending on interference or if it has decided switch channels etc.
I was originally provisioned to the incorrect LNS by A&A they did fix it but I still get speed drops.
I had the same problem 2 years ago and provisioned on to the wrong LNS, I'd have thought they had got this fixed by now, obviously not.
You could try connecting to the test LNS and see if that gives you better speed, just prefix your username with test- and reconnect. I see better single thread speeds when on that one presumably because it isn't loaded up with many customers. If its better you can assume the slow downs are not due to BTWholesale and backhaul
Whats your single threaded like on the production LNS? I am used to max single threaded around the clock, so would be disappointed if its like Zen.
|
|
|
This is the current test, a bit better than it has been on the single thread test. I can replicate similar numbers by doing a single thread test at Speedtest.net.
IPv4, consistently around half the speed on a single thread.
https://www.thinkbroadband.com/speedtest/17133601033...
IPv6, which typically used to give lower figures due overheads is now better than IPv4, so something is a bit odd somewhere just recently
https://www.thinkbroadband.com/speedtest/17133599514...
The test LNS did a lot better on the single thread tests, 700+ from memory, but I can't disconnect and try it again at the minute.
It doesn't really matter now as I'm migrating away, the issues have gone on too long and the premium I'm paying isn't justified in the service received anymore. It is clear we are beta testing their new kit and paying them a premium to do it, I've stuck around and given them the benefit of the doubt but 5 or 6 months now and still not fixed.
Okay we are running back on stable firmware, but it still means another round of upgrades to come and potentially the same problem then rolling back again in the near future, then it all starts again a few weeks later. I think if they can't fix it after all this time, even when they have a stable firmware they should be able to compare and see what code changes have taken place, suggests this problem may not be fixable anytime soon and is quite a low level hardware issue.
|
|
|
Thank you, this combined that you see it performing better on the test LNS does indicate something not been right.
I am letting my order progress, but I wont lie my eyes have already wondered. Hence the post I made a few posts back.
I agree also on the hardware. Over the years I have done my own far share of diagnosing, kernel panics and the like, and it is good AAISP have been open, but when I read about CPU deadlocks, and it happening on all of the units, I cant look past either some kind of hardware issue (VRM or whatever) or a bios programming issue, the fact the factory firmware seems to be stable, but at the cost of not fully utilising the hardware gives hope a bios or p-state/c-state type adjustment could stabilise the entire kit and if I was AAISP that is what I would be pursuing, I have managed to make unstable kit stable via those kind of tweaks. But of course thats down to AAISP to figure out, I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned, the path they ended up taking only served to hurt both brands I think. It is good at least they have now pulled back to the factory firmware though and left the test device as an optional one for customers to utilise. They at least got to that point in the end.
Curious, if you dont mind saying, who are you migrating to?
Edited by Chrysalis (Wed 17-Apr-24 15:19:05)
|
|
|
I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned
Exactly, I've not heard of a plan B except they haven't got one, they are just blinkered into Firebrick and that's it, and I just got to the point of thinking it is too much money a month to be a beta tester. I agree they should have offered some rebate or reduced monthly premium whilst these issues continued to those affected.
After some research and some info from another user here I'm moving to Unchained. So far a couple of questions I've had have been answered in mere seconds it seemed, and its great to support a smaller business and get that more personal touch. It's cheaper than A&A as well, mind you most ISPs are!
A&A say they are transparent and happy to engage, but I think the last update on the problems only came about because of more noise in this thread, and despite saying they are happy to answer questions, questions asked of them here hasn't had any response I've noticed! They talk the talk...
I'm sure A&A will get it sorted out it in the end one way or another, and maybe I will return as a customer in future who knows. It was great for many months,couldn't fault it, but I can't justify to myself the premium for their product whilst these issues are ongoing with no end in sight.
Edited by E300 (Wed 17-Apr-24 16:02:05)
|
|
|
Thank you, will see how my experience goes when I rejoin, but I am preparing for the worst if it happens, and I appreciate you sharing.
|
|
|
Good point E300, aabloor the General Manager of A&A registered here on Fri 12-Apr-24 10:24:36 to post his message:
https://forums.thinkbroadband.com/aaisp/t/4755788-re...
he finishes that message with:
"Thanks for taking the time to read this, and we are happy to answer any questions, of course."
But no response has been made to ANY of the questions that have followed... and he still has:
Total Posts 1
|
|
|
Whilst there's technical reasons to keep PPP in the BTW and TTB networks that would be harder if it was a pure layer 2 delivery (at least in my opinion), they both do offer VLAN based handover on their EoFTTx products - which attract a fair premium, and they probably want to keep EoFTTx as their premium offering to maintain that extra profit margin, which would incentivise keeping PPP in place for their less premium offerings.
FTTx is already handed over on the L2S in the exchange as a VLAN anyway.
I don't understand: why would it make a difference how premium something is whether it's delivered via PPP or Ethernet? Surely it's the SLA and QoS behind the service that makes it premium, not whether it's delivered as Ethernet or PPP?
Ethernet over FTTC is FTTC with a service level agreement - https://daisycomms.co.uk/resource/what-is-ethernet-o... - but is presented to the customer as Ethernet. It could happily be a PPP service with the NTE stripping the PPP to give an Ethernet presentation. The presence of PPP costs a little throughput but beyond that it makes no difference to the service's reliability or performance.
On it being harder not really, just different. CityFibre provide Ethernet handoff to everyone that wants it with PPP as an option. Broadband Network Gateways built to handle Ethernet presentation and handoff may eliminate the need for PPP, allowing providers to use EVPN technologies. Still encapsulating but a whole lot more you can do with the traffic than PPP over L2TP tunnels.
Edited by XGS_Is_On (Wed 17-Apr-24 20:34:22)
|
|
|
I was surprised at the resistance to utilising cisco as a stop gap, and no compensation got mentioned
Exactly, I've not heard of a plan B except they haven't got one, they are just blinkered into Firebrick and that's it, and I just got to the point of thinking it is too much money a month to be a beta tester. I agree they should have offered some rebate or reduced monthly premium whilst these issues continued to those affected.
Mentioned this in my earlier post to Alex: the ISP arm can't drop the Firebrick hardware as they're the largest customer by far and if the ISP that's part of the same group as the hardware company are using someone else it's not a great sales story.
With that in mind no surprise at all that AAISP have no intention of using anything other than Firebrick. The hardware was built with their requirements in mind, customised to them and relies on them for sales. These folks work out of the same offices and are mates and colleagues so why would A&A do anything other than keep the faith unless the issue were catastrophic?
I'm not being mean with that, I'm just, well, surprised at Chrys being surprised. I would've thought this would be obvious.
|
|
|
|
The speeds I got previously was 800MB on the 5Ghz signal with the very same make and model router and the same ONT. the only thing that changed was the backhaul and the ISP.
I’ll give the other instructions a go but what I will say is my setup hasn’t changed much apart from a new router supplied with A&A and we have channel hopped multiple times to no avail. I’ve been told I’m lucky I’m on 300MB and that’s it. For me it’s not enough for the premium I’m paying.
|
|
|
Good point E300, aabloor the General Manager of A&A registered here on Fri 12-Apr-24 10:24:36 to post his message:
https://forums.thinkbroadband.com/aaisp/t/4755788-re...
he finishes that message with:
"Thanks for taking the time to read this, and we are happy to answer any questions, of course."
But no response has been made to ANY of the questions that have followed... and he still has:
Total Posts 1
Indeed - no point saying something like that and then not responding to any questions posed.
Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?
|
|
|
Well I didnt say that, was surprised it wasnt considered for temporary use. Maybe a naive thought.
Edited by Chrysalis (Thu 18-Apr-24 13:43:51)
|
|
|
Well I didnt say that, was surprised it wasnt considered for temporary use. Maybe a naive thought.
Understood. Think even temporary use would not be possible: no CQM, etc and they are huge fans of CQM. A&A continue to install these devices into datacentres so seems like they're committed.
|
|
|
Within this same time frame, we have had multiple instances of BT Wholesale doing planned work which they had not told us about in advance (and apparently not told other ISPs, too). We could have zeroed the impact of their planned work, had they told us they were doing it beforehand.
Multiple times we have raised this with our account manager and at higher levels, and we still have not had a satisfactory response. Of course, no wholesale network is 100% reliable; we are not unreasonable about this, but the combined appearance, especially to customers not following matters closely, is that it's "another LNS blip".
Hi Bloor!
Is there any way you could enhance your Continuous Quality Monitoring to provide a better indication of where in the network a fault is occurring. I believe the major areas are:
- a problem with the the DSL line / Modem
- a problem with the BT/TT wholesale network from the DSLAM to the LNS
- a problem with the AAISP LNS
- [a problem getting from the AAISP network to the Internet]
At the moment it is very hard to tell when there is a problem, where the problem is.
Is it possible for you to monitor the DSLAM, to compare with the DSL line? Or are you prevented from that kind of access?
Thanks,
njh.
|
|
|
More speed drops. Speed went to 100MB, after reboot it’s now sitting at 200MB. Paying for 800MB. Again before anyone jumps to conclusions I used the very same router and ONT with my previous provider and the speeds was around 800MB. It was business broadband wholesale talktalk previously. Now the back haul is over bt open reach and not using the LLU kit at the exchange.
Called up A&A. Put through to the sales line. Been told it will cost me £160 to cancel the service. Really don’t like being a beta tester for a product. Silence is golden from A&A on the forum says everything I need to know as a customer. I use the line as a fully remote worker so I depend on stable speedy internet and I’m very disappointed in the service.
Edited by Sun4Lw5LIQy (Tue 23-Apr-24 14:05:30)
|
|
|
More speed drops. Speed went to 100MB, after reboot it’s now sitting at 200MB. Paying for 800MB. Again before anyone jumps to conclusions I used the very same router and ONT with my previous provider and the speeds was around 800MB. It was business broadband wholesale talktalk previously. Now the back haul is over bt open reach and not using the LLU kit at the exchange.
Called up A&A. Put through to the sales line. Been told it will cost me £160 to cancel the service. Really don’t like being a beta tester for a product. Silence is golden from A&A on the forum says everything I need to know as a customer. I use the line as a fully remote worker so I depend on stable speedy internet and I’m very disappointed in the service.
Hi, I'm not aware of your problems personally, but Shaun here has ben looking and is getting in touch with you.
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
Hi Peter,
There have been a number of announcements of LNS firmware related work, not always with a vast amount of notice, I admit. But I think where we've been rolling back the firmware we've notified of this; at least that is the intention. Where we've been unable to give a lot of notice, the decision to do this has been made on the basis that a little notice but with sooner stability is still better than a lot of notice, but a delay to stability. I think that makes sense! Hopefully?
Alex.
---
Bloor
GM, A&A.
|
|
|
Many thanks indeed for the support!
Alex
---
Bloor
GM, A&A.
|
|
|
That's a good question.
I think the basic answer is that there does not seem to be an "LNS that seems to have had the most lockups".
These numbers are made up/approximate but I'm using them to hopefully try and illustrate the situation :
Let's say we have 10 live LNS running. And about every 30 days one hangs. This means we can say that means there is a hang approximately every 300 "LNS days". "LNS days" being a measure a bit like "man hours". The blips are still a fairly rare event (though obviously not rare enough). This means waiting a quite considerable amount of time to know for sure if a fix has worked. The way to progress fixing it at maximum speed is to deploy it as widely as possible, to try and ramp up the speed of acquisition of "LNS days", in the fewest number of "day days".
This does go back to my point about being a hardware developer as well an ISP, and how sometimes occasionally these two differing objectives can collide head on.
Your point is well received though that every kind of outage is bad, and that of course we do have to shuffle customers around (which can be seen as a "drop" to do the upgrades themselves.
Just to clarify though: when we were testing new software, obviously it ran on our test rig at our offices first, then loaded one at a time onto live LNS. The plan was to update them all one by one over a week or two. A hang occurred after doing I think only two live LNS, so the plan was halted, and then the rollback decision was made.
Alex.
---
Bloor
GM, A&A.
|
|
|
Hi there,
Just replying this point:
"Why there has been no mention or consideration of getting specialist 3rd parties in to put a 2nd pair of eyes on the situation?"
There are already several people working on this internally. Without exaggeration there really probably aren't too many people out there who can be asked. SOOC manufacturers tend to assume you will be running something like Linux on their chips. Certainly out approach of an entire ground up OS, hardware drivers, networking etc is rare to the extend that some initially have trouble believing it to be true. But it really is.
That being said, yes, contact has been made with both TI and ARM on this, suggestions made, ideas explored.
It wasn't mentioned because this is all a pretty standard part of hardware development work, I think, in situations such as this one.
Cheers-
Alex.
---
Bloor
GM, A&A.
|
|
|
There really honestly are good reasons.
And yes you are right! We have somewhat been here before.
-A
---
Bloor
GM, A&A.
|
|
|
Inevitably as time has gone by alternative solutions have come up.
And yes, not all end users can read and understand CQM graphs.
But as a business, I think we would be far far less capable of offering the services and support that we do without the FireBrick.
Not to take away from anything you say, though.
Cheers,
Alex.
---
Bloor
GM, A&A.
|
|
|
Replying hopefully in order:
"Is there much of a market for bonding FTTP?"
Not zero but not a huge amount. As you correctly identify, some of that small amount of demand is from those who wish for higher upload speeds, rather than downstream speeds.
"Utilisation can be measured at the BNG, latency and loss rely on preferential treatment of LCP and without it may be measured out of band with similar accuracy so what customer benefit do you see going forward?"
You are right, CQM was an absolute assassin for problems on copper based RF services. And as fibre services are far more reliable, and not susceptible to things like water, REIN etc. But undoubtedly it has allowed us to diagnose a surprising range of issues with fibre circuits, backhaul network stuff can still be sniffed out, and even things like problematic GPONs etc. And of course So it definitely still has a use.
"Are you folks going to be having the CPE putting customer data in PPP purely for the LCP echoes to keep CQM running?"
At present we feel that it's still the best option to run all "broadband" circuits using PPP. Ethernet is a different matter, although I think we may have some doing PPP over ethernet for specialist reasons.
"Lower power consumption is super important but per subscriber how are the numbers?"
Unquestionably the FB9000 is lower consumption per Mbit/sec or per customer or pretty much however you want to measure it than FB6000. So it is "greener". Really and truly compared to any PC based or (ex  Cisco solution it is minuscule. Absolutely tiny. At present we have no plans to launch higher than 1Gbit services via BTW as (I think) they are currently not offering it. We may offer slightly higher services via CityFibre, and blended in with other customers; we don't envisage it being too much of a problem in reality.
"The FB9000 is brand new, can't be purchased yet, what kind of lifespan do you folks see for it?"
The FB6000 lasted something like fifteen years, in active live use. I do not see any reason why the FB9000 wouldn't have a useful life of perhaps close to ten.
Thanks for your comments. Hopefully I've managed to answer vaguely OK most of them! Yours has been one of the most interesting to write a reply to so far!
Cheers-
Alex.
---
Bloor
GM, A&A.
|
|
|
This has been addressed now.
I wanted to wait until there were a few questions, before answering them all in one go; particularly as some questions do repeat, I figured this was the best way.
Sorry if it looks like there's been a gap in contact.
Alex.
---
Bloor
GM, A&A.
|
|
|
|
I've been on AAISP for the last week or so now and I haven't experienced any speed drops.
I can consistently hit a 940 speedtest wired or wireless (6E, iPhone 15 Pro Max, admittedly sat close to the AP in the living room)
|
|
|
Hi njh,
Thanks for the question.
CQM is rather a blunt animal. It is an echo and response at the PPP layer, which we then graph the timing of. Not much more or less than that. So with that methodology, it's rather hard to get inbuilt diagnostics of the sort you are suggesting.
We have built up a lot of knowhow over the years of how to interpret and read the graphs though as is described in the following article : https://support.aa.net.uk/CQM_Graphs
Over the years these have been able to narrow down a surprising variety of faults.
It would be theoretically possible for us to have realtime stats from things like intermediate hardware, if our carriers allowed it. But it wouldn't be "part of CQM". It could potentially be correlated against CQM data, though. Honestly I doubt this is likely to happen though, as carriers are famously defensive about giving anyone detailed stats about their networks and equipment (and understandably so).
If you've not seen the wiki page above before, it might give you a new appreciation of what sorts of stuff we can infer by looking at graphs at least!
Cheers,
Alex.
---
Bloor
GM, A&A.
|
|
|
Adding to Andrew's point... just for clarity (and others have already said similar to this) ...
I am 100% certain this specific customer's problem is nothing whatsoever to do with the LNS troubles that are the subject of this thread. At no point has there been any impact on throughput. This has been an issue with stability, not throughput in live use.
We will obviously work with the customer to resolve it, just as we would with any other.
Alex.
---
Bloor
GM, A&A.
Edited by aabloor (Tue 23-Apr-24 16:37:38)
|
|
|
Tangential thought :
One possible use of AI would be to make a stab at interpretation of graph results.
We probably won't do it, but it feels like it would be a perfect application.
Alex.
---
Bloor
GM, A&A.
|
|
|
Tangential thought :
One possible use of AI would be to make a stab at interpretation of graph results.
We probably won't do it, but it feels like it would be a perfect application.
Alex.
It's what day job and others do: use AI to interpret telemetry and both assist with reporting and take action where appropriate.
|
|
|
SOOC manufacturers tend to assume you will be running something like Linux on their chips. Certainly out approach of an entire ground up OS, hardware drivers, networking etc is rare to the extend that some initially have trouble believing it to be true. But it really is.
Hmm.As a business model that sounds over ambitious for a relatively small company. Is your core expertise developing embedded systems or is it running an ISP network? If you are not 2 separate divisions with an arms length commercial relationship, such that the ISP is free to source its embedded systems elsewhere, I would not look at you as an ISP.
As for developing without Linux, you are again requiring 2 different core expertises, OS development and embedded network system development. There is a dichotomy on something like this between starting with nothing and building upwards or starting with Linux and cutting it down. The first course seems more suited to a large player, where the OS and the application development operate at arms length and you are forced to maintain abstraction between layers rather than taking [road to hell] advantage of the potential to abrogate abstraction when it gets in the way of an easy solution. As an outsider, I don't have the insight to what is going on with A&A, but the troubles reported here look like what I expect some time after strict layering of software has been abrogated.
Of course I could be wrong about the situation ...
|
|
|
It is an ambitious business model.
But this is (quite extremely) not our first product.
In the last 20 years, we have done hardware and firmware on the FireBrick Plus and Soho, FireBrick 105, FireBrick 2500, FireBrick 2700, FireBrick 6000, FireBrick 2900, and now FireBrick 9000. We have many thousands of units out there in the wild, not just with A&A customers. We did change architecture after the FB105.
In summary, we do have considerable experience in this field.
We are a company of between 20 and 25 people.
Some of them work on the ISP business (only).
Some of them have a split role.
And some are essentially FireBrick only.
We have people whose only job is FireBrick software development and nothing else. This is not a case of some people who know devops "having a crack" at low level firmware development, by a very very long chalk.
The two business activities are somewhat symbiotic. It would not really make any difference, other than adding admin overhead to try and split them up. A&A The ISP absolutely is free to buy hardware from wherever. Splitting them up really wouldn't change that.
I feel your reply may possibly (entirely reasonably) assume we have far less experience and track record than we do. We have been doing this for years. It is hard but historically we have punched considerably above our small weight.
We have a website that has some in-depth blog posts which should also highlight broad aspects of our approach www.firebrick.co.uk if not perhaps in this subject area. We cover mistakes we've made in the past, and how we've resolved them. We fully intend to document this current situation, once we are comfortable it has been resolved.
Example of such a blog : https://www.firebrick.co.uk/about/news/fb2700-psu-tr...
And a bit of general history : https://www.firebrick.co.uk/about/history/
Alex.
---
Bloor
GM, A&A.
Edited by aabloor (Wed 24-Apr-24 11:39:57)
|
|
|
It is an ambitious business model ...
I am not doubting A&A's experience - it is needed to get away with what I consider actually to be a low level of issues for the ambition you have. Obviously you are a long way with this, so I wouldn't be suggesting a switch to Linux, but I if I am questioning anything about your development it is the decision not to start with a Linux core and strip that down.
And I don't think that it is people who know devops "having a crack" at low level firmware development. One of the hardest things with some development is holding it all in one head, which can lead to breaches of abstraction - this is where abstraction carried out between layers owned by different people leads to a conversation where neither side knows too much about the other and a conversation between 2 people is far more effective in splitting a problem than trying to resolve it all in one mind. Yes, your size makes you nimble, but it can also place too much of a load on someone who understands both sides of a problem.
|
|
|
|
I think its worth re-evaluating your business model from time to time as what may have worked successfully for 20 years may not work for the next 20 years.
|
|
|
|
No more LNS drops for me since 7th April 2024 - now that the FB9000 firmware has been reverted.
I am so glad to no longer be a beta tester, and once again have the reliable connection that I pay for!
- Perlen
|
|
|
I have had no issues since FTTP circuit activated. (week 3 April)
Edited by Chrysalis (Sat 04-May-24 01:48:17)
|
|
|
I am so glad to no longer be a beta tester, and once again have the reliable connection that I pay for!
Maybe they could consider some kind of discount to encourage use of beta LNS instead of foisting the updates on the unsuspecting.
|
|
|
|
They publish how to use the "test" device. No discount mentioned.
|
|
|
|
I am aware, but it seems they've said part of their issue may have been not having enough folk on the test LNS before it went to live; there isn't any incentive at present to be on the test server (particularly given the premium pricing). Some kind of discount might encourage more to sign up for the test LNS.
|
|
|
I am aware, but it seems they've said part of their issue may have been not having enough folk on the test LNS before it went to live; there isn't any incentive at present to be on the test server (particularly given the premium pricing). Some kind of discount might encourage more to sign up for the test LNS.
Something sensible might be divide say 20-30% of the monthly sub by 30 and then add a automatic compensation scheme, that will credit the account by that amount for every 24h when on a test LNS, this would also apply during periods when a open incident within their control is affecting service. If AAISP ever decide they dont need people testing, or they decide the testing after all isnt as valuable as they say it is then they could avoid paying this compensation by capping max connections on the test LNS or take it down.
|
|
|
|
Week 2 April here and it's been 100% other than when I've been faffing internally.
|
|
|
Week 2 April here and it's been 100% other than when I've been faffing internally.
Yeah my CQM I expect doesnt look great since circuit installed, but lots of work on my equipment, and some outages from national grid work also (I have moved ONT to UPS power now though since earlier this week). Not noticed any outages ISP side yet.
Edited by Chrysalis (Fri 24-May-24 16:35:30)
|
|
|
Update on the status page:
Our LNSs remain stable running the 'Factory' software. Work continues in our test lab and on non-customer affecting parts of our network to track down the problem with the alpha software.
For the past month or so our service has no incident of LNS hangs, most of our FB9000 LNSs have an uptime approaching 100 days. We are confident that the Factory software is stable.
Interesting that it was "alpha" software that was causing the problems. But good that the released software is stable.
|
|
|
Update on the status page:
Our LNSs remain stable running the 'Factory' software. Work continues in our test lab and on non-customer affecting parts of our network to track down the problem with the alpha software.
For the past month or so our service has no incident of LNS hangs, most of our FB9000 LNSs have an uptime approaching 100 days. We are confident that the Factory software is stable.
Interesting that it was "alpha" software that was causing the problems. But good that the released software is stable.
Seeing as they develop the software themselves I wouldn't read too much into their nomenclature of calling it "alpha". It wouldn't have been in their interests to knowingly put buggy software into production. It probably was thought of as stable and about to make it to that as such, but then the problems happened so it never quite ever got out of the stage of officially being "alpha".
Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?
|
|
|
|
I do wonder though, what shiny new features were so important in this new software that (a) they put it straight into production, and (b) they persisted with trying to fix it *in production* for months, rather than rolling back to the stable software straight away.
The positive from this is that the existence of software which is stable under the same load strongly suggests that it's *not* flaky hardware after all. Not 100%, but very likely.
|
|
|
I do wonder though, what shiny new features were so important in this new software that (a) they put it straight into production, and (b) they persisted with trying to fix it *in production* for months, rather than rolling back to the stable software straight away.
The positive from this is that the existence of software which is stable under the same load strongly suggests that it's *not* flaky hardware after all. Not 100%, but very likely.
I'm assuming they've gone back to the original Firebrick OS that only runs on 2 cores and the OS they need to get working is the complete rewrite that allows the new Firebrick to run using all the cores. This would explain why they can't just compare the code differences between the working one and the one that crashes, as they can't be compared. This might also explain why they needed to throw up more of the new Firebricks as they would be under-performing.
All conjecture on my part of course, but if you read this blog https://www.firebrick.co.uk/about/news/version-20/ the dates of the new version 2.0 OS going live coincided with the all the problems starting, and prior to the OS 2.0, the new Firebricks were stable when running on the original older software utilising only 2 cores.
Edited by E300 (Sat 25-May-24 09:50:00)
|
|
|
|
We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?
|
|
|
|
I also had an outage last night between 23:41 and 23:49, during which time my router actually received a WAN address in the 10.93 range - and I'm now with Zen via BTW! I put it down to a BTW issue.
|
|
|
I'm assuming they've gone back to the original Firebrick OS that only runs on 2 cores and the OS they need to get working is the complete rewrite that allows the new Firebrick to run using all the cores.
Ha - "complete rewrite". Microtik did that for RouterOS v6 -> v7, and some three years later the "stable" train still is in complete flux and there's no "long-term" version at all.
|
|
|
Yeah it looks like it wasnt AAISP wide, no outage here (via cityfibre).
|
|
|
We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?
On the BTW network also observed the same outage for 11 minutes. Internet came back online shortly after. No network status page for this fault did check Zen Internet’s pages as they sometimes include downtime that BTW has planned on exchanges that’s publically viewable.
|
|
|
We had a very short outage at 23:40 last night. It's the first time since they reverted to the stable firmware. Anyone else have any similar issues?
Interesting I had an outage May/30/2024 23:41:46 for 3 seconds. Aquiss, Openreach fibre, City Fibre backhaul.
The fact that this has occurred across such a wide range of ISPs tends to suggest there might be a single point of failure out there for quite substantial disruption.
|
|
|
10.93.x.x?
I smell CGNAT
|
|
|
|
I have 100% uptime since 25 April, and even then that was because I was doing something internally.
|
|
|
Currently at 75 days, since the day I plugged ONT into my UPS.
|
|
|
It seems like the dramas just go on and on:
https://aastatus.net/recent.cgi
Tuesday night: disconnections of broadband lines, 3x LNSs lost power.
Wednesday: shuffling people back around to the right LNS.
Wednesday afternoon: routing and traffic loss problems for two hours.
This morning, Thursday: massive packetloss and disruption for 45 minutes.
Edited by perlen (Thu 24-Oct-24 08:26:55)
|
|
|
Tuesday night: disconnections of broadband lines, 3x LNSs lost power.
Wednesday: shuffling people back around to the right LNS.
Wednesday afternoon: routing and traffic loss problems for two hours.
This morning, Thursday: massive packetloss and disruption for 45 minutes.
I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.
|
|
|
I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.
Well given how transparent they say they are, they are now managing the bad PR by hiding the blip chart on their status page and not publicly saying what the issues are, but you can email in, perhaps sign an NDA before they tell you
We do apologise to customers affected by the problems this afternoon. Please feel free to email [email protected] for a 'Reason for Outage'
Blip chart is accessible still here: https://control.aa.net.uk/blip.cgi
Perhaps this is a new issue with their Firebrick 9000? Did they ever fix the random crashing issue or are they still using the last firmware that was stable?
Haven't they also replaced all their older but stable LNSs with the 9000s now?
Thankfully I'm an ex customer having already been put off by the last lot of troubles, currently on a 173 days of uptime without a single issue, that is from the day of switchover from AA to Unchained.
Edited by E300 (Thu 24-Oct-24 09:10:39)
|
|
|
I'm not excusing the disruption, but I suspect a number of providers would simply not report these sort of issues. I think it's fair to say that it's quite unusual for a residential broadband provider to (e.g.) invite their customers to email support for a RFO.
Well given how transparent they say they are, they are now managing the bad PR by hiding the blip chart on their status page and not publicly saying what the issues are, but you can email in, perhaps sign an NDA before they tell you
We do apologise to customers affected by the problems this afternoon. Please feel free to email [email protected] for a 'Reason for Outage'
Blip chart is accessible still here: https://control.aa.net.uk/blip.cgi
Perhaps this is a new issue with their Firebrick 9000? Did they ever fix the random crashing issue or are they still using the last firmware that was stable?
Haven't they also replaced all their older but stable LNSs with the 9000s now?
Thankfully I'm an ex customer having already been put off by the last lot of troubles, currently on a 173 days of uptime without a single issue, that is from the day of switchover from AA to Unchained.
I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though?
|
|
|
I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though? 
I did say they had hidden it on their service status page, the obvious place their customers would look if they are having issues, it was always my first stop when I had problems so I knew it wasn't just me and I didn't need to start messing with my own kit. Why remove it now? Why ask people to email in for an explanation of the outage? I can only assume they are managing the PR situation.
Their openness was one of their unique selling points, if that is vanishing along with reliability, then what do they offer over and above any other average ISP?
My current ISP never claimed to have a blip chart, no other ISPs do as far as I know, still I've had no outages with my current ISP that needed me to check any sort of blip chart.
|
|
|
|
I contacted them regarding it and received an open and detailed summary, as usual.
|
|
|
I contacted them regarding it and received an open and detailed summary, as usual.
Congrats on your first post and welcome to the forum.
|
|
|
The LNS power loss was a data centre power feed issue, and the rebalance after was a domino effect from that.
It has been very stable since I signed up, hopefully this packet loss issue gets resolved soon.
Most other ISPs would be telling users to reboot their equipment without even disclosing a problem.
Also is still a link to the blip page on the customer control panel page, its there for customers to help them see quickly if a problem is their own side or not.
Edited by Chrysalis (Thu 24-Oct-24 13:20:34)
|
|
|
I'm not sure how effectively they've "hidden" the blip chart given it's still available via Google and their wiki. I'm struggling to find the Unchained blip chart though? 
They've hidden it more effectively now as its been taken down completely, perhaps it will reappear when things are working better.
Edit: looks like only available if you log into the control panel, so not for public consumption anymore.
Edited by E300 (Thu 24-Oct-24 13:29:07)
|
|
|
|
Yeah it seems to be hidden now as you say, and also no longer updating. Less than ideal.
Still no update posted regarding this mornings issues...
|
|
|
My graph from this morning:
https://ibb.co/bvmrkck
Edited by perlen (Thu 24-Oct-24 21:48:36)
|
|
|
My graph from this morning:
https://ibb.co/bvmrkck
Yeah, that tracks.
|
|
|
I would suspect given the lack of detail regarding the outage on Thursday, as well as the removal of the blip graph, it might have something to do with an ongoing DDoS. This is just speculation though and I have no evidence for this.
The lack of updates on the FB9000 bug fix is dissapointing though as I thought the idea was we were going to be kept in the loop on that.
Andrews & Arnold Home ::1 on Draytek 2862ac - Why settle for inferior?
|
|
|
Maybe we need to take precedence from: "Please feel free to email [email protected] for a Reason for Outage", to now "Please feel free to email [email protected] for a Progress with FireBrick 9000 firmware".
|
|
|
I would suspect given the lack of detail regarding the outage on Thursday, as well as the removal of the blip graph, it might have something to do with an ongoing DDoS. This is just speculation though and I have no evidence for this.
There's a person on the right track.
|
|
|
From what I can see from the last update made, there is no new firmware, they rolled back to stable factory build, and are running on that for the foreseeable future. Its doing the job and they have deployed enough of them to handle the load in that configuration.
That's my interpretation of what they have disclosed on it.
|
|
|
Another DDOS just now?
https://ibb.co/Dk5VbNf
|
|
|
Another DDOS just now?
https://ibb.co/Dk5VbNf
I've seen exactly the same BQM today on two other very separate connections of mine, so I don't believe this is down to A&A...
|
|
|
|
Got the same pattern on my A&A connection. Same time.
But, as Pheasant says…
|
|
|
|
|
|
|
I'm on BT in the north of Ireland and got the same spike...
https://www.thinkbroadband.com/broadband/monitoring/...
|
|
|
I didnt notice without seeing this thread, but yeah I have it.
Given its also on Zen and IDNet, it looks like it is not specific to AAISP as you said.
|
|
|
Oh no! More FB firmware updates are on the way:
https://aastatus.net/42728
The work outlined below will start from Saturday 23rd November.
Background:
Our FireBrick team has been working on the 'hang' problem that we faced with the LNSs earlier in the year.
The nature of the problem has made investigating the problem very time consuming as it is extremely difficult to reproduce. However, we do believe that a plausible cause has been identified, and code changes have been made to mitigate the problem.
We have been testing this new code, both in our test lab and on a few select A&A routers, for over two months. During this time the new code has not caused the hardware to hang, where older versions of the code did.
Our next step is to run the new code on our LNSs, the ones our customers connect to for their broadband connections.
We plan to do this slowly, out of hours and in a couple of phases.
We believe the cause of the hang is related to how memory is initially allocated for the tasks the FireBrick will be performing, this means that if the hardware is going to hang then this will most likely happen over the first couple of days (or first couple of hours).
Stage one:
We plan to upgrade only one of our LNSs at first. We will move broadband connections on to it in the early hours of the morning and then move them back off a few hours later. This means that during the day, customers will be on the normal set of LNSs.
Then, each night, over the course of two weeks, the LNS will be power cycled and we will move an increasing number of connections over, until it is at the point of taking twice the amount of connections that we'd normally run on an LNS. (We normally run LNSs at around 40% capacity, so twice the number of connections is not a problem.)
Stage two:
Once we have confirmed that the hang is not happening, the second phase would be to run customer connections on the upgraded for a few days at a time.
We will go through a cycle of: move connections off, reboot the LNS, move connections on, wait a few days. Repeat. We will do this with an increasing number of connections until it's at the point of taking a normal amount of connections.
More information and to opt out:
So as to minimise impact to customers, the work of moving connections off and on will happen overnight between 1AM and 5AM.
As mentioned, this phase of upgrading involved only one LNS being upgraded. This will be the one named 'i.gormless'. The connections that will be moved on to 'i.gormless' will be those currently on the LNS named 'h.gormless'. if you are currently on 'h.gormless' (as seen on the top/left) of your line quality graph and want to opt out, then please email support.
Once this phase has been completed, we will review and plan the next stages.
|
|
|
+1
Gigaclear in Kent which is connected to Vodafone and Plusnet VDSL in Anglesey.
Michael Chare
|
|
|
However, we do believe that a plausible cause has been identified
Sure I've heard that before somewhere
|
|
|
We had several blips in the early hours of Saturday:
My Broadband Ping.
Since Monday our TBB monitor has looked like this one from today (and we were out of the house from approx 7am to 7pm today):
My Broadband Ping
|
|
|
Did you ask them about Saturday, all I can say my Saturday graph is clean, the 2nd graph looks like something automated running like a speedtest or something.
|
|
|
|
I just wanted to write in and say that uptime has improved considerably. AAISP did a rollback on their LNS firmware, they figured out what was causing the crashes and they rolled out new firmware in the end that’s stable. I’ve had a solid connection for the most of December/January and I’ve remained a customer of AA. Just wanted to chime in and say things have improved.
|
|
|
|
I have now left Andrews and Arnold, and I am glad that I have done so.
My AAISP FTTP connection was the worst experience I have ever had with home broadband - the reliability/uptime was worse than even my old ADSL connections!
I am now with EE have have 5ms pings to major sites as opposed to 9ms pings with AAISP, and 1600/120 for £15 less than AAISP with 900/110.
I have had zero outages with EE for the last 3 months.
AAISP treating its broadband customers as beta testers for firebrick firmware is unforgivable.
|
|
|
|
That's the wonder of the Internet, your experience does not echo others. Sure they had a blip but it was dealt with transparently and has been sorted.
Meanwhile their tech support is excellent,
Still worth the premium price to me and others.
|
|
|
|
Perlen according to his post has been with EE for three months, so it is a bit of a dated comment really.
I too have had no issues for the last three months and TBH my only temptation to move away from A&A is the Aquiss half price for 6 months offer.
I agree with you they had a blip which took some time to resolve but that is history now
|
|
|
|
Folks pay their money and make their choice. FTTP has eroded some of the A&A USP but plenty of other attractions that appeal to their customers. I can't say I understand posting after months of silence to announce departure or at all if not relevant to others but that's life.
|
|
|
As XGS said there was no need to announce 3 months later, people move between ISPs all the time. You was clearly not at ease though, so for your own peace of mind you moved.
But in terms of your latency the biggest thing affecting that will likely be the wholesale backhaul your connection is going over and your geographical location, for reference here is my typical ping to a UK location on AAISP using cityfibre national backhaul. I know as an example I get higher pings over BT wholesale backhaul as their routing isnt optimal for my city.
Pinging cloudflare.com [104.16.132.229] with 32 bytes of data:
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Reply from 104.16.132.229: bytes=32 time=4ms TTL=59
Ping statistics for 104.16.132.229:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 4ms, Maximum = 4ms, Average = 4ms
|
|
|
|
Delete 6 months: insert 3 months.
|
|
|
You obviously haven't seen the post on the 21st of Feb in this thread.
6 months half price
|
|
|
|
I took out their business L2TP service last week and within 45 minutes of opening a support ticket I had the routed /29 that they say is available, and everything is working great. I'm happy to keep recommending them to people who have the budget.
|
|
|
|
I stand corrected. That change didn’t last long!
|
|
|
I stand corrected. That change didn’t last long!
Exactly! Temptation renewed as a result.
|