|
|
For the second evening running, I've had a massive latency spike on an almost idle Zen FTTC connection. It's far worse today than yesterday - I currently have ~55ms unexpected extra latency to the Zen gateway.
BQM for 24/09/2013
BQM for 25/09/2013 (ignore the packet loss this afternoon - I was working on the rack here and had to take the router down for several short periods, though the BT Openreach modem remained on)
Local monitoring (pinging the Zen gateway every second) confirms this is not a BQM artefact and that the connection is almost idle (between 0-1000kbit/s down and 0-150kbit/s up).
I've tried dropping the PPP session, but keep getting reconnected to "dsl1.wh-man". The Zen portal shows I've retained my 79999/20000 sync speeds and it went back to my usual ~10ms fast path latency last night, so I doubt the explanation is to do with the VDSL2 link between here and the cabinet. Even if I'm right that this isn't an issue with the VDSL2 link, I appreciate that there are multiple possible explanations to do with the Zen, BT Wholesale and local BT Openreach networks.
Can anyone from Zen throw any light on what is happening? Is anyone else seeing anything similar?
It may be pure coincidence, but I'm a little uneasy that I've unknowingly signed up for a much more contended service when I moved to Unlimited Fibre 2.
Edit: added links to local monitoring graphics
Edited by deleted (Thu 26-Sep-13 00:50:50)
|
|
|
I've just had a chat with Zen technical support.
The agent confirmed that there's no errors showing on the VDSL2 link to the cabinet, which tends to confirm my suspicion that the problem did not originate there. He also confirmed that there's no routing or prioritisation differences between customers on the old metered products and the new unlimited ones - the difference is purely one of marketing and billing. It is pure coincidence that this problem occurred shortly after moving to an unlimited tariff.
Zen won't receive some relevant contention reports from their suppliers until tomorrow, which hampers their understanding of where the problem is. The network operations people have been alerted to this occurrence, there's a link to this thread on the internal ticket, and I've promised to keep this thread updated over the next few days. Zen have also added some sort of monitoring to my connection so they're aware of what is happening.
The problem is not just a busy router showing higher ICMP latency. 'Real' traffic is affected - my NTP servers typically have 0.15 to 0.8ms jitter to external servers, but this shot up to several tens of ms at 9pm last night. The latency on my IPv6 first hop, which heads over a SixXS tunnel terminating at gblon2 (the protocol 41 tunnel traffic typically routes via Zen's BGP peering with Goscomb at Telehouse North) showed a similar increase in latency to my IPv4 ICMP first hop latency.
The BQM graphics use local time, not UTC. The other graphics, which originate from my in-house servers, also use local time.
The most likely explanation is link saturation somewhere. This could be: - on the BT Openreach network between the FTTx cabinet and the GEA aggregation node, which I'd expect to be at my local Flitwick (SMFK) exchange as there's no GEA-FTTx on any surrounding exchanges
- on the BT Wholesale network (there's no Zen PoP at my exchange and the Zen portal confirms I'm on WBMC)
- on the Zen links from the BT Wholesale network
- on the Zen network between BT Wholesale and the gateway I'm connected to
The latter two options seem unlikely. Zen have a philosophy of having spare capacity, and Zen didn't seem aware of any saturation problems.
The BT Openreach network may be to blame, though this seems the less likely of the remaining two options.
Virgin offered broadband here several years before the first ADSL was available. A lot of people are still on Virgin, judging by the SSIDs of the wireless networks I can see. FTTx has only been available here for 9 months and ADSL2+ speeds are fairly good (around 14Mbit/s downstream is achievable), though the consumer ISPs are marketing FTTx heavily and a lot of their customers may have switched to fibre products.
My understanding is that there's usually plenty of spare fibre in the FTTx network, though I appreciate not all of it is lit. It might be that a fibre fault has caused traffic to my cabinet to take a different route which is becoming saturated in the evening.
The problem seems most likely to have originated somewhere in the BT Wholesale network. Time will tell!
There was a brief spike on my graphs to 290ms latency to the Zen gateway just before 11:00 local today, 29/09/2013, though it would be conjecture to ascribe any significance to it. BQM didn't pick anything up at that time.
I'll post further graphics if the problem recurs. If it does not recur, I'll confirm that in writing.
|
|
|
The problem seems most likely to have originated somewhere in the BT Wholesale network. Time will tell!
I've had suspicions that the BTwholesale WBC network has congestion/contention and never had anyone able to get anywhere with identifying it. You might be the only one!
James BT Infinity 2 19/09/2012 - Sold 42/6 - Getting 46/8 - Sync 50 / 9 Mbps @ 470m approx
14 years of broadband (ntl: cable to BT FTTC) - Router: Asus RT-N66U - Modem: Huawei HG612 speedtest
|
|
Register (or login) on our website and you will not see this ad.
|
|
|
I'm with Plusnet on a 100/15Mbps FTTP connection (to Bradwell Abbey, Milton Keynes).
The support guys at Plusnet think it is an issue with Bradwell Abbey (major BT node). The evidence seems to suggest that.
Here is my ping graph: http://www.thinkbroadband.com/ping/share-large/a9870...
I normally get 3-4ms pings to bbc.co.uk and these are anything from 30-50ms at the moment. Speeds are also way down (getting 17-20Mbps down from my usual 88-91Mbps). It's a little strange as my SamKnows reports were showing no speed drops but they must have not been running the tests during the 20:00-22:30 peak time.
|
|
|
There's certainly something in common between our problems - the shapes of our BQM graphs are the same. As both our graphs show, it's at it again this evening.
For any support personnel reading, here's my BQM for 26/09/2013 and my local monitoring for the 10 hour period including tonight's event.
I dropped my PPP session at 21:37 and got reconnected to "dsl7.wh-man" (so I changed gateway this time). As I expected, the change of gateway made no difference.
The commonality between our problems rules out the ISP and ISP links to BT Wholesale, as we're using different ISPs. It all but rules out the BT Openreach network local to me, as we're on different exchanges about fifteen miles apart and I can't think that the GEA aggregation nodes for Flitwick and Bradwell Abbey exchanges are the same.
It now seems almost certain to be a BT Wholesale network issue.
Edit: fixed typo, s/Openworld/Openreach/g (thanks jchamier), updated local monitoring graphic to show end of event
Edited by deleted (Thu 26-Sep-13 23:04:53)
|
|
|
the BT Openworld
I think you mean Openreach. (Openworld was the old name for the ADSL ISP now just called Broadband and part of BT retail division.).
James BT Infinity 2 19/09/2012 - Sold 42/6 - Getting 46/8 - Sync 50 / 9 Mbps @ 470m approx
14 years of broadband (ntl: cable to BT FTTC) - Router: Asus RT-N66U - Modem: Huawei HG612 speedtest
|
|
|
the BT Openworld I think you mean Openreach. (Openworld was the old name for the ADSL ISP now just called Broadband and part of BT retail division.).
I do indeed mean BT Openreach - it's been a long day and that was a brain boo-boo.
I remember dial-up BT Openworld...
|
|
|
it's been a long day and that was a brain boo-boo.
I remember dial-up BT Openworld...
I remember my parents thinking of getting Openworld 512k ADSL and being put off by the requirement for the USB modem back in 2000/2001. I had been on NTL cable over a year by then and they were jealous.
James BT Infinity 2 19/09/2012 - Sold 42/6 - Getting 46/8 - Sync 50 / 9 Mbps @ 470m approx
14 years of broadband (ntl: cable to BT FTTC) - Router: Asus RT-N66U - Modem: Huawei HG612 speedtest
|
|
|
|
I'm getting exactly the same shaped 'hump' on my BQM on Plusnet FTTC in Northampton (Weston Favell exchange).
|
|
|
|
Hi David,
We've reported it to BTW for the Plusnet users we've seen, I've sent the case number over to Zen too. All the users we've seen so far are on fibre (both FTTC and FTTP) and on the Milton Keynes node. Some routers will let you see the BTW BRAS you are connected to on the event logs, all the ones we've seen so far are bras-redXX-2401.mqd.21cn-infra.bt.net. so you might be able to check if you're in the same pattern.
|
|
|
|
Thanks for that, Dave. I've spoken to Zen this morning, and the staff member I'm dealing with will be alerted to the ongoing problem when he gets in this morning (I called just after 0900, he doesn't arrive until 1000).
This thread is linked to the Zen ticket, so those within Zen following this matter should see any updates posted here. My PPP logs show "bras-red5.mqd", so it looks like I'm in the same pattern as your affected users. I wonder what "mqd" is an abbreviation for.
|
|
|
You had the spike last night too?
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
You had the spike last night too? Yes. I posted about this last night, though could have made it clearer that was the third night in a row.
|
|
|
You are not alone
http://www.coolwebhome.co.uk/images/zenbqm.jpg
(One image is near perfect, which shows not a BQM issue, and while its not every Zen customer by a long way, it looks to be around 1 in 10 that have a BQM running)
Will take a peek at other providers now. Had to blank out personal information obviously, but left the odd large town place name in position. There were others with latency changes, but not of the same precise pattern and a lot more variable, suggesting line saturation from their own activities
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
Similar shape with PlusNet in around 1 in 10 of the BQM graphs.
Also on IDNet, and Andrews&Arnold same shape, same time similar amounts of users.
So the idea that it may be one node in the BT Wholesale network seems a reasonable guess.
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
Thanks so much for that investigative work. The composite you posted shows clearly that something is amiss for a proportion of users.
I'm not looking forward to this weekend if this fault isn't fixed, as I could have a much longer period of high latency and jitter when everyone is at home using their connections.
Hopefully the ISPs can put pressure on BT Wholesale to do something. It would be great if something could be posted about what went wrong, but I appreciate that may not be possible.
Maybe this part of the country has become the Bermuda Triangle of Internet connections. First the Northampton fire affecting the Be/O2 network, now this.
|
|
|
Tempted to suggest that as its affecting AAISP that Adrian Kennard are good people to talk to, past masters at pressuring BT, and as they run their own BQM targeting customers they will have all the location information to hand.
Time for ISP techs to do some emailing between themselves.
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
|
Hi all
I can confirm that we are working alongside PlusNet to resolve this issue and we have added Zen's affected lines to an open case at BT. The latest update is that investigations are continuing and it will be passed over to their 24/7 team to look at over the weekend.
Please accept our apologies for any issues that have been caused.
|
|
|
We are seeing this latency too... Seems to be on the various Milton Keynes BRASs, affecting all lines. FTTC, FTTP and ADSL affected.
Generally, we are seeing an extra 40 milliseconds or so, between 8pm and 11pm.
Edited by andrewhearn (Fri 27-Sep-13 17:01:38)
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
I've passed our affected lines (100's) over to BT too.
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
|
Thanks for the update, Mick, also to you, Andrew, for the information from AAISP's perspective.
This is sounding like a fault affecting a large number of customers when added up across all ISPs using BT Wholesale.
|
|
|
Fault is probably the wrong word, network may be within spec, don't have latency specs for WBC backhaul to hand.
An issue is probably a better way to approach it. Be interesting to here what the resolution is, though I remember back in the early ADSL days this sort of thing was more common and then by magic it would vanish.
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
|
|
|
|
Fault is probably the wrong word, network may be within spec, don't have latency specs for WBC backhaul to hand.
An issue is probably a better way to approach it. Be interesting to here what the resolution is, though I remember back in the early ADSL days this sort of thing was more common and then by magic it would vanish. I accept that broadband is a contended product, and the effects of backhaul contention may show up in end-user connections. I could accept a graceful degradation to higher latency and lower bandwidth per user, though I appreciate it is likely impossible to implement traffic management that cleanly in a network between the ISP and the customer.
What has happened for three consecutive nights is a sudden significant increase in latency causing considerable jitter on an almost idle connection. My NTP servers are far from happy about it, and I suspect the line would be unusable for VoIP. If it happens again, I'll run some VoIP tests to try to quantify how bad things are during periods of degradation.
There's been no inkling of this kind of issue until this week - the latency graph on my monitor shows a straight line from my switch to FTTC in January until this occurred. I would expect upgrades to be implemented in the core of the BT Wholesale network long before substantial degradation of service became a nightly event. Certainly, this gives the impression that some link or device is hitting 100% utilisation.
The sudden onset of this problem and the lack of graceful degradation in end-user experience makes me suspect it is a consequence of some sort of failure or degraded operating condition. That's why I used the word 'fault'. However, out of deference to BT Wholesale, I can go with the word 'issue', so long as I can hold out hope of some sort of resolution.
As you say, it would be interesting to discover what the root cause is, and what resolves the issue.
I hold out hope that Zen will install a PoP in the Flitwick exchange, meaning my traffic doesn't have to pass over the BT Wholesale network. Realistically, though, there are likely to be many more attractive options for Zen when it comes to expanding their network. Flitwick is a fairly small and overwhelmingly residential town, which is currently an island for FTTx surrounded by exchanges where no FTTx commercial deployment will take place.
|
|
|
Not often you get 3 ISP's all replying to you with updates 
Good stuff!  Just so long as I don't get a bill from all three.
Seriously, the co-operation shown in this thread is very helpful, and will hopefully help move this issue towards resolution.
|
|
|
|
|
|
|
|
Just as a matter of interest, have you done a tracert on your connection before, during and after your spike?
|
|
|
|
There was no evidence of routes changing, though the way Zen's setup is configured, the BT Wholesale network is wholly transparent. The first hop to respond after my router is losubs.subs.dsl<num>.wh-man.zen.net.uk
There's no evidence of a spike this evening.
|
|
|
We've not seen any latency problems since thursday evening, so looks like it may have been fixed. I'll see if we have an update or further information from BT...
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
OOI, our monitoring should normally pick this kind of problem up and publish it on: http://clueless.aa.net.uk/congestion.cgi
It didn't actually work this time around, we know why and will get it fixed, so feel free to bookmark the page for future reference.
|
|
The above post has been made by an ISP REPRESENTATIVE (although not necessarily the ISP being discussed in the post).
|
|
|
|
I've seen no further problems since Thursday evening. It would be interesting if there is any information you can share on what went wrong, Andrew.
|
|
|
Just took a scan through the Zen BQM's and they all look nicely unique, i.e. no obvious blip like there was
|
|
The author of the above post is a thinkbroadband staff member. It may not constitute an official statement on behalf of thinkbroadband.
|
|
|
Looks like AAISP have a new page for stuff like this, entitled Current broadband congestion or fault areas
http://clueless.aa.net.uk/congestion.cgi
Might be useful to help you and others identify faults such as the one you've experienced in this thread!
|
|
|
Yes, we've had confirmation that it's fixed. BT found a link was out of service which caused reduced capacity. It was fully restored just before 6pm on September 27.
regards,
Phil.
|