I've just had a chat with Zen technical support.
The agent confirmed that there's no errors showing on the VDSL2 link to the cabinet, which tends to confirm my suspicion that the problem did not originate there. He also confirmed that there's no routing or prioritisation differences between customers on the old metered products and the new unlimited ones - the difference is purely one of marketing and billing. It is pure coincidence that this problem occurred shortly after moving to an unlimited tariff.
Zen won't receive some relevant contention reports from their suppliers until tomorrow, which hampers their understanding of where the problem is. The network operations people have been alerted to this occurrence, there's a link to this thread on the internal ticket, and I've promised to keep this thread updated over the next few days. Zen have also added some sort of monitoring to my connection so they're aware of what is happening.
The problem is not just a busy router showing higher ICMP latency. 'Real' traffic is affected - my NTP servers typically have 0.15 to 0.8ms jitter to external servers, but this shot up to several tens of ms at 9pm last night. The latency on my IPv6 first hop, which heads over a SixXS tunnel terminating at gblon2 (the protocol 41 tunnel traffic typically routes via Zen's BGP peering with Goscomb at Telehouse North) showed a similar increase in latency to my IPv4 ICMP first hop latency.
The BQM graphics use local time, not UTC. The other graphics, which originate from my in-house servers, also use local time.
The most likely explanation is link saturation somewhere. This could be:
- on the BT Openreach network between the FTTx cabinet and the GEA aggregation node, which I'd expect to be at my local Flitwick (SMFK) exchange as there's no GEA-FTTx on any surrounding exchanges
- on the BT Wholesale network (there's no Zen PoP at my exchange and the Zen portal confirms I'm on WBMC)
- on the Zen links from the BT Wholesale network
- on the Zen network between BT Wholesale and the gateway I'm connected to
The latter two options seem unlikely. Zen have a philosophy of having spare capacity, and Zen didn't seem aware of any saturation problems.
The BT Openreach network may be to blame, though this seems the less likely of the remaining two options.
Virgin offered broadband here several years before the first ADSL was available. A lot of people are still on Virgin, judging by the SSIDs of the wireless networks I can see. FTTx has only been available here for 9 months and ADSL2+ speeds are fairly good (around 14Mbit/s downstream is achievable), though the consumer ISPs are marketing FTTx heavily and a lot of their customers may have switched to fibre products.
My understanding is that there's usually plenty of spare fibre in the FTTx network, though I appreciate not all of it is lit. It might be that a fibre fault has caused traffic to my cabinet to take a different route which is becoming saturated in the evening.
The problem seems most likely to have originated somewhere in the BT Wholesale network. Time will tell!
There was a brief spike on my graphs to 290ms latency to the Zen gateway just before 11:00 local today, 29/09/2013, though it would be conjecture to ascribe any significance to it. BQM didn't pick anything up at that time.
I'll post further graphics if the problem recurs. If it does not recur, I'll confirm that in writing.