Slow speed after GEA migration :: Zen Internet

User comments on ISPs
>> Zen Internet

Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | [16] | 17 | 18 | 19 | (show all)

Print Thread

XGS_Is_On
(member) Thu 10-Nov-22 11:08:45

Re: Slow speed after GEA migration

[re: pluralist] [link to this post]

In reply to a post by pluralist:
Aggregation |= Segregation. They are complete opposites.

Maybe you have since looked up LAG in rather more detail than I did. There's no way that could be a typo.

What is perfectly reasonable is not necessarily perfectly implemented. You don't seem to be aware how the tiniest error can be made or incompatibility can be present across such a complex technology. Or how the tiniest component in electronics can develop an intermittent fault or be outside "guaranteed" tolerances and only show up under a very rare combination of circumstances in your 900,000 combinations of circumstances.

Guess what! Skimmed milk masquerades as cream. (You may need to google that).

I'm glad I never had a person with such belief in the infallibility of themselves, others or equipment as yourself working for me in IT.

No use of 'segregation' in my post. 'Segmentation' in an earlier one as that's part of what routers do and why we have subnet masks.

Your suggesting I'm arrogant having claimed Zen may have made a mistake with basic functionality having just read about it on Wikipedia is beyond comedy. The appeal to authority you made regarding how long you'd been on the forum for set the scene and you just ran with it.

May I suggest using Wikipedia or similar to find the difference between control planes and forwarding planes as a start rather than the appeal to forum authority and indignance at being challenged?

As clearly I know nothing and am being arrogant for disagreeing I'll defer to yourself, your Wikipedia research and the Dunning-Kruger effect.

pluralist
(knowledge is power) Thu 10-Nov-22 13:00:20

Re: Slow speed after GEA migration

[re: XGS_Is_On] [link to this post]

In reply to a post by XGS_Is_On:
In reply to a post by pluralist:
Aggregation |= Segregation. They are complete opposites.
...
I'm glad I never had a person with such belief in the infallibility of themselves, others or equipment as yourself working for me in IT.
No use of 'segregation' in my post. 'Segmentation' in an earlier one as that's part of what routers do and why we have subnet masks.

Correct. That part of my post now struck out but I stand by the rest of it. See the edit line added there. However aggregation as helpfully pointed out by devonkev (unlike yourself) is still a different matter.

Your suggesting I'm arrogant having claimed Zen may have made a mistake with basic functionality having just read about it on Wikipedia is beyond comedy.

You are the comedian.Your grammar and punctuation is terrible. There are at least three very different ways of construing that sentence. As it stands the most obvious reading is that I suggested you claimed Zen may have made a mistake. (That's if we accept your opening "Your" as "You're").

I made no such suggestion. However I do see the insulting (for the second time) intention of what you were trying to express.

The appeal to authority you made regarding how long you'd been on the forum for set the scene and you just ran with it.

My post did not claim authority, just stated the period of time and experience during which I had neve seen an LAG-labelled hop. Particularly one with three routers in the same hop. Given your poor writing I suppose your misunderstanding is to be expected. I do know and have for many years known everything you said in the post it was a reply to. Really elementary stuff. But nothing whatsoever to do with hop 6.

May I suggest using Wikipedia or similar to find the difference between control planes and forwarding planes as a start rather than the appeal to forum authority and indignance at being challenged?

Now you are simply being childish because you are losing every time you have a go at me bar my one sleepy slip-up.

As clearly I know nothing and am being arrogant for disagreeing I'll defer to yourself, your Wikipedia research and the Dunning-Kruger effect.

Well, you do seem to have a bit of course-work knowledge about networking, but how deep and that is and how much experience and capability in complex problem diagnosis and solving is a different question altogether.

The Zen people I'm sure know far more than you, and yes you are arrogant. Particularly in saying that any particular Link Aggregation strategy and implementation is simple. No network designer can know everything that might be presented to the system by legitimate users! It's easier to guard against deliberate threats by hackers, and that's hard enough.

Maybe in another five or ten years you'll begin to realise how little you currently know about reality.

Connections: OnePlus 8 Pro on Three 4+ (LTE)/5G and at home Three Mobile, with (Three)ZTE MF286D router giving about 113/20Mbps.

The best of all possible countries.

XGS_Is_On
(member) Thu 10-Nov-22 13:40:55

LAGs and ECMP and Traceroutes: Oh my!

[re: XGS_Is_On] [link to this post]

Going to take a few minutes to just fill out a bit on the other posts I've made on this topic briefly rather than getting into a urinating contest with the forum technical Gods.

If you, as a Zen user, have seen ae in a traceroute you've gone over a LAG - Juniper equipment, and others, name LAGs aggregated Ethernet. I remember seeing this plenty when I used Zen retail.

Cisco call them Port Channels, so links with 'po' and a number often indicate Cisco LAGs.

All these do is, when a packet needs to use the link, basically do a bit of maths and work out which physical link to use, that's it. When there are 2 links it's essentially nothing more than distilling the packet headers down to odd or even. Where there are 4 it's turning those headers into a number, dividing by 4 and counting the remainder, then using link 0,1,2 or 3. Once that's done every packet in that connection will use the same link. Different connections will use different links,so the slow speed will be inconsistent. Anything, whether GEA, BTWholesale or TalkTalk, that goes over those links will experience the same exact issues. If the two sides of the LAG choose different links it doesn't matter - these are cables connecting the same chassis and likely same line cards either side that are probably running next to each other and physically cable managed together in a bundle.

TL;DR it's extremely unlikely to be anything to do with that.

ECMP is nothing more than having multiple routes to the same place and using all of them. This can and does create asymmetry and jitter. However, how much jitter do you reckon is possible on links between the same chassis in the same building, probably the same line of racks and don't you think if there were issues like that you'd see them in the following hops?

Traceroute: https://www.thousandeyes.com/learning/glossary/trace...

Traceroute most commonly uses Internet Control Message Protocol (ICMP) echo packets with variable time to live (TTL) values. The response time of each hop is calculated.

Time To Live is sent starting at 1, then 2, and so on until you reach the destination. Most devices that receive this and aren't tunneling your packets will count the TTL down when they receive the packet. If the TTL becomes zero they will send this to their control plane or a slow routing path to be dealt with, as it requires some action from the device itself above just forwarding it.

https://traceroute.home.blog/category/general-networ... has a nice quote:

In terms of configuration the control plane should be considered as an interface though which any traffic destined for the device must pass. This traffic can enter through any physical interface, but before it is processed it passes through the control plane “interface”.

Packets with TTL 1 hitting the router enter the control plane or a 'slow' data plane as they can't go further. They must be dealt with by this device and need to have an ICMP TTL expired message generated. The time it takes to generate and send this message and for it to arrive back at you is your latency for that hop in the traceroute. Where the TTL is above 1 the router will send it on through the fast data plane / forwarding path which in the case of these routers is a high speed, high bandwidth very low latency Application Specific Integrated Circuit routing and switching fabric.

There are, of course, buffers in between where this data hits the router and the CPUs that handle the control plane and slow data plane. When these buffers fill, or if there is policing in place to drop everything that isn't answered immediately, the traceroute is dropped. Until then packets wait for the CPU to service them and send out the ICMP TTL expired message. This is both right at the bottom of the priority list for the CPU and will in itself be throttled to protect the system so you've policing both entering the control plane in the first place and within generating the ICMP messages. This is to protect the router from having its CPU drained.

That set of CPUs likely have some work to do to handle the huge routing table. They probably export telemetry, handle generation of alerts, etc, etc. As long as nothing is seen after that hop in a traceroute the higher latency isn't an issue.

https://hal.inria.fr/hal-01111190/document

The main problem that arises when making use of TTL-limited probes is that ICMP feedback from routers is often neither instantaneous nor entirely reliable. Indeed, as the generation of ICMP error messages takes place in the slow path of the data plane, manufacturers and operators impose a low priority on it, in order to minimize the overall load on routers. Other internal tasks mostly related to the control plane, like route computation and management operations, might take precedence over it, especially when resources are shared between slow path and control plane

If nothing is seen after the hops using LAGs / ECMP it's nothing to do with that either.

That's it. I appreciate the frustration, this thread is huge, but don't get sidetracked with this stuff. ANY issues with the LAGs, ECMP or the higher ping response coming from the Zen edge router would, if relevant, show throughout the traceroute down. It really is that simple - you'd see latency, loss or jitter throughout. You don't.

I appreciate I don't have 40,000+ posts in this forum's technical section however if you are really, really bored you can read https://www.rfc-editor.org/rfc/rfc7747.html for why some edge devices have higher latency when being pinged and the truly excruciating https://www.rfc-editor.org/rfc/rfc4098 for control plane / forwarding/data plane stuff. I have no intention of reading either.

The issues almost certainly relate to Zen's Plexus network, hence why some people on it are fine, others aren't, even when they're going across the dreaded ECMP LAGs.

Coffee break over. Cheers.

EDIT: Just to reiterate, Pluralist, I am not reading your posts here since I responded to the last one, it being abundantly clear it's a waste of both my time reading them and yours writing them. This post is for the benefit of those actually interested in this and to avoid support tickets flying into Zen because people see 'lag' in a traceroute and think it's a problem. I'm sure if you look hard enough look through some documentation you can find something to nitpick, I've simplified a ton, however quoting documentation on something you only realised existed yesterday says a lot more about you than it does me so your call whether you waste your time trying to be 'right' or keep that to The Park.

EDIT 2: It's overly simplistic to not mention that there are 2 data plane forwarding paths in many routers, the fast path and the slow one, and in some cases the slow forwarding path, usually sharing resources with the control plane, handles traceroute responses. In others it goes to the control plane itself, it depends, but for clarity should be mentioned and cuts off some pendaticism even though the end result is the same.

Edited by XGS_Is_On (Thu 10-Nov-22 14:00:20)

Chrysalis
(legend) Thu 10-Nov-22 20:47:51

Re: LAGs and ECMP and Traceroutes: Oh my!

[re: XGS_Is_On] [link to this post]

I assume each flow is kept to the same port for the duration of the flow?

VM Gig1 - AAISP L2TP

XGS_Is_On
(member) Fri 11-Nov-22 01:00:33

Re: LAGs and ECMP and Traceroutes: Oh my!

[re: Chrysalis] [link to this post]

Yes.

For instance:

https://support.huawei.com/enterprise/en/doc/EDOC110...

Load Balancing in LACP Mode section.

LACP only supports flow-based balancing. Once the calculation is done a flow is sticky to that link as that result isn't changing. Into fast path on its way that goes.

Chrysalis
(legend) Fri 11-Nov-22 02:25:38

Re: LAGs and ECMP and Traceroutes: Oh my!

[re: XGS_Is_On] [link to this post]

Thats good to hear, so I am not sure what the issue would be with LAG then, it would be similar to RSS and the like.

VM Gig1 - AAISP L2TP

XGS_Is_On
(member) Sat 12-Nov-22 01:16:50

Re: LAGs and ECMP and Traceroutes: Oh my!

[re: Chrysalis] [link to this post]

Me either. The only time I've seen issues the symptoms were some links in the LAG not being used and the remaining ones congesting as a result, such as when software is limited to using 4 links in a LAG but allows 8 to be configured. Things like that alongside extremely basic configuration errors that would result in some or even all of the LAG not coming up and worst case leaving a single link alive as fallback to congest.

This is pretty obvious in its effects right down the route after the congestion point but I'll leave it to the expert to comment further.

XGS_Is_On
(member) Sat 12-Nov-22 02:25:12

Re: Slow speed after GEA migration DELETED

[re: pluralist] [link to this post]

Post deleted by XGS_Is_On

Pheasant
(knowledge is power) Fri 18-Nov-22 17:23:51

Re: Slow speed after GEA migration

[re: E300] [link to this post]

Tagging this on the end. Apparently all now resolved after…ahem “2 weeks”. Wishful thinking perhaps?

https://www.ispreview.co.uk/index.php/2022/11/uk-isp...

deleted
(deleted) Fri 18-Nov-22 17:50:47

Re: Slow speed after GEA migration

[re: Pheasant] [link to this post]

In reply to a post by Pheasant:
Apparently all now resolved after…ahem “2 weeks”. Wishful thinking perhaps?

I'm sure all Zen customers will be able to get a goodnights sleep tonight knowing Zen have spoken 😎🤣

Pages in this thread: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | [16] | 17 | 18 | 19 | (show all)

Print Thread

Jump to

Re: Slow speed after GEA migration

Re: Slow speed after GEA migration

LAGs and ECMP and Traceroutes: Oh my!

Re: LAGs and ECMP and Traceroutes: Oh my!

Re: LAGs and ECMP and Traceroutes: Oh my!

Re: LAGs and ECMP and Traceroutes: Oh my!

Re: LAGs and ECMP and Traceroutes: Oh my!

Re: Slow speed after GEA migration *DELETED*

Re: Slow speed after GEA migration

Re: Slow speed after GEA migration

Re: Slow speed after GEA migration DELETED