My company runs an OMA PoC service which uses UDP to send Presence ("user logged in") information to user terminals. We recently had complaints from some customers that it has stopped working, though other aspects of the system (voice, messaging, contact lists etc) continued to work. Presence stopped working in our office too when the terminal was connected over our BT Business Broadband connection, but continued to work with other carriers (Virgin, Vodafone 4G hotspot etc). I ran a ping test from our server to our office IP address (RedHat Linux so "ping -s <icmp-payload-size> <host IP>") and discovered that 1496-byte packets got through (payload=1468 + icmp header=8 + IP header=20) but 1500 byte packets (payload=1472) didn't. I changed the MTU on my server NICs from 1500 to 1492 to give a bit of headroom, and now I can "ping" with any payload size up to 35320 and, more importantly, the UDP Presence service is working again. This must be because my server network layer is fragmenting ICMP and UDP into IP frames up to 1492 bytes, and these are now getting all the way across The Internet from my server data centre to BT and through their infrastructure to my office. However, 1500 bytes packets can't make that journey via BT, though they still can through other carriers, so the outbound leg from my datacentre cannot be at fault.
My question is, what has changed to cause the to happen all of a sudden? It was fine at the start of last week then broke around Wednesday/Thursday. Has someone at BT installed a misconfigured router between my office and my datacentre so that in-flight IP packet fragmentation and reassembly no longer works? Or is there a hop with a 1496-byte MTU that didn't used to be there?
Another question: my BT Business Hub allows me to set the MTU, but only to a maximum 1492 bytes. Why is this? Are they hedging their bets? (Yes, I understand that this affects outbound packets, not inbound ones, so this won't be causing my fragmentation issue).