While everyone else is giving advice about tracing this problem, I thought I'd add something about the error statistics you have shown us...
For the first week or so my speed was hovering around 76Mb, and 20Mb upload. The connection itself was functioning completely fine. I did notice on day one that there were quite a few CRC errors appearing. I think it was on day 2 that interleaving was applied to the line, which remains on the line to this day.
That would be the normal time that DLM would choose to intervene.
In order to see if a higher level of intervention is needed, it will broadly be watching the CRC error rate (although it strictly seems to be the ES value).
To see if lower intervention is needed, it will be looking for very low levels of CRC errors, and probably fewer FEC errors too.
The level of interleaving seems to have increased a lot since then. The speeds are now a lot less, and the level of interleaving is now at:. Delay: Downstream:9, Upstream:6. D: Downstream: 1723, Upload: 551.
These figures aren't enough to see what interleaving or FEC have been set to.
DLM's level of intervention can be seen from the combination of INP and delay, and the initial level for downstream is usually set to an INP of 3 and a delay of 8ms. Those parameters usually lead to an increase in latency of 8ms and a FEC overhead of around 20%.
Broadly, a larger value of INP means that the noise appears to be made of longer duration, and needs extra protection. A larger value of 'delay' gives the modems permission to utilise interleaving to a larger degree which induces longer delays.
When the 2 modem's synchronise, they set the parameters I (capital i) and D to determine interleaving - the grid used to interleave is size I x D, so you need both parameters to see how much interleaving is happening; the size of this grid largely affects the latency seen.
Parameters R and N are also determined when the modems synchronised, and determine how much overhead is used by the FEC process - R bytes are used in a block size of N to add protection. The R bytes are pure overhead, and are bytes that are lost to your end-end sync speed.
The interplay between I, D, R and N all contribute to giving the "INP" protection demanded by DLM.
For your line, the stats show DLM has set:
INP: 6.00 4.00
delay: 9.00 6.00
For downstream, that's a pretty high INP value, aongside a delay value that isn't normal - suggesting that DLM has indeed been trying to vary the settings to solve your error rate. That in turn suggests you have a non-normal error behaviour.
For upstream, the settings aren't as bad as downstream, but they are very bad for upstream - there is often less need for any intervention at all on the upstream side.
The modems have responded with:
R: 16 10
D: 1723 551
I: 48 32
N: 48 32
Those are very small blocks at 48 bytes downstream and 32 bytes upstream.
Downstream, FEC is using 16 bytes out of every 48 as overhead - or one third (33%). Upstream it is 10 bytes every 32, or 31%.
Both are pretty high values - and you are "wasting" a lot of your bandwidth in the form of error protection.
The number of FEC's don't seem to have really reduced though in this time, although interleaving has been increasing progressively.
You wouldn't expect to see the FEC count (which is the same as the RSCorr count) to decrease as intervention gets ramped up.
The noise you are getting is real, and causes errors in the bitstream. The FEC process will help correct them - if it succeeds, RSCorr gets incremented, and if it fails, RSUncorr gets incremented. While the noise continues, you *will* get errors, and you see these counters increment.
When DLM intervenes (or increases intervention) it is because it is watching the CRC count (or the ES counter that acts as a summary of the CRC errors); it isn't watching the FEC counter.
When the engineer came around recently and plugged in his JDSU, he advised that the max rate that my is was capable of was showing as about 74Mb , and that my actual rate was around 60Mb. This made me wonder, as according to the HG612 stats, the max rate that the line should be capable of is displayed as 93Mb.
1) Is there any reason for the difference between the maximum rate reading on the JDSU compared to the max rate figure on the HG612?
There seem to be a lot of things "off" in your statistics, and these might accumulate to explain the differences between the original speeds you saw, the sync speed you have now (48Mbps), and the 2 different maximum speeds.
[NB: A display of both the "--stats" and the "--pbParams" commands would be good, done at the same time. They will help show attenuation, SNR and power levels, split over the bands. [b]Graphs would be good too]
First: The SNRM value. Your stats show an SNRM of 13.6dB, which is higher than the target of 6dB. That means your modem is currently synchronised at the best speed for the current noise level, and begs the question of "why not?" Meanwhile, the "max attainable" is an estimate used by the modem that does aim at this target value.
The usual reasons for not having an SNR of 6dB is that (a) you've already hit the maximum speed of 80Mbps, and have margin to spare, or (b) that some noise present at the time of the sync has now gone, or (c) you are banded with an artificially capped speed.
(a) doesn't apply.
(c) seems unlikely, as it would be a very odd speed cap.
Noise in (b), of course, can merely reduce the SNR margin, or it can introduce errors, or both.
If (b) is true, it suggests that you are working in a scenarios where noise comes and goes - and therefore that the "max attainable" at any one time depends on a snapshot of the current noise. As the noise changes, so does the max attainable.
Second: The max attainable speed itself
For some reason, when the HG612 estimates a "max attainable" while FEC/interleaving are turned on, it seems to overestimate considerably. Even when "ordinary" levels of FEC/interleaving are activated, this over-estimate can be as much as 10Mbps.
If this "error" in the estimate is linear, it could account for 15Mbps of the difference.
Third: Throw DLM into the mix, with it altering settings regularly, in order to get rid of your errors, it could be that the FEC overhead is changing significantly too.
You might just be comparing apples with pears.
For the first day, the line seemed stable enough with no disconnections (and was fine for daily use anyway); and as I didn't have any additional latency on the line, everything felt a little snappier I think. For the first week or so, the actual speed still remained at about 74/75Mb. It was only the last few days that the speed seems to have also dropped dramatically also.
The 8ms won't make much difference to everyday surfing, but it might be obvious in gaming.
However, the one place you'd have seen the difference between day 1 and day 3 (or later) would be in the packet loss rates. Without FEC, every CRC error would become a lost packet (for ping packets) or a packet that required re-transmission (for downloads, so reducing throughput).
A TBB BQM would have helped you see the packet loss on days 1 and 2, and then visualise the change in latency on day 3.
2) I would much prefer to have a slower downstream speed, and for there to be no interleaving on the line, but this doesn't seem to be possible. It seems that instead, a heavy level of interleaving is being applied by the DLM, coupled with a reduction in line speeds, but with no noticeable difference in the number errors being reported. Is there any chance of DLM noticing that the caps on speeds and interleaving aren't actually reducing the number of errors, and so will relent, and restore original speed with interleaving off?
You are looking at the wrong meaning of "errors" here, as the FEC count is a count of corrected errors - and DLM's whole purpose is to turn CRC errors into FEC corrected errors.
As you still have errors, DLM is still trying to tune the settings for your line. If DLM finally relents, it won't be to take away the FEC/interleaving settings. It will be to band your connection, to limit the speed.
Unfortunately, this happens so rarely, that we probably can't tell you what it will then go through in terms of FEC/interleaving.
However, by the sounds of things, your noise is likely to defeat a lot of the systems.
3) Does anyone know why there is such a big difference between the max rate figure in the HG612 stats and my actual sync speed. Max rate in HG612 stats: 93564. Actual: 48703. Surely if my line is (theoretically) capable of much higher speeds, shouldn't I be seeing a much higher actual speed than I am?
I think I answered this above:
- Your max isn't really that high when FEC/interleaving is turned on. Expect it to be lower by 10-15Mbps, possibly more.
- Your FEC/interleaving settings are using 33% of your speed for error correction
- Your SNRM isn't 6dB, so your current sync speed is lower than the capability.
The errors don't seem to be coming in ever second, but maybe every so many minutes a whole load of errors will appear.
Distinct signs of an intermittent fault, but hopefully others are helping on this front.
I probably won't be around much for the next few weeks to see the results if you do this, but... you will probably help to track down the fault by running something to track the statistics 24/7.
BaldEagle's programs will help to do this.
OHF: 34510549 1275770
OHFErr: 5 0
RS: 2453252362 3087014
RSCorr: 128896 26406
RSUnCorr: 137 0
In 16 hours, the number of RS blocks is very high - not surprising, because they are small blocks (48 bytes downstream; on my link they are 240 bytes)
However, there is a pretty low level of RSCorr (0.005%) and very very few RSUncorr leading to 5 CRC errors (same as OHFErr).
On your current line (running too slow, with SNRM of 13dB) and with high FEC/interleaving, it suggests that the FEC settings are far too high, and should come down. Or that your intermittent error hasn't occurred much in those 16 hours.
Tracking would be a good idea....