Many ISP will have a pool of recursive servers for their clients : in part for load balancing, in part redundancy in case one or more fails. It's not unknown for there to be configuration differences between the various servers which might make some more "forgiving" than others which would explain why it works for while, occasionally.
Sometimes, ISP will even use a diversity of hardware and/or software also to protect against bugs/defects/vulnerabilities in the respective implementations and that too would potentially explain the observed behaviour.
This is nothing but speculation of course ... the real corrective action is, as you have done, get the issues addressed on the authoritative server (which ideally should be servers!)