This is the latest in our series of posts about interesting things we have found, such as cyber security threats, server misconfigurations and software bugs, while analysing the DNS traffic to the .uk authoritative servers using our turing tool. This story is about a pair of bugs we simultaneously found on Google’s DNS servers and in the BIND DNS software, bugs which interacted with one another.
It started when using turing we noticed a lot of SERVFAIL responses in our DNS traffic, as shown in the screenshot above. Closer analysis of the traffic revealed that the requests were all for very long domain names (larger than 255 bytes), which are not compliant with the DNS protocol. In fact it looked like a series of random subdomains had been prepended to a legitimate domain name, causing the resultant name to exceed the 255 byte limit. Further inspection showed that the offending queries all came from a specific set of addresses; those we see when someone queries Google’s 18.104.22.168 public DNS service.
However; a SERVFAIL response indicates a temporary error condition, so the reaction of the resolver making the query is to try again, either asking the next authoritative server in its list or asking the same one again. Clearly this query will never be valid and the correct way to indicate this would be to send a FORMERR response back.
This problem was occurring as Google’s public DNS resolver prepends nonce labels (a label with a random string) to requests destined for top level domain (TLD) servers, such as the servers for .UK. This is done to make cache poisoning orders of magnitude harder because the query has far more entropy associated with it. At times, this process would “loop”, whereby nonces are prepended over and over again, until the requested domain name was longer than the 255 byte limit. This is a protocol violation, which meant that these domains were unable to be resolved. We spoke to Google about this problem – which they were unaware of – but able to fix soon afterwards.
There was still the issue of the way in which we responded, which was amplifying the effect by suggesting that it was worth asking the query again. We run different nameserver software on our infrastructure, but we were only seeing this effect from our BIND nodes. Strictly speaking a SERVFAIL response to these queries is a protocol violation; one which ISC (the company that owns the BIND software) was unaware of, until we flagged the issue to them. ISC were able to fix their code for the next release and so our “two for the price of one” bug sale had come to a successful end.
This was unique situation where two exotic and rare bugs in very popular implementations would amplify DNS traffic unnecessarily. Fortunately turing has the ability to search and cross-correlate queries from Google’s resolvers with very long domain names that resulted in SERVFAIL ‘loops’ instead of FORMERR, which enabled us to identify the root of this problem.