Developer Blog

Some time ago I came across a very strange issue – tests that worked at one host started failing an another host with NullPointerException. Hosts were very identical, and after debugging it appeared that Java handles hosts with non-standard characters in host names in a special way.

Wikipedia article https://en.wikipedia.org/wiki/Hostname says that underscore is not allowed in host names, but it’s easy to see that many standard Linux tools allow this: add the following entry to /etc/hosts:

127.6.7.8  local_host

and run the following commands:

$ cd /etc/
$ python -m SimpleHTTPServer &
$  telnet local_host 8000
// connection will succeed
$ wget http://local_host:8000/hosts -O /tmp/hosts
// file will be stored to /tmp/hosts
$ curl http://local_host:8000/hosts
// file contents will be printed

Java handles this, too:

---
import java.net.*;

public class URLTest  {
   public static void main(String args[]) throws Exception {
  URL url = new URL(args[0]);
  System.out.println("URL host is " + url.getHost());
   }
}
---
$ java URLTest http://alta_vista.com
URL host is alta_vista.com

But, sometimes it silently turns strict validation on:

---
import java.net.*;

public class URITest  {
   public static void main(String args[]) throws Exception {
  URI uri = new URI(args[0]);
  System.out.println("URI host is " + uri.getHost());
   }
}
---
$ java URITest http://alta_vista.com
URI host is null

What??…

Let’s look at URI javadoc: URI(String) https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#URI-java.lang.String-

 throws URISyntaxException if ‘given string violates RFC as augmented by the above deviations’ but this didn’t happen – exception was not thrown.

getHost() https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#getHost–

Returns ‘the host component of this URI, or null if the host is undefined’ – if you read Javadoc of this method long enough, you have a chance to understand that if host  does not match alphanum + ‘-’ + dots , null is returned

Given that most people don’t differentiate URL and URI and no exception is thrown when URI is created – your code may or may not work with host names containing underscore depending on which class you use and which methods of this class are called.

Bottom line: always use only ASCII alphanum + ‘-’ in host names unless you want to take an ‘Advanced Java by examples’ crash course during deadline.

Leave a comment

Your email address will not be published. Required fields are marked *