Some time ago I came across a very strange issue – tests that worked at one host started failing an another host with NullPointerException. Hosts were very identical, and after debugging it appeared that Java handles hosts with non-standard characters in host names in a special way.

This Wikipedia article says that underscore is not allowed in host names, but it’s easy to see that many standard Linux tools allow this: add the following entry to /etc/hosts:

127.6.7.8 local_host

and run the following commands:

$ cd /etc/
$ python -m SimpleHTTPServer &
$  telnet local_host 8000
// connection will succeed
$ wget http://local_host:8000/hosts -O /tmp/hosts
// file will be stored to /tmp/hosts
$ curl http://local_host:8000/hosts
// file contents will be printed

Java handles this, too:

import java.net.*;
public class URLTest  {
   public static void main(String args[]) throws Exception {
 	URL url = new URL(args[0]);
 	System.out.println("URL host is " + url.getHost());
   }
}
$ java URLTest http://alta_vista.com
URL host is alta_vista.com

… but sometimes it silently turns strict validation on:

import java.net.*;
public class URITest  {
   public static void main(String args[]) throws Exception {
 	URI uri = new URI(args[0]);
 	System.out.println("URI host is " + uri.getHost());
   }
}
$ java URITest http://alta_vista.com
URI host is null

What?..

Let’s look at URI javadoc: URI(String) -java.lang.String- throws URISyntaxException if ‘given string violates RFC as augmented by the above deviations’ but this didn’t happen – exception was not thrown.

getHost()— Returns ‘the host component of this URI, or null if the host is undefined’ – if you read Javadoc of this method long enough, you have a chance to understand that if host does not match alphanum + ‘-’ + dots , null is returned

Given that most people don’t differentiate URL and URI and no exception is thrown when URI is created – your code may or may not work with host names containing underscore depending on which class you use and which methods of this class are called.

Bottom line: always use only ASCII alphanum + ‘-’ in host names unless you want to take an ‘Advanced Java by examples’ crash course during deadline.

Leave a comment

Your email address will not be published. Required fields are marked *