Java and underscores in host names

Deadlocks happen when two or more threads are blocked forever because each has acquired a resource and waits for another resource which is already taken by another thread.

  • JAVA
  • Linux
  • Host Names
Written by Anton Luht • 30 Oct 2018 • 2 min read • Last updated 3 hours ago

Some time ago I came across a very strange issue - tests that worked at one host started failing an another host with NullPointerException. Hosts were very identical, and after debugging it appeared that Java handles hosts with non-standard characters in host names in a special way.

Wikipedia article https://en.wikipedia.org/wiki/Hostname(opens new window) says that underscore is not allowed in host names, but it's easy to see that many standard Linux tools allow this: add the following entry to /etc/hosts:

127.6.7.8  local_host

and run the following commands:

$ cd /etc/
$ python -m SimpleHTTPServer &
$ telnet local_host 8000
// connection will succeed
$ wget <a href="http://local_host:8000/hosts">http://local_host:8000/hosts</a> -O /tmp/hosts
// file will be stored to /tmp/hosts
$ curl <a href="http://local_host:8000/hosts">http://local_host:8000/hosts</a>
// file contents will be printed

Java handles this, too:

import java.net.*;

public class URLTest {
  public static void main(String args[]) throws Exception {
   URL url = new URL(args[0]);
   System.out.println("URL host is " + url.getHost());
  }
}
---
$ java URLTest http://alta_vista.com
URL host is alta_vista.com

But, sometimes it silently turns strict validation on:

import java.net.*;

public class URITest {
  public static void main(String args[]) throws Exception {
   URI uri = new URI(args[0]);
   System.out.println("URI host is " + uri.getHost());
  }
}
---
$ java URITest http://alta_vista.com
URI host is null

What??...

Let's look at URI javadoc: URI(String) https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#URI-java.lang.String-(opens new window)

throws URISyntaxException if 'given string violates RFC as augmented by the above deviations' but this didn't happen - exception was not thrown.

getHost() https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#getHost--(opens new window)

Returns 'the host component of this URI, or null if the host is undefined' - if you read Javadoc of this method long enough, you have a chance to understand that if host does not match alphanum + '-' + dots , null is returned

Given that most people don't differentiate URL and URI and no exception is thrown when URI is created - your code may or may not work with host names containing underscore depending on which class you use and which methods of this class are called.

Bottom line: always use only ASCII alphanum + '-' in host names unless you want to take an 'Advanced Java by examples' crash course during deadline.