Monthly Archive for April, 2013

Hadoop 2: “alpha” elephant or not? Part 2: features and opinions

In the first part of this article I looked into the development that brought us Hadoop 2. Let’s now try to analyze whether Hadoop 2 is ready for general consumption, or if it’s all just a business hype at this point. Are you better off sticking to the old, not-that-energetic grandpa who, nonetheless, delivers every time or going riding with the younger fella who might be a bit “unstable”?

New features

Hadoop 2 introduces a few very important features such as

    • HDFS High Availability (HA) with . This is what it does:

      …In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes…In the event of a fail-over, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a fail-over occurs.

      There’s an alternative approach to HDFS HA that requires an external filer (an NAS or NFS server to store a copy of the HDFS edit logs). In the case of failure of the primary NameNode, a new one can be brought over and the network-stored copy of the logs can be used to serve the clients. This is essentially a less optimal approach than QJM, as it involves more moving parts and requires more complex dev.ops.

    • An HDFS federation that essentially allows to combine multiple namespaces/namenodes to a single logical filesystem. This allows for better utilization of the higher-density storage.
  • YARN essentially implements the concept of Infrastructure-As-A-Service. You can deploy your non-MR applications to cluster nodes using YARN resource management and scheduling.Another advantage is the split of the old JobTracker into two independent services: resource management and job scheduling. It gives a certain advantage in the case of a fail-over and in general is a much cleaner approach to MapReduce framework implementation. YARN is API-compatible with MRv1, hence you don’t need to do anything about your MR applications, just perhaps recompile the code. Just run them on YARN.

Improvements

The majority of the optimizations were made on the HDFS side. Just a few examples:

  • overall file system read/write improvements: I’ve seen reports of >30% performance increase from 1.x to 2.x with the same workload
  • read improvements for DN and client collocation HDFS-347 (yet to be added to the 2.0.5 release)

Good overall observation on the HDFS road map can be found here

Vendors

Here’s how the bets are spread among commercial vendors, with respect to supported production-ready versions:

Hadoop 1.x Hadoop 2.x
Cloudera x[1] x
Hortonworks x
Intel x
MapR x[1] x
Pivotal x
Yahoo! x[2]
WANdisco x

The worldview of software stacks

In any platform ecosystem there are always a few layers: they are like onions; onions have layers 😉

  • in the center there’s a core, e.g. OS kernel
  • there are few inner layers: the system software, drivers, etc.
  • and the external layers of the onion… err, the platform — the user space applications: your web browser and email client and such

The Hadoop ecosystem isn’t that much different from Linux. There’s

  • the core: Hadoop
  • system software: Hbase, Zookeeper, Spring Batch
  • user space applications: Pig, Hive, users’ analytics applications, ETL, BI tools, etc.

The responsibility of bringing all the pieces of the Linux onion together lies on Linux distribution vendors: Canonical, Redhat, SUSE, etc. They pull certain versions of the kernel, libraries, system and user-space software into place and release these collections to the users. But first they make sure everything fits nicely and add some of their secret sauce on top (think Ubuntu Unity, for example). Kernel maintenance is not a part of daily distribution vendors’ business. Yet they are submitting patches and new features. A set of kernel maintainers is then responsible to bring changes to the kernel mainline. Kernel advancements are happening under very strict guidelines. Breaking compatibility with user-space is rewarded by placing a guilty person straight into the 8th circle of Inferno.

Hadoop practices a somewhat different philosophy than Linux, though. Hadoop 1.x is considered stable, and only critical bug fixes are getting incorporated into it (Table2). Whereas Hadoop 2.x is moving forward at a higher pace and most improvements are going there. That comes with at a cost to user-space applications. The situation is supposedly addressed by labeling Hadoop 2 as ‘alpha’ for about a year now. On the other hand, such tagging arguably prevents user feedback from flowing into the development community. Why? Because users and application developers alike are generally scared away by the “alpha” label: they’d rather sit and wait until the magic of stabilization happens. In the meanwhile, they might use Hadoop 1.x.

And, unlike the Canonical or Fedora project, there’s no open-source integration place for the Hadoop ecosystem. Or is there?

Integration

There are 12+ different components in the Hadoop stack (as represented by the BigTop project). All these are moving at their own pace and, more often than not, support both versions of Hadoop. This complicates the development and testing. It creates a large amount of issues for the integration of these projects. Just think about the variety of library dependencies and such that might all of a sudden be at conflict or have bugs (HADOOP-9407 comes to mind). Every component also comes with its own configuration, adding insult to injury for all the tweaks in Hadoop.

All this brings a lot of issues to the DevOps who need to install, maintain, and upgrade your average Hadoop cluster. In many cases, DevOps simply don’t have the capacity or knowledge to build and test a new component of the stack (or a newer version of it) before bringing it to the production environment. Most of the smaller companies and application developers don’t have the expertise to build and install multiple versions from the release source tarballs, configure and performance tune of the installation.

That’s where software integration projects like BigTop come into the spotlight. BigTop was started by Roman Shaposhnik (ASF Bigtop, Chair PMC) and Konstantin Boudnik (ASF Bigtop, PMC) at the Yahoo! Hadoop team back in 2009-2010. It was a continuation of earlier work based on expertise in software integration and OS distributions. BigTop provides a versatile tool for creating software stacks with predefined properties, validates the compatibility of integral parts, and creates native Linux packaging to ease the installation experience.

BigTop includes a set of Puppet recipes — an industry standard configuration management system — that allows to spin up a Hadoop cluster in about 10 minutes. The cluster can be configured for Kerber’ized or non-secure environments. A typical release of BigTop looks like a stack’s bill-of-materials and source code. It lets anyone quickly build and test a packaged Hadoop cluster with a number of typical system and user-space components in it. Most of the modern Hadoop distributions are using BigTop openly or under the hood, making BigTop a de facto integration spot for all upstream projects

Conclusions

Here’s Milind Bhandarkar (Chief Architect at Pivotal):

As part of HAWQ stress and longevity testing, we tested HDFS 2.0 extensively, and subjected it to the loads it had never seen before. It passed with flying colors. Of course, we have been testing the new features in HDFS since 0.22! EBay was the first to test new features in HDFS 2.0, and I had joined Konstantin Schvachko to declare Hadoop 0.22 stable, when the rest of the community called it crazy. Now they are realizing that we were right.

YARN is known for very high stability. Arun Murthy – RM of all of 2.0.x-alpha releases and one of the YARN authors – in the 2.0.3-alpha release email:

# Significant stability at scale for YARN (over 30,000 nodes and 14 million applications so far, at time of release – see here)

And there’s this view that I guess is shared by a number of application developers and users sitting on the sidewalks:

I would expect to have a non-alpha semi-stable release of 2.0 by late June or early July.  I am not an expert on this and there are lots of things that could show up and cause those dates to slip.

In the meanwhile, six out of seven vendors are using and selling Hadoop 2.x-based versions of storage and data analytics solutions, system software, and service. Who is right? Why is the “alpha” tag kept on for so long? Hopefully, now you can make your own informed decision.

References:

[1]: EOLed or effectively getting phased out

[2]: Yahoo! is using Hadoop 0.23.x in production, which essentially is very close to the Hadoop 2.x source base

avatar

About Konstantin Boudnik

WANdisco CEO, David Richards, presents at Tech London Advocates Launch as Founding Member

Last week saw the launch of Tech London Advocates, a new advocacy group launched by angel and venture investor Russ Shaw to support London’s technology start-ups into high growth.

 

With a founding membership of 150 comprising international CEOs, CTOs, fund managers and private investors, Tech London Advocates launched with an event featuring presentations by high profile executives, including our own founder and CEO, David Richards.

 

High profile supporters of Tech London Advocates include Saul Klein, partner at Index Ventures; David Richards, founder and CEO of WANdisco; Julie Meyer, founder of Ariadne Capital; Sherry Coutu, co-chair of Silicon Valley comes to the UK; Simon Devonshire, director at Wayra Europe, Dan Crow, CTO of Songkick and Rajeeb Dey, CEO and founder of Enternships.

 

Tech London Advocates will work in partnership with existing groups and initiatives to support ongoing efforts to establish London as a world-class hub for digital and technology businesses. WANdisco is honored to be part of the advocacy group.

 

Hadoop 2: “alpha” elephant or not?

Today I will look into the state of Hadoop 2.x and try to understand what has kept it in the alpha state to date. Is it really an “alpha” elephant? This question keeps popping up on the Internet, in conversations with customers and business partners. Let’s start with some facts first.

The first anniversary of the Hadoop 2.0.0-alpha release is around the corner. SHA 989861cd24cf94ca4335ab0f97fd2d699ca18102 was made on May 8th, 2012, marking the first-ever release branch of the Hadoop 2 line (in the interest of full disclosure: the actual release didn’t happen until a few days later, May 23rd).[1]

It was a long-awaited event. And sure enough, the market accepted it enthusiastically.  The commercial vendor Cloudera announced its first Hadoop 2.x-based CDH4.0 at the end of June 2012, according to this statement from Cloudera’s VPoP — just a month after 2.0.0-alpha went live! So, was it solid, fact-based trust of the quality of the code base, or something else?  An interesting nuance: MapReduce v1 (MRv1) was brought back despite the presence of YARN (a new resource scheduler and a replacement for the old MapReduce). One of those things that make you go, “Huh…?”

We’ve just seen the 2.0.4-alpha RC vote getting closed: the fifth release in a row in just under one year. Many great features went in: YARN; HDFS HA; HDFS performance optimizations, to name a few. An incredible amount of stabilization has been done lately, especially in 2.0.4-alpha. Let’s consider some numbers:

Table1: JIRAs committed to Hadoop between 2.0.0-alpha and 2.0.4-alpha releases

HADOOP 383
HDFS 801
MAPREDUCE 219
YARN 138

That’s about 1,500 fixes and features since the beginning. Which was to be expected, considering the scope of implemented changes and the need for smoothing things out.

Let’s for a moment look into Hadoop 1.x — essentially the same old Hadoop 0.20.2xx — per latest genealogy of elephants — a well-respected and stable patriarchy. Hadoop 1.x had 8 releases altogether in 14 months:

  • 1.0.0 released on Dec 12, 2011
  • 1.1.2 released on Feb 15, 2013

Table2: JIRAs committed to Hadoop between 1.0.0 and 1.1.2 releases

HADOOP 110
HDFS 111
MAPREDUCE 84

That’s about five times fewer fixes and improvements than what went into Hadoop 1.x over roughly the same time. If frequency of change is any indication of stability, then perhaps we are onto something.

“Wow,” one might say, “no wonder the ‘alpha’ tag has been so sticky!” Users definitely want to know if the core platform is turbulent and unstable. But wait… wasn’t there that commercial release that happened a month after the first OSS alpha? If it was more stable than the official public alpha, then why did it take the latter another five releases and 1,500 commits to get where it is today? Why wasn’t the stabilization simply contributed back to the community? Or, if both were of the same high quality to begin with, then why is the public Hadoop 2.x still wearing the “alpha” tag one year later?

Before moving any further: all 13 releases — for 1.x and 2.x —  were managed by engineers from Hortonworks. Tipping my hat to those guys and all contributors to the code!

So, is Hadoop 2 that unstable after all? In the second part of this article I will dig into the technical merits of the new development line so we can decide for ourselves. To be continued

References:
[1] All release info is available from official ASF Hadoop release page

avatar

About Konstantin Boudnik

On coming fragmentation of Hadoop platform

I just read this interview with the CEO of HortonWorks in which he expresses a fear about Hadoop fragmentation. He calls attention to the valid issue in the Hadoop ecosystem where forking is getting to the point that product space is likely to get fragmented.

So why should the BigTop community bother? Well, for one, Hadoop is the core upstream component of the BigTop stack. By filling this unique position, it has a profound effect on downstream consumers such as HBase, Oozie, etc. Although projects like Hive and Pig can partially avoid potential harm by statically linking with Hadoop binaries, this isn’t a solution for any sane integration approach. As a side note: I am especially thrilled by Hive’s way of working around multiple incompatibilities in the MR job submission protocol. The protocol has been naturally evolving for quite some time, and no one could even have guaranteed compatibility in versions like 0.19 or 0.20. Anyway, Hive solved the problem by simply generating a job jar, constructing a launch string and then – you got it already, right? – System.exec()’ing the whole thing. On a separate JVM, that is! Don’t believe me? Go check the source code yourself.

Anecdotal evidence aside, there’s a real threat of fracturing the platform. And there’s no good reason for doing so even if you’re incredibly selfish, or stupid, or want to monopolize the market. Which, by the way, doesn’t work for objective reasons even with so-called “IP protection” laws in place. But that’s a topic for another day.

So, what’s HortonWorks’ answer to the problem? Here it comes:

Amid current Hadoop developments—is there any company NOT launching a distribution with some value added software?—Hortonworks stands out. Why? Hortonworks turns over its entire distribution to the Apache open source project.

While it is absolutely necessary for any human endeavor to be collaborative in order to succeed, the open source niche might be a tricky one. There are literally no incentives for all players to play by the book, and there’s always that one very bold guy who might say, “Screw you guys, I’m going home,” because he is just… you know…

Where could these incentives come from? How can we be sure that every new release is satisfactory for everyone’s consumption? How do we guarantee that HBase’s St.Ack and friends won’t be spending their next weekend trying to fix HBase when it loses its marbles because of that tricky change in Hadoop’s behavior?

And here comes a hint of an answer:

We’re building directly in the core trunk, productizing the package, doing QA and releasing.

I have a couple of issues with this statement. But first, a spoiler alert: I am not going to attack neither Hortonworks nor their CEO. I don’t have a chip on my shoulder — not even an ARM one. I am trying to demonstrate the fallacy in the logic and show what doesn’t work and why. And now here’s the laundry list:
  • building directly in the core trunk“: Hadoop isn’t released from the trunk. This is a headache. And this is one of the issues that the BigTop community faced during the most recent stabilization exercise for the Hadoop 2.0.4-alpha release. Why’s that a problem? Well, for one, there’s a policy that “everything should go through the trunk”. It means — in context of Hadoop’s current state — that you have to first commit to the trunk, then back-port to branch-2, which is supposed to be the landing ground for all Hadoop 2.x releases, just like branch-1 is the landing ground for all Hadoop 1.x releases. If it so happens that there’s an active release(s) happening at the moment, one would need to back-port the commit to another release branch(es), such as 2.0.4-alpha in this particular example. Mutatis mutandis, some of the changes are reaching only about 2/3 of the way down. Best-case scenario. This approach also gives fertile ground to all “proponents” of open-source Hadoop because once their patches are committed to the trunk, they are as open-source as the next guy. They might get released in a couple of years, but hey — what’s a few months between friends, right?
  • productizing the package“: is Mr. Bearden aware of when development artifacts for an ongoing Hadoop release were last published in the open? ‘Cause I don’t know of a publication of any such thing to date. Neither does Google, by the way. Even the official source tarballs weren’t available until, like, 3 weeks ago. Why does that constitute a problem? How do you expect to perform any reasonable integration validation if you don’t have an official snapshot of the platform? Once your platform package is “productized”, it is a day late to pull your hair out. If you happen to find some issues — come back later. At the next release, perhaps?
  • doing QA and releasing“: we are trying to build an open-source community here, right? Meaning that the code, the tests and their results, the bug reports, the discussions should be in the open. The only place where the Hadoop ecosystem is being tested at any reasonable length and depth is BigTop. Read here for yourself. And feel free to check the regular builds and test runs for _all_ the components that BigTop releases for both secured and non-secured configurations. What are you testing with and how, Mr. Bearden?
So, what was the solution? Did I miss it in the article? I don’t think so. Because a single player — even one as respected as Hortonworks — can’t solve the issue in question without ensuring that anything produced by the Hadoop project’s developers is always in line with the expectations of downstream players.
That’s how you prevent fracturing: by putting in the open a solid and well-integrated reference implementation of the stack – one that can be installed by anyone using open-standard packaging and loaded with third-party applications without tweaking them every time you go from Cloudera’s cluster to MapR’s. Or another pair of vendors’. Does it sound like I am against making money in open-source software? Not at all: most people in the OSS community do this on the dime of their employers or as part of their own business.
You can consider BigTop’s role in the Hadoop centric environment to be similar to that of Debian in the Linux kernel/distribution ecosystem. By helping to close the gap between the applications and the fast-moving core of the stack, BigTop essentially brings reassurance of the Hadoop 2.x line’s stability into the user space and community. BigTop helps to make sure that vendor products are compatible with each other and with the rest of the world; to avoid vendor lock-in and to guarantee that recent Microsoft stories will not be replayed all over again.

Are there means to achieve the goal of keeping the core contained? Certainly! BigTop does just that. Recent announcements from Intel, Pivotal, WANdisco are living proof of it: they all using BigTop as the integration framework and consolidation point. Can these vendors deviate even under such a top-level integration system? Sure. But this will be immensely harder to do.

avatar

About Konstantin Boudnik

Ignoring Files with SmartSVN

It’s common for Apache Subversion projects to contain files you don’t wish to place under version control; for example, your own notes or a list of tasks you need to complete.

Users of SmartSVN, the popular cross-platform SVN client from WANdisco, will be reminded of these unversioned files whenever they perform an ‘svn commit.’ In most instances you’ll want to add these files to SmartSVN’s ignore list to prevent them from cluttering up your commit dialog, and to safeguard against accidentally committing them to the repository.

To add a file to SmartSVN’s ignore list:

1) Select the unversioned file you wish to ignore.

2) Open the ‘Modify’ menu and click ‘Ignore…’ If the ‘Ignore’ option is greyed out, double check the file in question hasn’t already been committed!

3) Choose either ‘Explore Explicitly,’ which adds the selected file/directory to the ignore list, or ‘Ignore As Pattern.’

If ‘Ignore As Pattern’ is selected, SmartSVN ignores all files with the specified naming convention. Enter the names of the files you wish to ignore, or use the * wildcard to ignore all files that:

  • End with the specified file extension (*.png, *.txt, *.class)
  • Contain certain keywords (test_*, draft*)

The above two options are useful if you wish to ignore a group of related files, for example all image files. You can also opt to ignore all files, by entering the * wildcard and no other information.

4) Select ‘OK’ to add the file(s) to SmartSVN’s ignore list.

Ignore Patterns Property

You may also wish to apply the ‘Ignore Patterns’ property to your project. This has the same effect as selecting ‘Ignore Patterns’ in SmartSVN’s ignore list (described above) but it doesn’t require you to select a file first. This means you can configure SmartSVN to ignore groups of files before you even add them to your project.

To apply the ‘Ignore Patterns’ property:

1) Open the ‘Properties’ menu and select ‘Ignore Patterns…’

edit ignore patterns

2) Enter the names of the files you wish to ignore. Again, you can use the * wildcard where necessary.

Visit http://www.smartsvn.com/download to try SmartSVN Professional free before you buy.

Understanding SmartSVN’s Revision Graph

SmartSVN, the popular cross-platform client for Apache Subversion, provides all the tools you need to manage your SVN projects out of the box, including a comprehensive Revision Graph.

SmartSVN’s Revision Graph offers an insight into the hierarchical history of your files and directories, by displaying information on:

  • Merged revisions

  • Revisions yet to be merged

  • Whether a merge occurred in a specific revision

  • Which changes happened in which branch

  • When a file was moved, renamed or copied, along with its history

The Revision Graph is useful in several tasks, including identifying changes made in each revision before rolling back to a previous revision, and gathering more information on the state of a project before a merge.

Accessing the Revision Graph

To access the Revision Graph, open the ‘Query’ menu and select ‘Revision Graph.’

revision graph

Understanding the Revision Graph

In the Revision Graph, projects are mainly represented by:

node Nodes – represent a specific entry (file/directory) at a specific revision.

branch 

    Branches – a collection of linked nodes at the same URL.

 

 

The main section of the Revision Graph is the ‘Revisions’ pane, which displays the parent-child relationships between revisions. Revisions are arranged by date, with the newest at the top. In addition to the main ‘Revisions’ pane, the SmartSVN Revision Graph includes several additional views:

  • Revision Info – displays information on the selected revision (such as revision number, date, author who created the revision etc.)

revision info

  • Directories and files – displays modified files in the selected revision. This is useful for pinpointing the revision at what point a particular file changed or disappeared from the project.

From this screen, you can access several additional options:

  • Export – export the Revision Graph as an HTML file by selecting ‘Export as HTML…’ from the ‘Graph’ menu. This file can then be easily shared with other team members.

  • Merge Arrows – select the ‘Show Merge Arrows’ option from the ‘Query’ menu to view the merge arrows. These point from the merge source to the merge target revisions. If the merge source is a range of revisions, the corresponding revisions will be surrounded by a bracket. This allows you to get an overview of merges that have occurred within your project, at a glance.

  • Merge Sources – select the ‘Show Merge Sources’ option from the ‘Query’ menu to see which revisions have been merged into the currently selected target revision.

  • Merge Targets – select ‘Show Merge Targets’ from the ‘Query’ menu to see the revisions where the currently selected target revisions have been merged.

  • Search – if you’re looking for a particular revision, you can save time by using ‘Edit’ and ‘Search.’ Enter the ‘Search For’ term and specify a ‘Search In’ location.

  • Branch Filter – clicking the ‘Branch Filter’ option in the ‘View’ menu allows you to filter the display for certain branches. This is particularly useful if you’re examining a large project consisting of many different branches.

WANdisco Announces SVN MultiSite Plus

We are proud to announce SVN MultiSite Plus, the newest product in our enterprise Subversion product line. WANdisco completely re-architected SVN MultiSite and the result is SVN MultiSite Plus, a replication software solution delivering dramatically improved performance, flexibility and scalability for large, global organizations.

SVN MultiSite Plus enables non-stop performance, scalability and backup, alongside 24/7 availability for globally distributed Apache Subversion deployments. This new product takes full advantage of recent enhancements to our patented active-active replication technology to improve flexibility, scalability, performance and ultimately developer and administrator productivity.

“SVN MultiSite has been improving performance and productivity for global enterprises since 2006 and SVN MultiSite Plus builds on those features for even greater benefits,” said David Richards, WANdisco CEO. “We’re committed to providing organizations with the most robust and flexible solutions possible and we’re confident SVN MultiSite Plus will meet and exceed the requirements of the largest globally distributed software development organizations.”

To find out more, visit our SVN MultiSite Plus product page, download the datasheet, or see how it compares to SVN MultiSite. You can try SVN MultiSite Plus firsthand by signing up for a free trial, or attend the free, online SVN MultiSite Plus demo we’ll be holding on May 1st. This webinar will demonstrate how SVN MultiSite Plus:

  • Eliminates up to 90% of communication overhead at each location

  • Eliminates downtime completely by providing administrators with the ability to add/remove servers on-the-fly

  • Delivers additional savings over SVN MultiSite through tools consolidation and greater deployment flexibility

  • Provides increased efficiency and flexibility with selective repository replication

  • And more.

This webinar is free but register now to secure a spot.

Subversion Tip of the Week

An Apache Subversion working copy can be created quite simply by running the ‘svn checkout’ command. However, sometimes you’ll want to have more control over the contents of your working copy; for example, when you’re working on a large project and only need to checkout a single directory.

In this post, we share two ways to get greater control over your checkout commands.

1. Checkout a particular revision

By default, Subversion performs a checkout of the HEAD revision, but in some instances you may wish to checkout a previous revision, for example when you’re recovering a file or directory that has been deleted in the HEAD revision.

To specify a revision other than HEAD, add the -r switch when performing your checkout:

svn checkout (URL) -r(revision number) (Location)

In this example, we are performing a checkout of the project as it existed at revision 10.

customizing working copy

2. Performing Shallow Checkouts

A standard Subversion checkout copies the entire directory, including every folder and file. This can be too time-consuming if you’re working on a large project, or too complicated if your project contains many different branches, tags and directories. If you don’t require a copy of your entire project, a ‘shallow checkout’ restricts the depth of the checkout by preventing Subversion from descending recursively through the repository.

To perform a shallow checkout, run the ‘svn checkout’ command with one of the following switches:

  • –depth immediates: checkout the target and any of its immediate file or children. This is useful if you don’t require any of the children’s contents.

  • –depth files: checkout the target and any of its immediate file children.

  • –depth empty: checkout the target only, without any of the files or children. This is useful when you’re working with a large project, but only require the contents of a single directory.

In this example we are performing a shallow checkout on a ‘bug fix branch’ located within the branches folder, and specifying that only the immediate file children should be included (–depth files):

customizing working copy 2

Looking for a cross-platform Subversion client? Get a free trial of SmartSVN Professional at www.smartsvn.com/download

WANdisco Releases New Version of Hadoop Distro

We’re proud to announce the release of WANdisco Distro (WDD) version 3.1.1.

WDD is a fully tested, production-ready version of Apache Hadoop 2 that’s free to download. WDD version 3.1.1 includes an enhanced, more intuitive user interface that simplifies Hadoop cluster deployment. WDD 3.1.1 supports SUSE Linux Enterprise Server 11 (Service Pack 2), in addition to RedHat and CentOS.

“The number of Hadoop deployments is growing quickly and the Big Data market is moving fast,” said Naji Almahmoud, senior director of global business development, SUSE, a WANdisco Non-Stop Alliance partner. “For decades, SUSE has delivered reliable Linux solutions that have been helping global organizations meet performance and scalability requirements. We’re pleased to work closely with WANdisco to support our mutual customers and bring Hadoop to the enterprise.”

All WDD components are tested and certified using the Apache BigTop framework, and we’ve worked closely with both the open source community and leading big data vendors to ensure seamless interoperability across the Hadoop ecosystem.

“The integration of Hadoop into the mainstream enterprise environment is increasing, and continual communication with our customers confirms their requirements – ease of deployment and management as well as support for market leading operating systems,” said David Richards, CEO of WANdisco. “With this release, we’re delivering on those requirements with a thoroughly tested and certified release of WDD.”

WDD 3.1.1 can be downloaded for free now. WANdisco also offers Professional Support for Apache Hadoop.

Apache Subversion Team Releases 1.7.9 and 1.6.21

The Apache Subversion team has announced two new releases: Subversion 1.7.9 and 1.6.21.

Subversion 1.7.9 improves the error messages for svn:date and svn:author props, and it improves the logic in mod_dav_svn’s implementation of lock, as well as a list of other features and fixes:

  • Doxygen docs now ignore prefixes when producing the index

  • Javahl status api now respects the ignoreExternals boolean

  • Executing unnecessary code in log with limit is avoided

  • A fix for a memory leak in `svn log` over svn://

  • An incorrect authz failure when using neon http library has been fixed

  • A fix for an assertion when rep-cache is inaccessible

More information on Apache Subversion 1.7.9 can be found in the Changes file.

Meanwhile, Subversion 1.6.21 improves memory usage when committing properties in mod_dav_svn, and also improves logic in mod_dav_svn’s implementation of lock, alongside bug fixes including:

  • A fix for a post-revprop-change error that could cancel commits

  • A fix for a compatibility issue with g++ 4.7

More information on Apache Subversion 1.6.21 can be found in the Changes file.

Both versions can be downloaded free via the WANdisco website.