Monthly Archive for February, 2014

Apache Announces Subversion 1.7.16

On the heels of Subversion 1.8.8, I’m pleased to announce the release of 1.7.16 on behalf of the Apache Subversion project. Along with the official Apache Software Foundation source releases, our own fully tested and certified binaries are available from our website.

1.7.16 is a bugfix and security fix releases and does not include major new features. 1.7.16 includes the following changes:

  • A security fix that prevents Apache httpd from crashing when SVNListParentPath is turned on and certain requests are received. Further details on the issue can be found in the advisory the Apache Subversion project has published.
  • Reduced memory usage in both server implementations during checkout and export operations.
  • Fixed an issue that caused executing a copy of a relocated path to break the working copy.
  • Resolve a number of regressions in diff that we introduced in 1.7.14. Most notably, requesting a diff against a specified revision and a working copy file that had a svn:mime-type property would fail.

For a complete list of new features and fixes, visit the Apache changelogs for Subversion 1.7.

You can download our fully tested, certified binaries for Subversion 1.7.16 free here.

WANdisco’s binaries are a complete, fully-tested version of Subversion based on the most recent stable release, including the latest fixes, and undergo the same rigorous quality assurance process that WANdisco uses for its enterprise products that support the world’s largest Subversion implementations.

Demo of Non-Stop HBase: Brett Rudenstein – Strata SC 2014

Brett Rudenstein, Senior Product Manager for Big Data, sat down with theCUBE’s Dave Vellante and Wikibon’s Jeff Kelly at Strata to examine why customers are asking how they can use their idle clusters for more than just disaster recovery, what Non-Stop HBase means for enterprise architecture, and the implications for real-time mission-critical applications. Rudenstein provides a demonstration of Non-Stop Hadoop’s continuous availability to show how WANdisco’s active-active replication is applied to HBase region servers, enabling real-time data visualization.

Be sure to watch theCUBE’s interviews with CEO David Richards and CTO Jagane Sundar, CMO Jim Campigli, and Dr. Konstantin Boudnik for more from Strata Santa Clara.

More information about Non-Stop Hadoop is available on our our website.

Interview: Dr. Konstantin Boudnik – Strata Santa Clara 2014

The always entertaining and educational Dr. Konstantin Boudnik – a.k.a. “Cos” – gave theCUBE an insider’s perspective on the current state of Hadoop adoption and innovation. Cos cuts straight to the point, discussing why enterprise solutions available today aren’t meeting enterprise demands, the need for continuous global availability, the problems with HBase’s built-in failover capability in environments with multiple region servers, and what WANdisco’s technology means for the future of real-time applications.

Don’t miss theCUBE’s interviews with CEO David Richards and CTO Jagane Sundar, and CMO Jim Campigli for more from Strata Santa Clara.

More information about Non-Stop Hadoop is available on our our website.

Interview: Jim Campigli & Jagane Sundar – Strata Santa Clara 2014

Drilling deeper into the current state of Hadoop in the enterprise, CMO Jim Campigli and CTO and VP of Engineering for Big Data Jagane Sundar spoke with theCUBE to discuss why the majority of Hadoop clusters aren’t yet in production, how WANdisco is enabling worldwide data availability and disaster recovery, the use cases that enterprises are most interested in, and what’s driving demand for continuous availability and Non-Stop Hadoop in various industries.

Watch theCUBE’s interview with CEO David Richards and CTO Jagane Sundar and watch out for more interviews with WANdisco execs and engineers from Strata Santa Clara coming soon.

More information about Non-Stop Hadoop is available on our our website.

Apache Announces Crash Fixes and Performance Improvements for Subversion 1.8.8

Today I’m pleased to announce the release of Subversion 1.8.8 on behalf of the Apache Subversion project. Along with the official Apache Software Foundation source releases, our own fully tested and certified binaries are available from our website.

1.8.8 is a bugfix and security fix release and does not include major new features. 1.8.8 includes following changes:

  • A security fix that prevents Apache httpd from crashing when SVNListParentPath is turned on and certain requests are received. Further details on the issue can be found in the advisory the Apache Subversion project has published.
  • Reduced memory usage in both server implementations during checkout and export operations.
  • Fixed an issue that caused executing a copy of a relocated path to break the working copy.
  • Support verifying SSL server certificates using the Windows CryptoAPI when the certificate has an intermediary certificate between it and the root certificate. This restores the ability to verify certificates automatically as was the case before intermediate certificates became commonly used.
  • Clients receiving redirects from DAV servers can now automatically relocate the working copy even if the working copy is not rooted at the repository root.
  • Improve performance when built with SQLite 3.8 which has a new query planner.
  • Fix errors that occurred when executing a move between an external and the parent working copy.
  • Resolve a performance regression with log when used against old servers with a single revision range.
  • Decrease the disk I/O needed to calculate the differences between 3 files during a merge.
  • Prevent problems with symlinks being checked out on Windows to a NAS that doesn’t support a flush operation.
  • When committing do not change the permissions on files in the working copy.
  • When committing fix an assertion due to pool lifetime issues. This was usually seen by git-svn as an error about a path not being canonical.
  • Fix an error with status that caused failures on some lesser used platforms such as PPC due to a missing sentinel value.
  • When creating a rep-cache.db file in a FSFS repository, use the proper permissions so that it can be used without an admin fixing the permissions.
  • Fix the mod_dav_svn SVNAllowBulkUpdates directive so that it can be changed in different blocks.
  • Fix mod_dav_svn to include properties in the reports when requested by the client, so that the client doesn’t need to request them separately.
  • Fix the help text of svnserve to correctly document the default size of the memory cache. It does not default to 128 MBs in threaded mode, but 16 MBs in all modes.
  • Reduce the size of dump files when the ‘–deltas’ option is used by calculating the delta even when we haven’t stored a delta in the source repository due to the skip delta algorithm.
  • Fixed several build issues when building bindings. Most notably OS X can build the SWIG bindings out of the tarball without regenerating the interfaces.
  • Developers using the Subversion APIs will find numerous documentation fixes and some API changes and should refer to the CHANGES file for details.

For a complete list of new features and fixes, visit the Apache changelogs for Subversion 1.8.

You can download our fully tested, certified binaries for Subversion 1.8.8 here.

Using Subversion on Windows? Download TortoiseSVN 1.8.5 now.

WANdisco’s binaries are a complete, fully-tested version of Subversion based on the most recent stable release, including the latest fixes, and undergo the same rigorous quality assurance process that WANdisco uses for its enterprise products that support the world’s largest Subversion implementations.

Interview: David Richards & Jagane Sundar – Strata Santa Clara 2014

Strata Santa Clara was exciting for us with our announcement of Non-Stop HBase attracting great interest. During an interview with theCUBE, CEO David Richards and CTO and VP of Engineering for Big Data Jagane Sundar discussed Non-Stop HBase architecture and use cases, trends in enterprise Hadoop adoption, the leaders that will emerge in the Big Data market, and more.

Stay tuned for more interviews from theCUBE and visit our website to learn more about Non-Stop Hadoop.

Detecting Dependency Trends in Components Using R and Hadoop

As I’ve been experimenting with Flume to ingest ALM data into a Hadoop cluster, I’ve made a couple of interesting observations.

First, the Hadoop ecosystem makes it easy for any team to start using these tools to gather data from disparate ALM sources. You don’t need big enterprise data warehouse (EDW) tools – just Flume and a small Hadoop cluster, or even just a VM from one of the Hadoop vendors to get started. These tools are free and easy to use in a small deployment, and you simply scale everything up as your needs grow.

Second, once the data is in Hadoop, you have access to the growing set of free data analysis tools for Hadoop, ranging from Hive and Pig, to scripted MapReduce jobs and more powerful tools like R.

My most recent experiment utilized the RMR package from Revolution Analytics, which provides a bridge between R, MapReduce, and HDFS. In this case, I had already used Flume to ingest Git commit data from a couple of related Git repositories, and I decided to look for any unusual relationships in the commit activity for the components in the system, including:

  • The most active components

  • The number of commits that affected more than one component

  • Which pairs of components tended to see work in the same commit

That last item I often find very interesting, as it may indicate some dependencies between components that aren’t otherwise obvious.

I had all the Git data stored on HDFS, so I used a ‘word count’-style MapReduce task to provide the counts. A partial R script is shown below.

# libraries
require(rmr2)
dfs.git = mapreduce(
 input = "/user/admin/git",
 map = function(k,v)  {
   comps = c()

   for(i in 1:nrow(v)) {
     lcomps = c()

     # … some cleanup work to extract components ...
     lcomps = append(lcomps, component)
     lcomps = sort(unique(lcomps))
     numUnique = length(lcomps)
     multis = c()
     for(j in 1:length(lcomps)) {
       for(k in (j+1):length(lcomps)) {
         # record pairs
         multis = append(multis, paste0(lcomps[j], "-", lcomps[k]))
       }
     }
     lcomps = append(lcomps, multis)

     if(numUnique > 1) {
       lcomps = append(lcomps, "MULTI")
     }

     comps = append(comps, lcomps)
   }
   keyval(comps,1)
 },
 reduce = function(k,vv) {
   keyval(k, sum(vv))

 })

Now that I’ve got these counts for each component and component pair, I can easily get it back into R for further manipulation.

out = from.dfs(dfs.git)
comps = unlist(out[[1]])
count = unlist(out[[2]])
results = data.frame(comps=comps, count = count)
results = results[order(results[,2], decreasing=T), ]
r = results[count > 250,]
barplot(r$count,names.arg=r$comps,las=3,col="blue")

I’ll just focus on the most active components and pairs, which I can see in this plot.

Anything interesting there? Maybe. It certainly looks like the ‘app’ component is far and away the busiest component, so perhaps it’s ripe for refactoring. I also notice that ‘app’ and ‘spec’ tend to be updated a lot in the same commit, and there’s a lot of cross-component work (“MULTI”) going on. And what’s missing? Well, the ‘doc’ module isn’t updated very often with other components.  Perhaps we’re not being good about documenting test cases right away.

But the main point is that I can now do some interesting data exploration with a minimum amount of work and no investment in an EDW.

So even if your ALM data isn’t ‘Big Data’ yet, you can still take advantage of the flexibility, low barriers to entry, and scalability of the Hadoop ecosystem. You’ll have some fairly interesting realizations before you know it!

 

SmartSVN 8.5 Preview 2 Released

A couple of days ago (10th Feb) we released SmartSVN 8.5, Preview 2. SmartSVN is the cross-platform graphical client for Apache Subversion.

New SmartSVN 8 features include:

  • Native Subversion libraries used for improved performance

SmartSVN 8 fixes include:

  • Various authentication fixes
  • Fixed conflicts with previously installed JavaHL version
  • Fixed several native crash causes
  • Fixed support of repositories where the user does not have access to the repository root
  • Fixed error after entering master password
  • Fixed local repository creation
  • Windows: attempt to launch a second SmartSVN instance no longer produces an error

For a full list of all improvements and bug fixes, view the changelog.

Have your feedback included in a future version of SmartSVN

Many issues resolved in this release were raised via our dedicated SmartSVN forum, so if you’ve got an issue or a request for a new feature, head over there and let us know.

You can download Preview 2 for SmartSVN 8.5 from our early access page.

Haven’t yet started with SmartSVN? Claim your free trial of SmartSVN Professional here.

Apache Announces Subversion 1.9 Alpha

Today the Apache Software Foundation (ASF) announced the alpha release of 1.9.0. As usual, you can download the official source release from the ASF, or download our fully tested and certified binaries from our website.

The alpha release of 1.9.0 is intended to provide an opportunity for users to give their feedback early in the release process. During the 1.8.0 release candidate process we received some feedback on the behavior of conflict resolution that we found difficult to change without delaying the release significantly. With 1.9.0, we’d like to get feedback earlier not only to speed up our release process, but to support the continued production of alpha releases.

In particular, we’d like feedback on the improvements to the interactive conflict menus, the new reverse blame support, and the new svn auth command. We believe the interactive conflict menus in the command line client have been made easier to understand. Reverse blame now allows you to see when lines were deleted, not just added or changed by moving through history in the opposite direction. Likewise, changes to the svn auth command allow you to view and manipulate your authentication credential cache.

Given that this is an alpha, we don’t recommend it for production use since there are known issues (see the release announcement) and testing has not been fully completed to ensure it is stable.  Additionally, things will almost certainly change before 1.9.0 is released. But if you have a test environment and you can spare some time to look into it, we’d like to hear your feedback.

One thing you may notice about 1.9.0 is that we’ve focused on smaller features and performance improvements. The biggest change that’s coming in 1.9.0 is the new FSFS format 7 with logical addressing which will improve server performance.

For a complete list of new features and fixes, visit the Apache changelog for Subversion 1.9. We’ll be publishing more blogs about upcoming 1.9 features and improvements in the coming months so stay tuned here for more details.

You can download our fully tested, certified Subversion binaries here. Certified 1.9.0 alpha binaries are available for the following operating systems:

  • Windows
  • Redhat Enterprise Linux 6
  • Redhat Enterprise Linux 5
  • CentOS 6
  • CentOS 5
  • Ubuntu 12.04
  • Ubuntu 10.04

Announcing Non-Stop HBase

Today at Strata Santa Clara we announced Non-Stop HBase, providing continuous availability across multiple data centers any distance apart. HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and used for random, real-time read/write access to Big Data including Facebook’s messaging platform.

In the same way that HDFS has a single point of failure in the NameNode, HBase has a Master Server that manages the cluster and Region Servers that store portions of tables and perform work on the data. HBase is sensitive to the loss of the Region and Master Servers.

Non-Stop Hadoop (including Hortonworks and Cloudera editions) applies WANdisco’s patented replication technology to these two availability and performance bottlenecks in HBase’s architecture – its region and master servers – to eliminate the risk of downtime and data loss.

“HBase is used for real-time interactive applications built on Hadoop,” said David Richards, Chairman and CEO of WANdisco. “Many of these Big Data applications are mission critical and even the cost of one minute of downtime is unacceptable. One hundred percent uptime for HBase is now a reality with our Non-Stop HBase technology. For the first time we’ve eliminated the risk of downtime and data loss by enabling HBase to be continuously available on a global scale.”

Stop by the WANdisco booth at Strata Santa Clara and don’t miss CTO Jagane Sundar’s session, “Non-Stop HBase – Making HBase Continuously Available for Enterprise Deployment” Thursday, February 13th.

Visit our website to learn more about how Non-Stop Hadoop provides LAN-speed performance and access to the same data at every location with automatic failover and recovery both within and across data centers.