WANdisco Fusion Blog

Hadoopハイブリッドクラウドが可能に

弊社の初めての“クラウド”についてのWebinar(Making Hybrid Cloud a Reality)の情報。概要は以下のとおり。詳しくはリプレイをご覧ください。
弊社はHadoopクラスタ間でデータ複製を行うFusionという製品を昨年出荷しました。オンプレのHadoopクラスタ間で使用されてきたが、最近はクラウドとオンプレのハイブリッドのサポートに力を入れている。クラウドとオンプレの間で通常業務を止めないで即時に双方向のデータ複製が可能になるので、RTO/RPOがほぼゼロのDRを安価に実現することが可能となる。ピーク時のみ(例えば年末)クラウドを使用する、クラウドにオンプレのHadoop(異なるディストリビューションでもOK)を集約する等々も簡単に行える。
HDFSとHDFSの間のみでなくHFDSとS3/EMR間でも複製を行うことができる。さらにHadoopの動いていないシステムのフラットファイル(PosixまたはNFS)をS3に複製することも可能である。
HadoopのData Nodeは現実的には同一データセンタに置くしかないが、Fusionの提供するActive -Active複製によりデータロスのない、トランザクショナル(正確にはWriteの順番が保証された)データ移動がオンプレとクラウドの間で可能になる。

avatar

About Kenji Ogawa (小川 研之)

WANdisco社で2013年11月より日本での事業を展開中。
以前は、NECで国産メインフレーム、Unix、ミドルウェアの開発に従事。その後、シリコンバレーのベンチャー企業開拓、パートナーマネージメント、インドでのオフショア開発に従事。

Why Data Driven Companies Rely on WANdisco Fusion

Hadoop is now clearly gaining momentum. We are seeing more and more customers attempting to deploy enterprise grade applications. Data protection, governance, performance and availability are top concerns. WANdisco Fusion’s level of resiliency is enabling customers to move out of the lab and into production much faster.

As companies start to scale these platforms and begin the journey to becoming data driven, they are completely focused on business value and return on investment. WANdisco’s ability to optimize resource utilization by eliminating the need for standby servers resonates well with our partners and customers. These companies are not Google or Facebook. They don’t have an endless supply of hardware and their core business isn’t delivering technology.

As these companies add data from more sources to Hadoop, they are implementing backup and disaster recovery plans and deploying multiple clusters for redundancy. One of our customers, a large bank, is beginning to utilize the cloud for DR.

I’ve met 11 new customers in the past eight days. Five of them have architected cloud into their data lake strategy and are evaluating the players. They are looking to run large data sets in the cloud for efficiency as well as backup and DR.

One of those customers, a leader in IT security, tells me they plan to move their entire infrastructure to the cloud within the next 12 months. They already have 200 nodes in production today, which they expect to double in a year.

Many of our partners are interested in how they can make it easy to onboard data from behind the firewall to the cloud while delivering the best performance. They recognize this is fundamental to a successful cloud strategy.

Companies are already embarking on migrations from one Hadoop platform to another. We’re working with customers on migration from MapR to HDP, CDH to HDP, CDH to Oracle BDA, and because we are HCFS compatible, GPFS to IOP. Some of these are petabyte scale.

For many of these companies, WANdisco Fusion’s ability to eliminate downtime, data loss and business disruption is a prerequisite to making that transition. Migration has never been undertaken lightly. I’ve spoken to partners who are unable to migrate their customers due to the required amount of downtime and risk involved.

One customer I met recently completed a large migration to HDP and just last week acquired a company that has a large cluster on Cloudera. We’re talking to them about how we can easily provide a single consistent view of the data. This will allow them to get immediate value from the data they have just acquired. If they choose to migrate completely, they are in control of the timing.

Customers measure their success by time to value. We’re working closely with our strategic partners to ensure our customers don’t have to worry about the nuts and bolts, irrespective of distributions, on-prem, cloud, or hybrid environment so customers can concentrate on the business outcome.

Please reach out to me if these use cases resonate and you would like to learn more.

Peter Scott
SVP Business Development

avatar

About Mackensie Gibson

Configuring multiple zones in Hadoop

Hortonworks, a WANdisco partner and another member of the Open Data Platform, recently published a list of best practices for Hadoop infrastructure management.  One of the top recommendations is configuring multiple zones in Hadoop.  Having development, test, and production environments gives you a safe way to test upgrades and new applications without disturbing a production system.

One of the challenges with creating multiple similar zones is sharing data between them.  Whether you’re testing backup procedures and application functionality, or prototyping a new data analysis algorithm, you need to see similar data in all the zones.  Otherwise you’re not really testing in a production-like environment.

But in a large cluster transferring terabytes of data around between zones can be time consuming and it’s tough to tell how stale the data really is.  That’s where WANdisco Fusion becomes an essential part of your operational toolkit.  WANdisco Fusion provides active-active data replication between Hadoop clusters.  You can use it to effectively share part of your Hadoop data between dev/test/prod zones in real-time.  All of the zones can make full use of the data, although you can of course use your normal access control system to prevent updates from certain zones.

DevOps principles are coming to Hadoop, so contact one of our solutions architects today to see how WANdisco Fusion can help you maintain multiple zones in your Hadoop deployment.

WANdisco Fusion Q&A with Jagane Sundar, CTO

Tuesday we unveiled our new product: WANdisco Fusion. Ahead of the launch, we caught up with WANdisco CTO Jagane Sundar, who was one of the driving forces behind Fusion.

Jagane joined WANdisco in November 2012 after the firm’s acquisition of AltoStor and has since played a key role in the company’s product development and rollout. Prior to founding AltoStor along with Konstantin Shvachko, Jagane was part of the original team that developed Apache Hadoop at Yahoo!.

Jagane, put simply, what is WANdisco Fusion?

JS: WANdisco Fusion is a wonderful piece of technology that’s built around a strongly consistent transactional replication engine, allowing for the seamless integration of different types of storage for Hadoop applications.

It was designed to help organizations get more out of their Big Data initiatives, answering a number of very real problems facing the business and IT worlds.

And the best part? All of your data centers are active simultaneously: You can read and write in any data center. The result is you don’t have hardware that’s lying idle in your backup or standby data center.

What sort of business problems does it solve?

JS: It provides two new important capabilities for customers. First, it keeps data consistent across different data centers no matter where they are in the world.

And it gives customers the ability to integrate different storage types into a single Hadoop ecosystem. With WANdisco Fusion, it doesn’t matter if you are using Pivotal in one data center, Hortonworks in another and EMC Isilon in a third – you can bring everything into the same environment.

Why would you need to replicate data across different storage systems?

JS: The answer is very simple. Anyone familiar with storage environments knows how diverse they can be. Different types of storage have different strengths depending on the individual application you are running.

However, keeping data synchronized is very difficult if not done right. Fusion removes this challenge while maintaining data consistency.

How does it help future proof a Hadoop deployment?

JS: We believe Fusion will form a critical component of companies’ workflow update procedures. You can update your Hadoop infrastructure one data center at a time, without impacting application availability or by having to copy massive amounts of data once the update is done.

This helps you deal with updates from both Hadoop and application vendors in a carefully orchestrated manner.

Doesn’t storage-level replication work as effectively as Fusion?

JS: The short answer is no. Storage-level replication is subject to latency limitations that are imposed by file systems. The result is you cannot really run storage-level replication over long distances, such as a WAN.

Storage-level replication is nowhere nearly as functional as Fusion: It has to happen at the LAN level and not over a true Wide Area Network.

With Fusion, you have the ability to integrate diverse systems such as NFS with Hadoop, allowing you to exploit the full strengths and capabilities of each individual storage system – I’ve never worked on a project as exciting and as revolutionary as this one.

How did WANdisco Fusion come about?

JS: By getting inside our customers’ data centers and witnessing the challenges they faced. It didn’t take long to notice the diversity of storage environments.

Our customers found that different storage types worked well for different applications – and they liked it that way. They didn’t want strict uniformity across their data centers, but to be able to leverage the strengths of each individual storage type.

At that point we had the idea for a product that would help keep data consistent across different systems.

The result was WANdisco Fusion: a fully replicated transactional engine that makes the work of keeping data consistent trivial. You only have to set it up once and never have to bother with checking if your data is consistent.

This vision of a fully utilized, strongly consistent diverse storage environment for Hadoop is what we had in mind when came up with the Fusion product.

You’ve been working with Hadoop for the last 10 years. Just how disruptive is WANdisco Fusion going to be?

JS: I’ve actually been in the storage industry for more than 15 years now. Over that period I’ve worked with shared storage systems, and I’ve worked with Hadoop storage systems. WANdisco Fusion has the potential to completely revolutionize the way people use their storage infrastructure. Frankly, this is the most exciting project I’ve ever been part of.

As the Hadoop ecosystem evolved I saw the need for this virtual storage system that integrates different types of storage.

Efforts to make Hadoop run across different data centers have been mostly unsuccessful. For the first time, we at WANdisco have a way to keep your data in Hadoop systems consistent across different data centers.

The reason this is so exciting is because it transforms Hadoop into something that runs in multiple data centers across the world.

Suddenly you have capabilities that even the original inventors of Hadoop didn’t really consider when it was conceived. That’s what makes WANdisco Fusion exciting.

The inspiration for WANdisco Fusion

Screen Shot 2015-04-21 at 10.08.22 PM

Roughly two years ago, we sat down to start work on a project that finally came to fruition this week.

At that meeting, we had set ourselves the challenge of redefining the storage landscape. We wanted to map out a world where there was complete shared storage, but where the landscape remained entirely heterogeneous.

Why? Because we’d witnessed the beginnings of a trend that has only grown more pronounced with the passage of time.

From the moment we started engaging with customers, we were struck by the extreme diversity of their storage environments. Regardless of whether we were dealing with a bank, a hospital or utility provider, different types of storage had been introduced across every organization for a variety of use cases.

In time, however, these same companies wanted to start integrating their different silos of data, whether to run real-time analytics or to gain a full 360 perspective of performance. Yet preserving diversity across data center was critical, given that each storage type has its own strengths.

They didn’t care about uniformity. They cared about performance and this meant being able to have the best of both worlds. Being able to deliver this became the Holy Grail – at least in the world of data centers.

This isn’t quite The Gordian Knot but it’s certainly a very difficult, complex problem and possibly one that could only be solved with our core, patented IP DConE.

Then we had a breakthrough.

Months later and I’m proud to formally release WANdisco Fusion (WD Fusion), the only product that enables WAN-scope active-active synchronization of different storage systems into one place.

What does this mean in practice? Well it means that you can use Hadoop distributions like Hortonworks, Cloudera or Pivotal for compute, Oracle BDA for fast compute, EMC Isilon for dense storage. You could even use a complete variety of Hadoop distros and versions. Whatever your set-up, with WD Fusion you can leverage new and existing storage assets immediately.

With it, Hadoop is transformed from being something that runs within a data center into an elastic platform that runs across multiple data centers throughout the world. WD Fusion allows you to update your storage infrastructure one data center at a time, without impacting your application ability or by having to copy vast swathes of data once the update is done.

When we were developing WD Fusion we agreed upon two things. First, we couldn’t produce anything that made changes to the underlying storage system – this had to behave like a client application. Second, anything we created had to enable a complete single global name-space across an entire storage infrastructure.

With WD Fusion, we allow businesses to bring together different storage systems by leveraging our existing intellectual property – the same Paxos-powered algorithm behind Non-Stop Hadoop, Subversion Multisite and Git Multisite – without making any changes to the platform you’re using.

Another way of putting it is we’ve managed to spread our secret sauce even further.

We have some of the best computer scientists in the world working at WANdisco, but I’m confident that this is the most revolutionary project any of us have ever worked on.

I’m delighted to be unveiling WD Fusion. It’s a testament to the talent and character of our firm, the result of looking at an impossible scenario and saying: “Challenge accepted.”

avatar

About David Richards

David is CEO, President and co-founder of WANdisco and has quickly established WANdisco as one of the world’s most promising technology companies.

Since co-founding the company in Silicon Valley in 2005, David has led WANdisco on a course for rapid international expansion, opening offices in the UK, Japan and China. David spearheaded the acquisition of Altostor, which accelerated the development of WANdisco’s first products for the Big Data market. The majority of WANdisco’s core technology is now produced out of the company’s flourishing software development base in David’s hometown of Sheffield, England and in Belfast, Northern Ireland.

David has become recognised as a champion of British technology and entrepreneurship. In 2012, he led WANdisco to a hugely successful listing on London Stock Exchange (WAND:LSE), raising over £24m to drive business growth.

With over 15 years’ executive experience in the software industry, David sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established many successful start-up companies in Enterprise Software and is recognised as an industry leader in Enterprise Application Integration and its standards.

David is a frequent commentator on a range of business and technology issues, appearing regularly on Bloomberg and CNBC. Profiles of David have appeared in a range of leading publications including the Financial Times, The Daily Telegraph and the Daily Mail.

Specialties:IPO’s, Startups, Entrepreneurship, CEO, Visionary, Investor, ceo, board member, advisor, venture capital, offshore development, financing, M&A