Tag Archive for 'Hadoop'

Why Data Driven Companies Rely on WANdisco Fusion

Hadoop is now clearly gaining momentum. We are seeing more and more customers attempting to deploy enterprise grade applications. Data protection, governance, performance and availability are top concerns. WANdisco Fusion’s level of resiliency is enabling customers to move out of the lab and into production much faster.

As companies start to scale these platforms and begin the journey to becoming data driven, they are completely focused on business value and return on investment. WANdisco’s ability to optimize resource utilization by eliminating the need for standby servers resonates well with our partners and customers. These companies are not Google or Facebook. They don’t have an endless supply of hardware and their core business isn’t delivering technology.

As these companies add data from more sources to Hadoop, they are implementing backup and disaster recovery plans and deploying multiple clusters for redundancy. One of our customers, a large bank, is beginning to utilize the cloud for DR.

I’ve met 11 new customers in the past eight days. Five of them have architected cloud into their data lake strategy and are evaluating the players. They are looking to run large data sets in the cloud for efficiency as well as backup and DR.

One of those customers, a leader in IT security, tells me they plan to move their entire infrastructure to the cloud within the next 12 months. They already have 200 nodes in production today, which they expect to double in a year.

Many of our partners are interested in how they can make it easy to onboard data from behind the firewall to the cloud while delivering the best performance. They recognize this is fundamental to a successful cloud strategy.

Companies are already embarking on migrations from one Hadoop platform to another. We’re working with customers on migration from MapR to HDP, CDH to HDP, CDH to Oracle BDA, and because we are HCFS compatible, GPFS to IOP. Some of these are petabyte scale.

For many of these companies, WANdisco Fusion’s ability to eliminate downtime, data loss and business disruption is a prerequisite to making that transition. Migration has never been undertaken lightly. I’ve spoken to partners who are unable to migrate their customers due to the required amount of downtime and risk involved.

One customer I met recently completed a large migration to HDP and just last week acquired a company that has a large cluster on Cloudera. We’re talking to them about how we can easily provide a single consistent view of the data. This will allow them to get immediate value from the data they have just acquired. If they choose to migrate completely, they are in control of the timing.

Customers measure their success by time to value. We’re working closely with our strategic partners to ensure our customers don’t have to worry about the nuts and bolts, irrespective of distributions, on-prem, cloud, or hybrid environment so customers can concentrate on the business outcome.

Please reach out to me if these use cases resonate and you would like to learn more.

Peter Scott
SVP Business Development

avatar

About Mackensie Gibson

WANdisco Fusion Q&A with Jagane Sundar, CTO

Tuesday we unveiled our new product: WANdisco Fusion. Ahead of the launch, we caught up with WANdisco CTO Jagane Sundar, who was one of the driving forces behind Fusion.

Jagane joined WANdisco in November 2012 after the firm’s acquisition of AltoStor and has since played a key role in the company’s product development and rollout. Prior to founding AltoStor along with Konstantin Shvachko, Jagane was part of the original team that developed Apache Hadoop at Yahoo!.

Jagane, put simply, what is WANdisco Fusion?

JS: WANdisco Fusion is a wonderful piece of technology that’s built around a strongly consistent transactional replication engine, allowing for the seamless integration of different types of storage for Hadoop applications.

It was designed to help organizations get more out of their Big Data initiatives, answering a number of very real problems facing the business and IT worlds.

And the best part? All of your data centers are active simultaneously: You can read and write in any data center. The result is you don’t have hardware that’s lying idle in your backup or standby data center.

What sort of business problems does it solve?

JS: It provides two new important capabilities for customers. First, it keeps data consistent across different data centers no matter where they are in the world.

And it gives customers the ability to integrate different storage types into a single Hadoop ecosystem. With WANdisco Fusion, it doesn’t matter if you are using Pivotal in one data center, Hortonworks in another and EMC Isilon in a third – you can bring everything into the same environment.

Why would you need to replicate data across different storage systems?

JS: The answer is very simple. Anyone familiar with storage environments knows how diverse they can be. Different types of storage have different strengths depending on the individual application you are running.

However, keeping data synchronized is very difficult if not done right. Fusion removes this challenge while maintaining data consistency.

How does it help future proof a Hadoop deployment?

JS: We believe Fusion will form a critical component of companies’ workflow update procedures. You can update your Hadoop infrastructure one data center at a time, without impacting application availability or by having to copy massive amounts of data once the update is done.

This helps you deal with updates from both Hadoop and application vendors in a carefully orchestrated manner.

Doesn’t storage-level replication work as effectively as Fusion?

JS: The short answer is no. Storage-level replication is subject to latency limitations that are imposed by file systems. The result is you cannot really run storage-level replication over long distances, such as a WAN.

Storage-level replication is nowhere nearly as functional as Fusion: It has to happen at the LAN level and not over a true Wide Area Network.

With Fusion, you have the ability to integrate diverse systems such as NFS with Hadoop, allowing you to exploit the full strengths and capabilities of each individual storage system – I’ve never worked on a project as exciting and as revolutionary as this one.

How did WANdisco Fusion come about?

JS: By getting inside our customers’ data centers and witnessing the challenges they faced. It didn’t take long to notice the diversity of storage environments.

Our customers found that different storage types worked well for different applications – and they liked it that way. They didn’t want strict uniformity across their data centers, but to be able to leverage the strengths of each individual storage type.

At that point we had the idea for a product that would help keep data consistent across different systems.

The result was WANdisco Fusion: a fully replicated transactional engine that makes the work of keeping data consistent trivial. You only have to set it up once and never have to bother with checking if your data is consistent.

This vision of a fully utilized, strongly consistent diverse storage environment for Hadoop is what we had in mind when came up with the Fusion product.

You’ve been working with Hadoop for the last 10 years. Just how disruptive is WANdisco Fusion going to be?

JS: I’ve actually been in the storage industry for more than 15 years now. Over that period I’ve worked with shared storage systems, and I’ve worked with Hadoop storage systems. WANdisco Fusion has the potential to completely revolutionize the way people use their storage infrastructure. Frankly, this is the most exciting project I’ve ever been part of.

As the Hadoop ecosystem evolved I saw the need for this virtual storage system that integrates different types of storage.

Efforts to make Hadoop run across different data centers have been mostly unsuccessful. For the first time, we at WANdisco have a way to keep your data in Hadoop systems consistent across different data centers.

The reason this is so exciting is because it transforms Hadoop into something that runs in multiple data centers across the world.

Suddenly you have capabilities that even the original inventors of Hadoop didn’t really consider when it was conceived. That’s what makes WANdisco Fusion exciting.

Application Specific Data? It’s So 2013

Looking back at the past 10 years of software the word ‘boring’ comes to mind.  The buzzwords were things like ‘web services’, ‘SOA’.  CIO’s Tape drives 70sloved the promise of these things but they could not deliver.  The idea of build once and reuse everywhere really was the ‘nirvana’.

Well it now seems like we can do all of that stuff.

As I’ve said before Big Data is not a great name because it implies that all we are talking about a big database with tons of data.  Actually that’s only part of the story. Hadoop is the new enterprise applications platform.  The key word there is platform.  If you could have a single general-purpose data store that could service ‘n’ applications then the whole of notion of database design is over.  Think about the new breed of apps on a cell phone, the social media platforms and web search engines.  Most of these do this today.  Storing data in a general purpose, non-specific data store and then used by a wide variety of applications.  The new phrase for this data store is a ‘data lake’ implying a large quantum of every growing and changing data stored without any specific structure

Talking to a variety of CIOs recently they are very excited by the prospect of both amalgamating data so it can be used and also bringing into play data that previously could not be used.  Unstructured data in a wide variety of formats like word documents and PDF files.  This also means the barriers to entry are low.  Many people believe that adopting Hadoop requires a massive re-skilling of the workforce.  It does but not in the way most people think.  Actually getting the data into Hadoop is the easy bit (‘data ingestion‘ is the new buzz-word).  It’s not like the old relational database days where you first had to model the data using data normalization techniques and then use ETL to make the data in usable format.  With a data lake you simply set up a server cluster and load the data. Creating a data model and using ETL is simply not required.

The real transformation and re-skilling is in application development.  Applications are moving to data – today in a client-server world it’s the other way around.  We have seen this type of reskilling before like moving from Cobol to object oriented programming.

In the same way that client-server technology disrupted  mainframe computer systems, big data will disrupt client-server.  We’re already seeing this in the market today.  It’s no surprise that the most successful companies in the world today (Google, Amazon, Facebook, etc.) are all actually big data companies.  This isn’t a ‘might be’ it’s already happened.

avatar

About David Richards

David is CEO, President and co-founder of WANdisco and has quickly established WANdisco as one of the world’s most promising technology companies. Since co-founding the company in Silicon Valley in 2005, David has led WANdisco on a course for rapid international expansion, opening offices in the UK, Japan and China. David spearheaded the acquisition of Altostor, which accelerated the development of WANdisco’s first products for the Big Data market. The majority of WANdisco’s core technology is now produced out of the company’s flourishing software development base in David’s hometown of Sheffield, England and in Belfast, Northern Ireland. David has become recognised as a champion of British technology and entrepreneurship. In 2012, he led WANdisco to a hugely successful listing on London Stock Exchange (WAND:LSE), raising over £24m to drive business growth. With over 15 years' executive experience in the software industry, David sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established many successful start-up companies in Enterprise Software and is recognised as an industry leader in Enterprise Application Integration and its standards. David is a frequent commentator on a range of business and technology issues, appearing regularly on Bloomberg and CNBC. Profiles of David have appeared in a range of leading publications including the Financial Times, The Daily Telegraph and the Daily Mail. Specialties:IPO's, Startups, Entrepreneurship, CEO, Visionary, Investor, ceo, board member, advisor, venture capital, offshore development, financing, M&A

A View From Strata NY: Big Data is Getting Bigger

In general a trade show is a dangerous place to gauge sentiment.  Full of marketing & sales, backslapping & handshakes and marketecture rather than architecture the world is indeed viewed through rose-tinted-spectacles. Strata, the Hadoop Big Data conference in New York last week was very interesting albeit through my rose-tinted-spectacles.

Firstly, the sheer volume of people, over 3,500 is telling.  This show used to be a few hundred, primarily techies inventing the future.  The show is now bigger, much bigger.  A cursory glance at the exhibit hall revealed a mix of the biggest tech companies and hot start-ups.  The keynotes, to the disappointment of those original techies, were primarily press-driven product releases lacking real technical substance.  This is not such a bad thing though. It’s a sign that Hadoop is coming of age. It’s what happens when technology moves into the main stream.

Second, the agenda has changed quite dramatically.  Companies looking to deploy Hadoop are no longer trying to figure out how it might fit into their data centers. They are trying to figure out how to deploy it.  2014 will indeed be the end of trials and the beginning of full-scale enterprise roll-out.  The use-cases are all over the place.  Analysts yearn for clues and clusters to explain this “Are you seeing mainly telco’s or financial services?”  Analysts of course must try to enumerate in order to explain but the wave and shift is seismic and the only explanation is a fundamental shift in the very nature of enterprise applications.

My third theme is the discussion around why Hadoop is driving this move to rewrite enterprise applications.  As someone at the show told me, “the average age of enterprise application is 19 years”.  Hence,this is part of a classic business cycle.  Hadoop is a major technological shift that takes advantage of dramatic changes in the capabilities and economics of hardware.  Expensive spinning hard-disk, processing speeds, bandwidth, networks, etc. were limitations and hence assumptions that the last generation of enterprise applications had to deal with.  Commodity hardware and massive in-memory processing are the new assumptions that Hadoop takes advantage of.  In a few years we will not be talking about ‘Big Data’ we will simply use the term ‘Data’ because it will no longer be unusual for it to be so large in relative terms.

My fourth observation was that Hadoop 2 has changed the agenda for the type of use case.  In very rough terms Hadoop 1 was primarily about wall ststorage and batch processing.  Hadoop 2 is about yarn and run-time applications. In other words processing can now take place on top of Hadoop rather than storing in Hadoop but processing somewhere else.  This change is highly disruptive because it means that software vendors cannot rely on customers to use their products in conjunction with Hadoop.  Rather, they are talking about building on top of Hadoop.  To them Hadoop is a new type of operating system.  This disruption is very good news for the new brand of companies that are building pure applications built from the ground up and really bad news for those who believe that they can mildly integrate or even store data in 2 places. That’s not going to happen. Some of the traditional companies had a token presence at Strata that suggests they are still unsure of exactly what they are going to do – they are neither fully embracing or ignoring this new trend.

My final observation is about confusion.  There’s a lot of money at stake here so naturally everyone wants a piece of the action.  There’s a lot of flashing lights and noise from vendors, lavish claims and a lack of substance.  Forking core open source is nearly always a disaster. As open-source guru Karl Fogel says ‘forks happen due to irreconcilable disagreements, technical disagreements or interpersonal conflicts and is something developers should be afraid of and try to avoid it in any way’.  It creates natural barriers to use tertiary products and with an open source project moving as quickly as this, one has to stay super-close to the de facto open source project.

A forked version of core Hadoop is not Hadoop, it’s something else.  If customers go down a forked path it’s difficult to get back and they will lose competitive edge because they will be unable to use the community of products being built as part of the wider community.  Customers should think of Hadoop like an operating system or database.  If it’s merely embedded and heavily modified then this is not Hadoop.

So 2014 it is then.  As the Wall St Journal put it the Elephant in the Room to Weigh on Growth for Oracle, Teradata

Here’s a great video demo of the new @WANdisco continuous availability technology running on Hortonworks Hadoop 2.2 Distro

 

avatar

About David Richards

David is CEO, President and co-founder of WANdisco and has quickly established WANdisco as one of the world’s most promising technology companies. Since co-founding the company in Silicon Valley in 2005, David has led WANdisco on a course for rapid international expansion, opening offices in the UK, Japan and China. David spearheaded the acquisition of Altostor, which accelerated the development of WANdisco’s first products for the Big Data market. The majority of WANdisco’s core technology is now produced out of the company’s flourishing software development base in David’s hometown of Sheffield, England and in Belfast, Northern Ireland. David has become recognised as a champion of British technology and entrepreneurship. In 2012, he led WANdisco to a hugely successful listing on London Stock Exchange (WAND:LSE), raising over £24m to drive business growth. With over 15 years' executive experience in the software industry, David sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established many successful start-up companies in Enterprise Software and is recognised as an industry leader in Enterprise Application Integration and its standards. David is a frequent commentator on a range of business and technology issues, appearing regularly on Bloomberg and CNBC. Profiles of David have appeared in a range of leading publications including the Financial Times, The Daily Telegraph and the Daily Mail. Specialties:IPO's, Startups, Entrepreneurship, CEO, Visionary, Investor, ceo, board member, advisor, venture capital, offshore development, financing, M&A

WANdisco Releases New Version of Hadoop Distro

We’re proud to announce the release of WANdisco Distro (WDD) version 3.1.1.

WDD is a fully tested, production-ready version of Apache Hadoop 2 that’s free to download. WDD version 3.1.1 includes an enhanced, more intuitive user interface that simplifies Hadoop cluster deployment. WDD 3.1.1 supports SUSE Linux Enterprise Server 11 (Service Pack 2), in addition to RedHat and CentOS.

“The number of Hadoop deployments is growing quickly and the Big Data market is moving fast,” said Naji Almahmoud, senior director of global business development, SUSE, a WANdisco Non-Stop Alliance partner. “For decades, SUSE has delivered reliable Linux solutions that have been helping global organizations meet performance and scalability requirements. We’re pleased to work closely with WANdisco to support our mutual customers and bring Hadoop to the enterprise.”

All WDD components are tested and certified using the Apache BigTop framework, and we’ve worked closely with both the open source community and leading big data vendors to ensure seamless interoperability across the Hadoop ecosystem.

“The integration of Hadoop into the mainstream enterprise environment is increasing, and continual communication with our customers confirms their requirements – ease of deployment and management as well as support for market leading operating systems,” said David Richards, CEO of WANdisco. “With this release, we’re delivering on those requirements with a thoroughly tested and certified release of WDD.”

WDD 3.1.1 can be downloaded for free now. WANdisco also offers Professional Support for Apache Hadoop.

Free Webinar: Enterprise-Enabling Hadoop for the Data Center

We’re pleased to announce that WANdisco will be co-hosting a free Apache Hadoop webinar with Tony Baer, Ovum’s lead Big Data analyst. Ovum is an independent analyst and consultancy firm specializing in the IT and telecommunications industries.

This webinar, ‘Big Data – Enterprise-Enabling Hadoop for the Data Center’, will cover the key issues of availability, performance and scalability and how Apache Hadoop is evolving to meet these requirements.

“This webinar will discuss the importance of availability, performance and scalability,” said Ovum’s Tony Baer. “Ovum believes that for Hadoop to become successfully adopted in the enterprise, that it must become a first class citizen with IT and the data center. Availability, performance and scalability are key issues, and also where there is significant innovation occurring. We’ll discuss how the Hadoop platform is evolving to meet these requirements.”

Topics include:

  • How Hadoop is becoming a first class, enterprise-hardened technology for the data center
  • Hadoop components and the role of reliability and performance in those components

  • Disaster recovery challenges faced by globally distributed organizations and how replication technology is crucial to business continuity

  • The importance of seamless Hadoop migration from the public cloud to private clouds, especially for organizations that require secure 24/7 access with real-time performance

Big Data – Enterprise-Enabling Hadoop for the Data Center’ will be held on Tuesday, April 30th at 10:00 am Pacific / 1:00 pm Eastern. Register for this free webinar here.

WANdisco’s March Roundup

Following the recent issuance of our “Distributed computing systems and system components thereof” patent, which cover the fundamentals of active-active replication over a Wide Area Network, we’re excited to announce the filing of three more patents. These patents involve methods, devices and systems that enhance security, reliability, flexibility and efficiency in the field of distributed computing and will have significant benefits for users of our Hadoop Big Data product line.

“Our team continues to break new ground in the field of distributed computing technology,” said David Richards, CEO for WANdisco. “We are proud to have some of the world’s most talented engineers in this field working for us and look forward to the eventual approval of these most recent patent applications. We are particularly excited about their application in our new Big Data product line.”

Our Big Data product line includes Non-Stop NameNode, WANdisco Hadoop Console and WANdisco Distro (WDD.)

This month, we also welcomed Bas Nijjer, who built CollabNet UK from startup to multimillion dollar recurring revenue, to the WANdisco team. Bas Nijjer has a proven track record of increasing customer wins, accelerating revenue and providing customer satisfaction, and he takes on the role of WANdisco Sales Director, EMEA.

“Bas is an excellent addition to our team, with great insight on developing and strengthening sales teams and customer relationships as well as enterprise software,” said David Richards. “His expertise and familiarity with EMEA and his results-oriented attitude will help strengthen the WANdisco team and increase sales and renewals. We are pleased to have him join us.”

If joining the WANdisco team interests you, visit our Careers page for all the latest employment opportunities.

We’ve also posted lots of new content at the WANdisco blog. Users of SmartSVN, our cross-platform graphical Subversion client, can find out how to get even more out of their installation with our ‘Performing a Reverse Merge in SmartSVN’ and ‘Backing Up Your SmartSVN Data’ tutorials. For users running the latest and greatest, 7.5.4 release of SmartSVN, we’ve put together a deep dive into the fixes and new functionality in this release with our ‘What’s New in SmartSVN 7.5.4?’ post. If you haven’t tried SmartSVN yet, you can claim your free trial of this release by visiting http://smartsvn.com/download

We also have a new post from James Creasy, WANdisco’s Senior Director of Product Management, where he takes a closer look at the “WAN” in “WANdisco:”

“We’ve all heard about the globalization of the world economy. Every globally relevant company is now highly dependent on highly available software, and that software needs to be equally global. However, most systems that these companies rely on were architected with a single machine in mind. These machines were accessed over a LAN (local area network) by mostly co-located teams.

All that changed, starting in the 1990’s with widespread adoption of outsourcing. The WAN computing revolution had begun in earnest.”

You can read “What’s in a name, WANdisco?” in full now.

Also at the blog we address the hot topic of ‘Is Subversion Ready for the Enterprise?’ And, if you need more information on the challenges and available solutions for deploying Subversion in an enterprise environment, be sure to sign up for our free-to-attend ‘Scaling Subversion for the Enterprise’ sessions. Taking place a few times a week, these webinars cover limitations and risks related to globally distributed SVN deployments, as well as free resources and live demos to help you overcome them. Take advantage of the opportunity to get answers to your business-specific questions and live demos of enterprise-class SVN products.

WANdisco Files Three New Patents with USPTO

We are pleased to announce the filing of three new patents with the United States Patent and Trademark Office (USPTO) related to distributed computing.

These three innovations involve methods, devices and systems that enhance security, reliability, flexibility and efficiency in the field of distributed computing. The patents are expected to have significant benefits for users of our new Hadoop Big Data product line.

Our team continues to break new ground in the field of distributed computing technology,” said David Richards, CEO for WANdisco. “We are proud to have some of the world’s most talented engineers in this field working for us and look forward to the eventual approval of these most recent patent applications. We are particularly excited about their application in our new Big Data product line.”

Our Big Data product line includes Non-Stop NameNode, which turns the NameNode into an active-active shared-nothing cluster, and the comprehensive wizard-driven management dashboard ‘WANdisco Hadoop Console.’ We also offer a free-to-download, fully-tested and production-ready version of Apache Hadoop 2. Visit the WANdisco Distro (WDD) to learn more.

This news comes after we announced the issuance of our “Distributed computing systems and system components thereof” patent, which covers the fundamentals of active-active replication over a Wide Area Network.

 

Continuous Availability versus High Availability

Wikipedia’s page on Continuous Availability is available here:

http://en.wikipedia.org/wiki/Continuous_availability

A quick perusal tells us that High Availability can be ‘accomplished by providing redundancy or quickly restarting failed components’. This is very different from ‘Continuously Available’ systems that enable continuous operation through planned and unplanned outages of one or more components.

As large global organizations move from using Hadoop for batch storage and retrieval to mission critical real-time applications where the cost of even one minute of downtime is unacceptable, mere high availability will not be enough.

Solutions such as HDFS NameNode High Availability (NameNode HA) that come with Apache Hadoop 2.0 and Hadoop distributions based on it are subject to downtimes of 5 to 15 minutes.  In addition, NameNode HA, is limited to a single data center, and only one NameNode can be active at a time, creating a performance as well as an availability bottleneck. Deployments that incorporate WANdisco Non-Stop Hadoop are not subject to any downtime, regardless of whether a single NameNode server or an entire data center goes offline. There is no need for maintenance windows with Non-Stop Hadoop, since you can simply bring down the NameNode servers one at a time, and perform your maintenance operations.  The remaining active NameNodes continue to support real-time client applications as well as batch jobs.

The business advantages of a Continuously Available, multi-data center aware systems are well known to IT decision makers. Here are some examples that illustrate how both real-time and batch applications can benefit and new use cases can be supported:

  • A Batch Big Data DAG is a chain of applications wherein the output of a preceding job is used as the input to a subsequent job. At companies such as Yahoo, these DAGs take six to eight hours to run, and they are run every day. Fifteen minutes of NameNode downtime may cause one of these jobs to fail. As a result of this single failure, the entire DAG may not run to completion, creating delays that can last many hours.
  • Global clickstream analysis applications that enable businesses to see and respond to customer behavior or detect potentially fraudulent activity in real-time.
  • A web site or service built to use HBase as a backing store will be down if the HDFS underlying HBase goes down when the NameNode fails. This is likely to result in lost revenue and erode customer goodwill.  Non-Stop Hadoop eliminates this risk.
  • Continuous Availability systems such as  WANdisco Non-Stop Hadoop are administered with  fewer staff. This is because failure of one out of five NameNodes is not an emergency event. It can be dealt with by staff during regular business hours. Significant cost savings in staffing can be achieved since Continuously Available systems do not require 24×7 sysadmin staff .  In addition, in a distributed multi-data center environment, Non-Stop Hadoop can be managed from one location.
  • There are no passive or standby servers or data centers that essentially sit idle until disaster strikes.  All servers are active and provide full read and write access to the same data at every location.

See a demo of Non-Stop Hadoop for Cloudera and Non-Stop Hadoop for Hortonworks in action and read what leading industry analysts like Gartner’s Merv Adrian have to say about the need for continuous Hadoop availability.

 

avatar

About Jagane Sundar

WANdisco Announces Free Online Hadoop Training Webinars

We’re excited to announce a series of free one-hour online Hadoop training webinars, starting with four sessions in March and April. Time will be allowed for audience Q&A at the end of each session.

Wednesday, March 13 at 10:00 AM Pacific, 1:00 PM Eastern

A Hadoop Overview” will cover Hadoop, from its history to its architecture as well as:

  • HDFS, MapReduce, and HBase
  • Public and private cloud deployment options
  • Highlights of common business use cases and more

March 27, 10:00 AM Pacific, 1:00 pm Eastern

Hadoop: A Deep Dive” covers Hadoop misconceptions (not all clusters include thousands of machines) and:

  • Real world Hadoop deployments
  • Review of major Hadoop ecosystem components including: Oozie, Flume, Nutch, Sqoop and others
  • In-depth look at HDFS and more

April 10, 10:00 AM Pacific, 1:00 pm Eastern

Hadoop: A MapReduce Tutorial” will cover MapReduce at a deep technical level and will highlight:

  • The history of MapReduce
  • Logical flow of MapReduce
  • Rules and types of MapReduce jobs
  • De-bugging and testing
  • How to write foolproof MapReduce jobs

April 24, 10:00 AM Pacific, 1:00 pm Eastern

Hadoop: HBase In-Depth” will provide a deep technical review of HBase and cover:

  • Its flexibility, scalability and components
  • Schema samples
  • Hardware requirements and more

Space is limited so click here to register right away!