Monthly Archive for June, 2012

A Real Time Sentiment Analysis Application using Hadoop and HBase in the Cloud

Download TwitterSampler.java

Download the html files

I did a talk about a Real Time Sentiment Analysis Application at the Hadoop Summit 2012.

Here are the slides from this presentation:

http://www.slideshare.net/Hadoop_Summit/realtime-sentiment-analysis-app-using-hadoop-and-h-base

This is an application that evaluates the sentiment of twitter users towards a small number of pre-determined keywords, stores them in HBase, and displays a graph of sentiment versus time. Users can scroll back and forth in time to view how the sentiment tracked over time.

Download the java files and the html files for this project from the links at the top of this post.

There are three parts to this program.

  1. Using the twitter API to get a stream of tweets (public status updates)
  2. Doing sentiment analysis on the tweets and storing them in HBase
  3. Using a javascript program running in the browser to call back into HBase using the REST gateway, and plotting the output

I am co-founder of AltoStor – we develop software that turns HBase and Hadoop into a metered, billed service in the public cloud or in Enterprise VMware. I used our HBase Workbench to develop this entire project. The code itself will run on stock Hadoop 1.0.x with HBase 0.92.x – you don’t need the Workbench. The Workbench is simply the easiest way for you to get started developing Big Data applications.

Good luck playing with these technologies

— Jagane

avatar

About Jagane Sundar

WANdisco’s June Roundup

Happy summer! As well as enjoying the nice weather and longer days, this month we announced an exciting update for the uberSVN community.

uberSVN keeps going from strength to strength and, with an ever-growing community of users, we need your feedback more than ever to ensure we continue to deliver the features and functionality you need. Have some thoughts on the uberSVN user interface? Head over to SVNForum.org now to read our proposals for a redesigned Users/Teams tab and a redesigned Admin tab. If you’re a registered SVNForum.org user you can join the discussion by posting at the relevant thread (if you’re not already registered, then signing up is quick and easy!) Alternatively, send us your feedback directly.

We also announced a dedicated channel for keeping in touch with the uberSVN community, the Latest Release Channel. Everyone signed up to the Latest Release Channel will get a sneak preview of upcoming releases at least a few weeks before the rest of the uberSVN user base. This gives you the opportunity to test new features and see how they fit into your ALM environment before the update becomes widely available.

Members of the Latest Release Channel already have access to an update to uberSVN Chimney House. This update features a list of improvements and new functionality, including:

  • Improvements to uberSVN APIs and internal development of uberSVN SDK.
  • Further improvements to the way uberSVN handles LDAP and LDAPS.
  • New manageAPPS page allows you to see metadata attached to your APP license.
  • The latest Apache Subversion 1.7.5 binaries set to active by default.
  • …..and more!

Not on the Latest Release Channel? Expect to see the uberSVN Chimney House update in the Stable Release Channel within the next two weeks. We’ve had a great uptake on the Latest Release Channel so far – if you’re an uberSVN user who wants to make your voice heard, then head over to our blog announcement to find out more.

But it hasn’t all been about uberSVN, we’re pleased to announce that registration for Subversion Live 2012 will be opening shortly!

After getting a great response from the Apache Subversion community in 2011, this year’s conference series is bigger and better than ever, with events taking place in San Francisco (October 10th & 11th) Greenwich, Connecticut: (October 16th & 17th) and London (October 23rd & 24th).

This year’s sessions will include:

  • What’s coming in 1.8
  • Merge & Performance Improvements
  • Hook Scripts
  • Branching & Merging Best Practices

We look forward to meeting up with the Apache Subversion community later this year! In the meantime, be sure to follow @WANdisco and @uberSVN for all the latest conference news.

If you can’t wait until the conference for your Subversion training, we’ve just announced another set of free SVN training webinars for the Apache Subversion community. After receiving lots of feedback, we’ve added another one hour course on branching and merging, alongside plenty of other webinar goodness:

Finally, fancy winning an iPad or a Kindle Fire? You may remember we announced the 2012 Worldwide Developer Survey last month. The survey will help us to build a picture of the developer’s perspective on software development, and trends in the software change and configuration control management tools market. We already announced that, at the end of the Worldwide Developer Survey we’ll make the results available to everyone who took part but we’ve been so pleased with the response that we’re also entering all respondents into a prize draw. The top prize will be an iPad, with two lucky runners-up receiving a Kindle Fire. If you haven’t completed the survey, make sure you send us your answers before 13th July 2012 to be in with a chance of winning an iPad or Kindle Fire.

Good luck!

Subversion Tip of the Week

Getting More out of ‘Check for Modifications’

When modifying a working copy with TortoiseSVN, ’Check for Modifications’ is a useful function for pinpointing exactly which files you’ve changed, and which files have been changed and committed by others. But did you know that you can perform other useful actions, from the ‘Check for Modifications’ screen?

  • You first need to access the ‘Check for Modifications’ screen. Start by selecting the ‘Check for Modifications…’ option from the TortoiseSVN menu.

  • TortoiseSVN will then bring up a dialog displaying every file that has been changed in your working copy, with colour coding to highlight the status.

You can perform several other operations from inside this dialog.

1) Recover deleted items – If you have deleted a file by accident, you can recover it by right-clicking on the file and selecting the ‘Revert’ option.

You will be asked to confirm the revert. Click ‘Revert’ to re-add the deleted file to your working copy.

2) Delete unversioned/ignored files – to send these files to the recycle bin, right-click on the file and select ‘Delete.’

You will then be asked to confirm the delete.

Tip: If you wish to permanently delete the file, you can hold the ‘Shift’ key while selecting ‘Delete’ to bypass the recycle bin. This will bring up a slightly different confirmation message.

3) Examine a file in detail – if you need more information on the contents of a file, you can right-click on the file in question and drag it into another application, for example a text file or an IDE. This will automatically display the contents of the file.

Our Support Engineers are the Sherpas of Source Control Management! Just as traditional Sherpas use their deep knowledge of local terrain to assist mountain climbers in reaching the highest peaks and avoiding pitfalls along the way, WANdisco’s Subversion Sherpas use their extensive experience to guide customers away from problems and enable them to get the most out of Subversion. Find out how our Sherpas can help you!

Deleting Branches in Subversion

In Apache Subversion, it’s easy to create new branches to the point where they become confusing. To simplify things, branches can be typically divided into two categories:

Permanent – you may want to store certain folders or projects in permanent branches, e.g copies of released code stored in tagged branches.

Temporary – can be used to test new technology or experiment with new features. Temporary branches should have a defined lifetime, and at the end of that lifetime they should be deleted. It is good practice to label a branch as temporary in the accompanying log message, and to note who is responsible for deleting that branch.

To delete a branch, simply:

  • Select the branch you wish to delete.
  • Right-click and select the ‘delete’ command.

There are several reasons why you might delete a branch:

1) House-keeping – regularly deleting branches helps to reduce the clutter in the branches directory, and to avoid confusion for anyone browsing the repository, who might expect to find ongoing development in abandoned branches. When all abandoned branches are routinely deleted, a glance at the branches directory can tell you which branches are still active.
2) Following a merge – when you’ve finished all the work in a branch and merged the changes back to the trunk, the branch becomes completely redundant and can be deleted.
3) Following reintegration – the ‘–reintegrate’ command option allows Subversion to merge from a branch to the trunk, by replicating only the changes that are unique to the branch. Subversion achieves this by comparing the latest trunk with the latest branch, and applying the resulting difference to the trunk. This is useful in a number of situations, including when all the changes made within the trunk have been ported to the branch, so the only difference is the branch changes. A ‘–reintegrate’ merge uses Subversion’s merge-tracking features to calculate the correct revision ranges to use, and checks to ensure the branch is truly up-to-date with the latest changes in the trunk. These checks ensure the reintegration will not overwrite work other team members have committed to the trunk.

Once a ‘–reintegration’ merge has been performed, the branch shouldn’t be used for development, as any future reintegration will be interpreted as a trunk change by Subversion’s merge tracking, and it will attempt to merge the branch-to-trunk merge back into the branch. The reintegrated branch should therefore be deleted, and if you wish to continue working on the branch, you should re-create it from the trunk.

Recovering Deleted Branches

Deleting branches in Subversion does not completely remove them from the repository; it just removes them from the HEAD revision. You can always go back to an earlier revision to view and recapture the branch, or the files that were in that branch. Useful commands for revisiting deleted branches, include:

  • svn log –verbose – run this in the directory that used to contain the deleted item. –verbose displays a list of all the changed items in each revision, allowing you to locate the exact revision where you deleted the desired file or directory.
  • svn merge -r – can be used to roll back a change that has already been committed. ‘svn merge -r’ merges the changes from your current revision back to a revision with the changes you wish to revert to. Rolling back a change is like any other svn merge operation, so ‘svn status’ and ‘svn diff’ should be employed to approve the changes. Since the merge happens at the local working copy level, you need to use ‘svn commit’ to send the final version to the repository.
  • svn checkout –revision (number) – checks out a particular deleted branch.
  • svn switch – updates the working copy to a different URL, typically one that shares a common ancestor with the current working copy.
  • svn revert filename – rolls back local changes, on the local working copy only.

Alternatives to Deleting

If you do not wish to delete branches, there are some alternatives commonly employed by developers:

  • create an ‘inactive’ folder and move your unneeded branches to that folder.
  • rename the branches to show they are inactive.

Award-Winning uberSVN Platform Hits New Milestone

uberSVN may have just come out of beta, but already the award winning, open ALM platform for Apache Subversion has hit another milestone: uberSVN registration keys hit the 20,000 mark yesterday! (That’s not to mention the total number of downloads, which surpassed 35,000 a while back!)

It’s an exciting time to be an uberSVN user – and we have plenty more planned for the ever-growing uberSVN community. We’ll be announcing an update to uberSVN Chimney House soon (don’t forget to sign up for the Latest Release Channel if you want to get this update early!) We’re also in the process of completely overhauling the existing uberSVN interface. We’re currently redesigning the Administration and Users/Team tabs, along with our social coding and Backup/Restore features. This is the first phase in our roadmap for UI improvements. Now is the time to shout if there’s something bugging you!

Longer term we’re planning to create an SDK so you can customize uberSVN however you want, and we’ll have some very special rewards for contributors to the uberSVN experience.

Thank you to the community for helping to make uberSVN such a success! Be sure to follow @uberSVN to keep up to date with all the latest news and updates.

Top Ten Reasons to try TortoiseSVN

So, you’ve just installed Apache Subversion, and now you’re wondering whether you should run it from the command line, or download a Subversion client. As a successful and established open source project, Subversion has a vibrant ecosystem of tools, and there’s no shortage of clients out there, if you decide a client is the way to go. For Windows users, TortoiseSVN is a user-friendly, easy-to-use Subversion client that can be downloaded and used for free; simply visit the WANdisco website to download the latest version. Not convinced? Here’s our top ten reasons why Windows users should give TortoiseSVN a go.

1. It’s Established – TortoiseSVN saw its first public release (version 0.4) way back in 2003, in a release that was linked with the 0.17 release of Subversion. Since then, TortoiseSVN has gone from strength to strength, and is today maintained by a global community of contributors.

2. Integrates with Windows shell – TortoiseSVN seamlessly integrates with Windows’ file explorer, giving Windows users the ability to run TortoiseSVN commands through a tool they are already familiar with.

With TortoiseSVN, there is no need to learn how to use the Windows command line, giving you access to code-free Subversion.

3. Context sensitive menu – TortoiseSVN automatically registers your current location, and only populates its menu with relevant commands. You will not find any options in the TortoiseSVN menu that you cannot use in your current situation.

4. The freedom to choose – TortoiseSVN isn’t designed with a particular IDE in mind, so you can use it with whatever development tools best suit your project.

5. Status of your files at-a-glance – TortoiseSVN shows you the status of your files at-a-glance, with a range of handy icon overlays.

6. It’s Open Source – Cost is one of the most commonly-cited reasons for adopting open source solutions such as TortoiseSVN, but there are many more benefits to using open source in your project. Open source projects are usually collaborative efforts between many developers, so users reap the benefit of potentially thousands of developers, all with their own particular skills and areas of expertise. Another benefit of this culture of collaboration, is the transparent, archived communication you can find on an open source project’s mailing lists and forums. This communication can be an invaluable (and free!) source of information for TortoiseSVN users and, if you can’t find the answers you were looking for, you can always ask the community directly, through these channels. With all these benefits, it’s easy to see why open source is gaining popularity – even in the enterprise!

7. Professional support options – There’s no doubt that open source solutions have a lot to offer, but one of the major concerns many organizations have, is the level of support available for open source solutions in enterprise development. Forums and mailing lists aren’t always the ideal place to go for advice on enterprise-level concerns. Thankfully, as a long-established open source solution, there are professional support options available for TortoiseSVN. WANdisco offer enterprise-class support for TortoiseSVN, as part of our Subversion professional support offerings. Stefan Küng, the TortoiseSVN project’s lead developer since 2003, heads our dedicated TortoiseSVN support team, bringing TortoiseSVN users phone and email support, feature enhancements, upgrades, named support contacts, and more.

8. TortoiseMerge – TortoiseSVN comes integrated with a number of additional tools that are designed to make TortoiseSVN even more user-friendly. This includes ‘TortoiseMerge,’ a diff / merge tool that makes it easy to pinpoint the differences in text files, merge those changes, and review and apply unified diff files.

9. TortoiseBlame – Sometimes, you just need someone to blame! The TortoiseBlame tool shows who is responsible for making specific changes, to specific files. Alongside the name of the person responsible for the change, TortoiseBlame displays the log message for each commit when you hover over the relevant line.

10. TortoiseIDiff – Files under version control take many shapes and forms; sometimes, they’re not even text-based. TortoiseSVN has the ‘TortoiseIDiff’ tool, which is specifically designed for comparing image files. TortoiseIDiff can display two images side-by-side, or display images blended over one another.

Ready to get started with TortoiseSVN? The latest version of TortoiseSVN can be downloaded from WANdisco http://www.wandisco.com/subversion/download#tortoise

Extra help with your TortoiseSVN implementation, is available through WANdisco’s enterprise-class support for TortoiseSVN, which includes 24 x 7 support, names support contacts and online case tracking.

Take Part in the 2012 Worldwide Dev Survey & Win an iPad

Have you taken the 2012 Worldwide Developer Survey yet? This survey will help us to build a picture of the developer’s perspective on software development, and trends in the software change and configuration control management tools market.

We already announced that, at the end of the Worldwide Developer Survey we’ll make the results available to everyone who took part, and have so far received some great data about the way you’re using software configuration control management solutions.

We’ve been so pleased with the response that in addition to sharing our data with you, we’re going to enter all respondents into a prize draw. The top prize will be an iPad, with two lucky runners-up receiving a Kindle Fire.

If you’ve already taken part in the survey, your name has automatically been entered into our free prize draw. If you haven’t completed the survey, make sure you send us your answers before 13th July 2012 to be in with a chance of winning an iPad or Kindle Fire. Good luck!

Subversion Tip of the Week

Checking for Modifications with TortoiseSVN

When modifying a working copy with TortoiseSVN, it can be useful to pinpoint exactly which files you have changed, and which files have been changed and committed by others, before you perform your commit. TortoiseSVN has a ‘Check for Modifications’ function especially for this.

  • Select the ‘Check for Modifications…’ option from the TortoiseSVN menu.

  • TortoiseSVN will bring up a dialog displaying every file that has been changed in your working copy, with colour coding to highlight the status:

1) Blue – files modified locally.
2) Purple – files that have been added to your working copy.
3) Dark red – files that have been deleted.

In this example, the ‘Admin Guide’ and ‘Wiki’ files have been modified, the ‘Logo’ file has been added, and the ‘New Bitmap Image’ file has been deleted.

From this dialog, you can perform several other operations:

1) Display local changes – right-click on the file you wish to examine, and select ‘Compare with base’ from the drop-down menu.

This will bring up a TortoiseMerge dialog displaying all the changes that have been made locally.

2) Check for changes in the repository – It is also possible to view changes that other team members have made in the repository. Simply right-click on the file you wish to examine and select ‘Show differences as unified diff.’

The TortoiseUDiff dialog will then display all the changes that have been committed to the repository, since you last performed an ‘SVN Update.’

Our Support Engineers are the Sherpas of Source Control Management! Just as traditional Sherpas use their deep knowledge of local terrain to assist mountain climbers in reaching the highest peaks and avoiding pitfalls along the way, WANdisco’s Subversion Sherpas use their extensive experience to guide customers away from problems and enable them to get the most out of Subversion. Find out how our Sherpas can help you!

Subversion Tip of the Week

Examining the History of Log Messages

At some point during the development process, it may be useful to see the entire history of your project’s log messages. This can be viewed using the ‘svn log’ command, followed by the path of your working copy:

svn log (PATH)

In this example, that’s:

If this is too much information, you can filter the log messages being displayed. To view the history of a particular file, use that file’s path:

It is also possible to see the log message from a particular revision, by adding the ‘–revision’ switch. To view the log message for the second revision, we would use:

Finally, you can execute the ‘svn log’ command with the –xml switch, to see the log message output as an xml file.

svn log –xml (PATH)

In this example, this looks like:

Our Support Engineers are the Sherpas of Source Control Management! Just as traditional Sherpas use their deep knowledge of local terrain to assist mountain climbers in reaching the highest peaks and avoiding pitfalls along the way, WANdisco’s Subversion Sherpas use their extensive experience to guide customers away from problems and enable them to get the most out of Subversion. Find out how our Sherpas can help you!

We did it!

WANdisco IPO Market Open LSE

A week ago today (June 1st 2012) I was given the great honor of opening-up the London Stock Exchange on the same day that WANdisco officially listed on the exchange. It was the proudest day of my business life. It was pretty difficult not to get emotional at the moment that we ‘rang the bell’. The IPO was a highly significant event in what so far has been an incredible story.

WANdisco is no ordinary Silicon Valley start-up. This isn’t a ‘fat and lazy’ venture-backed-to-the-hilt business where the greatest personal risk one takes is whether to wear brown or grey slacks!

Our story starts with some world-beating technology invented in the garage of Dr. Yeturu Aahlad but that is the beginning and end of any similar Silicon Valley story you may have heard because the rest is pure blood, sweat and tears. The company started in late 2005 in Naeem’s apartment in Fremont, California. Three months later we closed our first big deal and that enabled us to move to proper offices just up the road in Pleasanton. It would have been very easy for us to take venture capital at that point.

Venture Capital is great for some people. It means you get a regular pay-check. You can go home and not worry about kids school fees, going out for dinner, buying a new car, the list goes on and on. We said “no”. And we said “no” because we believed that this company could be rather special. The interesting mix of our growing blue-chip customer list and unique technology gave us the confidence to go-it-alone.

It’s easy to sit here now and say “yeah, we were right”. But, back then it was a great risk for each and every one of us personally. That start-up team of Jim Campigli, Yeturu Aahlad, Naeem Akhtar and yes yours truly are heroes. We had to sail the ship through some pretty treacherous waters of down economies and slow IT spending. But we flourished and amazingly grew.

By 2008 we were looking for a way to scale the business and hire more engineers & support staff to enable us to grow. The conundrum we had, as a self-funded start-up, was growing inline with our sales bookings. That’s not easy in Silicon Valley with all of the high salary and high expectations that people have. So we looked further afield to India and China. The problems for a small company like us in those geographies were both the cultural differences and time-zone constraints. They are not easy to overcome for a small start-up.

In the end we settled on my hometown of Sheffield in the UK. I have written before of all of the virtues of Sheffield as a development center. Even now I get raised eyebrows when I discuss that we have software engineers in the town. I even heard the other day that one of our competitors criticized us for setting up in a ‘old rundown steel town’.

This serves as a great motivator for the team though. The whole ethos that the United States is built on is supposed to be about economic prosperity and upward social mobility achieved via hard work – last time I checked Feudalism died away in the 15th century – there is no god-given right to success!

Hard work and dedication is precisely what we have plenty of. As part of the IPO diligence process I listened into a call with a multinational chipmaker who described the support he was getting (from our Sheffield office) as “world class”, “the best of any of our vendors” and more importantly to me “they really care”. I was so proud last year when we won the coveted “Made in Sheffield” mark on our products.

Onto the IPO itself. Well, I think we did OK!

• It was almost 4x over subscribed.
• The list of institutional investors is to die for (Fidelity, BlackRock, M&G, Octopus, Legal & General, Cazonove, Artemis, Hargreave Hale and Standard Life.
• The business is still largely employee owned. Every employee with tenure has stock.

For those that don’t know the London Stock Exchange (LSE) it’s the most international of all the world’s stock exchanges, with around 3,000 companies from over 70 countries. The LSE is also the world’s fourth largest exchange boasting some of the worlds largest companies such as Shell, HSBC, Vodafone and BP.

So, yes June 1, 2012 was a proud and historic moment in the history of WANdisco. But this is the beginning of a new chapter. Our goal now is growth. Yes we had a great party last Friday but Monday was very much business as usual.

Finally I must say thank you to a bunch of people. Firstly, to the whole company and to my management team of Jim, Peter, Rob, Ian and Nick and my assistant Jerilyn. Panmure Gordon (George, Adam, Charlie, Fred, Giles, Grishma, Ben and Tim.) DLA (Jon, Rob, Rachel), KPMG (Chris, Euan, Dion, John, Julie, Maria, Richard, Philip) , Gunderson Dettmer (Ward, Cindy, Lisa, Ashlee, Jackie) , Seven Hills (Nick, Michael, Alex, Rosie, Stuart), FTI (Matt, Jon, Sophie) and Travers Smith (Aahron, Lisa, James)

avatar

About David Richards

David is CEO, President and co-founder of WANdisco and has quickly established WANdisco as one of the world’s most promising technology companies. Since co-founding the company in Silicon Valley in 2005, David has led WANdisco on a course for rapid international expansion, opening offices in the UK, Japan and China. David spearheaded the acquisition of Altostor, which accelerated the development of WANdisco’s first products for the Big Data market. The majority of WANdisco’s core technology is now produced out of the company’s flourishing software development base in David’s hometown of Sheffield, England and in Belfast, Northern Ireland. David has become recognised as a champion of British technology and entrepreneurship. In 2012, he led WANdisco to a hugely successful listing on London Stock Exchange (WAND:LSE), raising over £24m to drive business growth. With over 15 years' executive experience in the software industry, David sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established many successful start-up companies in Enterprise Software and is recognised as an industry leader in Enterprise Application Integration and its standards. David is a frequent commentator on a range of business and technology issues, appearing regularly on Bloomberg and CNBC. Profiles of David have appeared in a range of leading publications including the Financial Times, The Daily Telegraph and the Daily Mail. Specialties:IPO's, Startups, Entrepreneurship, CEO, Visionary, Investor, ceo, board member, advisor, venture capital, offshore development, financing, M&A