Sunday, February 24, 2013

Set up Hadoop on OSX

Let's say you're looking to get you MacBook Pro all set up with a local Hadoop instance to play with for that long flight across the Atlantic (or some other time when you really want to be running locally).

The following steps work for OSX 10.7 Lion
The following tips may be useful the first time you're setting this up on a new computer:
  1. Turn on SSH (System Preferences => Sharing => Remote Login => "On")
  2. For the JAVA_HOME variable, you could put in the current full path: 
    • export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
    • or, you could do it the smart way:
    • export JAVA_HOME=`/usr/libexec/java_home`
  3. If you get this error: 
    ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /Users/jchen/Data/Hadoop/dfs/data: namenode namespaceID = 773619367; datanode namespaceID = 2049079249

    It's because you've formatted the namenode twice. Happens when you're walking through tutorials. The answer is well spelled out here (which is another good setup tutorial). The summary is: either start over - delete the datanode directory and then reformat the name node - or manually fix the version file in the datanode to match the name node.

Tuesday, July 12, 2011

Enabling X11 Forwarding to Ubuntu on AWS

If you're using Amazon Web Services EC2, you probably discovered that the RightScale Linux images are very good starting points. The Ubuntu server image is a good starting point for many standard dev or demo purposes. Usually the ssh session is good enough, but recently I wanted to run JConsole on the server to access some MBeans published by the graph database Neo4J.

Here are the changes I made to my basic server set up script (you do automate all the steps you like to apply to any new EC2 image, right?). Add 2 X-related packages and switched from the headless OpenJDK to the full OpenJDK:
apt-get install xauth -y
apt-get install x11-apps -y
#need full jdk, not headless in order to run jconsole as UI
#apt-get install openjdk-6-jre-headless -y
apt-get install openjdk-6-jdk -y

And remember to update your SSH command to include trusted X11 forwarding
ssh -i root@.compute-1.amazonaws.com -Y -C


Friday, March 4, 2011

Content Access Becomes King

The old mantra was that "Content is King" but that's changing quickly and it's now more true to say that "Content Access is King." The software applications that put the content in front of the user is what determines the winner.

In the overall evolution of the mobile app market, the open web, and the semantic web one irony is that while content is becoming more accessible, more accessed, and more widely used the seeming importance of the actual content is going down. Consumers have always thought of the content access tool as the product, not the content itself. When it was difficult to get to the content repositories, the Bloomberg terminal, LexisNexis green screen, and the CompuServe dial-up software were the product and there was no way to get to the content without going through the access platform.

The internet and search engines started the trend - consumers did a search and got results and never knew that the content was not owned by Google or Yahoo - and did not think about the quality, accuracy, or completeness of the search results because they were much more focused on the ease of access. This trend accelerates with newer internet applications and the Mobile Internet in particular. Users, consumer and professionals, will buy the apps that give them access to content - regardless of the content source. Yelp! and Urban Spoon replaced Zagat because it was easy to find and contribute reviews on your mobile device, not because the content was better or more complete than Zagat.

To compete in this industry today information services companies must:
  1. Continue to make your content better - try to maintain the Accuracy, Completeness, and Timeliness gap between proprietary content in captive repositories and the Free Content so that the upstarts stay at 80% solutions (or go down, not up)
  2. Innovate and deliver the world class applications and access points that consumers want. Incumbent information providers need to create rich mobile, always on, easy-to-use apps that make it easy to find, read, and contribute content from any device at any time that consumers like to use and promote to their friends. Rich User Experience, cutting edge software development.
  3. Open up your content repositories through APIs to get your content used by new innovators. Embedding your content in the largest social media applications, the new local apps, and every entrepreneur's crazy new idea gets your content in front of new users and new customers. By making it easy to use your content in new ways and new ideas, you also forestall the creation of even more competitive content sources and make your content the preferred choice for every new innovator.
  4. Overhaul your content supply chain. Make it nimble, make it flexible. New content sources, new content enrichment, new content integration. Faster, cheaper, and better.
More sophisticated customers may realize that free or smaller content providers do not meet the very high Accuracy, Completeness, and Timeliness hurdles that Professional information workers have. But they will still buy and use the easy-access mobile solutions to get access to the 80% content anywhere and all the time. As the 80% solutions get better (85%, 90%, ...), the professionals may decide they don't need to pay full price either.

Most ironically, the more content providers try to lock up their content, the more it will become commoditized -- startups who make great software products using a free content source because it's available and easy to use convince customers that the free content is good enough. There goes your proprietary brand and content differentiation.

Thursday, February 17, 2011

Don't Fire QA - Embed QA in to Agile teams

Mike Gualtieri recently wrote a post about how 1 financial data company was able to improve software quality by firing their QA team and making developers responsible for testing the software. This is a good concept - have the team of engineers responsible for creating the software also be responsible for proving the software works. I agree that you should break down the barrier between a development team and a QA team. I disagree that you should fire your testers. You should Embed them in to Agile teams.

I met with a client today who described doing weekly releases to a very high volume online advertising system. This system has hundreds of servers around the globe and is directly revenue generating software with complex algorithms. His approach was not to fire the QA but to embed them in to the development scrum teams to help the developers make even better software.

This is one of the basic tenets of typical Agile software development approaches. To quote Janet Gregory, Agile teams have "Blurred Lines Between Roles":
  • Agile developers are "test infected"
  • Agile testers and programmers collaborate
  • Agile testers and customers collaborate
  • "Whole Team" responsibility for testing
  • Everyone understands the business
When you follow these approaches, you get Unit tests from your developers. You get acceptance tests written for user stories BEFORE the code is written. You get developers who try to prove their code works, not who just try the happy path and throw the code over the wall to QA and wait to see what bugs get reported back.

When you have developers who are responsible for executing regression tests, you get automated unit and regression tests to validate the software in every build. And when you have developers who are responsible for successfully deploying the software to production they think about how to automate the deployment so it works every time. This is all good!

But you still need people who are trained to think about testing, requirements, and user story completeness, and who are used to looking for corner cases, and thinking like a user, and who have seen bad data, funny encodings, and the hundred other oddities that a skilled tester can find in software.

This is why it's better to embed the QA testers in to the scrum teams. The TEAM is responsible for the software quality. The 1 or 2 engineers on the team with a QA background go about ensuring quality differently - they work with the users and customers more to ensure the team really understands how the system is going to be used. They build automated tests for parts of the system that are not easy to automate with unit tests. They double check the consistency of the look & feel. They run stress tests. They manage sample data. They ensure that 2 user stories do not conflict with each other. They do all the things that a developer focused on a single user story might miss or not realize was important, because after all the developer's code passed the unit test and passed the acceptance tests.

If you want great software, you need more than just coders. You need testers - but you need Agile testers who are part of the software creation process, not a separate team given an impossible task of "finding all the bugs" in code casually written by the coders. Don't fire, EMBED QA!

Custom app dev is DEAD. Long live the Agile Business Platforms.

Custom application development is dead. Over the next 3 years Agile Business Platforms development like force.com and Mendix will replace custom development for 90% of business applications. The ability to rapidly prototype business requirements and deploy scalable, working applications in a fraction of the time of traditional Enterprise application development processes is a game-changing business advantage. No one who understands the ROI and business value benefit will hire a Java or .Net developer to build a new business application from scratch. Anyone looking to reduce costs and improve business agility by reinventing their legacy systems needs to look at a tool like Mendix that can deliver immediate business applications and continuous Agile business improvements.

The traditional 2- or 3-year Enterprise application development process run in the traditional way by the IT team is a waste of money and time and sacrifices key business agility. In today's hyper competitive and fast moving world, no business can afford to wait that long to introduce new capabilities, integrate with new supply chain partners, or automate existing costly manual processes. Agility, flexibility, and lower cost are the name of the game.

These Agile Business Platforms can be either on-premise of cloud based Platforms-as-a-Service (PaaS) options. The key is to be able to have a business analyst sit with users and business people and turn requirements in to prototypes immediately. This way the business people can "touch and feel" the application and see how their business process will work. They can provide feedback and iterate through processes, problems, and ideas in a matter of days not months. This is the definition of an Agile business and it is the promise of on-demand IT services that require a minimum of custom coding and maintenance. The companies who embrace and benefit from these cloud platforms will be able to out innovate and out compete their competitors by trying new business ideas, improving business processes, and leveraging the global supply chain of partners to produce the best products, services, and customer experience. IT must be the enabler, not the bottleneck to this true Business Agility.

Long live the new Agile Business Platforms.

Tuesday, February 8, 2011

Cross Platform Mobile App Development


In order to avoid headaches, reduce time, and reach a broader audience it is critical to have a good cross platform (or is that cross-device?) mobile application development framework to enable a "Write Once, Run Anywhere" experience without having your dev team try to learn five different SDK's and a zillion different libraries. With the plethora of different mobile platforms and operating systems, to reach the largest audience you would need to target at least 3 separate major SDK's - Apple's iOS, Google's Android, and RIM's Blackberry.
And then let's not forget the other smaller (but still significant) players, HP/Palm webOS, Microsoft's Win Phone 7, and Symbian.

So that's at least 6 separate SDK's and versions of mobile apps your team would have to build. Oh boy, that quick mobile app you wanted to build just got a lot harder.

Or did it? What if you could use a standard application development toolkit, maybe something a lot of developers already have experience with that worked across all the major mobile devices? That would suddenly cut your 6 separate SDK's back down to 1, plus some wrappers to get the native app's built and deployed on each platform.

Sounds pretty good - right?

Well - it's here, and it's HTML5. That's right, your favorite good old fashioned web development toolkit is also the best mobile development toolkit for building cross device mobile applications.

One of the best tools for packaging your HTML5 based app for each mobile platform is PhoneGap, an open source tool that uses each major SDK to provide a native mobile app for each platform. These HTML5 mobile apps have full access to native features and look like all the other apps you are already using. Heck, a lot of apps you are using are already developed using PhoneGap. They are working on additional enhancements to make automated build processes so that even the work of setting up and build five different flavors of your mobile app is automated.

Then there are various javascript libraries available to make your app shine and give you full development tools for building that killer business logic you need. Some of the ones AGS uses in our application development are:
  1. JQuery Mobile - open source jQuery plug-in with great mobile app theming support
  2. Sencha - ExtJS based commercial desktop & mobile app library
  3. Rhomobile - a set of products for full enterprise mobile application development
By leveraging these tools and techniques, we are able to build full-featured mobile apps that work on multiple platforms at the same speed (or faster in some cases) as traditional web applications.

Saturday, February 5, 2011

Dynamic Named Ranges in Excel

I want to share a tip I read about (and used) today for Dynamic Named Ranges in Excel. I've used Excel for a long time, and spent many many many hours building spreadsheets that mimic little databases because a client or user couldn't support a database but really "wanted" a simple spreadsheet. I often use named ranges for data validation to substitute for lookup tables and foreign key constraints. A common problem is when the users go to add a new value to the bottom of the lookup table the new value would fall outside the named range and not be used in the lookup table.

I've tried teaching users, writing instructions on how to expand the range, and had fallen in to the habit of making the last element be a row of dashes ("-----") with instructions to "add a new entry to the list by inserting a row ABOVE the dashes." Not very elegant at all, but it did usually work.

The nice folks at ozgrid.com gave a good, simple explanation for how to create Dynamic Named Ranges that will automatically expand to include new values added at the end of the list. Great time saver!