Wednesday, January 26, 2011

Networked Content Manifesto

Yesterday Temis distributed print outs of their newly released "Networked Content Manifesto" at the SIIA conference in New York. Ignoring the irony of a firm focused on digital content processing printing copies of their content and handing them out (or maybe that's an accurate reflection of the current state of online publishing?), the manifesto is a good introduction to the concepts surrounding Semantic technologies and "Content Enrichment". In Alliance's Information Services industry focus, wework frequently with clients on implementing taxonomies, ontologies, classification systems, and other tools to automate the "Enrichment" portion of the information supply chain processing pipeline. Combining these sophisticated tools with good Master Data or Master Entity repositories and linking with other internal content or the public Linked Data initiative provides a much richer experience for researchers and content users.

By providing more meaning - more semantic information - about the concepts, people, and entities that are in the document and providing easy ways to navigate through the overall content space we can create a richer experience for the end user and make it easier to discover the information she is looking for. For the publisher this translates in to increased usage which means easy subscription renewals, so that's a good thing too!

Saturday, January 22, 2011

Automated Functional Tester

This week saw the roll-out of Alliance's new "Automated Functional Tester" (AFT) framework. This is a very comprehensive and powerful automated testing framework enabling the end-to-end testing of complex web applications. It's driven by easy to write business requirements, handles rich AJAX interactions, integrated Windows security and file operations, enables fast regression suite runs, and rich reporting. This framework is built on and integrates a number of open-source testing frameworks to provide full capability and more comprehensive testing than any individual tool. And by using standard test execution environments, like Selenium, it's possible to take tests generated by the framework and execute them in Cloud testing platforms for performance testing or large-scale cross browser testing.

We use automated testing very extensively, from developer written unit tests to fully automated test suites. Sometimes clients balk at the high license fees associated with very full featured commercial testing tools like QTP and this open source testing framework is a great capability to provide our clients with custom software development backed by fully automated testing at a lower price point!

Great job to the AFT team!

Why use the Cloud?

A lot of people ask "What is Cloud Computing?" There are good answers for that, and I'm sure I'll expand on it more in this blog as well. SaaS, PaaS, IaaS, Public, Private, Hybrid, Virtualization, Storage, Compute, Developer Clouds, Production Clouds, etc. Lots and lots of definitions of "What is Cloud."

But there's a fundamental question I want to answer - Why use the Cloud? What's the business value for Cloud Computing?

Well - it's all about 3 things:
  1. Agility
  2. Capability
  3. Economics
Agility - using on demand software and infrastructure enables you to be more flexible and achieve a given result faster. If you're looking to implement a CRM package, turning on Salesforce.com or NetSuite takes a day or two to start and a week or two to get rolling - much quicker than the usual months-long implementation effort for on premise package installation. If you need to turn on some developer or testing lab server instances you can sign up with an Skytap, GoGrid or Amazon.com, configure a virtual instance and be completely done by lunch instead of the usual few weeks procurement process to buy a new server, get the IT department to install and configure it, and be able to start using it productively. Using cloud based platforms for development, like Azure, Long Jump, Force.com or App Engine enable much faster software development and implementation and let you start realizing the business benefit of your application much sooner.

Capability - Using proven, scalable services to build on enables much richer functionality and incredible scalability that you would be very hard pressed to achieve from scratch even if you could afford to build out all of the infrastructure and functionality. You can add amazing rich features to your web application, whether it's web analytics, photo sharing, data entry forms, data visualization and other business functions so much faster by integrating existing software service like Google Analytics, Flickr, Caspio, or Birst than you ever could by trying to write requirements, design, build and test the software, deploy the new feature and worry about scalability. When you're looking to deploy robust, scalable applications that support thousands of users around the world, deliver fast performance to every corner of the globe, and meet modern expectations for responsiveness and user experience using Cloud based Content Delivery Networks, Cloud based in-memory caching or databases, cloud-based load balancing between data centers and across tiers, and instantaneously available storage and compute capacity enables companies to have world-class capabilities that they could never hope to achieve by building out their own internal data centers.

Economics - Like many revolutions, by "doing things differently" Cloud based solutions are able to provide very large cost savings. For individual services, whether it's reliable storage or scalable web servers, a huge IaaS provider like Rackspace can provide much lower cost per unit than an internal data center could. These large Cloud providers make an investment in automation that enables them to efficiently manage orders of magnitude more capacity per engineer. They are able to buy in bulk from server and equipment vendors, site data centers next to low-cost electricity sources, and operate their data centers like a modern "factory" instead of the "cottage industry" of the internal corporate data centers - with all the cost savings you would expect. The same is true of SaaS providers, who are able to invest massive amounts in software engineering, user experience and design to produce world class software and share the costs across dozens, thousands, or millions of subscribers. An individual corporation will need to invest much much more to build their own version of a software package or service than it would cost to subscribe to a well implemented service.
For the Cloud subscriber, there is another benefit to subscribing to a cloud service rather than buying or building your own internal version - the switch from a large up-front capital investment to a monthly operating cost. This On Demand pricing model enables you to start small and incrementally increase your investment as users move to the service or your customer base grows, rather than requiring you to over-provision for your hoped-for 1 or 2 year projections. This subscription model also ensures you have continued flexibility to evaluate whether the deployed solution is meeting the needs of your customers - the cost to increase the service, switch to a different solution, or turn off the service completely if the business model is not working no longer hinges on huge up-front investments that must be managed and written off - you simply stop paying the monthly subscription and move on.

As you can see, using Cloud based services provide very real, concrete business value by enabling more Agility, more Capability, and better Economics. There are also numerous other intangible benefits to moving to Cloud services. You need to automate your deployment, you can test more easily, you need to define real service boundaries and interfaces, you can prototype integration and new services quickly, you have access to a wider range of technology options, and on and on.

That's Why you SHOULD use the Cloud!!

Thursday, December 16, 2010

The Best Tool for the Job

I spent some time this month learning more about the LexisNexis Data Analytics Supercomputer (or HPCC) system. This is a great tool for building and deploying lightning fast Content Services with high quality content enrichment to turn commodity content in to a valuable information product that professionals will pay for. It's purpose built by a team of very smart technologists who have been turning out content-based products for a long time. For re-engineering a large scale data processing system with hundreds or thousands of input files running on a variety of maxed out Unix servers or mainframes HPCC is a great fit.

The system design reminded me a lot of the notion that when you are doing something over and over again, the right approach is not to just get better and faster at repeating the specific task - but to find a better tool to eliminate the task. If you're a professional carpenter and putting in a lot of nails you may be tempted to look for a better hammer. The best hammer money can buy will certainly help you hammer in nails better. And that hammer will feel great - like an extension of your arm. It will have perfect balance and enable you to bang in nails all day long without feeling tired. You could be recognized as a true black belt hammering expert able to pound on those nails as long as anyone.

But if you spend the time to find & use a better tool (or even invent a new tool) -- say a nail gun! -- you won't be 5% or 10% faster at hammering - you won't be hammering at all. For the first hour the new nail gun will feel klunky, the tool will be inelegant and ungainly compared to that perfectly balanced hammer you're used to. You may resist because the hammer you're used to has worked so well for so long - you can't count how many hours you've used it to bang in the same nail over and over and over again.

But once you grok the new nail gun and get to used to the new way of accomplishing your task you'll see how much faster the new tool lets you put in nails. In fact, you'll be so fast at putting in nails, you'll stop measuring how long it takes to put in a nail and start thinking about the fact that you're 1,000% faster at putting up wood framing, which is the actual goal. This insightful leap requires you to realize that continuing down your existing path and perfecting your use of your current tool is not the best approach - optimization will not win over innovation.

In software development we are blessed that it is so easy to become 5 or 10% more productive. There is always a new trick to learn, always a new pattern to understand, always one more tip for a language to master, always a faster hotkey or way to do that same task again with fewer clicks, always a piece of code to copy & paste from a previous module or an internet example, always a faster way to repeat the same solution from yesterday over again today. Hmm... maybe "blessed" should really be "cursed".

And maybe the real blessing is that it is also very feasible to invent or find completely new approaches and new tools that make ourselves orders of magnitude more productive - to deliver more business value. The key is to recognize when using a new tool will not just save a few minutes here and there, but will actually save weeks or months of effort!

If you're working on an information supply chain that is processing terabytes of content or billions of rows of data, the LexisNexis DAS system and the ECL language is definitely one of those orders of magnitude improvement tools. It may take a while to stop thinking about SQL and good old RDBMS design, but once you get used to the power of your new nail-gun, you love it!

Friday, November 26, 2010

Forcing new google gadget to load, not from cache

One common challenge when working with Google Gadgets is forcing a new development version to load, not reusing a cached version. A quick search on finds several suggestions and conversations about how to do this, but they are not complete. Renaming the gadget for each save is overkill and not worth it. The developer gadget does not appear to correctly skip the cache in Safari. A little playing reminded me that the key is to edit the bogus querystring parameter to make the URL unique each time you re-add the gadget.

So, as the first suggestion above says, add a parameter like ?nocache=1 to the end of your gadget moduleurl when adding it.

AND... when you need to make another edit, change the parameter! The 2nd time you need for force a reload, set the url to ?nocache=2

Next time, ?nocache=3

And so on...

Google Gadgets

I started experimenting with Google Gadgets today - and I am very impressed with how easy the iGoogle and OpenSocial frameworks are to use. This is a great example of how a good framework, combined with some on-demand hosting services and mashups of REST-based data sources can make lightweight (or "rowboat") application development very simple. Mixing data from multiple sources via easy-to-integrate URL's is a great development paradigm for creating operational dashboards or quick glance BI reports.

In my particular case, I was looking to be able to quickly glance at a page of stock charts to see whether any stocks I'm currently following are at an interesting point. FinViz.com is a wonderful freemium site for simple analysis, but to follow 4 or 5 stocks required just too many searches and clicks. But FinViz follows the good Web 2.0 pattern of making their charts publishable and linkable through a URL, so you can combine them to make a dashboard or mashup very easily. (More proof that REST-based web services are better for distributing your data or enabling partners to easily integrate. SOAP is ok for heavy weight system integration, but if your goal is to get your data used, use REST!)

A few hours of reading the docs and scripting and now I've got a configurable stock dashboard integrated with my iGoogle home page. I just have to decide how to handle the charts when the gadget is not maximized -- any suggestions? :-)

Wednesday, November 3, 2010

Integrating Content in Customer Workflow Applications

As I discussed in my last post, it is critical for information firms to distribute and deliver their content where their customers want to purchase, read, and use it. In addition to needing to support more platforms and mobile devices, publishers need to integrate their content in to their customer's workflow applications. Typical customers do not purchase data, reports, articles, or analyses for fun - they purchase these content pieces to help them accomplish their larger business objectives. This might mean checking the credit history of a loan applicant, scouring scientific literature for a grant application, or analyzing the legal precedents of an upcoming case. In each situation, the goal of the customer is not to get a document to display on her bookshelf, but to get access to the information in the course of performing a larger task.

Obviously, the key to make this easy is to not force the customer to leave their workflow tools and go to a discrete content website but to INTEGRATE the content in to the workflow tool. In many cases Information Providers are moving up the value chain and selling those workflow enabling tools - for example:
In each case, the Information Provider adds value to the raw underlying content by providing a business process focused (or Knowledge Worker focused) workflow tool to help the knowledge worker turn the content from "information" in to "knowledge". Of course, this value-add is in addition to the direct content value-add through aggregation, classification, entity recognition & linking, and analytics applied in the Information Factory.

By building these Workflow solutions on top of standard REST or WebServices API's, the provider is able to leverage the existing content repositories, search, and enrichment capabilities without having to create duplicate product stacks. And, as I mentioned last time, the same content could be directly integrated with a customer's proprietary workflow tools or data. A well thought out and designed API enables many additional distribution channels and revenue generation options with a high ROI by enabling both provider built workflow tools and customer integration. And of course those same API's can support the new mobile applications that customers want.

There is also a very fast growing and productive middle ground for Information Providers to integrate with - Microsoft Sharepoint. This is one of the fastest growing products in Microsoft history, and used (admittedly to varying extents) by many, or even most, Corporate customers. The ability to provide pre-built Web Parts that customers can easily install in to their Sharepoint portal to integrate a provider's search and document retrieval capabilities are very powerful and provide a low-cost, very easy way to move beyond a generic internet content web site to a more integrated Enterprise application. Providing additional Sharepoint workflow enabled components can enable an information provider to provide the full range of on-premise, Enterprise capability without the cost and complexity of Enterprise application development and maintenance.