Thursday, December 24, 2009

Data Visualization and Web 2.0

I love data visualization techniques. From my early days as a operations data analyst and all of my software development career, finding patterns in data and finding an easy way to convey the pattern through a graph or other visualization has always been fun. Working on custom application development projects that provided a picture of how the business was doing, where customers were spending, etc is fun. Now working with our Business Information Services clients to help create innovative approaches to information discovery and data analysis is fun. It really is true that often "a picture is worth 1,000 words."

I vividly recall a stubborn memory leak my team had been trying to track down for several weeks. This was a long time ago, in the days of VB6 COM dll's running inside ASP web pages, and we were pretty sure our code was not leaking. The team had found memory leaks before, and tracked every single one of them down to circular references in our COM object model so that the automatic release never occurred. Historically, it had been easy to find a leak by running a simple load script, executing each page thousands of time in isolation and watch to see which page shows the memory leak. But not this time. We had run the load tests several times and never found the leak. We scanned the code thoroughly. We added as many "set obj = nothing" safety lines as we could. But still the production web servers kept leaking memory, and we were forced to move the automatic restarts of the servers from weekly to daily and hope our band-aid would hold.

One day, I had some down time and decided to see if I could find a correlation between the memory used and what pages were invoked on the production system. An hour or two later, I had pulled all the IIS log files, gotten dumps of memory traces from the systems team and started my analysis. A bit of awk, grep, Access, and other quick and dirty processes later to pull out the data I wanted, adjust for timezones, aggregate hits to cumulative 15 minute buckets and otherwise line up the datasets and I was ready to plot the data.

Instantly, the answer of where the leak was was obvious. The two lines, cumulative hits to a particular URL and memory in use were nearly on top of each other. The correlation jumped out, completely overwhelming the noise of other URL's, pages, etc. This is the power of a good visualization. (Of course, it turned out that the leak was coming from a web services API proxy URL, not a page in the website that everyone had focused on! Since the proxy was not 'in' the website it had been ignored for weeks as the team hunted for the answer.)

Recently, some colleagues and i were discussing what areas Alliance Global Services provides solutions to clients in. This is a pretty broad topic, and we talked about the types of industries we serve (including our focus on Business Information Services), the geographies we serve (mostly North East US, from about Virginia to Boston), the types of services we provide (Custom Software Development,Application Architecture Analysis). And we talked about the easiest way to visualize our coverage areas.

Well, today I had a little downtime before the holidays. So I took a list of our client locations used some simple geocoding tools, and put together two quick samples of mapping in the Web 2.0 world - one using a Yahoo! map through batchgeocode.com and the other using the Google visualization API.

Batchgeocode.com made it very easy to process the first set of data and create a map but then you were stopped. Google was a different story - getting the map running required coding, but then I had full control. To see the first map, visit this blog on Alliance Global Services.

Obviously it's not perfect, but lots of fun for a quick afternoon's work!