Thursday, January 20th, 2011

A Better Way to Look at Data – Beta Testers Wanted

I love doing data driven posts here at the Urbanophile. But I’ve always found it ridiculously tedious and time consuming to do what I’d consider even basic data analysis without resorting to expensive and complicated tools, ones that normally assume you’re doing some type of hard core economic modeling or something. Or else doing hours of hand cranking. So I decided to fix that by building some infrastructure to let me do in seconds what it used to take hours before, or things it almost would have been impossible for me to do without it. Then I figured, if this is so incredibly valuable to me, why not to other people? So I plan to release this soon to the public as a commercial service – and a powerful one at a very attractive price point.

I’ve had people on my system for a while, but I’ll soon want to do a more broad beta test, so I’m asking for volunteers. An an enticement for this, I’ll be offering a discount to my beta testers – with no obligation to buy, of course. This is the system I’ve been talking about that has easy query access to the IRS place to place migration data, including MSA-MSA migration. I’m not including access to that data set in the beta, but if you are interested in it, you’ll definitely want to sign up for this. Just shoot me an email.

To give you some reasons to check it out, here’s a teaser sample of what you can do with the system that should show you some of its power. Note all that off these charts were made in less than 30 seconds each end to end. I repeat, less than 30 seconds start to finish.

College Degree Density

You may remember my recent post on changes in college degree density. It was an extension of work done by Rob Pitingolo. This post attracted tons of national attention ranging from the Wall Street Journal to the Atlantic. Here’s the main chart:

Keep in mind again that I made this in less than 30 seconds. Here are some of the implications of this:

  • This is just some cool information. The fact that Manhattan increased its college degree density by more than the total population density of most cities is pretty amazing. If it weren’t so hard to do stuff like this, maybe we’d be finding a lot more cool facts.
  • You can see that I’ve got not just college degree attainment from Census 2000 and the American Community Survey, but that I can automatically apply functions like density to it. There are many other functions like this that can be invoked with a click.
  • You can see that I can also run actual queries, in this case to get the top 10 counties. Queries can both accommodate a change over a date range, and apply the transformation functions like density. Again, there’s much more where this came from.
  • The data can be rendered on a bar chart for easily dumping into a blog post, document, or presentation.

I asked Rob Pitingolo how long it took him to do the data analysis for his blog post. He told me a week. I doubt he was heads down the whole time, but he told me it was a ton of tedious work. His post was a bit different from mine, but this should show you the kind of time savings and capabilities we’re talking about here: one week vs. 30 seconds.

Free Yourself from the Tyranny of GIS

You always see people do cool thematic maps like those red-blue election maps that turn states or counties or regions different colors based on data. (Technically, these are called “choropleth” maps). You ever wonder how those were created? I did.

As it turns out, they are usually cranked out using complicated professional tools like Adobe Illustrator or ArcGIS. Tools like ArcGIS are super-powerful, but also pricey and complicated, mostly requiring specialist GIS jockeys to use. Using ArcGIS to create a simple thematic map is like using a tactical nuclear weapon to get rid of the spider in your bath tub.

My system is different. Here’s another 30 second chart, this one from my post on the 2010 Census state population results. It is states that grew faster vs. slower than the US average:

One of the things my system does is let you render the result of almost any query in the system automatically on a thematic map for states, counties, or MSAs – all without knowing a single thing about shapefiles or cartography or other stuff you probably don’t care about. So not only can you run a query, and apply a function like percent change, you can also map it just as easy as bar charting it or showing it in a table or exporting to Excel. And you get the map in a format you can actually use.

In this case I took advantage of a built-in function that lets me do separate colors for values above or below a threshold. One of the built-in values is a comparison vs. national. You don’t even have to know what that national value is – the system pulls it automatically, though in this case you can see the value of 9.7% in the legend. I elected a monochrome pattern, which is one of the built-in coloring algorithms, but I could have picked a different one. Again, all of this done in less than 30 seconds end to end.

Liberate Yourself from Bogus Restrictions and Hand Summarization

One of the things that irks me is how so many of the free query tools put bogus restrictions on what you can do. One of them is asking you to select a geographic level before doing any analysis, like “Do you want to look at states, counties, or MSAs?” The Census Bureau’s population estimates data is a classic example of this.

But why not let me query anything I want together as long as the data is comparable (which population certainly is)? Here’s another 30 second chart I created that compares population growth last year in what I call my “Chicago vertical stack.”

Mayor Daley should be pleased. The city of Chicago didn’t beat the national average, but it beat out every other level in my stack. Back to the city, anyone?

My system defines basically all of the standard hierarchies as you can see. This includes things like the BEA Economic Area that aren’t widely used. The Census Bureau doesn’t actually slice data by Economic Area, but since I know what counties it is comprised of, my system automatically rolls up the data if it is present and there’s a rollup rule. This lets you use constructs like Economic Area that may never have been easily usable before.

You may be wondering whether, if I can roll up standard hierarchies, why not custom ones? Well, I can. You can see the line labeled “CMAP Service Area.” This is the seven county area serviced by the Chicago Metropolitan Agency for Planning. You can see that it is labeled as user defined, and it works just like any standard geography in the system.

Now you’re asking, “How is it possible to have done all this in 30 seconds? You couldn’t possible have defined that CMAP region and created this list in 30 seconds.” No, I couldn’t. But I didn’t have to. Since this is a way I like to look at the Chicago region, I created them a long time ago, and saved them to a list. The system lets me do that, and now when I come back, I can run queries like this against my saved list without having to go through the tedium of re-selecting all the geographies. Now that I’ve got most of my frequently accessed geography lists defined, I find that I more often than not bypass completely the geography selection process, saving even more time. It’s awesome for benchmarking.

Beta Testers Wanted

If you work for an organization that uses data – demographic, economic, and other data about cities, counties, regions, and states – then you’ll definitely want to check this out. The price will be $495 per year (only a bit over $40 a month), which is frankly a no brainer for any real organization. Keep in mind that just getting the IRS to ship you their data on a CD – in the form of over 3,500 Excel spreadsheets I might add – is $500. You’d pay about $360 a year just for a GIS-free mapping tool like Indiemapper. If you participate in the beta test though, you can sign up at a price of $395 a year (a bit over $30 a month) – a 20% discount. Right now I’m only going to roll out annual plans. A monthly plan could follow, but it would be at a significant price premium to annual and not include IRS migration data.

I hope you’ll be interested. To sign up for the beta and qualify for the discount, again with no obligation to buy anything, just send me a note. If your head hurts as bad as mine does after trying to use the new Census Factfinder, you’ll be glad you did. And thanks for your help.

5 Comments
Topics: Demographic Analysis, Technology

5 Responses to “A Better Way to Look at Data – Beta Testers Wanted”

  1. Zathras says:

    Looks very interesting. One question: are the analysis tools restricted to the data which is included? Or can the user use the tools on his or her own data?

    Either way, I’ll shoot you an email for this.

  2. Right now it is restricted to system data with one exception: you can create a thematic map of your own data. I do want to eventually allow user data loading, but first need to make sure the core product works and is market viable. Thanks for your interest. Cheers.

  3. Alon Levy says:

    I’ll add to Aaron’s blurb and say that beyond letting your organize data, the tool is useful to learn things that are not always obvious. For example: using the state migration income numbers and a little bit of work, you can see that in California the people who move out are richer than the people who move in – perhaps contrary to the stereotype of California pricing the poor out.

  4. Jarrett says:

    Aaron.

    Any chance of getting below MSA and County, to maps of data by city or census tract?

    I ask because out west, counties and MSAs are just too big. You can’t say anything useful about Los Angeles, say, with data at that level.

    Or will that be in v2.0?

    Cheers, Jarrett

  5. Jarrett, right now, I do not support sub-city/county data like zip code or census tract level, though the architecture is built to be extensible to it. Right now I’ve got plenty of goodness to deliver at the levels above that. But stay tuned.

The Urban State of Mind: Meditations on the City is the first Urbanophile e-book, featuring provocative essays on the key issues facing our cities, including innovation, talent attraction and brain drain, global soft power, sustainability, economic development, and localism. Included are 28 carefully curated essays out of nearly 1,200 posts in the first seven years of the Urbanophile, plus 9 original pieces. It's great for anyone who cares about our cities.

About the Urbanophile

about

Aaron M. Renn is an opinion-leading urban analyst, consultant, speaker, and writer on a mission to help America’s cities thrive and find sustainable success in the 21st century.

Full Bio

Contact

Please email before connecting with me on LinkedIn if we don't already know each other.

 

Copyright © 2006-2014 Urbanophile, LLC, All Rights Reserved - Click here for copyright information and disclosures