Sunday, February 27th, 2011
You all know that I love doing data driven posts. But I found myself frustrated that it would make me literally hours to create even simple blog posts doing what I figured was very basic analysis like putting up something about what happened in the latest Census estimates release. There was just tons of tedious work involved.
It can be surprisingly difficult to answer seemingly basic questions about cities, like:
- Which large metro areas grew their GDP the most in the last year?
- How does Chicago benchmark against New York on job creation?
- What counties in Indiana increased their Hispanic population share in the last decade by the most?
- How did the population growth rate in the city of Chicago compare to Cook County, the metro area, state, and nation last year?
- Where do the people who move to Indianapolis come from?
Answering these questions can involve lots of drudge work to download raw data, manipulate it in Excel to find what you want, then to type it into an HTML table or put it into a chart you can use in a post, presentation, or document. It take literally take hours, sometimes days.
There are tons of free tools that let you access data, but every one I’ve seen is almost useless for real data analysis. They more or less only let you look up facts – like the population of Chicago – or display grids of numbers. It’s telling that the Census Bureau’s tool is actually called “Fact Finder.” If they create graphs, it’s mostly what they want to show, not what you to, and almost invariably only in Flash, so that you can’t take it out of their system without doing a screen shot.
Conversely, there are tons of pro tools that do fantastic stuff, programs like SAS, ArcGIS, or Moody’s Economy.com. The problem is that these cost huge amounts of money, are aimed at high end power users doing hard core statistical analysis and the like, or both; and are often hard to use as a result. There’s a reason that there’s an entire job category out there called “GIS Analyst”.
So I gave up in my search to find something that met my needs, and instead decided to build my own private database and query tools. Then I discovered that’s what half the world is doing, which seems like a waste. So I figured if this is so valuable to me, which it is, maybe it’s valuable to others and they might use it too.
And so my latest venture, the Telestrian data terminal was born. (See www.telestrian.com). For people who work with data about cities, counties, regions, and states, Telestrian is all about providing three bigtime benefits:
- Huge time and money savings. I can honestly say that having Telestrian for my own use during development has reduced the amount of time I spend on many data analysis tasks by over 95%. I’m serious. Stuff that would have taken me hours or been nearly impossible before I can now do in a few seconds. And as we all know, time is money.
- New capabilities. Notice that I’ve been posting more maps lately? That’s because I can actually make them now. And with Telestrian, so can you – and a lot more.
- New revenue opportunities. If you are a consultant, I’ll show you how Telestrian can power new types of engagements you can sell. In fact, I originally thought it would make a nice proprietary tool for my own consulting business.
And if you’re wondering whether this is the system with the IRS migration data, the answer is Yes! so read on.
You can read more about the benefits and walk through a few examples in my white paper, A Better Way to Find, Look At, Analyze, and Display Civic Data. I’ll highlight a few examples of the benefits in action.
Massive Time Savings
You’ve seen lots of studies that rate metro areas on college degree attainment, like Brookings’ wonderful State of Metropolitan America. Let’s say we’re doing an update to that study for them, and want to look specifically at growth in the share of people who have professional or graduate degrees. Which of the top 100 metros areas had the greatest change in their percentage of population with graduate or professional degrees?
With the Telestrian system, you can answer that question in about 30 seconds. We just go do that data element and do it. Telestrian gives you a common toolset on every data element. The Query tab is what most people gravitate to, since it is what lets you look up data by geography and date like other sites do. But that’s arguably the least powerful thing in the system. If you go to Analyze, you can run powerful parameterized queries that let you mine the data in a snap. Here’s the query you want. You can click to enlarge this screen shot:
Note that we set a threshold to only look at places greater than 510,000 people in population. This gives us the top 100, which is what Brookings looks at.
Bam, here’s the answer:
You’ll note that on the left there are a ton of options for working with the results. Maybe we want to dump that into a blog post like this one in a form you can actually read, for example. In just a couple clicks we can export an HTML table that we can paste right here:
|Row||Metro||2000||2009||Change in % of Total Adult (25+) Population|
|1||Washington-Arlington-Alexandria, DC-VA-MD-WV||607,122 (19.1%)||820,534 (22.6%)||3.49%|
|2||Buffalo-Niagara Falls, NY||74,319 (9.5%)||96,625 (12.5%)||3.05%|
|3||Baltimore-Towson, MD||201,072 (11.9%)||267,724 (14.8%)||2.95%|
|4||Boston-Cambridge-Quincy, MA-NH||455,971 (15.4%)||574,092 (18.3%)||2.94%|
|5||Poughkeepsie-Newburgh-Middletown, NY||41,647 (10.5%)||57,859 (13.3%)||2.85%|
|6||Worcester, MA||50,857 (10.3%)||70,294 (13.1%)||2.82%|
|7||Hartford-West Hartford-East Hartford, CT||96,943 (12.5%)||123,378 (15.3%)||2.80%|
|8||St. Louis, MO-IL||158,331 (9.0%)||220,061 (11.6%)||2.61%|
|9||Portland-South Portland-Biddeford, ME||34,082 (10.2%)||46,163 (12.8%)||2.54%|
|10||Columbia, SC||37,534 (9.1%)||55,623 (11.6%)||2.52%|
Or maybe put these into a bar chart. Voilà!
Yes, Telestrian system even truncates those overly long metro area names if you want it to. At this point we’ve spent about one total minute in the system.
If you’ve worked with this data at all, you’ll know that it comes from two completely different data sets. The 2000 data comes from Census 2000 and the 2009 data comes from the American Community Survey. So you’d have to manually extract both, merge them, merge in the population data somehow, trim it down to the top 100 metros, calculate the percentage attainment, calculate the percentage point change, sort on that, then hand create the tables or charts. But what’s worse, you may remember that the Census 2000 data is distributed in that old 1990′s era CMSA/PMSA stuff that isn’t comparable with today’s metro area definitions. So you have to download the county data and manually re-aggregate all the 2000 data to current metros yourself, unless you find a source that did it for you already.
Or you can just spend about a minute in the Telestrian tool.
Beyond change in the percentage of a parent data value, there are several other functions you can use in your search too, such as total raw value, total change, percent change, density, and location quotient. The Telestrian data terminal can almost turn you into a one man Brookings Institution.
Calculating data like the above is tedious, but conceivably doable. But there are some things that are almost impossible to do yourself without the right tools. One of them is to create thematic maps of your results, like those red-blue election maps. Most people create those with ArcGIS, but if you don’t have or can’t use it, or don’t have a graphic designer on call, making one can be almost impossible. I sure didn’t know how to make them.
Using ArcGIS to make a simple thematic map is like using a tactical nuclear weapon to get rid of the spider in your bath tub. That’s why I built it right into the system, letting you render almost any of those Analyze queries directly to a thematic map. We do that in the app on the Map tab, which is similar to analyze but gives you some other options. Let’s just map our same query for all US metros:
In blues, we see places where the percentage of people with graduate degrees increased, in reds those where it actually decreased. I could have picked my own thresholds for coloring, but decided to go with one of the built in algorithms, in this case a 5 bucket sort. This took about 30 seconds total to create by the way, so don’t think that just because I filed this under “new capability” it doesn’t mean it wouldn’t save you lots of time too even if you already have and can use ArcGIS.
By the way, these maps are images files (PNG), not Flash, so you can actually right click and save them to use them as you see fit. And you can make them pretty much as big or small as you want with no resolution loss or distortion. To see an example of what I mean by that, just click here.
Make More Money
This one also saves time and gives you new capabilities, but additionally it enables consultants to make more money too. Cities and states spend hundreds of millions of dollars on human capital and “brain drain” initiatives. But frankly very few places have much of a clue about their human capital networks. Where do people who move in come from? Where do people to leave go? How much money and how big of families are they taking or leaving?
A big problem is the data. The Census Bureau only publishes net migration, but doesn’t talk about where people come from or go to. The IRS publishes that in its migration data, but it is super painful to use. For one thing, other than the last handful of years, the data only comes in the form of over 3,500 Excel spreadsheets. (They will mail those to you on a CD for $500). And the data only tracks state-state and county-to-county when often what we really care about is metro-metro or metro-state. Unless you have and can use (sparse matrix, anyone?) a tool like SAS (which is thousands of dollars a year and doesn’t come with any data) and crack the code on data import, it’s virtually hopeless.
But with Telestrian, all that data has been processed for you, and presented not just at the county-county and state-state level, but also at the metro-metro, and metro-state level. And there are tons of summary metrics taken from the IRS files, as well as other bespoke calculations of things like migration rates and intra-metro migration (e.g., core to suburb moves). Over 100 items in all.
Want to know where the money is going when it leaves Atlanta and how much of it ends up there? Here you go, looking at 2000-2008:
Of course this data is available in raw form, exportable to Excel if you want it. Again, it’s about 30 seconds or so to make this.
This only scratches the surface of what you can do with migration. I hope it is easy to see that there are huge market opportunities for consultants to use this to start helping cities and states map out their human capital networks and find ways to take advantage of them. Much more on this later.
So What Is Telestrian?
So what does Telestrian actually do? A full feature summary is available for your perusal, but in brief, Telestrian provides the following.
- Data Repository. It contains an aggregated data repository of over 600 data elements, including core data such as population, sex, age, race, migration, education, immigration, commuting, highway congestion, health data, labor force and unemployment, jobs and wages, GDP, personal and household income, poverty, and more. I consider this a “starter set” and there’s virtually unlimited room to expand, which I have big ambitions to do.
- Common Analysis Toolset. Run parameterized queries to mine the data and analyze results sets. Includes things like filtering by state or population; applying functions like percent change, total change, or location quotient; and calculating CAGR, index values, percentage of parent, and much more.
- Task Automation. In addition to automatically applying functions like the above, Telestrian also automatically applies rollups of regions, allows saving of commonly used geography lists so you don’t have to recreate them over and over, defining custom regions, etc. The various components of the system are also integrated to enable rapid end to end processing.
- Visualization. Render results to bar, column, area, line, and pie charts (Flash or image), or export to Excel/CSV or HTML tables. Thematic maps can be made at the national level for states, counties or metros, or at the state level for counties.
The focus of the system is data about cities, counties, regions (MSA, CSA, EA, etc), and states, though national level data is also available.
Pricing is currently on an annual basis at only $495/year (a bit over than $40 a month). But for a limited time to my loyal readers who work for organizations who might be able to use this, I am offering it at $395/year (less than $35/month). If you use it for one project like that grad degree one, it already paid for itself. I might offer a monthly plan in the future, but it will be at a price premium to the annual, and not include access to IRS data. A free trial is available with no credit card required and no obligation so you can try it for yourself without risk. IRS data is not included in the trial.
Consider that just to have the IRS send you their raw data on a CD – in the form of over 3,500 spreadsheets – is $500 by itself. You’d pay well over $300/year just for GIS free mapping with something like Indiemapper. To say nothing of the untold thousands you could spend on high end products.
For those of you who work as consultants, planners, journalists, analysts, economic developers, agency staffers, etc. who work with this data and need to do more than just look up simple facts, I’d ask you to take a look, and if you see the value – which I’m confident you will – please buy.
Since this is the official launch day, I’d ask that you please be gentle if we run into performance or other type issues right here at the start. I will increase site capacity as fast as I can if need be, and of course candid feedback is always welcome. Again, the link is www.telestrian.com.
I’ll wrap up with a couple more fun examples, but before I do I want to tell you a few problems Telestrian is NOT designed to solve:
- If you need statistical analysis like multi-variate regressions, you need SAS or SPSS or something.
- If you need data at the zip code, Census tract, or other level below city or county, you need tools from ESRI or one of the many specialist providers who will help you decide where to locate your store or whatever else you need.
- If you need to look at detailed breakdows like jobs at the 4-digit NAICS code or black-female-and-hispanic, look for something like Moody’s Economy.com
- If you need to know the unemployment rate the minute it hits the wire, get a Bloomberg terminal.
- If you need non-US data, again go get Moody’s Economy.com
If you have problems like these that involve very detailed, complex, or time sensitive considerations, I’m sorry. You probably do need to spend a lot of money and hire some specialists.
Fun With Data
Here are a couple more fun pieces of data analysis.
First, a comparison of job growth in New York vs. Chicago vs. the US. I actually go through how to do this example in the Telestrian User’s Guide (yes, the system actually has documentation).
This is a great example of how you can query data at any geography level simultaneous if the data supports it, and the use of indexes for comparison of regions with very different sizes. If you’re familiar with the Current Employment Statistics, you’ll also know that the US data and Metro data come from two separate data sets, but I allow you to query them together.
By the way, I created every single chart in my Chicago vs. New York blog post from last fall combined in about five minutes using a development version of the system. If you at all benchmark or compare cities, I think you’re in the sweet spot of the product. That’s doubly true if you compare places at different geographic levels (such as metro vs. nation or county vs. state, etc) since Telestrian puts no arbitrary restrictions on what geographies you can query together.
One more. Here’s a national county map of unemployment rates for October 2010 (not seasonally adjusted):
It’s a cool graphic, but I especially posted it because data visualization guru Nathan Yau wrote a long blog post at his widely read blog Flowing Data that explained how to a create a map almost like this in 14 easy steps – easy if you know how to program in Python that is. As he put it, “There are about a million ways to make a choropleth map. You know, the maps that color regions by some metric. The problem is that a lot of solutions require expensive software or have a high learning curve…or both.” Yau’s solution requires you know to know how to write computer software. Telestrian is almost de minimis in cost to any real organization and only requires you to know how to surf the net. With that, about 30 seconds later you can have your map.
Thanks so much for reading and I hope you’ll check it out and decide to buy – remember, it’s www.telestrian.com. It’s a great way to support the work I do – but much more importantly I’m confident the business value is very real and significant because I’m enjoying it every day myself.