Tuesday, March 22nd, 2011
This is another post in which I want to highlight some of what you can do with my Telestrian data terminal system. Today I want to look at migration data. Please note that while Telestrian is free to try, free accounts don’t have access to this data.
Study after study, expert after expert touts the importance of human capital to the 21st century knowledge economy. Virtually every city and state is spending gigantic sums of money – often aggregating to hundreds of millions per state – on “brain drain” initiatives and other attempt to retain or attract people. Yet shockingly few places have ever taken the time to even attempt to research their human capital networks. That is, where do people who move to a place come from? Where do ones who leave go to? How much money do they take with them? How are these flows changing over time? How do my migration rates compare with other places? Is my net migration low because people are leaving or nobody is coming?
These strike me as very basic questions. One of the things that led me to build Telestrian is that answering them was difficult. The Census Bureau only reports overall net migration, and nothing about origins and destinations. The other best source of data is the IRS, which publishes place to place migration based on where people file tax returns. This isn’t totally comprehensive, but it’s a huge data set. The problem is that it is ridiculously cumbersome to use. Professional demographers with mad SAS skillz have done it, but little of this analysis has penetrated the public consciousness, and it it has been hard for mortal man to do anything with the data for a few reasons.
One is the way the IRS organizes the data. First, they partition it by geography type, flow direction, state, and year. This results in over 200 files being issued per year. Only a handful of recent years have consolidated files. Plus, the format is Excel. Add it up and you’ve got to process over 3,000 spreadsheets just to get the data.
Plus, each of these thousands of files contains a very large number of records types and data elements, so you have to figure out which ones you want. You can just slurp the whole thing.
Additionally, the data is only supplied at the county-county and state-state level. But with metro areas being the drivers of the modern economy, a metro area view of the data is what is really needed. That isn’t supplied. And only in and out migration is available, and other potentially useful metrics aren’t calculated.
I found it impossible to work with manually myself, so decided to bite the bullet and process all those files into a database, creating all the elements I wanted on it, and putting the query and analysis functions I needed on top of it.
The result is to the best of my knowledge the first system anywhere that gives non-specialists a way to work with this data effectively. Some of what I have includes:
- A full matrix of in, out, gross, and net migration for households (tax returns), people (exceptions), income (AGI), household size (exemptions/return), and household income (AGI/return)
- Aggregated flows at the metro-metro and metro-state level in addition to state-state and state-county
- Other summary data, such as total in-migrant returns and dozens of others, partitioned by data element, so they can easily be queried across states, years, etc
- Additional derived data such as intra-metro migration (e.g., core to suburb) and migration rates.
Let’s take a look at what some of this lets us do. And by the way, as we go through these, remember that it look less than a minute to do them.
Mapping Regional Migration Networks
One of the things I was interested in was migration between large Midwestern cities. I decided to take a look at gross migration (in+out) because I was interested in human capital circulation. Net migration doesn’t always tell you that since 10-5=5 and 10005-10000 also =5. Here’s a look at migration between Chicago metro an my other Midwest cities:
Clearly there’s a gravity model at work (size/distance relationship). I find it interesting the big gap between the top five and the rest. You can look at the evolution in the year by year view:
Interesting to see the similar patterns. I notice the significant increase in migration with Indianapolis. One could dig into the county level data to see what’s going on (is it legitimately Chicago migration, or is it some type of state thing with Northwest Indiana, which is part of Chicago metro?) in more detail. But perhaps that’s indicative of something interesting perking.
Sources of In-Migration
I’ve long noted that Indianapolis is the Midwest net in-migration champ. But does that mean it’s really a talent magnet, or it is just a mini-version of the Chicago effect, with Indy sucking in mostly people from within the state? Let’s take a look. In addition to metro-metro, we have metro-state flows, so what does that tell us? Here are the top ten net migration sources for Indianapolis from 2000-2008:
You can see that overwhelmingly the net in-migration to metro Indianapolis comes from elsewhere in Indiana. The values plunge off a cliff quickly. In fact, if you look at the full state result list and exclude Indiana, over that entire time frame Indy only attracted a bit over 8,000 people net from out of state. 84% of total net migration came from within Indiana itself. That would tend to undermine the story of Indy as a bigtime magnet.
I showed a migration shed map for Franklin County, Ohio (Columbus) before. The state of Ohio was clearly visible, with some hard migration boundaries at the state line. Some might suggest a university reason for this, so let’s do a similar one for Marion County, Indiana (Indianapolis). In a variant, let’s highlight net inflows in blue and net outflows in red.
That map speaks for itself.
I recently observed that Michigan had America’s lowest in-migration rate for all states. Here’s the list of bottom ten states for in-migration rates (per 1000 persons) in 2008:
Even if you don’t like my calculations of variables like this and prefer a different method, I make extracting the underlying data so easy, it’s a snap to roll your own analysis.
I also calculate movements within MSA, including suburb to core, core to suburb, and suburb to suburb. There doesn’t seem to be a good definition of “core” county on a standardized basis. The OMB definition of central vs. outlying doesn’t seem to work. And of course the geographic structure of metro areas varies such that you have to keep many caveats in mind when comparing across them. Nevertheless, I developed a bespoke definition of core I think works well, which is fully documented on the site. (Some places like NYC are defined as multi-core, so there is also an intra-core migration element).
Here’s a breakdown of the percentage of migration in a few sample MSAs shown in comparison:
Looking at Money
As I said, you don’t have to look just a people, you can also look at money. Here’s a map I ran before of top destinations for AGI leaving Atlanta:
Wrapping It Up
This only scratches the surface of what you can do with this data. I have extracted or generated over 100 different data elements from the IRS data sets. All of which require no specialist skills to query, merely a knowledge of how to surf the web. Of course, you have to know something about the data, what you are looking to find, and what the results mean, but at least you don’t have to spend your time groveling through thousands of spreadsheets.
The type of analysis and applications of this from a human capital perspective are incredible. Louisville showed one great example. Some researchers there did a study that found out where people who left Louisville went, then they did a road show of “Louisville Reunion” tours with the mayor and others reconnecting with the diaspora, dishing out Makers Mark, and touting the good things going on back in the ‘Ville. Want to recruit back boomerangers? Wouldn’t it be nice to know where to find them?
I hope you’ll find out the answer to this and many other questions over at www.telestrian.com.
By the way, if you are a more serious researcher who is interested in getting a full copy of this data set so that you can run arbitrary queries, join to your own data sets, etc., shoot me an email. I’m open to licensing full feeds of the data in an easy to use database form as well.