Blog - Data Is Life
The personal blog of Roger Filmyer, focusing on the interaction between policy and data analysis/statistics.
Last November, Strava released a feature called the Global Heatmap on their labs page. This weekend, a number of security analysts used this map to show the importance of military device security. When focused on countries like Djibouti, Syria, or Somalia, the map clearly shows Western military outposts that might otherwise be nondescript airstrips, near-impossible to find in vast desert expanses… So much cool stuff to be done. Outposts around Mosul (or locals who enjoy running in close circles around their houses): pic.
Ever had a hard drive crash and burn? Backblaze, an online backup company, has a little over 6 a day. It’s a small fraction of the 35,000 they have up and running right now, but it means they are constantly logging performance and diagnostic metrics for their hard disk arrays. Conveniently, they have also decided to release this data to the public in a set of CSV files, with a script to import them into an SQL database.
Payroll employment rises by 280,000 in May; unemployment rate essentially unchanged (5.5%) http://t.co/1Y9cSWJUIB #JobsReport #BLSdata — BLS-Labor Statistics (@BLS_gov) June 5, 2015 Swing and a miss! But this time it’s a surprise in the opposite direction - I was 80,000 jobs too short. Here’s what the distribution of guesses looked like for this month’s go-around: One thing I’ve noticed is the remarkable consistency of the average guess. I’ve plotted out all of the rounds of #NFPGuesses from February to today, and notice how average has come out at around 225,000 each time.
Yesterday, John Bohannon of io9 posted an article, “I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here’s How.”, that went absolutely viral. Bohannon ran an expose on the pervasiveness of bad nutritional science and bad nutritional science reporting by creating his own bogus study. It’s a fascinating read showing how the system breaks down if you can get through a few initial barriers. But I wanted to talk specifically about p-hacking, about what John & Co.
(Spoiler: It’s not. Despite a very strong correlation) Source: Spurious Correlations at tylervigen.com When I was in my Econometrics class at college, my professor drilled into me the “Ten Commandments of Applied Econometrics”, from an influential paper of the same name by Peter Kennedy. These rules apply as much to econometrics as they do any statistical modelling exercise: Thou shalt use common sense in economic theoryThe “Common Sense” that Kennedy (and by extension, Professor Khemraj) talks about is truly basic methodology.
Last weekend, a post on Reddit’s linguistics subforum showing a was a big hit. This map used a metric called Greenberg’s Linguistic Diversity Index, which is the percent chance that two random inhabitants of a given country have two different mother tongues. States like largely-homogeneous South Korea and Haiti have low scores (0.003 and 0.000, respectively), while places like Tanzania and Papua New Guinea, where every village might speak a different language, have LDIs of 0.
If there’s anything that 23andMe, last.fm, Strava, or any of those countless facebook apps have shown us, it’s that we love analyzing our own data and discovering new things about ourselves. A great source of data is your iTunes library. If you’re anything like me, you listen to music constantly- at home, at work, or on the go. With iPods (and iPhones) having been popular for over a decade, iTunes could potentially have data on a significant portion of your life.
I’ve been trying to get RStudio Server to build on my iMac all of yesterday. In my opinion, it’s the best IDE for R, and being able to run it on another computer remotely is icing on the cake. My Samsung Chromebook with crouton really doesn’t have the “oomph” to… well… do anything meaningful. But building has been anything but a trivial process, and I want to post here to document some pitfalls I’ve found myself running into… this isn’t a well-documented process.
The United States has heard repeated calls for more gun control legislation in the wake of the Sandy Hook Elementary School shooting. Every day it seems there’s a new mass shooting, with dire implications for the state of our country. But these mass shootings are isolated events that have almost been tailor-made to provoke disproportionate media attention. The day-to-day assaults, kidnappings, and murders affect a lot more people. Liberals claim that gun control makes places safer by making guns harder to obtain and be used illegally.
A few days ago, I found a really cool project on Twitter called OpenElections, which is trying to create a master dataset of every certified election result in the US. It’s gotten a chunk of critical acclaim, including a grant from the Knight Foundation. Unfortunately, the work isn’t easy. If you’re lucky, you’ll get an excel sheet. But often times you’ll get a bad-quality scan of an image like this…
Condo construction in Brickell, Miami. southbeachcars on flickr A few weeks ago, Stephen Smith (who runs Market Urbanism) was comparing the fates of Miami and Vancouver, two cities that have experienced massive housing construction booms. Both cities have grown tremendously… and grown upward. This comes in the face of major land constraints- the Everglades for Miami, and the Cascade Mountains for Vancouver. But how much do these barriers actually impact development?