This morning, I took a look through the junk drawer in my kitchen, and it wasn't pretty. I found quite a few useful things, including my car keys, a $5 bill and a Starbucks gift card. But I also found things that were of no use to me whatsoever: keys to a car I haven't owned in 5 years, 100 loose toothpicks and a pack of gum that appears to be from around the turn of the century.
As I was cleaning up the loose toothpicks one...at...a...time, a thought occurred to me: "big data" is a lot like the junk drawer. Having a lot of data (or stuff in the drawer) isn't what's valuable; being able to organize the data in a way that helps you get things done is where the real value is.
Our team of Data Engineers and Data Scientists at MapQuest are working hard on two main themes that will help turn your junk drawer into valuable data that helps you make better decisions and solve real world problems.
The first theme is adding context to the data. The location data we gather has information about latitude, longitude, altitude, speed, bearing, and much, much more. All of this information is useful, but we can make this information even more valuable by adding context. Knowing that the GPS trace was observed at 40.4870679,-106.8361813 is interesting, but few of us think in terms of latitude and longitude. When we add context to the latitude and longitude, we know that the location is Bob's Conoco Station on Lincoln Avenue in Steamboat Springs, Colorado. We know Bob's Conoco Station is in an area that has a high concentration of businesses, and restaurants in particular. We also know the income and age statistics for the people who live in the area. This context is significantly more interesting to many of our customers than a simple lat/long.
The second theme is something I like to call "better than real". Whenever you collect large amounts of data, whether it's location-based or not, you're going to see some degree of inaccuracy. MapQuest data gets its location information from GPS, a constellation of satellites that orbit the earth. While GPS is typically very accurate, there are a few challenging areas, including urban canyons, underpasses and tunnels, that often produce inaccurate location data. The inaccurate readings we see in these scenarios are real, in the sense that the numbers don't lie. But users don't always want to see what the GPS says happened, they want to see what actually happened.
For example, we often see GPS traces that "drift", so they appear away from the road network. If the person is going 75 mph but the lat/long puts them in a corn field, do we have a daredevil farmer on our hands? That's certainly possible, and it's also possible we've found a newly constructed road. But more likely we've found a trace that has drifted and needs to be snapped to the interstate that's 10 meters away. There's a lot more going on behind the scenes so we make intelligent assumptions, but by making these assumptions, we make the data more useful because it is a better representation of what actually happened.
And this is just the beginning! We'll continue to build off of these capabilities throughout 2017 in order to make your data better and to help you make more informed decisions, so stay tuned!