April 17, 2008
Statistics and Travel
Posted by rootdir under FOTW | Tags: web, travel, google, prediction, statistics |No Comments
Recently Google, oddly having to play catchup with Microsoft, released a feature within their Maps Tool that more or less predicts what speeds will generally be obtainable when driving on certain roads at a certain time of day. Let me start off by saying the amount of data to be crunched on this is great and wide.
Fortunately most of the work has already been done for Google, Microsoft, and any other players. Why am I taking a shit on this instead of praising it? There is a government agency that has been around for a while called the BTS, or in a longer phrasing: the U.S. Department of Transportation, Research and Innovative Technology Administration, Bureau of Transportation Statistics. They specialize in data that could be used to extrapolate these prediction algorithms. Actually on second thought, they already figured all of this out. The equations have been written out for a number of years, I remember reading an research paper on this when I was about 18 or 19 (not that long ago actually). Yes, I am nerd, I subscribe to the BTS Journal. And why not, its free.
There is an enormous amount of data that gets published by the Govt that covers these topics but you have to know where to look, which is why I’m surprised this hasn’t been seen before on any number of Traffic geared sites. Anyway, back on topic…
Google’s results are are pretty good, but they aren’t perfect. Any Portlander will tell you that I-5 around the Vancouver-Portland transit area is pretty much screwed from 4:30 to 6. Seriously, you’d be better riding a bike, I have many times enviously glared at a biker while waiting at an I-5 on ramp. Likewise I can personally tell you that 26 W around Cedar Mill’s is not in 50 mph+ condition during rush hour. And maybe that is why they only provide predictions in 15 minute blocks. With the amount of variables involved its impossible to predict anything like this without more real time data. A problem with predicting these types of systems is that the data you have is worthless without someone who is actually following the plan:
I’m not sure or not if Google loaded up their company cars with GPS trackers and sent
them out during traffic conditions to test the system but this becomes a necessary step. Data would need to further be enriched and not just based on data stream reports of traffic, as probably provided by Traffic.com to Google (just like Microsoft), but from actual drivers in traffic. Dash Express is a navigation unit that is working to solve this dilemma because each of their users enriches the data and prediction for future trips. From this point of view Dash could actually become more valuable from the data they provide than from the systems and subscriptions they sell to users.
This brings me to my final point. An age old quandary is: why is the line I am in the longest? Answer: because you are in it. Any variable that isn’t fixed, like human input and performance, is something that can never be perfectly calculated. The line is longest because you are physically “one more”… the average people in a line (x) becomes x + 1 when you join it, which will statistically be more than any other line over an amount of time.
The issue of traffic prediction can bring this rule into play: I have 500 users using Google traffic in the morning to predict their morning commute and maybe 10,000 who are not. The 500 may look at the data and say, “well, this road has historically been 20 mph and below so I will take this alternate route” and proceed to use their alternate routes. Those who were originally traveling on the alternate routes then see all this new traffic and make a benefit decision “should I stay or should I go now (ok, now I’m humming the song dammit). So, unless there was only the roads for which traffic prediction was available, ie a closed system, Google’s algorithms, like BTS’, will never be able to model us.
All said I’m damn glad that more companies are thinking of this issue and applying their skills. No offense to Microsoft but Google will probably be the first company to tap into Dash’s userbase, being the more savvy of the two. Neither may be able to model us perfectly, but tracking everyone (lets hope this avoids the License plate issue in StreetView? M’kay) may be the best solution to this problem.
At least until computers control the traffic. Let’s hope they arent running Vista.
I am not a professional statistician, which makes what I said above, from a practical standpoint, worthless. But that also means that if I’m wrong I probably won’t really care anyway. *wink*






