Bit Of A Hack

Home Archives

Keep up to date with Bit Of A Hack...

Tracking busses by scraping the web [Permalink]

Pebble bus time app

So, here's a bit of a hack...

The bus services in and around york all report live times for most bus journeys. There is a crude website that lets you look up what busses will soon be arriving at a particular bus stop. I wanted to have a similar thing on my Pebble smartwatch. It would be so handy to just look at your watch and see "University 4, Morrell Library, 5 mins" and know instantly when the bus home was getting here.

This is the site:

So I found the stop number for the stop across the street from me and typed in the number, the results were spot on, you would see a count-down for all busses arriving at the stop in the form of number of minutes until they get here. When they are due you see the word 'due'. Perfect. Turns out that people who provide the service of giving bus users this data have to pay for it. It is free until a point where you make too many API requests and then they want money to let you in so I thought I'd just piggy back on somebody else who was already subscribed to this system, in this case WYMetro by means of a scraper for their site.

If you want to use the official channel then you have to ask these guys really nicely about their 'TNDS' system:

Anyway, back to piggy backing on WYMetro... They have recently updated their live timing site to be less text-scraper-friendly but you can still get at the old site with this link:

It's just a hack to get a less graphic mode. Once there I requested details for a stop, looked at the responce source and started writing a scraper.

They've done a really good job of making this bit easy to scrape. I wrote a php script that I could run on my own server that would request the timing page from their site and generate a JSON object with the times in so that I could write a really simple Android app that would read in the JSON, parse it, get the details of the next bus and send it to a custom Pebble watchface.

The data on the site is in an HTML table. The first thing the scraper does is look for the start of the table:

$infostart = strpos($data, "<table") + 133;

Then it finds the end and does some basic checking to make sure the data is valid and then takes a copy of all the data in the table tag:

$infoend = strpos($data, "</table>");
if($infoend > $infostart){
    $length = $infoend - $infostart;
    $table = substr($data, $infostart, $length);

once that's done it's a simple case of walking the table and getting the values. These are then printed as JSON data.

The script also produces some time data and some nice error messages if things go wrong. You can give it a go right here, here is a link to the live data for the bus stop at the library for the University of York:

If you want to try out a different stop then you need its NaPTAN number. (National Public Transport Access Node number) You can get some info about the NaPTAN database from the department for transport here: Or if you just want to look up a NaPTAN number for a bus stop, any of the ones you find using this page will work: NaPTAN numbers are quoted on most of the bus stops in York and usually start 3290. Other places tend to start with other numbers. I have tried it in York and in Wakefield and they both work but that's it. I can't promise this works anywhere else.

If you want to get the app on your Pebble (and you have an Android{TM} phone) then you can go here:

It is free, don't worry. The app lets you assoicate a stop with a location, if you have multiple stops enabled then it will use the phone's location services (if enabled) to figure out which of the selected stops is closest to you and then display the information for that stop.

This is the full php script...


        //URL to scrape
        $url1 = "";
        $url2 = "&textonly=1&pda=1";
        //get the requested stop from the get request parameter 'stop'
        if (isset($_GET['stop'])){
            $url = $url1 . $_GET['stop'] . $url2;
            //load data from URL.
            $data = file_get_contents($url);
            //Check to see if the "No busses" text is shown.
            $nobus = strpos($data, "There are no departures in the next hour from this stop.");
            if ($nobus != false){
                //AHA! No busses
                echo "\"error\":\"no_busses_within_hour\",\n";
                //if we get data back
                if (isset($data)&&$data!=""){
                    //Find the start of the data
                    $infostart = strpos($data, "<table") + 133;
                    $infoend = strpos($data, "</table>");
                    //check valid
                    if($infoend > $infostart){
                        //get length and extract data from table
                        $length = $infoend - $infostart;
                        $table = substr($data, $infostart, $length);
                        //print the JSON object 'data'
                        echo "\"data\":[\n";
                        $stilldata = true; //(Loop var)
                        //while the table still has data to parse...
                            //get each of the four columns
                            for ($i = 0; $i<=3; $i++){
                                $thisstart = strpos($table, ">")+1;
                                $thisend = strpos($table, "</td>")-6;
                                if ($thisend > $thisstart){
                                    $thislength = $thisend - $thisstart;
                                    $thisdata = substr($table, $thisstart, $thislength);
                                    //Add the data to the JSON object under the right name
                                    switch ($i){
                                        case 0:
                                        echo "{\"route\":\"" . $thisdata . "\", ";
                                        case 1:
                                        echo "\"dest\":\"" . $thisdata . "\", ";
                                        case 2:
                                        echo "\"est\":\"" . $thisdata . "\", ";
                                        case 3:
                                        echo "\"floor\":\"" . $thisdata . "\"}, ";
                                        $thisend +=5;//accound for a tr end tag
                                    //remove the parsed data from the table
                                    $table = substr($table, $thisend+11);
                                    $stilldata = false;
                            echo "\n";
                        echo "],\n";
                        echo "\"error\":\"no_error\",\n";
                        echo "\"error\":\"scrape_error\",\n";    
                    echo "\"error\":\"empty_get\",\n";
            echo "\"error\":\"no_stop_specified\",\n";
        "date":"<?php echo date("d/m/y");?>",
        "time":"<?php echo date("H-I:i");?>",
        "unixtime":"<?php echo time();?>"
By .

comments powered by Disqus

This website uses cookies. If you don't like this, please stop using this site.