Next month: January 2013
After some usual testing (i.e. removing the latest completed draw from my data files and running through the code as if it were a draw day), the script now can grab all the main Lotto, Thunderball and Hotpicks numbers/results in a single run. I do wonder if I shouldn't merge back the Euro Millions code into that script as well! The newest script is now cron'ed up for the draw evenings and will be run every 2 minutes.
The first test is tonight for the main Lotto and Thunderball, which I'll still be doing the live updates for. I probably need some way to improve those live updates as well so that it's all done via the Web rather than a mix of Web and command line.
The reason for all this intense work is that I want to just have to do "manual" (live) updates on a Saturday night only and everything else will be auto-updated without requiring me to be present. In other words, almost mirroring exactly what the BBC have done - no more live or recorded TV coverage except on Saturday night from now on. If they're allowed to do this, so should I!
In case you're moaning that I'm "stealing" results from other sites, I'm only doing exactly what I've been doing for many years manually, but in a scripted fashion from now on, plus the info is publicly available with no login required either.
If you also object to me "battering" other sites, there aren't many pages loaded and they're all done serially. The worst case is Wed/Sat when it grabs 5 pages every 2 minutes from the UK official site for the main Lotto, Thunderball and Hotpicks. For Euro Millions grabs, it's only 1 page per site (5 sites in all) every 2 minutes, which is nothing really.
Also, it's important to note that as soon as the all the info I need has been obtained for a particular draw, I no longer access any sites that provide just that draw info for the rest of the draw evening. I will also be slightly adjusting the frequency of requests so that the smallest scraping period can be created without missing any results.
Just added newsflash generation for the main Lotto, Thunderball and HotPicks draws, which I'd completely forgotten about. Unlike the somewhat complex Euro Millions newsflash generation where bits of the data come in at different times and I have to produce different newsflashes as appropriate, these newsflashes were much easier.
The new newsflash code just announces the full results when they turn up, with the only tricky bit being whether it's for the main Lotto draw, the Thunderball draw, the HotPicks draw or any combination of the three. This is because Camelot don't "early release" just the winning numbers like the Irish, Spanish and Portuguese sites do for the Euro Millions for example. Instead, Camelot hold back any info until it's all known for a particular draw and then publish it all at once.
Talking of newsflashes, I've now added a 3.00pm cron job that will add a Tuesday, Wednesday, Friday and Saturday newsflash announcing when the winning numbers, full results or the Saturday live TV show will take place. I get the TV show start time automatically from digiguide.tv (which I do actually pay a subscription to, but you don't need it for BBC One listings).
Tonight's run was a mixed bag as I probably expected. The Thunderball results came through first at 9.43pm and was actually a perfect update in a total of 43 seconds thanks to yesterday's fixes. The main draw at 9.49pm wasn't so lucky - whilst most of the data was grabbed, it looks like the prize amounts didn't scan properly and they were left as zero in the data file.
It looks like I'll have to put some simple validation on those (e.g. minimum values I'd expect for each prize amount and ditto for the prize winners too) - if any value fails validation, both the winners and prize amounts should all be zeroed (i.e. ignored for that run).
I disabled the cron job, fixed the data file and created a manual newsflash and then re-generated all the pages. Strangely, further testing later on in the evening resulted in a perfect update for the main Lotto data file. It again makes me suspect I again got one of those "error processing this page" messages you get briefly as Camelot update the full results page. Hopefully some validation should ignore that error and pick up the data in the next run 2 minutes later.
The prize tier validation code is now in place - all prizes in either the main Lotto or Thunderball must be greater than or equal to slightly below the all-time lowest prize in each tier. If any aren't, all the prize and winners tiers for that draw are zeroed and therefore ignored until the next run. This should mean that tonight's zero prize amounts in most tiers of the main Lotto can't ever happen again.
The cron job has been re-enabled and the next real run for any of the grabbing code is on New Year's day of course, where we'll see if my multi-Millionaire Raffle scraping actually works or not (probably not, judging by these first runs!). At least we've had 100% successful runs of the Thunderball and HotPicks data grabbing, so I'm half way through the automation so far. I'm aiming for next Wednesday for the final fixes and then it should be able to be left unattended after that.
Remember that I'm manually inspecting the data updates and running the page generation by hand tonight, so although I'm giving times the data was grabbed, it might be several minutes later before the pages are updated.
The first coding issue tonight was that one of my shell array variables wasn't set and caused a test statement error, but that was easily fixed. I also put the wrong directory name in for the real newsflashes, so I had to cut and paste a newsflash statement in manually for the first two updates (that's now also fixed).
The Irish site produced the European individual jackpot prize of 26,703,677 Euros and we hit the first serious bug of the evening. The problem with the Irish site is that it announces that jackpot figure (along with the winning numbers) and nothing else! Not the number of jackpot winners or any other lower tier info and this kicked in an issue where the UK jackpot winners are set to 1 and the UK jackpot prize to 0 in my data file - I've always used this as a way to tell that the UK jackpot prize info isn't available yet.
When manually updating, I know to keep the European jackpot prize set to zero as well, but the Irish update overrode that and "1 UK jackpot winner of £0" predictably turned up on the home page. I edited the European jackpot prize field to zero and re-generated the pages.
At 9.12pm, the Swiss official site came up with the full results, which included no Euro Millions jackpot winners. This showed a flaw in my code as well because of the lack of UK results - the full results wouldn't be shown until I zeroed the number of UK winners and put in the estimated UK jackpot prize (which is the exact Euro figure * the exchange rate from last Tuesday). I did all this and re-generated the pages.
Needless to say, the cron job kept putting the wrong info back in the data file (resulting in all sorts of crazy EM jackpot info on the home page!), so I've disabled it now and am running it by hand. I'm looking into the home page jackpot announcement code as we speak and once that's sorted, the cron job can return.
I tracked down the code that produced the "1 UK winner of £0" on the home page - it needed to suppress any announcement until the UK jackpot prize was non-zero - nd also the statement that there were no winners "including 1 from the UK"! Cron job has been re-enabled.
It looks like we had two updates to finish the evening - the UK results came in around 9.42pm and the Portuguese pan-European ticket sales figures at around 9.46pm (it could have been either way around because I was fixing code at the time). Of course, my newsflash was wrong about announcing that the Thunderball full results were available (they still weren't around 10 minutes later), so that's a fix I'll be putting in. It's because the EM and Thunderball scripts are separate and I don't check the progress of the second from the first.
Another bug surfaced at the tail end, this time in the Thunderball grabbing, which came in at 9.55pm. Some of my number parsing returned zero for some of the balls and I didn't even bother checking for this :-( That'll need to invalidate all 6 balls if any of them are zero. I also need to do more country-specific logging so I can actually work out which country site did the first updates of each type during the evening. Not a great first real run, but it'll be better in the future, honest :-)
I think I've fixed all the Euro Millions autogen bugs now - found a small one on the individual draw page when the UK jackpot prize isn't known (I have to remove the cost of 1 Euro in sterling). The tardiness of the UK official site w.r.t. Euro Millions results is quite annoying. It's not only the slowest of all the official sites to update, but it does hold up a fair chunk of info (the European winners and European jackpot prize amount in Euros) from being displayed at the moment (they are grabbed and in my data file though!).
I may have to see if I can code a workaround for this, because I normally update those figures manually well before the UK figures are released. Oh and I added specific country/update messages to the logging, so I now know exactly which country updated what data and what time they did it (I only record the first to do so).
I've also fixed any remaining issues with the Thunderball autogen, mainly by validating every ball (1-39 for first 5, 1-14 for the 6th) and if any are outside their permitted range, the whole lot are ignored. I think this happened because Camelot have a bizarre "Error processing this page" message whilst their system is updating the full results page and my page grabber may have got that and tried some mad parsing of it. Whilst I was at it, I've moved the final cron job to just before 11pm, since all the info for both draws tonight turned up before 10pm.
That's a wrap for Friday - if I have the strength tomorrow, I'll start working on the main Lotto auto-update too. Trickiest bit of that is probably calculating the ticket sales that aren't on the official Camelot web site (even though they used to be and Camelot should start publishing them again!). I use a reverse calculation formula with iteration to close in on the sales figure, but it's never been coded to be put in my data file automatically.
First up are the Euro Millions draws (yes, there's the main draw for that and then the raffle of course). I picked these draws because they're most technically complex to automatically generate numbers and results for, involving scraping 5 Web sites, merging data and generating newsflashes as the scraping progresses.
Obviously, I spec'ed up the process as bullet points on what I do manually each Tuesday and Friday night and then coded it all up over the space of several days. I won't describe the full process here, but here's what it will do:
To test of all this, I simply created a new data file minus the latest completed draw, set the configs to think it was a draw day and then ran the code with some bits changed (e.g. put the newsflash in a temporary dir away from the normal one, don't re-generate the pages at all). I did these test runs with grabbinig from just one site at a time and then eventually grabbing from all 5 on one go. Once I'd got a new data file with the latest draw results exactly inserted as I'd done manually for that draw, then I was fairly happy that the next draw (tomorrow) would do a good job.
What I'll do for tomorrow's Euro Millions is just run the script manually several times during the evening and I've also disabled the page generation so I can inspect the updated data file and confirm it looks correct before running the page generation manually. I need to do it this way because the 5 sites are regularly updated during the evening and I need to see that my code will handling the varying HTML correctly, no matter what state it is in.
Once any corrections are done for tomorrow's draw, I will then cron up the script to run automatically from next Tuesday onwards. I will still manually check it on Tuesday and probably the following Friday, but if it looks good at that point, I'll leave it to run "unattended". And, yes, it does call the raffle script I talked about below if it detects that the raffle number on the official site is a link to a page of numbers rather than just a single raffle number that it usually is.
Update: Using much of the code for the Euro Millions automation, I have now automated Thunderball too, although this was much easier since I'm only scraping two pages on the UK official site (the CSV winning numbers/machine/ball set file and the HTML results page). I have to careful about Camelot wrongly claiming that the Thunderball jackpot prize is £0 when there's no winners of it (it's actually £500,000 - this is a mistake they've made for years on their site), but I have coded for this of course.
Yes, this leaves just the main Lotto left to automate and I will be tackling that very shortly. Once that's completed, the site will effectively "run itself", though obviously I will have to keep an eye on the automation since any slight changes to the URLs or their contents will break the updates. I will still be doing the live Saturday draw updates since there's still a live Saturday TV show, but from 2013 onwards, everything else will be automated.
With the Christmas Day and New Year's Day having 25 raffle jackpots each, there would potentially be 50 manual edits (albeit done with a cut'n'paste of the raw numbers initially to avoid mistakes). Hence, the new script saves me having to do all this - it's a simple file include operation (the C strings are saved to a temp file) after running the script on each of the two days. Of course, Camelot could change the format of the official page to flummox me :-)
Update: I've now coded the raffle number grabber more thoroughly to be automated. It now creates a separate header file with the C code for each multi-jackpot raffle draw and a master include file that then includes all those generated headers. The original C code that had a long set of per-draw case statements now just includes the master file and that's it.
The updated script will now only update headers that don't have a complete set of numbers in them (important, because I've got to keep old headers since Camelot trashes any lottery info more than 6 months old from their Web site!) and the script is now run every 5 minutes between 9.00pm and midnight on Tuesdays and Fridays. This means that tonight's 25 raffle numbers will appear automatically on this site within 5 minutes of the official site publishing them (and ditto on New Year's Day of course).
Previous month: November 2012