Next month: February 2013
In the end (after "no jackpot winners" wrongly appeared several times for a minute at a time), I disabled the cron job, updated the data file manually and then spent a few hours debugging the problems. Rather stupidly, I spelled "Portugal" as "Portugual" when looking for country names on the official Irish site, so it was even allocating 2 winners to Belgium and none to Portugual in my code! Yes, that typo has been fixed now.
I also didn't have any code to divide Camelot's jackpot prize pool figure in sterling by the number of European jackpot winners when there's no UK jackpot winners, so this code has now been added. A quick run through with individual site scraping and then all-site scraping seems to have fixed the bugs, for now at least.
In fact, at 9.16pm last night, it correctly extracted from the Irish official site that there was indeed one jackpot winner. However, whatever data it grabbed at 9.44pm from the Portuguse official site overrode that and switched the data file back to no winners. The order I scan sites is important, because the first one scanned to return information is the one that takes priority (it's me chickening out from writing "majority rules" code, which bit me badly here of course).
I have increased the logging somewhat now, so that whatever data is extracted is now displayed for every update of the data file (unfortunately, the number of European jackpot winners wasn't logged, but it is now). I've also added a little more validation to the European winners data, which may help.
And, yes, I re-ran the new code with the fixes and it works fine with the final data on the various official sites I scrape. Remember that I write out the data obtained after each cron job run and don't rescan it in later runs, so any bad then good data from a site won't correct itself. I suspect the Portuguese site bad data was fixed by that site later on Friday evening, but I didn't pick it up.
However, the buffoons at Camelot often get their full results table wrong when there's no individual prize winners in a tier. They occasionally also zero the prize amount for any no-winners tier, which is completely incorrect and, yep, they wrongly did this for the 5+bonus tier. I did a reverse sales calculation and came up with a figure of £708,782 which is what's in my data file now. As I type this a full 3 days later, Camelot have still not corrected their site to state the unwon 5+bonus individual prize!
It gets worse, because I have minimum prize value validation (if any prize is below slightly lower than the all-time lowest, all prize and number of winners tiers are zeroed on the assumption that the data is unreliable). Needless to say, on Wedneaday this code kicked in on Wednesday because of Camelot's woeful "zeroes across the board" on the 5+bonus tier. I had to do some manual updates to sort this out. I've now changed my validation code to allow "0" as a valid prize value if the number of winners is also "0", since Camelot moronically think that's correct! I may phone Camelot today to see if they'll give me Wednesday's unwon 5+bonus prize figure that's missing from their Web site.
There was no major problem with tonight's autogen updates, though the newsflash incorrectly stated that the HotPicks full results were available when they weren't (the main Lotto ones were!). I've fixed this bug already. The main Lotto results were pretty freaky tonight - all the numbers (even the bonus) were under 28, so the date-loving public matched them in droves, leading to high numbers of winners and low prizes in all the tiers. Perhaps the most shocking was only getting £4 more for matching 4 numbers compared to 3!
There was an interesting jackpot prize rounding issue with the official Irish Euro Millions site tonight. Although the jackpot prize is always an exact multiple of one Euro (i.e. the Eurocent field is ".00"), the Irish site has managed to display a figure (34,191,507) that is one whole Euro less than all the other non-UK official sites that show the prize in Euros (they show 34,191,508.00). I've used the higher figure since multiple sites all equally disagreed with the Irish site.
Hence, a new draw file data line was added with zeroes for the main Lotto winning numbers whilst also containing the number of HotPicks prize winners for Wednesday. This resulted in question marks being displayed for Wednesday night's main Lotto ball graphic for a short while. The Thunderball numbers came in at 9.21pm and just as I disabled the cron job at 9.25pm having got back home, the main Lotto results came through and I edited those in manually to play it safe.
The fix I've now applied for the HotPicks issue is simple - until the main Lotto winning numbers are known (i.e. they are non-zero in my data file), nothing from main Lotto or HotPicks (the latter uses the main Lotto numbers anyway) will be displayed from either draw that evening, even if info like winners/prizes/machine/ball set is available.
There were no bad data updates at all tonight, but the remaining Euro Millions issue is that although it scraped the winning numbers in ascending order at 8.38pm (Spanish site), the numbers in drawn order at 8.46pm (Portguese site), the number of European winners in each tier at 8.50pm (Swiss site), the jackpot prize amount in Euros at 8.58pm (Irish site) and the pan-European ticket sales at 9.32pm (Portuguese site again), it wasn't until 9.44pm when the hopelessly tardy UK figures came in from Camelot that I could show anything more than the winning numbers (inc. drawn order) and the fact that there no jackpot winners, with the latter only stated in a newsflash as well.
The reason for the paucity of info for almost an hour is that it was actually quite a tricky thing I used to do manually at the point the European winners and the European jackpot prize figure (in Euros) came though but there was no equivalent info for the UK winners/prizes yet.
I'd take the previous draw's currency conversion rate of pounds to Euros (which I'd copy in for this latest draw too until the UK results came in with a more exact figure), use it to get the sterling value of the individual jackpot prize, set the number of winners (if the winning countries weren't known from the Irish site that point) in all countries to zero except France who I assume win the lot :-)
The pages were then re-generated with that info and a newsflash explaining that the French winners is just a guess. If the Irish site then updates with the winning countries, the country split would be fixed, the newsflash changed to remove the "guess" statement and the pages re-generated again.
When the UK results then come in, I have to re-calculate the currency rate and then re-generate the pages once again. Because the UK site is so tardy, at this point we always have all the info we need and the updates are over for the night. I think the reason I ducked out on this "limbo" code is that I have to fill in a rough UK individual prize (the currency always fluctuate between draws), the previous draw's currency conversion rate and an intermediate newsflash statement about the French guess.
I then have to redo all those three bits of info again when the UK figures are through (whereas my code by default assumes that if a previous run of the script stored any info, it shouldn't ever be changed - this will have to have an exception for the UK stuff it looks like). Anyway, the upshot is that this would the last major scraping change needed and we'd see all the info bar the UK stuff by 9.00pm next time once I've completed the changes.
For the record, the official site updates tonight were 9.19pm for Thunderball winning numbers (42 seconds for page updates), 9.21pm for Thunderball full results (57 secs for updates), 9.31pm for main Lotto full results (83 seconds) and 9.34pm for HotPicks results (61 seconds). The newsflash indicating main Lotto results were available wasn't updated until the HotPicks results came in, so I probably need to look at that, but otherwise I'd say a pretty good evening of updates.
I'd never really looked at the official site on a Wednesday prior to 10.30pm and it was interesting to note that even though the full results were in by around 9.30pm and the site was announcing a rollover for Saturday's draw, you incredibly aren't allowed to buy tickets for that rollover until 8.00am on Thursday morning. A massive sales opportunity missed by Camelot there surely?
My favourite bit - 90 minutes after all the results are on the official site, the embarrassingly wrong caption "Lotto is closed while we are drawing today's lucky numbers!" is still on their pages. More like "we can't be bothered opening Lotto sales because we're going to spend 9 hours offline each day doing a cold database backup because we haven't learned how to sell tickets continuously using hot backups". Yes, you read that right, you can't buy lottery tickets online a staggering 37.5% of the time - the sales opportunities lost by this don't bear thinking about.
As is usually the case, I disabled the cron job and did manual updates of the data file. I've now fixed all the issues above and run it through the usual tests and it's looking a lot healihier. One slightly dodgy piece of code left is the allocation of countries to the winning jackpots - the Irish site is only one that lists these, but not an easily parseable way.
Hence, I've allocated one winner to each country mentioned and if there's any winners left over, they get added to the first country in my data file order that is named by the Irish site (which doesn't include the UK and has France first, so that luckily worked for tonight!). I will try to tidy this up in the future, but it'll have to do for now. The cron job has been re-enabled, so we're now looking at Friday evening as the last "attended" run before I leave it to its own devices.
Previous month: December 2012