So, it’s always interesting and fun to look forward to the next Tournament of Champions. Keith Williams over at The Final Wager put a model together for the 2015 tournament, and I have adapted my Coryat game-prediction model to determine each current 4-timer’s chances of making the next Tournament.
For my main projections, I am operating under the assumption that #ToC2017 will be airing in November 2017, with a cut-off of the end of Season 33. It is a possibility that the cutoff may also be variable, depending on when the field fills with 5-time champions, but for the purposes of this projection, I am going to work with a fixed cut-off date. As of March 6, 2017, that leaves 94 games until the cut-off. This model will also assume that there will be a Teacher’s Tournament in May of 2017, with an automatic qualification spot available.
Thus, for the 15-player field, there are currently 3 automatic qualifiers (and a fourth to come) and 7 superchampions (not counting Cindy Stowell). There are also currently six 4-time champions, the top 4 currently are Tim Kutz, Todd Giese, Rob Liguori, and Fred Vaughn. Thus, I am interested in the chances of these four players keeping their spots in the next tournament.
My Approach & Algorithm
If you’ve spent much time around The Jeopardy! Fan, you’ll know that I already have a Coryat Score-based game prediction model, which predicts the length of a champion’s run, given their previous Coryat scores. I have adapted the same model as the basis of my Tournament of Champions model.
The model’s algorithm works as follows:
- Determine the average Coryat score of the current champion. For the first champ in the model, this is known. For future unknown champions, this is randomly generated from a normal distribution (with the mean and standard deviations set from game data from November 23, 2015 to the present — the start of the current qualifying period. The current mean Coryat is $14,512, standard deviation $4,351).
- Given the Coryat score, you can then determine a player’s chances of winning their next game (from the trend line formula used by the single-game predictive model), and from there, the length of their run (in this case, I am using the ceiling function on r / (1 – r) — the formula used to determine the sum of this sort of infinite geometric series)
- If the run lasts 4 games or longer, or 5 games or longer, add them to a counter (We’ll need this info later!) and continue until we have reached the requisite number of games that we wish to look at.
- At the end of a set of games, look at the number of 4- and 5- timers. If there are 5 or more 5-timers, then there will either be an early cutoff, or a 5-timer will be left out (neither of these situations are good for any of our 4-timers). If there are exactly 4 5-timers, then again, this isn’t good for our 4-timers, but there won’t be an early cut-off either. It’s when there are 3 or fewer 5-timers that things get a bit more interesting.
- If there are 3 or fewer 5-timers, then we need to see where the randomly generated 4-timers fit in on the current standings. Of course, Tim has $107,000, Todd $82,403, Rob $72,601, and Fred $65,700. I have taken 4-day running totals of the champions’ winnings, again from November 23, 2015 to the present, and taking the mean and standard deviation of that (mean: currently $76,309; standard deviation: $17,877) to randomly generate normally distributed winnings of 4-time champions. They then get compared with the current standings to determine who the last qualifier will be for that simulation (i.e. if there are 3 5-timers, and the highest 4-timer gets a score of $95,000, then Tim will be the last player in. If there happens to be only 2 5-timers and the highest 4-time score is $75,000, then Todd would be the last player in.)
- Once one set of games is finished, then re-run the simulation a large number of times, tracking the outcome each time. For my main simulation, I’ll be running the simulation for 100,000 sets of games.
- Once the full set of games is finished, then output the mean and standard deviation of 5-timers, 4-timers, as well as the number of times the simulation predicts either an overfull field, or the number of times Tim, Todd, Rob, and Fred qualify.
This is being implemented using a Python script, and will generally be updated on a weekly basis, in both the ToC Tracker page, as well as the Week In Review.
March 9, 2017 update: Version 2 of this model makes the following changes to the algorithm, all to project the chances of the current champion in the early stages of their respective runs:
- For the current champion, randomly generate a run length. For this, I have centered the normal distribution halfway between the player’s average Coryat and the overall mean, with a standard deviation of half of the overall standard deviation. Once that average Coryat is then determined, I then determine the win chances as above and use the floor function on the infinite geometric series formula given above to generate the length of the run for each simulation.
- If the run length is 5 games, add them to the counter and list them separately as having qualified (We’ll need this later.) If the run length is 4 games, generate their 4-game total (the normal distribution here is centered on their current total + an extra $19,077 for any further wins to reach 4, and a smaller standard deviation the closer one gets to 4 wins.) Once the 4-game total is generated, we slot them into our list of 4-timers, sliding the other 4-timers on our list down as required.
- We then run the rest of the simulation as before, making the comparisons, and outputting the % times that Tim, Todd, Rob, Fred, and the current champion each qualify for the ToC from our model.
Day By Day Predictions:
(Best viewed on Desktop or Tablet)
(Note: The annotations are of various champions whose runs had a significant effect on qualifying chances.)
Become a Supporter now! Make a donation to the site on Patreon!
Remember, you can also now get the following products (and others!) from our new store! Check out our top sellers! All prices are now in US dollars!
This is some beautiful number-crunching! Always fun to pore over another stat geek’s work.
Logistic regression for the win!!!
Great analysis, Andy!