Monday, January 4, 2016

Applying combinatorics to dfs ( draftkings ) NFL and NHL lineups, using GNU tools and the R programming Language (Part 1)

Fair warning, I am a lazy developer and a bit of a crazy person. If you want to know the extent, I recommend having a look at the “about me” section. If you are comfortable with that, please proceed. But you've been warned!

This article is about using bad math and poorly conceived ideas to make a guess for Daily Fantasy Sports betting, e.g. DraftKings.

Recently someone gifted me a few dollars worth of plays on DraftKings. Being unemployed with time on my hands, and knowing nothing about sports, I decided to apply my brand of math to this. I've seen nearly, every episode of Numb3rs...sort of, so I was like, combinatorics...that's the way to go.

Combinatorics is a branch of mathematics concerning the study of "finite and discrete structures."

Based off of my 8th grade understanding of mathematics, that means combinatorics is essentially the mixing of lists. In this case it's mixing lists to find the most points for the salary cap on DraftKings. You know using math to do something noble.

I'm not very good at many things, but cutting up a CSV (comma seperated values, i.e. a text file with a consistent format) is an exception, and DraftKings let's you export a CSV file with the basic stats, e.g. player position, player name, average points per game, salary for purchase in the lineup, etc.


First I cut up the CSV with good ol' awk and sed:

awk -F "\"*,\"*" '{print $1,"," $2,"," $3,"," $5}' DKSalaries.csv | sed "s/\"QB /QB/g" | sed "s/\"RB /RB/g" | sed "s/\"WR /WR/g" | sed "s/\"TE /TE/g" | sed "s/\"DST /DST/g" > nflredux


And then, to add the FLEX players:

cat nflredux | grep -P "^WR|^QB|^RB|^TE" | sed "s/^[Q|R|T|W][B|E|R]/FLEX/" >> nflredux
I did a similar thing to the NHL CSV's. If for some odd reason you need these one liners, hit me up on twitter @bsdpunk.
So I altered an R language script, the one on Stack Overflow assumes a sample size of 4, mine was expanded so it could actually find real matches with a larger sample size. To get true results you need something more than a 10 year old laptop; in fact you would probably be best off rewriting it for parallel processing and running it on a Parallella.
I believe the current version of the R programming language tries to take advantage of multi-core. Which is evident in the “lapply” function running very quickly. The slowness is due to my shitty for loop, and I am not even sure how you would parallellize that, maybe rabbitmq... I'll do that in the future if my blog generates revenue (which means I will probably never do that)
dk <- read.csv("nflredux")

dk <- lapply(split(dk, dk$Position), function(x) x[sample(15), ])



dk <- dk[c("QB","WR","RB","TE","DST","FLEX")]

15*choose(10,3)*choose(10,2)*15*4*4

rows <- list(t(1:15), combn(10,3), combn(10,2),t(1:15),t(1:4),t(1:4))

dims <- sapply(rows, NCOL)

inds <- expand.grid(mapply(`:`, 1, dims))

dim(inds)

extract <- function(ind) {

    g <- inds[ind,]

    do.call(rbind, lapply(1:5, function(i) dk[[i]][rows[[i]][,g[[i]]], ]))

}

win <- c(0, 0, 0)

for(i in 1:17000000)

{

    extracted <- extract(i)

    if(sum(extracted$Price) < 50000){

        if(win[3] < sum(extracted$Points)){

                win <- c(i, sum(extracted$Price), sum(extracted$Points))

                print(win)

  print(extracted)

        }

    }

}



print(win)


Here is my NHL script:
dk <- read.csv("thisone")
dk <- lapply(split(dk, dk$Position), function(x) x[sample(15), ])

dk <- dk[c("G","W","C","D","U")]
15*choose(15,3)*choose(15,2)*choose(4,2)*4

rows <- list(t(1:15), combn(15,3), combn(15,2), combn(4,2), t(1:4))

dims <- sapply(rows, NCOL)
inds <- expand.grid(mapply(`:`, 1, dims))

dim(inds)

extract <- function(ind) {
    g <- inds[ind,]
    do.call(rbind, lapply(1:5, function(i) dk[[i]][rows[[i]][,g[[i]]], ]))
}

extract(1)

win <- c(0, 0, 0)
for(i in 1:17000000)
{
    extracted <- extract(i)
    #print(i)
    #print(sum(extracted$Price))
    if(sum(extracted$Price) < 50000){
        if(win[3] < sum(extracted$Points)){
                #print(sum(extracted$Points))
                win <- c(i, sum(extracted$Price), sum(extracted$Points))
                print(win)
  print(extracted)
        }
    }
}

print(win)

As I said, these probably won't give you the best results, because on my shitty laptop the scripts would take far too long to run; in fact it would take longer than the amount of time from when the lineups are posted until when the game goes live. So even if I could run my scripts in that time with Parallela processing on my current hardware I still couldn't run the complete problem set. Which I probably can do...with you know, revenue.
I could do a lot of pre-calculation on the players before the game, sort of like an incremental development thing. And I actually have access to a lot of older and newer NHL stats despite most sports API's charging 360 dollars a month! THE FUCK? INFORMATION IS FREE...WE ALREADY WON THIS WAR! Luckily I know an autistic man who loves hockey and can tell me what the weather was like when the Predators stomped the Rangers.
So there's a lot more of particular type of number crunching. And I think that there's a lot of Game Theory you could use to get these results much better. Like weighing teams that are highly offensive higher, and increasing the weight if they are playing a team not considered to have a good defense. I don't know how you would mathematically determine who is highly offensive or who has a lesser defense, if someone knows a way to calculate that, or has an idea shoot it at me. You could also use something like Normal Distribution, to show a bell curve of a given player's performance. And make decisions on that.
Of course combinatorics doesn't take in the possibility of rigged/cheating matches, or players not playing to their full potential. Though it could be argued that if they play this way consistently it is reflective in the stats.
Ok, so all of this is PART 1, i.e. written before anything actually was played. I'll update you with some stats in PART 2.

***The NFL script is actually broken I will address this in a future post.

1 comment:

  1. Haha, that was a funny read! But jokes aside, if you really want a good guide to websites, such as FanDuel or DraftKings, you should probably turn to this guide

    ReplyDelete