I’ve been trying to teach my computer to play Gin Rummy. Actually, it’s a simplified version of rummy that uses 7-card hands to simply race to “gin”. For the rules, see below. My hope is to learn more about machine learning, a long-time interest of mine that has recently become accessible to duffers like myself through some great frameworks and high-powered CPUs.
I decided to try Gin Rummy because I was inspired by Alpha Go. This has sent me down the path of Reinforcement Learning, where rather than learning patterns from examples, the machine learns behavior through a system of rewards and penalties meant to express the objective of the learning. The machine’s learning algorithm is one which maximizes the reward. I have been using RL4J, part of the deeplearning4j family recently made part of the Eclipse Foundation.
As it is Springtime and thus a season of optimism and generation, I will say here that I hope to write a few blog posts on this topic over the warm weather. I’ll postpone the details until those thing-with-feathers posts; here I will just offer a couple of high-level observations.
The first is that machine learning is still very general-purpose – meaning that you have a lot of work to do to get it ready to do what you actually want. Some problems, like image classification, are well-trod pathways with a lot of tools and code either already-working or at least formerly-working. Others – like card games – don’t necessarily have the same amount of support. A lot of work went into wrestling my problem into a form that seemed like it ought to be machine-learnable.
I started by building a simulator that could operate my simplified Gin Rummy game, with pluggable Strategy components for each player. Then I wrapped a Neural Network in such a strategy. This was great for playing Gin Rummy, but learning was a different matter: in this situation I needed a Markov Decision Process, where somehow a “State” would be managed by an external learning engine which would invoke my code after a decision had been made, to assess the associated reward/penalty. Let’s just say it took a lot of testing to get it right.
The second is that when things don’t work – in other words, the kind of learning you are looking for is just not happening – there are between 2 and 2.8 zillion parameters that can be adjusted to change how the Neural Network works, how the learning works, or both. There are a plethora of different types of Networks, each with its own 0.7-1.2 zillion parameters. Without guidance, it can really be hard to even know what questions to ask.
The third is that it’s a hilly solution space out there, with more local minima than are dreamt of in your philosophy. You can stoically wait for your stochastic learning injection to blind-man’s-bluff it’s way out if you like, but I prefer to get these things addressed while I am still living – so I resorted to a genetic algorithm, spinning up hundreds of models at a time, training them for a while, and selecting the ones that exhibited significant improvement. This was probably the biggest hands-on breakthrough I have achieved so far – combining ML/NN with genetic algorithms to more effectively explore the solution space.
Post is too long already, so I will just leave you with my fourth observation – I wish my stuff ran faster.