Learning Machine Learning with Gin Rummy

I’ve been trying to teach my computer to play Gin Rummy. Actually, it’s a simplified version of rummy that uses 7-card hands to simply race to “gin”. For the rules, see below. My hope is to learn more about machine learning, a long-time interest of mine that has recently become accessible to duffers like myself through some great frameworks and high-powered CPUs.

I decided to try Gin Rummy because I was inspired by Alpha Go. This has sent me down the path of Reinforcement Learning, where rather than learning patterns from examples, the machine learns behavior through a system of rewards and penalties meant to express the objective of the learning. The machine’s learning algorithm is one which maximizes the reward. I have been using RL4J, part of the deeplearning4j family recently made part of the Eclipse Foundation.

As it is Springtime and thus a season of optimism and generation, I will say here that I hope to write a few blog posts on this topic over the warm weather. I’ll postpone the details until those thing-with-feathers posts; here I will just offer a couple of high-level observations.


First Observation

The first is that machine learning is still very general-purpose – meaning that you have a lot of work to do to get it ready to do what you actually want. Some problems, like image classification, are well-trod pathways with a lot of tools and code either already-working or at least formerly-working. Others – like card games – don’t necessarily have the same amount of support. A lot of work went into wrestling my problem into a form that seemed like it ought to be machine-learnable.

I started by building a simulator that could operate my simplified Gin Rummy game, with pluggable Strategy components for each player. Then I wrapped a Neural Network in such a strategy. This was great for playing Gin Rummy, but learning was a different matter: in this situation I needed a Markov Decision Process, where somehow a “State” would be managed by an external learning engine which would invoke my code after a decision had been made, to assess the associated reward/penalty. Let’s just say it took a lot of testing to get it right.


Second Observation

The second is that when things don’t work – in other words, the kind of learning you are looking for is just not happening – there are between 2 and 2.8 zillion parameters that can be adjusted to change how the Neural Network works, how the learning works, or both. There are a plethora of different types of Networks, each with its own 0.7-1.2 zillion parameters. Without guidance, it can really be hard to even know what questions to ask.


Third Observation

The third is that it’s a hilly solution space out there, with more local minima than are dreamt of in your philosophy. You can stoically wait for your stochastic learning injection to blind-man’s-bluff it’s way out if you like, but I prefer to get these things addressed while I am still living – so I resorted to a genetic algorithm, spinning up hundreds of models at a time, training them for a while, and selecting the ones that exhibited significant improvement. This was probably the biggest hands-on breakthrough I have achieved so far – combining ML/NN with genetic algorithms to more effectively explore the solution space.

Post is too long already, so I will just leave you with my fourth observation – I wish my stuff ran faster.


Ed Schwarz

Ed has been delivering software systems for an undisclosed number of years. Before co-founding Gorilla Logic, he was Director of eBusiness Consulting at Sun Microsystems, and back in the day he was on Wall Street exec’ing and tech’ing at Lehman Brothers and Moody’s Investors Services. Ed lives in New York, so don’t try anything funny.

Related Articles

Ready to be Unstoppable?

Partner with Gorilla Logic, and you can be.