[FOM] NIM/Deep Learning
Joe Shipman
joeshipman at aol.com
Sun Oct 15 22:37:34 EDT 2017
Binary coding of nim pile sizes is cheating if the bits of the codes are accessible. If they aren’t, then I would expect this to be a difficult problem because of the “global” nature of the XOR function, but DeepMind could probably do it in the same way they created a Go champion.
― JS
Sent from my iPhone
> On Oct 15, 2017, at 3:48 PM, Harvey Friedman <hmflogic at gmail.com> wrote:
>
> My original posting on NIM and AI
> http://www.cs.nyu.edu/pipermail/fom/2017-September/020612.html spurred
> some other FOM postings, and my followup posting is at
> http://www.cs.nyu.edu/pipermail/fom/2017-September/020619.html
>
> Since then, I have been in correspondence about it with Scott Anderson
> and a specialist in Deep Learning, John Schulman. John has kindly
> given me permission to post his correspondence with me and Scott on
> this topic.
>
> FROM JOHN SCHULMAN
> October 14, 2017
> 6:11PM
>
> I think it'd be a fun project to try the latest algorithms on Nim and
> see what happens. It'd serve as a good unit test for whether one has a
> convergent algorithm or not.
>
> My first thought was that it'd be really hard to define the inputs and
> outputs in a way that are compatible with neural networks--variable
> sized objects are always hard to deal with--but in this case there's a
> neat solution.
>
> You deal with the variable sized input (game state) by using a
> recurrent neural network and feeding it one pile at a time, and you
> deal with the variable sized output by defining your neural network as
> a function that maps game state -> scalar score. This approach (called
> using an "afterstate") was used by TD-gammon, the neural network
> trained to play pro-level Backgammon back in the early 90s.
>
> Here's pseudocode for the network that evaluates the board state:
>
> x_i: binary vector representing i-th pile
>
> h_i: real vector, network’s internal state
>
> score: real scalar, score assigned to game state
>
> Update, Score: One or two layer neural nets
>
> theta_run, theta_score: parameters of neural nets
>
> h_0 = Zeros(k)
>
> h_1 = Update(h_0,x_0,theta_run)
>
> …
>
> h_n = Update(h_n-1,x_0,theta_run)
>
> score = Score(h_n,theta_score)
>
> Here, the Update needs to perform the XOR operation, and Score just
> needs to sum up the bits to get the nim-sum. XORs aren't easy for
> neural nets, so this might be hard to learn, but it should be doable
> with enough data.
>
> TD-Gammon, AlphaGo, and OpenAI's Dota bot all are trained using
> self-play--you play the agent against itself or a randomly-selected
> older version of itself, and then you update the agent as if the other
> player is fixed.
> This converges under some conditions which aren't satisfied in
> practice, but empirically it works pretty well.
>
> John
>
> MY REPLY:
> October 15, 2017
> 3:23AM
>
> Thanks for getting this to me!
>
> I had previously started a discussion of NIM/deep learning on the FOM
> email list. I don't think the discussion got the attention of deep
> people in deep learning (smile), but there are a lot of readers with a
> casual interest and some awareness and some knowledge of deep learning
> who do read FOM regularly.
>
> http://www.cs.nyu.edu/pipermail/fom/2017-September/020612.html
>
> http://www.cs.nyu.edu/pipermail/fom/2017-September/020619.html
>
> By the way, the second link here at the end has some NIM variants that
> I have a feeling might be "humanly deep", and perhaps while humans try
> to wrap their heads around them, or such things, we can run the
> machines on them in tandem, and compare their progress. Man v. machine
> on them as the years go by!
>
> E.g., maybe what happens is that until the humans make the ultimate
> breakthrough in these NIM variants and related games, the machines
> crush us. After that, we crush them!
>
> E.g., maybe the humans are completely clueless until the NIM variants
> get small after a lot of plays, and maybe the same for the machines,
> but the machines can crush the game by exhaustive search of the game
> tree at sizes where the humans are clueless.
>
> MORE FROM JOHN SCHULMAN
> October 15, 2017
> 12:49PM
>
> I did some preliminary experimentation: https://github.com/joschu/nim
>
> The first question was whether one could train a neural net with
> gradient descent to approximate the nim-sum: elementwise XOR of a
> variable number of input bit vectors.
>
> I tried using two different neural network structures, and both worked
> straightforwardly, though one learned 5x faster than the other. The
> last training run (bottom of the ipynb file) corresponds to up to 10
> 5-bit vectors.
>
> Learning the optimal strategy purely with self-play (as in TD-Gammon
> or the Dota bot) would be far less direct and would probably take a
> huge number of samples, but I see no fundamental reason why it
> wouldn't work, given a neural network that is able to approximate the
> nim-sum.
>
> I don't think it's that surprising that we can approximate the optimal
> policies for Nim or its variants with neural nets. I think the
> ultimate challenge would be to have a neural net that actually
> searches for and verifies the mathematical rule--i.e., does some kind
> of approximate Solomonoff induction. There have been some attempts to
> do this kind of things (see "neural turing machines", "reinforcement
> learning neural turing machines", etc) but there's a ways to go.
>
> Scott--not sure if successful learning of Nim would be paper-worthy
> for the machine learning community, unless there's something
> nontrivial and general that goes into the solution. (Maybe the result
> would be more interesting to other fields?) But at the very least, it
> could be a fun blog post.
>
> Harvey Friedman
> _______________________________________________
> FOM mailing list
> FOM at cs.nyu.edu
> http://www.cs.nyu.edu/mailman/listinfo/fom
More information about the FOM
mailing list