My sister seems like a die-hard-fan of rubik’s cube. I googled about it and found some algorithms to solve it here.
I am interested about implementing PURR-PUSS machine learning algorithm onto this. It would be really interesting to know whether a machine can replicate the same algorithm or perhaps discover some other algorithms. I think, one of the most challenging aspect of PURR-PUSS designer is to design the best associative template for the rubik’s cube.
Should the events are registered as the position of different colours on 3 different face?
How can we represent the associated actions? 1 global action type? or different types of rotational action based on the rotation axes?
Assuming we have perfect event-action type representation, how can we organise the association template? Is a template with current step state of the rubik’s cube associated to a certain action is suffice? Should we consider the past events, because at least 1 previous event can affect the associated action?
For now, this is what I think.
Assuming that we have Red Green Blue block on a 3 dimentional 3×3 rubik’s cube, most important information is as following:
———-
| a | b | c | <- one side with position at {a,b c, …} corresponds to R G B value.
— — —
| d| e | f |
— — —
| g | h | i |
— — —
3 event types:
top[n] := (a,b,c,d,e,f,g,h,i)
left[n] := (a,b,c,d,e,f,g,h,i)
right[n] := (a,b,c,d,e,f,g,h,i)
3 action types:
rotateHorizontalToLeft[n] := (rowPosition, amountOfRotation=1,2,3)
rotateVerticalToUp[n] := (columnPosition, amountOfRotation=1,2,3)
atm, I think following single step association template is suffice:
top[n], left[n], right[n] =>rotateHorizontalToLeft[n] / rotateVerticalToUp[n]
PP will be rewarded when all 3 sides have same colours.
based on this, I would optimistically expect PP brain capable of learning the system after a very huge number of simulation. Just to create a LTM map, there’s way too much possibilities in the world. Optimising the LTM path to ensure a reward will be very intensive as the LTM network is very big.
So, I think, would be it possible if we create a mini-reward system whereas the system will be rewarded with mini-reward signal when the association meet one of those induced rubik’s cube algorithms. Then, I might be able to partition the LTM based on the mini-reward nodes. Now, the LTM footprint might be smaller as we have series of LTMs like LTM[k=1:n].




