You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bandrobot test, which is one of a demo in ONA, is aiming to test the multistep event inferencing/subgoaling of ONA reasoner (by NAL-7 & NAL-8 temporal/procedural inferencing)
The scene generated by ASCII art is like:
+++++++++++++++++++++|
---------------------|
A |
o |
'''U'''''''''''''''''|
This is a singleplayer game and the main goal is to controll the robot A, pick the ball o and drop it into the bucket U.
In this game, ONA is expected to learn the procedual knowledge from comparing frequency of beliefs (corresponding to the relative position between the robot and the ball/bucket), which is logical represented by inference rule { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (+ P)>(t_frequency_greater) and { <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (= P)>(t_frequency_equal).
Using the representation of relative position, ONA is able to learn "pick when the position of the robot is equal to the ball, drop when the position of the robot is equal to the bucket, move left/right to make the position between them equal", thereby provide a proof of that ONA has a efficient procedual learning machanism (sensorimotor intelligence).
Problem
As the title says, although ONA can have high performance on this game by self-learning currently.
However, if we change the random seed of the whole reasoner, it might be seen that the high performance of ONA in this game is accidential:
If the reasoner not babble "precisely", the robot can't learn any effective knowledge to achieve the goal.
Although the robot achieve the goal by coincident, if the second goal satisfaction arrive later, the "right knowedge" represented by temporal implications will faded out and the reasoner will fall back into the "random babbling without decisions" status, like the accidential experience of success is never happened.
Pictures
The successful case on mysrand(666)
Failing cases on mysrand(667) and mysrand(668)
The text was updated successfully, but these errors were encountered:
ARCJ137442
changed the title
The high performance in the bandrobot test maybe accidental
The high performance in the bandrobot test may be accidental
Oct 10, 2024
I agree, robust learning is not achieved for this particular example.
I also have a test script which runs it with different seeds to evaluate it, I can commit it soon.
Part of the problem is that by design of this experiment, reward can only obtained in the very rare case that the object at the right position is picked up and then dropped at the target location, which is a rare occasion with motor babbling and when it happens there are tons of other hypotheses to weed out.
The solution will be to take what we learned from NACE and add the corresponding curiosity model to ONA: https://github.com/patham9/NACE
Another immanent change: the numeric representation is the initial incomplete one that has been experimentally added.
In the meanwhile there is a solid implementation of numeric spaces which allows the system to both condition on concrete values and to perform comparisons between numeric measurements.
With this new numeric value handling learning also seems way more robust: http://91.203.212.130/AniNAL/demo_complex_continuous_verbal.html
Background
The
bandrobot
test, which is one of a demo in ONA, is aiming to test the multistep event inferencing/subgoaling of ONA reasoner (by NAL-7 & NAL-8 temporal/procedural inferencing)The scene generated by ASCII art is like:
This is a singleplayer game and the main goal is to controll the robot
A
, pick the ballo
and drop it into the bucketU
.In this game, ONA is expected to learn the procedual knowledge from comparing frequency of beliefs (corresponding to the relative position between the robot and the ball/bucket), which is logical represented by inference rule
{ <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (+ P)>
(t_frequency_greater) and{ <{S1} |-> [P]>, <{S2} |-> [P]> } |- <({S1} * {S2}) --> (= P)>
(t_frequency_equal).Using the representation of relative position, ONA is able to learn "
pick
when the position of the robot is equal to the ball,drop
when the position of the robot is equal to the bucket, moveleft
/right
to make the position between them equal", thereby provide a proof of that ONA has a efficient procedual learning machanism (sensorimotor intelligence).Problem
As the title says, although ONA can have high performance on this game by self-learning currently.
However, if we change the random seed of the whole reasoner, it might be seen that the high performance of ONA in this game is accidential:
Pictures
The successful case on
mysrand(666)
Failing cases on
mysrand(667)
andmysrand(668)
The text was updated successfully, but these errors were encountered: