In Part 1, we described the task of building Minecraft AI agents with open-ended behavior capable of achieving arbitrary goals to use as a testbed for AGI development. As mentioned, attempts to solve a particular problem frequently do not lead to progress towards AGI but instead lead to focusing on domain-specific heuristics and solutions. If a goal or task is narrowly defined, narrow solutions can be made that are more efficient than general solutions (as long as they correspond to the goal); and no progress toward broad AI will be achieved.
Here in Part 2, we will explore why an AGI-ish Minecraft agent is valuable, what problems it uniquely solves, and why it is valuable for solving complex real-world problems. We will dive into the following areas:
- What problems are encountered by narrow AI solutions?
- What is AGI-ish about Minecraft as a testbed for AI agents?
- What are the ways that imperative programming and reinforcement learning both create insufficient solutions for novel and complex problems?
- And an exploration into the practicality and efficiency of AGI-ish agents for effective novel problem solving, as demonstrated in Minecraft
The inability of proto-AGI systems to outperform specialized solutions is very natural. This inability doesn’t necessarily imply that the design of such systems is flawed or unnecessary. But how can one tell apart a flawed design from a good but incomplete solution? Why is AGI needed if narrow solutions are more efficient?
The answer is that we need to take into account both the efficiency of solutions themselves and also the human efforts and resources spent on creating these solutions. Creating one narrow solution to one narrow goal is simpler and more efficient than creating a generally intelligent system; but creating a robust, generally intelligent system will solve many narrow goals — an efficiency over time and effort.
We want to show that even though what we have developed so far is not a full AGI, introducing some AGI-ish features allows achieving some forms of behavior practically unachievable otherwise.
Ultimately, usage of AGI technologies will become the only practical route, in the same sense as deep learning is now generally more practical than feature engineering.
What approach would be practical for creating general-purpose Minecraft agents? The application of deep learning is not practical at the moment, as it is a data and training-intensive approach. Collecting tens of millions of frames of human playing and training deep neural networks on them requires substantial human and machine resources. And the result is agents capable of solving only a few predefined tasks in restricted conditions — not very impressive. Of course, these results are interesting from the deep RL and imitation learning standpoints but not practical for broader applications, and not scalable to highly complex situations.
One interesting direction for expanding the intelligence and capability of our agent is to expand beyond video input. Such an agent would use observations from ray and grid, available in Malmo Minecraft mod, and provide ground truth information on blocks and entities. Programming a useful general-purpose agent directly would then be the most practical approach. Even if a generally capable agent is not created, the work and the results will illustrate what is AGI-ish about this task — helping us refine our principles of AGI.
Let us try to approach the creation of an agent naively. The most straightforward method is to describe our own actions in Minecraft to the computer in terms of algorithms of behavior. When teachers define what an “algorithm” is, the traditional definition is “a finite sequence of rigorous, well-defined instructions.” While this definition is not wrong, it can be interpreted too simplistically.
In the case of a robot, one may identify “instructions” with such motion commands as “go straight,” “turn right,” etc. The next step would be to condition these commands over sensor readings and internal states and add loops. This is precisely how small children are taught to program Lego robots in Scratch. A “finite sequence of instructions” can contain loops, conditions, variable assignments, arithmetic operations, and random choices and can unfold into arbitrarily complex sequences of “mental” operations — quite complex. Yet, for the layman, the definition can still sound like “just a sequence of commands,” which are sent to motors and tell the robot what to do — simple and straightforward.
Nevertheless, this imperative programming style is also natural for creating basic Minecraft agents which can do something useful — to a point.
Running is a simple activity in Minecraft. You don’t need to coordinate a lot of motors like a real-world robot — you just need to send the command “run.” With ground truth information about adjacent blocks, it is also simple to mine resources of interest. It is therefore tempting to describe to the agent what to do in a sequence of actions — find a tree, approach it, cut some wood, craft the wooden pickaxe, dig down at a certain angle, mine some stone, etc. Each of these actions can be detailed into concrete, straightforward algorithms. Searching for a tree is as simple as running and scanning. This would not in any way be considered AGI or even AI. But this would not matter if the task is simply to create a useful agent by any means.
However, this approach to creating agents encounters issues even in simplified settings. While the agent is running, it can encounter a lava lake. We have to foresee this and describe to the agent what to do in this situation. The agent can turn away or try to get around the lava lake. Just turning away from any obstacle could lead to further undesirable actions — the agent could get caught in a cycle of turning away from one obstacle to another.
This is just one of many examples where the environment can thwart imperative programmed agents:
- If a water lake obstacle is encountered, it might be preferable to jump into it. But there can be difficulty getting out if the shore is high. While this is solvable, the agent can approach its target (e.g., a tree) while still being in the water and start cutting it, which may result in drowning or the inability to attack one block long enough to destroy it.
- While digging diagonally, the agent can encounter water, which will start flowing, and the agent can be caught in the loop of moving forward and trying to dig but being carried back by the water.
The resulting behavior looks somewhat like animal instincts applied when conditions of their applicability are not satisfied. Although simple time constraints can generally break infinite loops, this behavior will still be inefficient.
Of course, the algorithms can be improved, making them reasonably reliable for the Minecraft flat world. However, if we switch to the default world of Minecraft, there will be too many unforeseen situations. It is possible to handle all of them explicitly, in an imperative way, but it is practically infeasible. There are just too many factors, and they can have infinite unforeseen combinations. The agent can die by standing near a cactus — but whether it should avoid cacti or cut them may depend on other factors. Whether to go around a deep canyon, climb down, or just go somewhere else may also depend on numerous factors. Manual coding of reactions to all possible combinations of these factors, and all the variety of possible contexts, is highly problematic.
The algorithm can be improved, but there are always some unaccounted cases.
This type of difficulty becomes apparent when we interact with simple chatbot AIs on a website — they can answer specific limited queries, but anything outside their narrow expertise returns errors. These systems cannot handle complexity or variations and cannot work with or solve novel problems.
To replace manual case-by-case imperative coding with training, using a deep reinforcement learning (RL) model sounds reasonable, theoretically. But as was discussed previously, it doesn’t work well in practice. What is interesting to note is that such models map known situations to learned actions. While the handcrafted imperative algorithm is like an instinct, such RL models are like trained reflexes. This means that if they find themselves in a novel situation, they will behave inadequately.
The idea that intelligence manifests itself in novel situations for which no ready solution is available is an old idea in cognitive psychology. The best thing a deep RL model can do in an unknown situation is to start performing random motor commands and learn the correct way to act in similar situations by numerous repetitions. However, when the AI is motivating your self-driving car, and the novel situation is an oncoming train, “trying random actions” is hardly a satisfactory solution.
This ‘reflexive’ reaction to novel problems is even less sophisticated than the behavior of lower animals, which can detect something is wrong and start overproducing motor actions.
Beginning Minecraft players may not know that cacti harm, but they will learn it at once and be able to use this experience differently depending on the context.
A Reinforcement Learning algorithm may learn to avoid cacti, but it will require a lot of trials.
Of course, problems of generalization, one-shot learning, etc., are studied intensively in the field of deep learning. However, while deep RL models learn imperative algorithms, they will act as reflexes, which don’t understand what is happening and just follow the instructions. They will never truly understand context, environment, and problem-solving unless their architecture is changed.
“General Intelligence” is the ability to solve novel tasks or to deal with novel situations.
Debugging hardcoded skills is similar to debugging an ordinary program that displays concrete behavior in concrete situations. Except that, in an ordinary program, we don’t expect the program to start doing anything by itself which wasn’t explicitly programmed.
Flaws in the agent’s behavior could be considered ordinary bugs to be fixed, and can in fact be treated as such for simple environments, giving the impression that this approach is practical. But the number of such “flaws” grows very rapidly for even moderately complex environments. As the environment increases in complexity, the number of situations to ‘debug’ increase even more rapidly. A distinct desire arises for the agents to somehow see for themselves when they are doing something senseless.
This desire is obscured in the case of deep RL agents, although the problem remains that agents have no clue what they are doing. If a DNN learns an imperative algorithm, this can save us from having to handcraft this algorithm; but even a learned algorithm remains imperative.
Let us emphasize that this is not just a matter of ideology, a belief that agents should think like humans; this is a matter of practicality. A direct mapping from states and environmental conditions to optimal actions is a restricted subset of algorithms. For example, if we take an NP-complete problem (a problem that can take an exponential amount of time to find a resolution), such restricted algorithms cannot provide a solution to it.
These agents will be able to cache solutions for some instances of this problem discovered during training (random enumeration of actions in RL). This can be good enough for a complex but fixed task like the game “Go”. However, it scales up badly with increasing complexity, requiring vast training data; and it still cannot handle novel situations efficiently (e.g., a novel element in Go, like tweaking game board size or shape, may make a trained model act completely unsatisfactorily).
Reinforcement Learning algorithms lack the skill to generalize well beyond the region of their training set or definition, even when enough information is available.
So what kind of algorithm can handle complexity and open-ended tasks?
Humans have long dreamed of telling the computer what to do, not how. We want to give the computer not a ready unambiguous set of instructions that it can mindlessly execute, and we don’t want it to mindlessly try random actions until it finds a good enough sequence. We want it to understand what it is doing and why.
This does not necessarily mean a guaranteed output of an unambiguous sequence of actions to achieve the task. Some trial-and-error may still be necessary. Further, if the agent constructed a precise plan, and continues executing it after something has gone wrong, this will look no better than the execution of an imperative skill in non-adequate conditions.
Nevertheless, the very possibility for an agent to infer necessary actions depending on its current goals or desires and present situation makes the agent’s behavior much more flexible. It also removes the necessity for the agent to have ready responses (either hardcoded or pre-trained) for various combinations of multiple factors.
The following video demonstrates problems encountered by imperative agents. We believe that the spirit of AGI can be better understood through their example — solving each task or problematic situation manually or by brute force is unrealistic, and preparing and annotating sufficient training data is unscalable, even within simplified virtual worlds. Overcoming these limitations in practice is aligned with making agents more AGI-ish.
Our agent described in the previous post has explicit goals, which can be set both by a human or by the agent itself for exploration purposes. Its behavior immediately creates a different impression than an imperative algorithm agent — it looks purposeful rather than mechanical. Of course, the ability to describe what to do is not enough for the agent to act successfully. Both “how-to” knowledge and imperative skills are needed and they can be integrated together in a neural-symbolic architecture. This combination already helps to create a more useful agent, which can autonomously explore the Minecraft world for hours; something that is difficult to achieve with other algorithms.
This is just the first step — this agent doesn’t yet have a global representation of the situation and may not notice that it is doing the same in some abstract sense. The assessment of the purposefulness and effectiveness of actions does not apply to elementary actions within low-level skills now, so the agent cannot improve its skills deliberatively.
An agent must anticipate the results of its actions, and compare them with reality. Introducing such a feature is essential, not only to make the agent fancier in terms of cognitive architecture-centered AGI but in order to move forward in creating a general-purpose Minecraft agent in practice.
An overly-simplistic understanding of algorithms causes an impression or even belief (especially among non-programmers) that human behavior is not algorithmic. However, this impression starts fading when the algorithms are getting more and more complex and less imperative.