Remember me

A new breed of robot

Published on 21 October 2008

Search the magazine archive





By Michael Felsberg

A new breed of robot

The challenges faced by artificial cognitive system design and defines the progress made through the COSPAL Project initiative.

Designers of artificial cognitive systems (ACS) have, until recently, tended to adopt one of two approaches to thinking robots: classical rule-based artificial intelligence or artificial neural networks. However, a new breed of cognitive, learning robot developed through the European Union-funded project COSPAL (COgnitiveSystems using Perception-Action Learning) combines the best of both worlds.

The classical approach to artificial intelligence (AI) relies on a rule-based system, in which the designer supplies the knowledge and scene representations, obliging the robot to follow a set decision-making process.

Biologically inspired artificial neural networks (ANNs), on the other hand, rely on processing continuous signals and a non-linear optimisation process to reach a response, which, due to the lack of preset rules, requires developers to carefully balance the system constraints and its freedom to act autonomously.

The problem is that, used individually, these systems have major shortcomings when it comes to developing advanced ACS architectures. Classical AI cannot solve them if it has not been pre-programmed to do so, while ANN is too trivial to solve complex tasks.

To combat these shortcmings, researchers in the COSPAL project used ANN to handle the low-level functions based on the visual input their robots received, and then employed classical AI on top of that in a supervisory function. This fusion of systems enables the robots to explore the world around them through direct interaction, creating ways to act in it and controlling their actions in accordance. This harnesses the strengths of both approaches, creaming off the AI superiority in functions akin to human rationality, and ANN’s superiority in performing tasks for which humans would use their subconscious – things like basic motor skills and low-level cognitive tasks.

The most important difference between the COSPAL approach and most of what had been the state of the art is that ACS is scalable. It is able to learn by itself and can solve increasingly complex tasks with no additional programming.

There is a direct mapping from the visual precepts to performing the action. With previous systems, if something in the environment changed that the low-level system was not programmed to recognise, it would give random responses, but the supervising AI process would not realise anything was wrong. With the COSPAL approach, the system realises something is different and if its actions do not result in success it tries something else.

The shape-sorter

This trial-and-error learning approach was tested by making the COSPAL robot complete a shape-sorting puzzle but without telling it what it had to do. As it tried to fit pegs into holes it gradually learnt what would fit where, allowing it to complete the puzzle more quickly and accurately each time.

This puzzle was supposed to be solved using an industrial robot (Stäubli RX90), a side-view camera, and a camera mounted to the end-effector; both with a resolution of 1024×768.

Since the puzzle board was lying flat on the ground, the problem was not a full 3D problem but rather a 2+D problem, where the (relevant) height-position of objects consists of three different levels: on the ground; lifted; inserted. Solving this task using an engineered system is believed to be straightforward using standard methods. It is worth mentioning that such a system would probably fail if something from outside the model space happens. For COSPAL the shape-sorter scenario was to create a system that can learn to solve the puzzle starting from as little prior knowledge as possible.

The system has to learn through imitation – otherwise there is no way to make the system learn at all. The system knows how to close and open the gripper. This includes also a change of the vertical position over the ground, i.e., closing the gripper means lowering, closing, and lifting it again.

Bootstrapping

Bootstrapping (in which simple visual percepts are discovered first and then more complex ones built on top) has been applied to accelerate the learning of basic capabilities. Without bootstrapping, the system would start to learn such basic capabilities as visual servoing – a closed-loop control mechanism with visual feedback – and the concept of objects by explorative learning, where positive reward is given whenever consistent actions are performed. This is a lengthy process used by infants, who learn hand-eye coordination (corresponding to visual servoing) and object constancy. The following bootstrapping was performed before the system started its ordinary, incremental learning:

  • Visual servoing
  • Object-gripper relation
  • Object-hole relation

The latter two relations were bootstrapped in combination and both processes deserve further explanation.

In the visual servoing mechanism, the controlling system successively reduces some error obtained from visual feedback until the final goal position is reached. This is in contrast to open-loop systems, which compute a goal position and approach it in a single movement. Visual servoing can be sorted into variants, depending on whether it is based on image coordinates or coordinates in 3D space, and whether it uses knowledge about camera calibration and inverse kinematics or direct control within a visual-motor model.

Within COSPAL, only image-based, direct visual servoing was applied, since engineered models were to be avoided where possible. The overall robustness of the system, i.e., its capability to deal with unforeseen situations, is improved in two ways by this strategy:

  • No camera calibration – if the (side-view) camera is moved, the visual servoing control adapts to the new geometry.
  • No explicit inverse kinematics – even for robots without joint-feedback or partially modified configuration (e.g. weight), the visual servoing control adapts to the current situation.

Using a direct servoing method requires constant learning regarding the robot control, but in particular the initial visual-motor model needs to be estimated. To accelerate this initial estimation, the model is bootstrapped by moving the robot to a variety of configurations in an effort to generate learning data for the visual-motor model. This means that the major part of the model is acquired in a batch-learning process.

Object relations are essential for formulating the concept of objects and goals of actions. Objects without relations to their own system, i.e. the gripper, are functionless and there is no reason to build a concept for them – they belong to the background.

Turning around this argument allows for learning the appearance of objects: the manipulator moves randomly and moves objects by chance. The resulting change of appearance helps to register the objects, as such, and their location relative to the gripper.

However, similar to the visual servoing, fully random exploration would take too much time in practice and the process is accelerated by putting the objects into the manipulator and then moving around them.

It is similar for the holes: The system could learn the proper relation of objects to holes by random exploration, but the likelihood of insertion by chance is extremely low. Instead, it is better to start with the objects in the holes, and perform the opposite action, moving the object away from the hole, to establish the relation between objects and holes.

Hierarchical learning

After these two bootstrapping steps, the system can start to acquire further competences by incremental learning. There needs to be some intrinsic moment in the system to learn, which corresponds to motivation in people, but fortunately this can be forced into a technical system as a part of the engineered principles. Presumably, it is enough to impose some moment to imitate what the system has observed. This moment leads to learning hierarchical competences according to the following steps in a complexity chain:

  1. Reproduce the same action
    The system observes a teacher doing the following action sequence: a) approach object 1; b) align gripper to object 1; c) grasp object 1; d) approach hole 1; e) align to hole 1; f) release object 1. Note that these single actions are not programmed into the system, but are shown by the teacher to the system.
    What the system observes is about the following: a) put the gripper in relation to object 1 according to side-view camera (here the bootstrapped competences are required); b) put the gripper in relation to object 1 according to end-effector camera; c) close gripper (prior knowledge); d) put object in relation to hole 1 according to side-view camera; e) put object in relation to hole 1 according to end-effector camera; f) open gripper. By repeating exactly the same observations, the system starts to imitate the taught sequence.
  2. Generalise to different position
    In the previous example, ‘object 1’ and ‘hole 1’ have been identified to a large extent by their spatial position rather than their visual appearance. If both are now moved to different positions, the system has to find them through random exploration: unsuccessful trials will lead to reduced likelihood for similar actions and successful trials increase the likelihood.
    Eventually, the system will repeatedly go to object 1, independently where it is located. The system has therefore replaced the identification by spatial position with identification by appearance. A similar transition happens for hole 1: after some false trials using the wrong holes or other objects, the system starts to identify the correct object to hole relation. The system has learned to put object 1 into hole 1 independently of the position.
  3. Generalise to different objects
    Object 1 is now removed (after it has been inserted successfully) such that there is no object 1 to approach. The system selects randomly another object (success) or hole (failure). Eventually, the empty gripper is most likely moved to an object. If an object is in the gripper, the teacher reports only success for the fitting hole, and the system generalises to use the bootstrapped object to hole relation for the object currently in the gripper. At the end of this stage, the system has learned to insert an arbitrary object into the appropriate hole. Due to the intrinsic moment to imitate the initial action, the system will continue until all objects are inserted.

Robustness and scalability

Several attempts to distort the running system have been made. For example, objects have been moved while they were approached. In the worst case, the system fails in its initial trial and either repeats the same action or picks another object and returns later to the first one.

One might now argue that the chosen scenario was too simple to be useful, but due to the generic bootstrapping, the robot arm could be replaced with any other actuator, the objects can be replaced, and even the relations between objects and holes (goals) can be replaced.

The COSPAL system is generic enough to be used for many other assembling tasks or, as it has been shown during the COSPAL project, to control a radio-controlled car. In this experiment, the system learned to steer the car towards coloured balls. The actuator was now the car, the objects were the balls, and the sub-goals were to get close to the balls. This has been achieved by simply replacing the bootstrapping and learning examples.

The radio controlled car was the starting point for a new project on cognitive systems, where the setting is changed to a dynamic one with interacting agents and a more serious demonstrator. In the DIPLECS project (Dynamic Interactive Perception-action LEarning in Cognitive Systems, www.diplecs.eu), an artificial cognitive system is supposed to learn appropriate and adaptive assistance for drivers; the system warns the driver if required and if the system expects the driver to accept the warning. Furthermore, partly autonomous control of a radio-controlled car using generic, learned recognition and sub-goals, is planned to be demonstrated based on the COSPAL architecture.

Dr Michael Felsberg is work package leader in the EU project MATRIS and coordinator of the EU projects COSPAL and DIPLECS. This work has been supported by EC Grant IST-2003-004176 COSPAL.

www.cospal.org.



Comments

All comments

You need to be registered with the IET to leave a comment. Please log in or register as a new user.

Toolbox

Comments on this article

  • Artificial cognitive systems versus classical system design

    For the definition of artificial cognition, the major principles of an engineered intelligent system design have to be looked at. An engineered intelligent system is the implementation of a complex world model, which gets some input or measurement from the world and registers its model according to acquired data, easy to understand examples of which are below.

    Consider for example a chess computer: The inputs are the positions of all pieces (a discrete state space) and the model contains all the chess-rules in a suitable representation (mostly an algebraic formulation). The output is the next move (a state transition). Another example of an intelligent system is a ball-catching robot-arm. The model, based on Newton’s laws of motion, gets measurements for the ball’s position from some sensor (a computer vision system). The system generates a control signal for a robotic hand to catch the ball.

    The two examples show essential differences of problem formulation: The chess computer lives in a discrete state space, i.e., the whole chess world can be described in an error-free way by a finite amount of data. The chess computer has to solve a mapping problem from the input space to the output space, which consists of a finite (although extremely large) number of possible options.

    This is different for the ball-catching robot: measurements are continuous and contain errors, either by random deviations from the true value or by outliers caused, for example, by failures of the sensor. Hence, the model has to be registered against unreliable or even absent data in a continuous setting in order to generate appropriate control signals – an area where control theory has achieved much progress. Nevertheless, the structure of the problem remains simple since it is assumed that the whole relevant part of the world (ball and robot) can be described completely by the system model.

    Artificial cognition comes in when system design is extended from uncertain data to uncertain models. If reality deviates from the engineered model, either slightly, due to model inaccuracies, or entirely, because the model is wrong or incomplete, the system must still produce sensible output.

    Some might still object that this is an engineering problem, since the model can be made more accurate and detailed to cover all problematic cases. However, this would require complete knowledge about the problem including all possible complications.

    This is possible, in principle, for all cases of systems acting in a discrete world, like the chess example, but it would mean describing mappings between infinitely large numbers of possible measurements to control signals in the second example. The second mapping itself cannot be described in a finite way; it cannot be implemented in a technical system. The real world is so complex that it cannot be modelled completely in a technical system.

    Fortunately, it is not necessary to model the whole world – the only requirement is that the system acts ‘sensible’ for input from the real world. This creates another problem of terminology for artificial cognition: what is a sensible action? Due to the absence of a complete modelling, it is impossible to explicitly define the meaning of sensible, however it can be defined implicitly based on the assessment of a teacher. These assessments can only be done during the lifetime of the system, i.e., the system has to learn ‘on the fly’ while solving problems.

    Since the teacher should spend as little time with the system as possible, the system must make the best possible use of already acquired competences and should learn from single examples. If the system does not know how to combine available competences for a particular task, it makes a random selection and performs exploratory learning.