Agent entity registry

How much flexibility should be given to the environment (problib design)? Should the user be able to register their own “entities” and use those dynamically later on, perhaps by assigning certain agents to certain entity types? This for example would allow the end user to define their own entity logic when substitution is possible (e.g. different types of cars in a traffic simulation). Compare this to more restrictive envs like Car Racing, where the user has no control over what type of car is being used and is not concerned with any entity details; rather, they must provide an agent to control that entity which they will be a part of the environment by nature.

I think the env and/or gym formulation should be as general as possible, allowing for the dynamic registry of entities. However, when refined, more restrictive environments like the one given above are needed, one can simply wrap up the one-to-one registry logic in a packaged env without a problem.

There are a few input combinations to consider:

  • Agent and Entity: the Gym maps the agent to entity internally, propagating the agent actions forward to that entity inside of the env. The env makes use of that entity’s update() method, which essentially defines how that entity is meant to respond to certain actions (different cars might have different accelerations, different tire traction, etc).
  • Agent: assign the agent to the single canonical entity, or initialize it randomly amongst a single entity class (e.g. a random cell position in a grid).

Essentially there should be different levels of completeness to environments, starting at the most flexible:

  1. Entity types can be defined by the client, and registered dynamically along with Agents (must inherit the Entity base class).
  2. Predefined entity types are the only ones that can be registered. For example, in a car racing environment “racer agents” may only be allowed to use the car given to each racer by the env. This isn’t really here to be restrictive, so much as it is to take the load off of a client that shouldn’t need to define their own entities.

Meet Entity Spaces. They allow a designer to compactly define the entities that are allowed in the environment, even when the end user doesn’t have access to the objects directly.

I was getting caught up thinking allowing an Entity to have an update() makes the Env redundant. After all, what’s the point of the environment if all it has to do is call update() on all of its entities and pass along actions? Turns out, there a lot more that it’s responsible for. For example, imagine Pacman. The env allows the Ghost and Pacman entities to define how they move in accordance to the requested action; they each have their own tightly scoped set of internal attributes to modify, and can respond in whatever way the client desires. Perhaps the Blue ghost doesn’t listen to his action half the time! Allowing this flexibility enables each entity to respond differently to the actions requested by the agents. ACTUALLY this is a BAD EXAMPLE. That action decision is OUTSIDE THE SCOPE of an entity; if an agent’s decision process defers half the time, that should be encoded in the agent decision process (maybe?!). A more reasonable example might be a difference in speed between ghosts. Perhaps the Blue ghost can only move half as fast as the others. This certainly IS outside the action mechanism’s control and is purely a factor of the environment. Now once these entity updates have been made, are we done? No, not at all. The environment is then responsible for updating food booleans bases on these entity positions (the food being a static, non-agent part of the environment), as well as keep track of the current score, among other things. There is more to the global state than just a collection of agent states: there are the fundamental underlying rules of the world that have to be implemented inside of tick(). How the entities themselves respond to actions is just a factor that the environment is happy to relinquish to the entities themselves, and that reaction deserves to be defined by a client if desired.

Having Entities work this way simply allows a designer/client to have the ability to extend the functionality of the environment by tweaking an internal update mechanism that should be editable. It gives an extra degree of freedom, increasing the value of a set of environment rules. Allowing the update mechanism isn’t “overbearing” or “too deep” (since the env now has nothing to do); it simply defers logic to something that a client can define while still being understood by the internal env and its other rules.

One problem with allowing the entity’s response to actions to be defined by a client is it makes things difficult to regulate. How do I ensure that the entity update conforms to game rules? A dummy object? Check here for discussion: Simulation constraint system.

Thought about what happens if we have tons of agents, but they all are really just using some large, expensive decision process that we can really only have one of (say, a big neural net that eats up a lot of memory)? Then it’s no big deal, just have the agent class reference that single instance so all instantiated agents use that one existing object (or maybe make a query to an API). That way, nothing really changes, and we semantically allow the creation of many agent objects as more or less light-weight action/state trackers. Otherwise there really wouldn’t be a point in having multiple agent objects if they’re all just use the same decision process. This still makes me question the design, and if there should just be one “agent brain” per decision process.

  • I would argue in general it makes sense for there to be separate agents; imagine the multi-agent model-based reinforcement learning. Every agent needs to maintain an internal model according the observations that have been made so far for that agent. I suppose this isn’t much different from just storing state/action histories; despite the internal model or other information varying, they still share the logic behind the action mechanism.
  • I think I’m also falling off here. You don’t have to worry about method definitions taking up too much space because you’re creating many objects of that class. This is just silly; the class defines the methods to be used for all instances (if instances methods), and at runtime the interpreter need only refer to these canonical copies of the method logic and apply them appropriately.

I wondered how Berkeley Pacman went about applying agent updates once it knew the action they wanted to take. In the main game loop, Pacman is “agent index” 0, and all other indices indicate ghosts. So agents are queried for an action, and that action (as well as the agent index) is sent through to a generateSuccessor() function that is responsible for generating the next game state (note that agent’s actions are taken one at a time i.e. each action generates a new state). This function then sends the action, along with the current game state, to a PacmanRules.applyAction(state, action) method, which basically ensures the provided action is a legal one, moves Pacman’s position according to the action at the game speed (i.e. one block), and then checks if the Pacman is currently on a piece of food, removing that piece if it is.

  • So overall here we can see that essentially there are agent/entity class specific rules that are defined separately from the main game loop for updating agent information in the state. These is roughly analogous to the entity update() methods, except the method gets access to the entire game state and can check other rules defined inside of the PacmanRules “entity”.
  • For me this a hard design choice. It means somewhat scattering the access the game state, along individual entities to modify the state directly and determine what is valid. While this can be okay and built out in a responsible manner, I’m not really sure if it’s the most logical choice. If I were to a take a more restrictive approach to the above Pacman update, it might look like this:
    • Entities are only allowed to make updates to their own state, and even these updates may not stand. So the action is submitted through to the Pacman entity (analogous to the PacmanRules object).
    • The entity updates its own internally stored position (e.g. some .x .y attributes it stores).

Consider the D3.js update pattern for managing the registry. I think it matches perfectly; for new data elements we need create new DOM elements, delete DOM elements when data elements are removed, and leave currently paired data-DOM elements alone. This, in conjunction with agent-entity class maps, could yield a winning formula for adaptive, automatic registry/maintenance of entities and agents throughout a simulation lifetime. The question now I suppose would be how to handle more manual situations? For example, it’d be relatively straightforward in the Cellular automata env to just register the same agent and entity type on initialization. But what to do when I want the env to start with one cell having a different value than the default?

An Approach

Currently under consideration are agent and entity class registration, as well as maps between agents and entities by default. What does this mean?

  • Agent registration is the simple process of telling a Gym about an agent class by providing the class and the name to be used for that class: gym.register_agent_class(AgentClass, 'agent'). The exact same process applies for entity registration.
    • Note that this can also be done using so-called “agent spaces” and “entity spaces”. Upon creating a gym, you specify a dictionary of <name>: <Class> pairs for both agents and entities, and can later specify maps using these names.
  • Agent entity maps are links between an agent class and an entity class. When a mapping is set like gym.register_mapping(AgentClass, EntityClass), whenever an agent of type AgentClass is registered, it will be assigned a corresponding EntityClass entity inside of the env.
  • After more thought and development, a refined approach for environments has been detailed in a new Zettel: Environment registry
  • Same goes for the Gym: Gym registry