Development page for the Probability Library project

TODO | project:problib status:pending

  • Polish up cellular automata env (2021-01-26 23:59)
  • Implement virus related sims (ncase + 3B1B) !!!
  • Implement simple Monopoly env !!
  • Devise problib documentation scheme and update for recent changes !!
  • Define model base structure, port old NN into system !!!
  • Detail structure from Gym and RLLib for inspiring/improving sim module !!!
  • Public early release, refine library focus !!
  • Add sampling techniques !
  • Consider creating a command line CLI tool for basic interaction !
  • Add lessons from 516 final project !
  • Resolve evolution interface
  • Add MLE and MAP
  • Buy up and/or
  • Create diagrams for documentation and pages


  • Determined an approach for the Interaction between environment and agent. Problem was determining what really does an observation look like when the agent isn’t physically in the environment? How should this be implemented in a way that makes semantic sense? Details in the zettel.
  • Another big problem to address is Agent entity registry. How should agents be assigned to entities, and what does this look like from the client’s perspective? What happens when you have entities on initialization of the env? What about when you have tons of agents, or just want a class of agents to be registered for all entities of a certain type? Again, more info in the zettel.
  • Another pressing issue is how do we properly implement and formalize constraints and game logic? How do we make it easy for client’s to create logical rules easily? How does this fit in with allowing custom entities? Should entities themselves be responsible for managing their impact on the state, or should just react to an action and rely on internal mechanisms to clean up? These questions addressed in Simulation constraint system
  • Having so much trouble figuring a useful registration scheme for the gym. I’m opting for a simple and possibly shortsighted approach, but it’s not worth trying to think anymore about at this point. I’m doing simple type mappings for bulk registration, and then simply instance only registration for agent and entities. I’m also allowing this only at initialization of gym so I don’t overcomplicate the API with extraneous methods. At this point I don’t anticipate having to dynamically (on the fly) register agent or entity classes, but perhaps there’s a point where it can’t be specified during initialization. We’ll deal with it when the time comes.
  • Throwing internal index updates inside of the entity registry method for the env. These are the dictionaries indexing into certain collections of entity attributes for easy lookup.
  • State refreshes in the gym (problem that plagued the genetic implementation) is just going to happen automatically on the entity registry. That’s the only time we need to update it, so we’ll just have it output a new state for the gym to set.
  • Redesigning env output as packet, a dictionary of state, reward, done, and extra. Extra is going to include internally managed indexes as mentioned above for easy use inside of view functions.
  • entities key in environment state is only formal attribute as just a collection of eid: Entity pairs. Other standard structures exist but not required.
  • Pivot on initialization params, now using options dict. This seems pretty clean, want to see how other libs do it
  • Using options object, .get for optional parameters, direct indexing for required parameters so error is thrown
  • Removed draw() method from Env base, redundant with recent View development


  • To allow Gym an initial set of agent-entity instances for registry?
  • How to deal with “live” state, i.e. one that changes as entities are updated?
  • Are we allowing “options” dicts/objects for passing parameter values in?


  • bayesian inference, distributions, emphasis on visualization
  • Consider far down the line to idea of having a “plugin” that allows clients to specify in an HTML doc basically just a query for some system they’d like to visualize on their page, and the problib servers with populate that area. Not sure if this would ever work out but it sure sounds neat
  • Consider how this library ties into broader resource goals, like making reference documents about the material the library covers (not docs, but actual note-like material).


  • a large python collective of interesting implementations and useful tools
  • Probabilistic simulations
  • Running experiments and visualizing distributions, etc
  • A repository for reference on how to implement certain technical programming tasks


See hub problib modules

Structure Philosophy

  • Local aggregates module structure for ideal API usage
  • Youngest child objects placed in single file together
  • Base classes for local module located inside file (maybe?)
  • All cross subpackage references are to be local imports;
  • Library examples placed under the root examples directory, followed by the appropriate submodule name inside that directory. This is what most public packages seem to do to avoid confusion of the core code for the library itself. Some packages even have a separate repo entirely for examples (like PyTorch).
  • Base class naming scheme: take note of Ray RLlib bases


  • Problib as the potential to be more than just an intuitive guy…a high level lib that weaves common libs and extends them in useful ways. Like a new env system that could use Gym, but is multi agent. Or implements RL algorithms using Pytorch as a model backend (could also define my own models for fun). So you could throw together a model from Pytorch or problib NN or even scikit, into an algorithm structure which uses an env instance. This could actually be very useful given the disarray of popular tools; making it easier to piece together fundamental components of a larger ecosystem, where each component can use those existing, popular tools. So basically infrastructure for connecting them in a modular way, again along with potentially better extensions and utilities like multi agent envs. A nicer, bigger, more meaningful goal. ALSO has API and front end optional connectivities to make it easy for develops to design sending data forward for monitoring or visualization. Lightweight, dead simple, easy to understand and manage.
    • Can just immediately think back to school projects: 516 we needed to build a naive traffic sim. Should be easy to prototype this with problib without tons of overhead. Of course, it wasn’t too difficult with the library, but could’ve helped with quicker development and additional functionality.
    • Pacman for 511A. We had a terrible time trying to extend DeepQ and PPO using TF/PyTorch in to our own multi-agent turn-based game setting. It should be easy to use problib structures to manage and prototype the proper simulation structure, along with the algorithms themselves.
  • Think there is quite a bit of room for multi agent envs. There isn’t any good standard as far as I’m aware, and trying to bend standard libs to work is a pain (as we saw today). I think there is a good need for an easy to understand, flexible, multi agent framework that, while perhaps sacrificing efficiency, people can actually work with at a lower level than something like TF or PyTorch implementations.

Design notes from other libraries

Thought dump

  • The hierarchical entity-policy assignment system is fine. I think it makes sense and is no worse than the furthest child system seen in the something like the RLLib policy mapping function.

  • The main issue right now is the initialization schematic. How are we allowing agents to be specified AND ensuring later internal creation is properly set up? You can either have a single mapping function from entities to policies, or you can build up your groups and defaults and let the hierarchical structure get filled in that way. In either case, we’re also trying to handle views and indexes at the same time, so this should be elegantly thrown in somehow.

  • Spaces…can our individual observation spaces be defined using only the (formal) state space and a transformation? Seems possible…can pass the abstract form through the transformation (and maybe this function is part of the spaces library using a supported set of local transformations) to get a new abstract form, or at least can sample from the original space and pass those through the transformation each time. Something to think on.

  • Update process seems useful, but build it as we need it. Figure out some of the core issues, begin building your set of envs, and as you find a need for it piece the update process together then. Don’t really have a good reason to let it stand in the way of building actual environments.

  • More problib development, some on chalk board some here:

    • Seems fairly clear that we have lots of modular pieces to put together all the components necessary for specifying the multi-agent observation spaces, action spaces, transformations on the state space (views), indexes, etc. Gave some though to dynamic groups and realized they were kinda just indexes…more to think about there.

    • Need to decide on structure of spec if we allow multi-agent details on init, holding named groups, initialized entities, views, indexes:

        '<group_name>': {
            'entities': [ ... ],
            'policy': <func to policy or direct policy>,
            'view': <view func>,
            'indexes': [ ... ],
        }, ...
    • More and more finding the entity .update() function a reasonable automatic way to advance an agent forward from within the environment. The pygame library has examples that seem to do this, and then check for inconsistencies/collisions/etc afterward. This makes sense to me, although may need to pay attention to the signature of the update() method (is it always parameterless?).

    • Think a lot more of the game logic’s responsibility ends up being on the policy that I originally thought. Like in a traffic env, the environment isn’t really enforcing anything per se (although it definitely can if that’s decided). Instead, it’s the cars’ responsibilities to, say, decide to slow down as they reach a traffic light, or to properly make a turn, etc. This all boils down to the policy getting the right information and (possibly deterministically) using that to make a sensible decision.

    • There’s also the question of the responsibility for certain changes to entity attributes. Think about a police car turning on its lights: is that something that’s enabled only after the policy decides to submit an action enabling the lights, or is it enforced by the environment whenever a rule of some sort if broken within the range of the police car? I suppose this is just a design question for a specific env more than anything, and less involving the general framework. Either situation should be equally easy to implement I guess is the main point there.

    • Thoughts on treating the sim setup like an MVC framework. The env logic is the model, policies are controllers for entities, and views are analogous to the plib views (both as latent observers for human rendering and for observation spaces).

Trying to get total problib thoughts together from recent dev cycles. This includes ideas from above, as well as on chalkboard:

While there are lots of very specific, seemingly small but consequential ideas (think individual entity update() methods), there are some large overarching issues being addressed at the core. These are

  • The core env registration scheme. How are we supplying environment parameters, custom entities, necessary policy details, etc?