STS Software Architecture

Overview of the software architecture of STS.

This document provides an overview of the software architecture of STS, as well as a development workflow for those contributing to STS itself.

For a detailed walkthrough of STS’s uses cases, see this page.

For searchable code documentation, see this page.

Software Architecture

Simulation State

All the important state of the simulation can be accessed through a single object, instantiated from sts/simulation_state.py. This file stores the configuration parameters specified by the user, handles instantiation of the simulation object, and allows the control flow to access relevant state.

Topologies

STS currently supports two default network topologies: full meshes, and fat trees. For more information on their configuration parameters, see the documentation.

You might also consider using STS’s topology creation GUI to create your own custom topology:

$ ./simulator.py -c config/gui.py

Traffic Generation

For an overview of how to generate dataplane in STS, see this page.

Control Flow

STS has six modes of operation. Each of these are split into separate modules, each of which can be found under sts/control_flow/.

Interactive

This mode provides a command line interface for users to interactively step through the execution of the network. Type help for more information on the command line interface.

All events observed in interactive mode are recorded for later replay.

Fuzzer

This mode programmatically generates random inputs. The core of Fuzzer is a simple loop:

while True:
  check dataplane messages
  check controlplane messages
  inject inputs
  sleep

We refer to each iteration of this loop as a ‘logical round’

By default Fuzzer generates its inputs based on the probabilities defined in config/fuzzer_params.py. That is, in a given round, the probability that an event will be triggered is defined by the parameter specified in that file.

Fuzzer allows you to check invariants of your choice at specified intervals.

See the documentation on Fuzzer.__init__ for more information about parameters.

Fuzzer will drop into interactive mode if the user sends a ^C signal.

Replayer

Given an event trace generated by Interactive or Fuzzer, Replayer tries as best as it can to inject the inputs in the trace in a way that reproduces the same result. It does this by listening to the internal events in the trace and replaying inputs when it sees that the causal dependencies have been met.

event_scheduler.py determines how long the simulator waits for each internal event before timing out.

MCSFinder

Given an event trace, MCSFinder executes delta debugging to find the minimal causal sequence. For each subsequence chosen by delta debugging, it instantiates a new Replayer object to replay the execution, and checks at the end whether the bug appears. To avoid garbage collection overhead, MCSFinder runs each Replay in a separate process, and returns the results via XMLRPC. See sts/util/rpc_forker.py for the mechanics of forking.

The runtime statistics of MCSFinder are stored in a dictionary and logged to a json file.

InteractiveReplayer

Given an event trace (possibly minimized by MCSFinder), InteractiveReplayer allows you to interactively step through the trace (a la OFRewind) in order to understand the conditions that triggered a bug. This is helpful for:

OpenFlowReplayer

Delta debugging does not fully minimize traces (often for good reason, e.g. delicate timings). In particular we have observed minimized traces often contain many OpenFlow messages that time our or are overwritten, i.e. are not directly relevant for triggering an invalid network configuration.

OpenFlowReplayer replays the OpenFlow messages from an event trace, enabling:

The tool can then spit back out a new event trace without the irrelevant OpenFlow messages, to be replayed again by Replayer or InteractiveReplayer.

Experiment Results

Experiment results are automatically placed in their own subdirectory under experiments/. There you can find console output, serialized event traces, and config files for replay and MCS finding.

By default, the name of the results directory is inferred from the name of the config file. You can also specify a custom name with the -n parameter to simulator.py. You can also specify that each directory name should have a timestamp appended with the -t parameter

Event Traces

The event types logged by Interactive and Fuzzer are defined in sts/replay_event.py.

Events are eventually serialized to JSON. The format of the JSON files is documented here.

During replay events are stored in a EventDag object, which is essentially a linked list of events. Each input event object knows how to inject itself, and each internal event object knows how to wait for the appropriate internal event. Replay proceeds simply by invoking each event’s proceed() method.

Ensuring Validity of Traces

The subsequences chosen by delta debugging may not always be sensical. For example, it does not make sense to replay a recovery event if the preceding failure event has been pruned.

To cope with the possibility of invalid subsequences, we define ‘Atomic Input’ pairs that must be removed together by delta debugging. For example, we ensure that failure/recovery pairs are treated atomically, and we ensure that chains of host migration events for a given host are always consecutive in terms of location (i.e. we ensure that hosts don’t magically teleport to new locations) despite the possibility of delta debugging pruning intermediate host migration events.

Concurrency Model

STS is single threaded. All sockets are set to non-blocking, and all I/O operations or blocking calls such as sleep() are routed through a central select loop.

The select loop is encapsulated in an IOMaster object, found at sts/util/io_master.py. The IOMaster creates IOWorker objects to wrap each socket, which perform maintain read/write buffers to enable ‘fire-and-forget’ I/O semantics so that clients do not have to wait around for blocking calls to complete.

Message Buffering

STS buffers all messages that are passed throughout the system. There are two important buffer objects to note:

This buffering allows Fuzzer or Interactive to perturb the order or timing of events in the system. Messages are not allowed through until the main control loop explicitly gives permission.

Invariant Checking

STS primarily uses headerspace analysis (hassel) to check network invariants. All hassel code can be found under sts/hassel.

We use two parts of hassel:

We convert our OpenFlow routing tables to headerspace transfer functions in sts/hassel/config_parser/openflow_parser.py.

We generate a topology transfer function in sts/hassel/topology_loader/topology_loader.py.

Defining New Invariants

We use a shim layer to make all invocations into hassel: sts/invariant_checker.py.

This defines static methods for common invariants.

To add a new invariant, add a static method there.

Defining Custom Invariants

If you just want to compose invariants, or perform some other computation on top of an existing invariant, define a new method in config/invariant_checks.py. This is where all invariant checks must be explicitly named for event serialization purposes.

Determinism

We have implemented several optional features to achieve better determinism during replay.

Multiplexed Sockets

The operating system displays non-determinism in the order it schedules socket operations. That is, if you make the same sequence of socket syscalls, the O/S may actually perform them in a different order under the head. We cope with this by multiplexing all socket connections onto a single socket.

Multiplexed sockets require there to be module written by us running within the controller software.

See sts/util/socket_mux for more information.

Sync Protocol

The sync protocol (sts/syncproto) is in charge of extracting or feeding information to the controller software. It can:

Sync proto requires there to be module written by us running within the controller software.

Dependency on POX

STS depends on POX for library functionality (that is, we do not use POX for its controller functionality). Here are the specific library functionality we make use of:

Development Workflow

Console Output

All output to the console is serialized (and optionally colored with bash codes). Console output is also Tee’ed to a separate file in the experiments results directory (console.out). See:

Testing

All unit and integration tests are under the tests/ subdirectory. We use nose to run tests:

$ nosetests

This will find and run all files with ‘_test’ in the name.

Tools

There are many useful tools in the tools/ subdirectory:

----- config file format: ----
config files are python modules that may define the following variables:
fields  => an array of field names to print. uses default_fields if undefined.
filtered_classes => a set of classes to ignore, from sts.replay_event
...
see example_pretty_print_config.py for an example.

Questions?

Send questions or feedback to: sts-dev@googlegroups.com