STS Software Architecture

This document provides an overview of the software architecture of STS, as well as a development workflow for those contributing to STS itself.

For a detailed walkthrough of STS’s uses cases, see this page.

For searchable code documentation, see this page.

Software Architecture

Simulation State

All the important state of the simulation can be accessed through a single object, instantiated from sts/simulation_state.py. This file stores the configuration parameters specified by the user, handles instantiation of the simulation object, and allows the control flow to access relevant state.

Topologies

STS currently supports two default network topologies: full meshes, and fat trees. For more information on their configuration parameters, see the documentation.

You might also consider using STS’s topology creation GUI to create your own custom topology:

$ ./simulator.py -c config/gui.py

Traffic Generation

For an overview of how to generate dataplane in STS, see this page.

Control Flow

STS has six modes of operation. Each of these are split into separate modules, each of which can be found under sts/control_flow/.

Interactive

This mode provides a command line interface for users to interactively step through the execution of the network. Type help for more information on the command line interface.

All events observed in interactive mode are recorded for later replay.

Fuzzer

This mode programmatically generates random inputs. The core of Fuzzer is a simple loop:

while True:
  check dataplane messages
  check controlplane messages
  inject inputs
  sleep

We refer to each iteration of this loop as a ‘logical round’

By default Fuzzer generates its inputs based on the probabilities defined in config/fuzzer_params.py. That is, in a given round, the probability that an event will be triggered is defined by the parameter specified in that file.

Fuzzer allows you to check invariants of your choice at specified intervals.

See the documentation on Fuzzer.__init__ for more information about parameters.

Fuzzer will drop into interactive mode if the user sends a ^C signal.

Replayer

Given an event trace generated by Interactive or Fuzzer, Replayer tries as best as it can to inject the inputs in the trace in a way that reproduces the same result. It does this by listening to the internal events in the trace and replaying inputs when it sees that the causal dependencies have been met.

event_scheduler.py determines how long the simulator waits for each internal event before timing out.

MCSFinder

Given an event trace, MCSFinder executes delta debugging to find the minimal causal sequence. For each subsequence chosen by delta debugging, it instantiates a new Replayer object to replay the execution, and checks at the end whether the bug appears. To avoid garbage collection overhead, MCSFinder runs each Replay in a separate process, and returns the results via XMLRPC. See sts/util/rpc_forker.py for the mechanics of forking.

The runtime statistics of MCSFinder are stored in a dictionary and logged to a json file.

InteractiveReplayer

Given an event trace (possibly minimized by MCSFinder), InteractiveReplayer allows you to interactively step through the trace (a la OFRewind) in order to understand the conditions that triggered a bug. This is helpful for:

visualizing the network topology
tracing the series of link/switch failures/recoveries
tracing the series of host migrations
tracing the series of flow_mods
tracing the series of traffic injections
perturbing the original event sequence by adding / removing inputs interactively

OpenFlowReplayer

Delta debugging does not fully minimize traces (often for good reason, e.g. delicate timings). In particular we have observed minimized traces often contain many OpenFlow messages that time our or are overwritten, i.e. are not directly relevant for triggering an invalid network configuration.

OpenFlowReplayer replays the OpenFlow messages from an event trace, enabling:

automatic filtering of flow_mods that timed out or were overwritten by later flow_mods
automatic filtering of flow_mods that are irrelevant to a set of flow entries specified by the user
interactive bisection of the OpenFlow trace to infer which messages were and were not relevant for triggering the bug (especially useful for tricky cases requiring human involvement). (TBD)

The tool can then spit back out a new event trace without the irrelevant OpenFlow messages, to be replayed again by Replayer or InteractiveReplayer.

Experiment Results

Experiment results are automatically placed in their own subdirectory under experiments/. There you can find console output, serialized event traces, and config files for replay and MCS finding.

By default, the name of the results directory is inferred from the name of the config file. You can also specify a custom name with the -n parameter to simulator.py. You can also specify that each directory name should have a timestamp appended with the -t parameter

Event Traces

The event types logged by Interactive and Fuzzer are defined in sts/replay_event.py.

Events are eventually serialized to JSON. The format of the JSON files is documented here.

During replay events are stored in a EventDag object, which is essentially a linked list of events. Each input event object knows how to inject itself, and each internal event object knows how to wait for the appropriate internal event. Replay proceeds simply by invoking each event’s proceed() method.

Ensuring Validity of Traces

The subsequences chosen by delta debugging may not always be sensical. For example, it does not make sense to replay a recovery event if the preceding failure event has been pruned.

To cope with the possibility of invalid subsequences, we define ‘Atomic Input’ pairs that must be removed together by delta debugging. For example, we ensure that failure/recovery pairs are treated atomically, and we ensure that chains of host migration events for a given host are always consecutive in terms of location (i.e. we ensure that hosts don’t magically teleport to new locations) despite the possibility of delta debugging pruning intermediate host migration events.

Concurrency Model

STS is single threaded. All sockets are set to non-blocking, and all I/O operations or blocking calls such as sleep() are routed through a central select loop.

The select loop is encapsulated in an IOMaster object, found at sts/util/io_master.py. The IOMaster creates IOWorker objects to wrap each socket, which perform maintain read/write buffers to enable ‘fire-and-forget’ I/O semantics so that clients do not have to wait around for blocking calls to complete.

Message Buffering

STS buffers all messages that are passed throughout the system. There are two important buffer objects to note:

OpenflowBuffer (found in sts/openflow_buffer.py): buffers OpenFlow messages between switches and controllers (both incoming and outgoing messages)
BufferedPatchPanel (found in sts/topology.py): buffers dataplane messages sent between switches

This buffering allows Fuzzer or Interactive to perturb the order or timing of events in the system. Messages are not allowed through until the main control loop explicitly gives permission.

Invariant Checking

STS primarily uses headerspace analysis (hassel) to check network invariants. All hassel code can be found under sts/hassel.

We use two parts of hassel:

The python version can be found under sts/hassel/hsa-python. This version is potentially much slower for large networks, but is much easier to modify.
The optimized C version can be found under sts/hassel/hassel-c. This must be explicitly compiled with:
$ (cd sts/hassel/hassel-c; make)

We convert our OpenFlow routing tables to headerspace transfer functions in sts/hassel/config_parser/openflow_parser.py.

We generate a topology transfer function in sts/hassel/topology_loader/topology_loader.py.

Defining New Invariants

We use a shim layer to make all invocations into hassel: sts/invariant_checker.py.

This defines static methods for common invariants.

To add a new invariant, add a static method there.

Defining Custom Invariants

If you just want to compose invariants, or perform some other computation on top of an existing invariant, define a new method in config/invariant_checks.py. This is where all invariant checks must be explicitly named for event serialization purposes.

Determinism

We have implemented several optional features to achieve better determinism during replay.

Multiplexed Sockets

The operating system displays non-determinism in the order it schedules socket operations. That is, if you make the same sequence of socket syscalls, the O/S may actually perform them in a different order under the head. We cope with this by multiplexing all socket connections onto a single socket.

Multiplexed sockets require there to be module written by us running within the controller software.

See sts/util/socket_mux for more information.

Sync Protocol

The sync protocol (sts/syncproto) is in charge of extracting or feeding information to the controller software. It can:

Override gettimeofday in the controller, and instead have STS send fake values
Inform STS whenever an internal state change has occurred. This is currently implemented by monkeypatching the controller’s logging library, and routing logging messages through STS. Optionally, the controller can be blocked at internal state transitions until STS gives explicit acknowledgment.

Sync proto requires there to be module written by us running within the controller software.

Dependency on POX

STS depends on POX for library functionality (that is, we do not use POX for its controller functionality). Here are the specific library functionality we make use of:

Our software switches are instances of NXSoftwareSwitches from pox/lib/openflow/nx_software_switch.py (also see software_switch.py)
STSIOWorker subclasses the IOWorker from pox/lib/ioworker/io_worker.py
We use the revent library for event handling
POX’s dataplane packet classes are used to encapsulate packets
libopenflow_01 is used to parse and encapsulate OpenFlow messages
We use pox.util.connect_socket_with_backoff to create and connect non-blocking sockets

Development Workflow

Console Output

All output to the console is serialized (and optionally colored with bash codes). Console output is also Tee’ed to a separate file in the experiments results directory (console.out). See:

sts/util/console.py for bash code coloring and console Tee’ing
sts/util/procutils.py for how the controller’s console output is treated

Testing

All unit and integration tests are under the tests/ subdirectory. We use nose to run tests:

$ nosetests

This will find and run all files with ‘_test’ in the name.

Tools

There are many useful tools in the tools/ subdirectory:

check_compile.sh: Checks all python files for syntax or import errors.
clean.sh: Removes all extraneous files, e.g. .pyc files.
pretty_print_input_trace.py: Print an events.trace file in human readable format. This script’s output is highly configurable; simply pass the path to a config file to with -c. The config files have the following format:

----- config file format: ----
config files are python modules that may define the following variables:
fields  => an array of field names to print. uses default_fields if undefined.
filtered_classes => a set of classes to ignore, from sts.replay_event
...
see example_pretty_print_config.py for an example.

tabulate_events.py: Groups classes of events (e.g. LinkFailures/Recoveries) together and prints the to the console in human readable format.
trace_traffic_injection.py: given a path to a events trace file, and the event id (e.g. “e120”), this tool will trace the path a packet takes through the network. in particular, it will print all dataplane permits and drops, as well as ofp_packet_in’s and out’s associated with the packet injected by the given traffic injection event.
reindent_pox.sh: Canonicalize the whitespace formatting of all python files.
run.sh: Run the simulator iteratively until an invariant violation occurs. Pass command line arguments as an argument, e.g. $ run.sh ./simulator.py ...
visualization/visualize1D.html: A webpage for visualizing event traces. Especially useful for debugging non-deterministic replays by comparing the timings of different replay runs.

A common workflow:
- Run ./simulator.py -c experiments/experiment_name/mcs_config.py
- Discover that the final MCS does not trigger the bug.
- Open visualize1D.html in a web browser.
- Load either the original (fuzzed) trace, experiments/experiment_name/events.trace, or the first replay of this trace, experiments/experiment_name_mcs/interreplay_0_reproducibility/events.trace, as the first timeline. I have found that it is often better to load the first replay rather than the original fuzzed trace, since this has timing information that matches the other replays much more closely.
- Load the final replay of the MCS trace, experiments/experiment_name_mcs/interreplay_._final_mcs_trace/events.trace, as the second timeline.
- Hover over events to further information about them, including functional equivalence with events in the other traces.
- Load intermediate replay trace timelines if needed. Intermediate replay traces from delta debugging runs can be found in experiments/experiment_name_mcs/interreplay_*
visualization/visualize2D.html: A webpage for showing a Lamport time diagram of an event trace. Useful for visually spotting the root causes of race conditions and other nasty bugs.

Questions?

Send questions or feedback to: sts-dev@googlegroups.com