Pisces: Real-Time Video Transport Framework

Internet-Draft	Pisces	October 2024
Zheng, et al.	Expires 21 April 2025	[Page]

Abstract

This document specifies the Pisces, an ensemble video transport framework for real-time communication. Pisces complements the benefits of rule-based and learning-based approaches, without modifying the codec layer. Pisces uses an incremental and iterative reinforcement learning model to adapt to the unseen environment. When the real environment well matches the training environment, the learning-based approach is actively working. Otherwise, the rule-based approach is used to ensure transport safety. Proactively probing network capacity simultaneously using both rule-based approach and learning models makes Pisces highly efficient and robust. Pisces can be deployed in WebRTC, which replaces the default Google congestion control algorithm.¶

3. Design Overview

3.1. Adaptive Bitrate Framework Overview

Figure1 shows the main structure of an adaptive bitrate framework named Pisces . Pisces mainly includes an ensemble bitrate adaptation module and an incremental and iterative learning module. The bitrate adaptation leverages the wisdom of the rule-based and learning-based approaches, running both control logic to determine the final bitrate based on the collected network feedback. The learning module works in an incremental and iterative manner, providing consistent high performance by only relearning a part of the model.¶

3.1.1. Bitrate Adaptation Module

The monitor module collects the network information in the latest monitoring interval. The raw data can be transformed to historical states and rewards, feeding to the RL agent. The RTP/RTCP packets, including throughput and delay, are directly sent to the rule-based (RL-based) counterpart. Subsequently, the RL agent derives the historical state from the monitor module and generates one candidate bitrate x_rl. The transitions generated by the RL agent will be stored in the replay buffer of the learning module. The rule-based counterpart runs its fixed control logic to produce the other candidate bitrate x_cl according to the RTCP packet. Both x_rl and x_cl are calculated by the utility module and then evaluated by their corresponding utility values.¶

3.1.2. Incremental and Iterative Learning Module

The subspace partitioning module adaptively partitions the original state space into two subspaces and each subspace can be further partitioned. When the environment changes, the function refitting module can just refit the Q-value functions of the changed subspaces belonging to a variation state set and update the model to make RL Agent synchronize with the learning module, not involving the state in the unchanged subspaces. Partitioning a state space into subspaces allows Pisces to use a simple and practical fitting function and meanwhile guarantees the generalization ability.¶


      +------------------------------------------------------------------+
      |Incremental and Iterative Learning                                |
      | +---------------+       +--------------+       +---------------+ |
      | |   Function    <-------+   Subspace   <-------+ Replay Buffer | |
      | |   Refitting   |       | Partitioning |       +--------^------+ |
      | +---------------+       +--------------+                |        |
      +------------------------------------------------------------------+
              +-----------------------------------------------+
                  model synchronation                        |
      +------------------------------------------------------------------+
      |Bitrate Adaptation       x_rl     |                               |
      |    +-------------------+   +-----v-------+  state +----------+   |
      |    |sending            <---+  RL Agent   <--------+          <---+
      | <--+rate               |   +-------------+        |          |   |
      |    |      Control      |x_cl                      |          |   |
      |    |      Module       |   +-------------+   rtp/ |          |   |
      |    |                   <---+  Rule Based |   rtcp |  Monitor |   |
      |    |                   |   |  Estimator  <--------+  Module  |   |
      |    +---------^---------+   +-------------+        |          <---+
      |              |                         performance|          |   |
      |    +---------+---------------------------+metrics |          |   |
      |    |           Utility Module            <--------+----------+   |
      |    +-------------------------------------+                       |
      +------------------------------------------------------------------+

Figure 1: Main Structure of Pisces

3.2. State Machine Overview

The control module determines the switching logic among different stages and produces the running rate of each stage. The Figure 2 shows the overall control logic consists of four stages: queue draining, exploration, evaluation and exploitation.¶

At the start of each stage, Pisces estimate the congestion degree based on the measured current delay and the latest minimum delay at the sender side. Once the estimated delay exceeds a certain threshold 𝛿 and the current sending rate is higher than the receiving rate, indicating that the queue is continually building up, it enters the Drain Queue Stage and decreases the bitrate. The duration is one RTT.¶

Once the drain queue stage ends, it enters the exploration stage. The Exploration Stage takes an RTT for both rule-based counterpart and the RL agent to generate the candidate bitrates, based on the network state collected during this phase. If the difference between these two candidate bitrates is divergent enough, indicating that a disagreement happens, Pisces enters into the evaluation stage and judge which one is right; Otherwise, Pisces still keeps in the exploration stage.¶

In the Evaluation Stage, one RTT is divided into two evaluation intervals (EI), one for each candidate bitrate. To minimize the interference during the evaluation, Pisces try the smaller candidate bitrate in the first EI and the larger one in the second EI. This could avoid the side effect to some degree as a result of the queue accumulation if tried the larger one firstly. When the evaluation ends, it enters the exploitation stage.¶

The Exploitation Stage temporally takes the bitrate determined in the last control cycle as the sending rate. The main purpose of this stage is to quantify the performance of the two candidate bitrates obtained in the evaluation stage since the network feedback should have been received by the sender side at this moment. The collected feedback are fed into the utility function to calculate the corresponding utility values. The bitrate with higher utility value is preferred and will be set to the final sending rate. Finally, Pisces enters the next control cycle and starts with the exploration stage again. Note that the previous action 𝑎𝑡 −1, the last state , the reward 𝑟𝑒𝑤 and current state 𝑠𝑡 together form a sample and stored into the replay buffer used for the incremental and iterative learning. The detailed behavior for each state is described below.¶

4. Detailed Algorithm

4.1. State Machine

Pisces implements a state machine which keeps track of performance of both rule-based CC agent and learning-based CC agent. Utilization is then calculated and decisions are taken based on states.¶

4.1.1. State Transition Diagram

The following state transition diagram Figure2 determines the switching logic among different stages:¶


        +-----------+   +-----------+   +-----------+    +------------+
        |Queue Drain+-> |Exploration+-> | Evaluation+--> |Exploitation|
        +----+------+   +-----------+   +-----------+    +-----+------+
        ^                                                 |
        |                                                 |
        +-------------------------------------------------+

Figure 2: State Machine of Pisces

4.1.2. State Machine Operation Overview

When starting up, Pisces tries to ramp up sending quickly; to utilize the best of rule-based CC and learning-based CC, Pisces must continue to monitor the divergence of bitrate the two algorithm proposed, and then evaluate which one better suits current link capacity. If the rule-based agent performs better, Pisces must use the bitrate it given to guide later transmission, otherwise the bitrate learning-based agent given is picked to improve transmission quality. Pisces runs these measurements periodically to ensure best utilization among rule-based agent and learning-based agent. The frequency, duration of measurements differ depending on current link status. This state machine has several goals:¶

Achieve better utilization by learning from link states¶
Avoid security problem of learning-based methods by running rule-based model in parallel¶
Continous learning by sampling from link states and actions to keep model effective¶

4.1.3. State Machine Tactics

In Pisces, at any given time the sender can choose one of the following tactics:¶

Using rule-based model: When the rule-based method achieves a higher utility value during the evaluation phase, use this method to obtain improved performance.¶
Using learning-based model: When the learning-based method achieves a higher utility value during the evaluation phase, use this method to obtain improved performance.¶
Stay with previous selected bitrate: If there are no new comparative results between the rule-based and learning-based methods, continue using the previous bitrate.¶

4.2. Algorithm Organization

The Pisces algorithm is driven by both event and time. State transitions happen upon transport connection initialization, when a processInterval event is triggered, and when an event for receiving feedback messages is triggered. All of the sub-steps invoked referenced below are described below.¶

4.2.1. Initialization

Upon transport connection initialization, Pisces executes its initialization steps:¶

class Pisces:
  def __init__(self):
    self.state = PiscesState.START_UP
    self.feedback_processor = FeedbackProcessor()
    self.rl_agent = RLAgent()
    self.rule_agent = RuleAgent()
    self.last_bitrate = DEFAULT_START_BITRATE
    self.last_rtt = DEFAULT_START_RTT
    self.last_loss = 0
    self.last_receiving_rate = DEFAULT_START_BITRATE
    self.last_checkpoint = timeNow()
    self.smoothed_rtt = DEFAULT_START_RTT
    self.first_bitrate = DEFAULT_START_RTT
    self.second_bitrate = DEFAULT_START_RTT
    self.first_agent_utility = 0
    self.second_agent_utility = 0
    self.last_utility = 0
    self.feedbacks_received = []

The most important part of Pisces is rl_agent and rule_agent, which are responsible for making decisions based on predefined rules or learning models. FeedbackProcessor is meant to conduct link state information from TWCC feedbacks.¶

4.2.2. Per-ProcessInterval Steps

When processInterval event triggered, Pisces take further actions based on current state.¶

class Pisces:
  def onProcessInterval(self):
    # ...

4.2.3. Per-Feedback Steps

When the event of receiving feedback messages is triggered, Pisces does not process them immediately. Instead, it stores them in a buffer and processes them together later to match the RTT-based control cycle.¶

class Pisces:
    def onFeedbackMsg(self, msg):
        self.feedbacks_received.append(msg)

4.3. State Machine Operation

4.3.1. Startup Stage

When a Pisces flow starts up, it performs a exponential bitrate ramp up process to search the potential link capacity to later guide the learning-based model. This is done by double the bitrate each RTT until an observable delay rise happend. Pisces always start with Startup stage.¶

When initializing a connection, Pisces will set its state to START_UP. As Pisces exploring available bandwidth, it updates the last and max receiving rate achieved upon feedbacks. During the stage, Pisces keep track of receiving rate, current RTT and minimum RTT observed. When the sending rate goes beyond link capacity, a queue forms in the bottleneck link and then RTT rise will be observed. Upon the RTT signal, Pisces is able to probe current link capacity and exit START_UP stage. To mitigate the buffer filled by START_UP stage, Pisces enters DRAIN stage to prevent further congestion events.¶

class Pisces:
  # ...
  def onProcessInterval(self):
    if self.state == PiscesState.START_UP:
      if timeNow() < self.last_checkpoint + self.smoothed_rtt:
        return self.last_bitrate
      self.processFeedback()
      if self.last_rtt > self.min_rtt * DRAIN_ENTER_THRESHOLD_RTT:
        return self.enterDrain()
      self.last_checkpoint = timeNow()
      self.last_bitrate *= PROBE_BITRATE_GAIN
      return self.last_bitrate
    # ...

4.3.2. Drain Stage

Upon exiting Startup or a significant RTT grow happened on exploration stage, Pisces enters its Drain state. In Drain, Pisces aims to quickly reduce the queue length built during transmission by reducing current sending rate on each bitrate update event. The reduced bitrate is then sync to two agents.¶

class Pisces:
  # ...
  def enterDrain(self):
    self.state = PiscesState.DRAIN
    self.last_checkpoint = timeNow()
    self.last_bitrate *= DRAIN_BITRATE_GAIN
    self.rule_agent.setBitrate(self.last_bitrate)
    self.rl_agent.setBitrate(self.last_bitrate)
    return self.last_bitrate

Drain stage lasts one RTT, then Pisces exits Drain and enters Exploration.¶

class Pisces:
  # ...
  def onProcessInterval(self):
    # ...
    elif self.state == PiscesState.DRAIN:
      if timeNow() < self.last_checkpoint + self.smoothed_rtt:
        return self.last_bitrate
      self.processFeedback()
      return self.enterExploration()
    # ...

4.3.3. Exploration Stage

Pisces aims to select best performing algorithm with current network states. To achieve this, Pisces needs to run rule-based congestion control agent and learning-based counterpart simultaneously then collect feedbacks to determine which one is better performing. This stage takes one RTT to finish. During this stage, Pisces takes the bitrate chosen from last control loop to reset both agent. Then the bitrate given by rule-based agent is picked as final bitrate. Learning-based agent still monitors network feedbacks but only provide bitrate as a reference. The exploration stage ends once there is a gap larger than threshold between the bitrate given by rule based agent and learning based agent.¶

class Pisces:
  # ...
    def enterExploration(self):
    self.last_checkpoint = timeNow()
    self.rule_agent.setBitrate(self.last_bitrate)
    self.rl_agent.setBitrate(self.last_bitrate)
    return self.last_bitrate

class Pisces:
  # ...
  def onProcessInterval(self):
    # ...
    elif self.state == PiscesState.EXPLORATION:
      if timeNow() < self.last_checkpoint + self.smoothed_rtt:
        return self.last_bitrate
      self.last_checkpoint = timeNow()
      self.processFeedback()
      if (
        abs(self.rule_agent.getBitrate() - self.rl_agent.getBitrate())
        < BITRATE_VARIANCE_THRESHOLD * self.last_bitrate
      ):
        self.last_bitrate = self.rule_agent.getBitrate()
        return self.last_bitrate
      return self.enterEvaluationFirst()
    # ...

4.3.4. Evaluation Stage

To evaluate the performance of rule-based algorithm and learning-based algorithm, Pisces takes one RTT to try both bitrate given and then picks the better performing one. Evaluation stage is divided into two parts, one for each bitrate. Note that Pisces always starts evaluation with the smaller bitrate given to prevent potential congestion harmness.¶

class Pisces:
  # ...
  def enterEvaluationFirst(self):
    self.last_checkpoint = timeNow()
    self.state = PiscesState.EVALUATION_FIRST
    x_cl = self.rule_agent.getBitrate()
    x_rl = self.rl_agent.getBitrate()
    self.first_bitrate = min(x_cl, x_rl)
    self.second_bitrate = max(x_cl, x_rl)
    return self.first_bitrate

  def enterEvaluationSecond(self):
    self.last_checkpoint = timeNow()
    self.state = PiscesState.EVALUATION_SECOND
    return self.second_bitrate

class Pisces:
  # ...
  def onProcessInterval(self):
    # ...
    elif self.state == PiscesState.EVALUATION_FIRST:
      if timeNow() < self.last_checkpoint + self.smoothed_rtt / 2:
        return self.first_bitrate
      self.processFeedback()
      return self.enterExploitationSecond()
    elif self.state == PiscesState.EVALUATION_SECOND:
      if timeNow() < self.last_checkpoint + self.smoothed_rtt / 2:
        return self.second_bitrate
      self.processFeedback()
      return self.enterExploitationFirst()
    # ...

4.3.5. Exploitation Stage

As results of bitrate selection, feedbacks are expected to return one RTT later, during the exploitation stage. As Pisces chose two bitrate in the earlier half RTT and the latter half RTT separately, their feedbacks are returning in the earlier half RTT and the latter half RTT of Evaluation stage respectively. Through calculation of Utility function and several network-layer metrics collected from feedbacks, Pisces is able to measure the performance of rule-based agent and learning-based agent and pick the one behave better. After Pisces has done bitrate selecting, it will enter Exploration stage again.¶

class Pisces:
  # ...
  def enterExploitationFirst(self):
    self.last_checkpoint = timeNow()
    self.state = PiscesState.EXPLOITATION_FIRST
    return self.last_bitrate
  def enterExploitationSecond(self):
    self.last_checkpoint = timeNow()
    self.state = PiscesState.EXPLOITATION_SECOND
    return self.last_bitrate

class Pisces:
# ...
def onProcessInterval(self):
  # ...
  elif self.state == PiscesState.EXPLOITATION_FIRST:
    if timeNow() < self.last_checkpoint + self.smoothed_rtt / 2:
      return self.last_bitrate
    self.processFeedback()
    self.first_agent_utility = self.getUtility()
    return self.enterExploitationSecond()
  elif self.state == PiscesState.EXPLOITATION_SECOND:
    if timeNow() < self.last_checkpoint + self.smoothed_rtt / 2:
      return self.last_bitrate
    self.processFeedback()
    self.second_agent_utility = self.getUtility()
    if (
      max(self.first_agent_utility, self.second_agent_utility)
      > self.last_utility
    ):
      if self.first_agent_utility > self.second_agent_utility:
        self.last_bitrate = self.first_bitrate
      else:
        self.last_bitrate = self.second_bitrate
    return self.enterExploration()
  # ...

4.4. Feedback Process

Pisces uses transport-wide congestion control feedbacks to conduct link metrics. The specification of packet formats can be found in [draft-holmer-rmcat-transport-wide-cc-extensions-01]. RTT, receiving rate and loss calculation is based on google congestion control, which can be found in [draft-ietf-rmcat-gcc-02], in 5.Delay based control.¶

4.5. Utility Calculation

Pisces uses Utility function to evaluate performance of bitrate selection. The utility function can bound the bandwidth of a video flow into the range [MIN_BAND, MAX_BAND], which can meet the requirements of most industry scenarios. At the same time, it can speedup the training procedure. The utility function fully considers the impact of relative and absolute throughput, packet loss rate, delay, and delay jitter.¶

class Pisces:
# ...
def getUtility():
  delay_metric = self.min_rtt / (self.last_rtt * 2)
  return (
    self.last_receiving_rate - 10 * self.last_loss * self.last_receiving_rate
  ) / self.max_bw * delay_metric - 2 * delay_metric

4.6. Updating Control Parameters

Most of the algorithm deployed to process network signals are from the delay based estimator of GCC XX, including RTT, loss rate and receiving rate estimate. These are done in FeedbackProcessor. Feedbacks provided by WebRTC Transport-Wide CC extension are stored in Pisces.feedbacks_received to provide RTT-based network signal processing.¶

class Pisces:
  # ...
  def processFeedback(self):
    self.feedback_processor.process(self.feedbacks_received)
  # ...

After FeedbackProcessor has done processing network signals, callbacks of Pisces, rule-based agent and learning-based agent will be called. These metrics are used in later utility calculation and bitrate selection.¶

class Pisces:
  # ...
  def onFeedback(self, stats):
    self.last_rtt = stats.rtt
    self.last_loss = stats.loss
    self.last_receiving_rate = stats.receiving_rate
    self.smoothed_rtt = WEIGHT_RTT * self.smoothed_rtt + stats.rtt * (1 - WEIGHT_RTT)
    self.min_rtt = WEIGHT_MINRTT * self.min_rtt + stats.min_rtt * (1 - WEIGHT_MINRTT)
    self.max_bw = WEIGHT_MAXBW * self.max_bw + stats.max_bw * (1-WEIGHT_MAXBW)
    # ...

The max_bw and min_rtt are exceptions because we need to the two metrics stay stable but keep up with link state. Pisces uses an EWMA (exponential weighted moving average) filter when calculating smoothed_rtt, min_rtt and max_bw. The hyperparameter of EWMA can be fine-tuned to tradeoff sensitivity and robustness.¶

4.6.1. Updating Rule-based Agent

The rule-based agent is a modified version of GCC. The feedback processing part of GCC is extracted to provide network signals to learning-based agent as well (Feedback Processor). Delay-based bandwidth estimator and loss-based estimator are kept in rule-based agent and the decision procedure remain the same.¶

class Pisces:
  # ...
  def onFeedback(self, stats):
    # ...
    self.rule_agent.onStats(stats)

4.6.2. Updating Learning-based Agent

Upon selection, the learning-based agent in Pisces selects a reinforcement learning-based model to produce a bitrate candidate. The model is built on PPO (Proximal Policy Optimization), containing features of loss rate, trendline, bitrate metric, delay metric and network state.¶

Among these features, loss rate, trendline and network state are obtianed directly from stats provided by FeedbackProcessor. Bitrate metric and delay metric need further calculations.¶

class Pisces:
  # ...
  def onFeedback(self, stats):
    # ...
    features = [self.loss, stats.trendline, self.last_bitrate/self.max_bw, self.min_rtt/self.rtt, stat.network_state]
    self.rl_agent.onFeature(features)
class RLAgent:
  # ...
  def onFeature(self, features):
    self.history.append(features)

Pisces: Real-Time Video Transport Framework

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

2. Terminology