Network Functions (NFs) are pervasive in today’s networks. They implement core network functionality, from basic fundamental features like bridging and network address translation; to accelerating the network with WAN optimizers and load-balancers; to guaranteeing security with firewalls, port scan detectors, and intrusion detection systems.
The ACES network will incorporate accelerated software NFs across the infrastructure to meet the requirements of its use cases. A crucial challenge is guaranteeing the required performance while keeping the flexibility offered by software.
Context. NFs were originally implemented as fixed-function, closed-source appliances, but recently, there has been a transition to implementing them in software using commodity off-the-shelf servers. These NFs trade off flexibility and ease of deployment for an increased performance challenge. Specifically, to allow NFs to process packets at current line-rate speeds (100+ Gbps), one must resort to multiple CPU cores. The difficulty is doing so without breaking the NF core functionality.
Operating at high line rates (e.g., 100Gbps), each packet demands a very short processing time, making inter-core coordination a complex and costly task. The challenge of avoiding this synchronization is not only difficult but also error-prone, necessitating a deep understanding of the NF, meticulous implementation, and careful avoidance of common parallelization pitfalls.
Automatic NF parallelization. In ACES, we advocate a paradigm shift in NF parallelization: the burden of parallelization should not be put on the developer but instead be automatically performed by compilers. This approach empowers developers to reason about their NFs in a sequential mindset while reaping the full benefits of parallelization. In this context, we developed Maestro, a system that facilitates the automatic parallelization of software network functions.
Maestro uses static-analysis tools to analyze the sequential implementation of the NF and automatically generates an enhanced parallel version that carefully configures the NIC to distribute traffic across cores while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic.
To find a shared-nothing solution, Maestro analyzes the NF and infers how it should partition state and packets across cores to altogether avoid synchronization (i.e. a sharding solution). To concretize this sharding solution, Maestro formulates it as an SMT problem and uses a solver (e.g., Z3) to find the correct NIC configuration that enforces it. Finally, it correctly configures the NIC and automatically generates performance-oriented parallel code, dealing with the pitfalls of parallel programming.
Evaluation. We parallelized 8 common software NFs. The Figure shows how their performance scales with the number of cores. They generally scale up linearly until bottlenecked by PCIe when using small packets (an optimal outcome) or by 100Gbps line rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.
Maestro was presented at NSDI24, and its source code is available on Github.
0 Comments