Prior to entering the high-performance, user-level networking space in April 2017, I wasn't even aware of the existence of the field of high-performance interconnects. I barely knew the sockets interface. I had heard about the concepts of DMA and RDMA but nothing about RDMA programming (while RDMA is technically a subset of high-performance, user-level networking, the terms are commonly used interchangeably). At SC'16, I had heard about InfiniBand, but only at Mellanox Technology's booths. So my brain subconsciously associated the two. To the RDMA novices: I was wrong; to the RDMA experts: you can imagine the turmoil in my head when the first RDMA interface I was learning to use was the abstract `libfabric`.
While I could further digress from the topic of this post and write about my then confused state of mind, I will instead leave a list of links for those just getting started with the high-performance, user-level networking space:
If you don't know much about `sockets`, I suggest you quickly read about it and write a simple send/receive program using the `sockets` API to understand the implications of maintaining flow control of data transmission/reception in your code.
OFI guide: all sections just before `OFI Architecture` to motivate the existence of this field.
InfiniBand: this RedHat guide officially defines what InfiniBand is.
From what I have read so far, InfiniBand is the first official interface (building off of the Virtual Interface Architecture) for high-performance, user-level networking. High-performance because of no kernel involvement (hence, user-level) for operations that involve transmission/reception of data, unlike TCP/IP. The kernel is involved only in the creation of resources used for issuing data transmission/reception. Additionally, unlike TCP/IP, the InfiniBand interface permits RDMA operations (remote reads, writes, atomics, etc.). The InfiniBand (IB) specification has both hardware and software components.
`libibverbs` is the software component (Verbs API) of the IB interface. As `sockets` is to TCP/IP, `libibverbs` is to IB. Your best bet to learn how to code with `libibverbs` is the amazing RDMAmojo blog written by Dotan Barak, the creator of the man pages for `libibverbs`. You could even solely rely on his blog to learn about the InfiniBand concepts. He writes about the API in excruciating detail with very helpful FAQs. Here's his big-picture tutorial-style presentation. Other critical software components of the IB interface are the user-space libraries and kernel-space modules that implement the API and enable IB resource creation. This is precisely what the Open Fabric Enterprise Distribution (OFED) is. OFED’ user-space libraries are in the rdma-core repository and the kernel components are in driver/infiniband subsystem of the linux tree.
The hardware component of IB is where different vendors come into play. The IB interface is abstract; hence, multiple vendors can have different implementations of the IB specification. Mellanox Technologies has been an active, prominent InfiniBand hardware vendor. In addition to meeting the IB hardware specifications in the NIC design, the vendors have to support the `libibverbs` API by providing a user-space driver and a kernel-space driver that actually do the work (of setting up resources on the NIC) when a `libibverbs` function such as `ibv_open_device` is called. These vendor-specific libraries and kernel modules are a standard part of the OFED. The vendor-specific user-space libraries are called providers in rdma-core. These providers span both the IB and other technologies, such as RoCE and iWARP, that implement RDMA over Ethernet adapters (I’ll delve into the convergence between IB and Ethernet in another post). Mellanox OFED (MOFED) is Mellanox's implementation of the OFED libraries and kernel modules. MOFED contains certain optimizations that are targeted towards Mellanox hardware (the mlx4 and mlx5 providers) but haven't been incorporated into OFED yet.
`libfabric` is another, fairly recent API and intends to serve a level of abstraction higher than that of `libibverbs`. Alongside InfiniBand, several other user-level networking interfaces exist. Typically they are proprietary and vendor-specific. Cray has the uGNI interface, Intel Omni-Path has PSM2, Cisco usNIC, etc. The underlying concepts (message queues, completion queues, registered memory, etc.) between the different interfaces are similar with certain differences in capabilities and semantics. The Open Fabrics Interface (OFI) intends to unify all of the available interfaces by providing an abstract API: `libfabric`. Each vendor will then support the OFI through its `libfabric-provider` that will call corresponding functions in its own interface. This way, a user-level networking application written using the `libfabric` API is portable across different vendors. Based on the hardware that the application will be running on, the right `libfabric-provider` can then be selected.
Hope that helps you with an introduction to high-performance, user-level networking. If not, hope it gives you enough search keywords to use in your favorite search engine. All the best!
If you are interested to learn how transmitting a message works under the hood, check out my newer blog post: How are messages transmitted on InfiniBand?