Keeping things BORING for Application Developers
One of the underlying mantras driving the CCIX Software Work Group was coined by Red Hat’s Jon Masters – “keep it boring.” But when has “boring” ever been good? Well, that depends on your definition of boring, how “boring” is implemented and just why you’re trying to keep it boring in the first place.
An essential part of establishing the CCIX Standard was to make hardware responsible for the important task of maintaining cache coherency. This allows Application Developers to focus on what they do best – creating the applications their customers need, while delivering them to market more quickly and with lower development costs. That is something we can all get excited about.
A CCIX accelerator refers to a processing unit, other than the application processors in the host, that provides accelerated computation for a specific use case. These accelerators can be GPUs*, FPGAs, Smart NICs, or other SoCs. CCIX also enables IO masters with compute capabilities to be considered as accelerators. Because the CCIX architecture provides hardware-designed accelerators with the ability to maintain cache coherency, the CCIX protocol does not require software support for cache management. Hardware coherency enables software to be simpler and portable, thus reducing development effort and cost. Or, as Jon stated, making software development “boring”.
CCIX also provides another key feature that greatly reduces software development burden. But before elaborating on this feature, it is important to consider the importance of standardization. Standardization is the cornerstone of successful software development – indeed, the software community thrives on standardization. Standardization is a very broad term, but in the context of software, it simply means that key ingredients in the software all use the same set of interfaces to communicate information and data with each other. An important aspiration for software enablement is to allow existing software stacks to “just work” with CCIX, requiring minimum to no porting.
From a software perspective, CCIX makes an important distinction between the hardware accelerator (i.e. FPGA, GPU) and accelerator functions. Defining these different components of a hardware accelerator allows CCIX to do the heavy lifting, and creates a standard representation of the accelerator that is exposed to software via well-accepted industry-standard interfaces and traditional software programming methodology. In other words, CCIX makes things even more “boring” for software developers!
Here is a brief overview of how the CCIX Acceleration Function (AF) works, and the various components that comprise the CCIX AF.
The CCIX AF comprises logic components of the accelerator that provide the necessary computation. The software programming interface provided by the CCIX AF is called the Acceleration Function Core (AFC). An AF might comprise one or more AFCs, which allows software to communicate with the compute engine(s) within the AF, configure the AF, query the AF for status and capabilities, and assign the AF to software-based “consumers” such as a Virtual Machine.
The AFC contains all of the required components for supervisory software to manage the AF operation. In a virtualized environment, an AFC can be assigned by a hypervisor to a Virtual Machine, while an OS driver may manage the AFC in a bare-metal system (see figure below).
Finally, each AFC supports contexts called an Acceleration Function Thread (AFT) a physical unit that software can use to take advantage of the acceleration capability of the AF. AFTs are mutually independent units of hardware threads that execute internal acceleration logic commands from its user and operate on data supplied to by that user.
Cache coherency is a bedrock function of every HPC application. Through the use of Acceleration Function components and the CCIX Architecture, CCIX enables software developers to focus on their applications, freeing them up to dedicate development time to their product, rather than spending time working out how to manage coherency. Furthermore, the AF interface allow for standardization to be put into practice. Last but not least, the AF interface offers flexibility of implementation, allowing for implementation over existing interconnect architectures such as PCIe.
As you can see, there’re really nothing “boring” about the CCIX architecture and software. But thanks to the mantra of “keep it boring”, the hard work done by and for CCIX Consortium members enables software developers to let CCIX handle all the boring stuff and get on with the things that are truly exciting – creating amazing applications!
To learn more about CCIX Acceleration Functions and how CCIX manages cache, download the free CCIX® Software Developer’s Guide.
*GPU – Graphics Processing Unit
FPGA – Field-Programmable Gate Array
Smart NIC – Smart Network Interface Card
SoC – System on Chip