Friday 07 March 2025
The quest for efficient computing has led researchers to develop innovative solutions that combine dense and sparse memory access computations, a crucial aspect of machine learning (ML) and high-performance computing (HPC) applications. The latest achievement in this field is the creation of Occamy, a 432-core dual-chiplet RISC-V system designed specifically for accelerating dense and sparse computations.
Occamy’s architecture is built around two chiplets: a compute chiplet and an interconnect chiplet. The former contains 216 cores, each capable of executing up to 768 floating-point operations per second (FLOPS). These cores are arranged in a hierarchical structure, enabling efficient data communication between them. In contrast, the interconnect chiplet is responsible for linking the two chiplets together via a latency-tolerant hierarchical interconnect.
One of Occamy’s most notable features is its ability to seamlessly switch between dense and sparse computations. This is achieved through the use of specialized in-core streaming units (SUs) that can efficiently handle both types of computations. The SUs are designed to accelerate dense linear algebra operations, such as matrix multiplication, while also being optimized for sparse computations like stencil codes.
To demonstrate Occamy’s capabilities, researchers conducted a series of benchmarking tests using various algorithms and workloads. Results showed that the system achieved an impressive 89% FPU (floating-point unit) utilization when executing dense linear algebra tasks, outperforming many existing solutions in this field. In sparse computations, such as stencil codes, Occamy reached an FPU utilization of 83%, with a technology-node-normalized compute density of 11.1 DP-GFLOPS/mm², surpassing state-of-the-art processors by 1.7 times.
Occamy’s performance was also evaluated using real-world workloads from the fields of ML and HPC. The system demonstrated exceptional efficiency in accelerating sparse-dense linear algebra operations, achieving a throughput of up to 187 GCOMP/s at an energy consumption of just 17.4 GCOMP/W. This level of performance is crucial for applications that require both dense and sparse computations, such as graph neural networks.
The development of Occamy marks a significant step forward in the pursuit of efficient computing solutions. By combining advanced architecture design with specialized hardware accelerators, researchers have created a system capable of tackling complex computational tasks while minimizing energy consumption.
Cite this article: “Occamy: A High-Performance Computing System for Efficient Dense and Sparse Computation”, The Science Archive, 2025.
Machine Learning, High-Performance Computing, Occamy, Risc-V, Sparse Computations, Dense Computations, Linear Algebra, Stencil Codes, Floating-Point Operations, Hierarchical Interconnect.







