Occamy: A High-Performance Computing System for Efficient Dense and Sparse Computation

Friday 07 March 2025

The quest for efficient computing has led researchers to develop innovative solutions that combine dense and sparse memory access computations, a crucial aspect of machine learning (ML) and high-performance computing (HPC) applications. The latest achievement in this field is the creation of Occamy, a 432-core dual-chiplet RISC-V system designed specifically for accelerating dense and sparse computations.

Occamy’s architecture is built around two chiplets: a compute chiplet and an interconnect chiplet. The former contains 216 cores, each capable of executing up to 768 floating-point operations per second (FLOPS). These cores are arranged in a hierarchical structure, enabling efficient data communication between them. In contrast, the interconnect chiplet is responsible for linking the two chiplets together via a latency-tolerant hierarchical interconnect.

One of Occamy’s most notable features is its ability to seamlessly switch between dense and sparse computations. This is achieved through the use of specialized in-core streaming units (SUs) that can efficiently handle both types of computations. The SUs are designed to accelerate dense linear algebra operations, such as matrix multiplication, while also being optimized for sparse computations like stencil codes.

To demonstrate Occamy’s capabilities, researchers conducted a series of benchmarking tests using various algorithms and workloads. Results showed that the system achieved an impressive 89% FPU (floating-point unit) utilization when executing dense linear algebra tasks, outperforming many existing solutions in this field. In sparse computations, such as stencil codes, Occamy reached an FPU utilization of 83%, with a technology-node-normalized compute density of 11.1 DP-GFLOPS/mm², surpassing state-of-the-art processors by 1.7 times.

Occamy’s performance was also evaluated using real-world workloads from the fields of ML and HPC. The system demonstrated exceptional efficiency in accelerating sparse-dense linear algebra operations, achieving a throughput of up to 187 GCOMP/s at an energy consumption of just 17.4 GCOMP/W. This level of performance is crucial for applications that require both dense and sparse computations, such as graph neural networks.

The development of Occamy marks a significant step forward in the pursuit of efficient computing solutions. By combining advanced architecture design with specialized hardware accelerators, researchers have created a system capable of tackling complex computational tasks while minimizing energy consumption.

Cite this article: “Occamy: A High-Performance Computing System for Efficient Dense and Sparse Computation”, The Science Archive, 2025.

Machine Learning, High-Performance Computing, Occamy, Risc-V, Sparse Computations, Dense Computations, Linear Algebra, Stencil Codes, Floating-Point Operations, Hierarchical Interconnect.

Reference: Paul Scheffler, Thomas Benz, Viviane Potocnik, Tim Fischer, Luca Colagrande, Nils Wistoff, Yichao Zhang, Luca Bertaccini, Gianmarco Ottavi, Manuel Eggimann, et al., “Occamy: A 432-Core Dual-Chiplet Dual-HBM2E 768-DP-GFLOP/s RISC-V System for 8-to-64-bit Dense and Sparse Computing in 12nm FinFET” (2025).

Leave a ReplyCancel Reply

Related Posts

Neural USD: A Novel Approach to Object-Centric Image Editing

Integrating Information Extraction with Target Databases for Efficient Data Analysis

Breaking Barriers in Distributed Graph Algorithms: A New Algorithm for Efficiently Coloring Graphs with Bounded Neighborhood Independence

Realistic Urban Traffic Simulation for Autonomous Vehicles

Unraveling Chaos: A New Approach to Forecasting Complex Systems

ArtiLatent: A Breakthrough Framework for Realistic 3D Object Generation from Single Images