🠛 QUAC 100 › Hardware Architecture
QUAC 100 Hardware Architecture
QUAC100-HW-002 Rev 2.0 — February 2026
Complete hardware reference for the Dyber QUAC 100 Quantum-Resistant Universal Accelerator Card. This document covers the AMD Versal HBM architecture, PCIe Gen5 x8x8 interface, 16 parallel Radix-32 NTT engines, HBM2e memory subsystem, QRNG subsystem, power distribution, thermal design, and mechanical specifications.
Product Overview #
The QUAC 100 is a PCIe Gen5 x8x8 hardware accelerator integrating NIST-standardized post-quantum cryptographic acceleration, a FIPS 140-3 Level 3 hardware security module, and quantum random number generation in a single device. Built on the AMD Versal HBM platform with 32 GB HBM2e high-bandwidth memory, the QUAC 100 delivers industry-leading performance for lattice-based cryptography.
<700ns Full KEM Cycle Latency
>800 Mbps QRNG Conditioned Output
Key Specifications Summary
Parameter Specification
Form Factor PCIe full-height, 3/4 length, dual-slot
Host Interface PCIe Gen5 x8x8 (or Gen4 x16)
FPGA Platform AMD Versal HBM
HBM Memory 32 GB HBM2e DRAM (2× 16GB stacks)
Memory Bandwidth 819 GB/s
NTT Engines 16 parallel Radix-32 units @ 1 GHz
QRNG Dual free-running ring oscillators, >800 Mbps conditioned, NIST SP 800-90B
Power 190W TDP
Cooling Passive (requires ≥200 LFM airflow)
Operating Temperature 0°C to +50°C ambient
Warranty 3 years standard (5 years GOV SKU)
Platform Architecture #
The QUAC 100 is built on the AMD Versal HBM adaptive compute acceleration platform, combining high-performance programmable logic with integrated HBM2e memory for maximum cryptographic throughput. The architecture is optimized for lattice-based post-quantum cryptography with dedicated hardware acceleration for NTT operations, polynomial arithmetic, and hash functions.
Subsystem Specification
FPGA Platform AMD Versal HBM Series
Programmable Logic High-density adaptive logic with AI Engine array
NTT Acceleration 16 parallel Radix-32 butterfly engines @ 1 GHz
Hash Acceleration SHA-3/SHAKE hardware cores, 20 Gbps aggregate
HBM Memory 32 GB HBM2e (2× 16GB stacks), 819 GB/s bandwidth
PCIe Interface Gen5 x8x8 (or Gen4 x16) with DMA engines
Security Features Secure boot, key storage enclave, side-channel countermeasures
NTT Engine Array
The Number Theoretic Transform (NTT) engine array is the computational core for ML-KEM (Kyber) and ML-DSA (Dilithium) operations. The array implements 16 parallel Radix-32 butterfly units operating at 1 GHz, enabling polynomial multiplication over the ring Zq [X]/(X256 +1) with exceptional throughput.
Component Specification
NTT Engines 16 parallel units, Radix-32 butterfly architecture
Operating Frequency 1.0 GHz
Pipeline Depth 8 stages
NTT Transform Size 256-point
Cycles per NTT <64 cycles
Modulus (ML-KEM) q = 3329
Modulus (ML-DSA) q = 8,380,417
Polynomial Ops 256-parallel add/sub/mul + Barrett/Montgomery reduction
NTT Hardware Components
Module Function Performance
NTT_FORWARD[0:15] Forward NTT Transform <64 cycles @ 1 GHz
NTT_INVERSE[0:15] Inverse NTT Transform <64 cycles @ 1 GHz
POLY_MUL Coefficient-wise multiplication 1 polynomial/cycle
POLY_ADD_SUB Vector addition/subtraction 1 polynomial/cycle
BARRETT_REDUCE Barrett modular reduction 1-cycle latency
MONTGOMERY_REDUCE Montgomery modular reduction 1-cycle latency
SAMPLER_CBD Centered Binomial Distribution 1 sample/cycle
COMPRESS_ENGINE Compression/Decompression Variable latency
Hash Acceleration
Dedicated SHA-3 and SHAKE hardware accelerators provide high-throughput hashing for key derivation, message hashing, and the extensive hash operations required by SLH-DSA (SPHINCS+).
Component Specification
SHA3-256/384/512 Hardware accelerated, 20 Gbps
SHAKE128/256 Extendable output, 20 Gbps
Merkle Tree Engine Parallel leaf hash computation for SPHINCS+
HMAC Acceleration Hardware HMAC-SHA3
Security Features
The QUAC 100 implements comprehensive security features including hardware-enforced isolation, side-channel countermeasures, and secure key storage.
Feature Description
Secure Boot Authenticated boot chain with RSA-4096/ECDSA verification
Key Storage Enclave Hardware-isolated key storage with access controls
Side-Channel Countermeasures Constant-time operations, power analysis resistance, EM shielding
Tamper Detection Environmental sensors, mesh detection (GOV SKU)
Zeroization Hardware-triggered emergency key destruction
Multi-Tenant Isolation SR-IOV with hardware-enforced memory separation
267mm Card Length (full-length PCIe)
111mm Card Height (full-height)
~450g Weight (with heatsink)
2-slot Width (dual-width profile)
PCIe Compliance
Requirement Specification QUAC 100 Implementation
Card Type Full-height, 3/4 length, dual-slot Compliant
Slot Width Dual-width (2 slots) Required for heatsink
Edge Connector x16 mechanical 164 pins (82 per side)
Gold Fingers 50μ" minimum Hard gold plating
Insertion Cycles >100 cycles Wear-resistant plating
CEM Compliance PCIe CEM 4.0 Fully compliant
PCB Specifications
Parameter Value Tolerance
PCB Length 267.00 mm ±0.25 mm
PCB Width 111.00 mm ±0.25 mm
PCB Thickness 2.40 mm ±0.20 mm
Layer Count 16 layers —
Copper Weight (outer) 2 oz (70 μm) —
Copper Weight (inner) 1 oz (35 μm) —
Edge Connector Gold 30 μ" minimum —
PCIe Gen5 x8x8 Interface #
The QUAC 100 implements a PCIe Gen5 x8x8 (bifurcated) interface providing 64 GB/s bidirectional theoretical bandwidth between the host system and the accelerator. The implementation also supports Gen4 x16 fallback negotiation.
Parameter Specification
Generation PCIe Gen5 x8x8 (or Gen4 x16 fallback)
Lane Width x8x8 (bifurcated) or x16
Per-Lane Rate 16 GT/s
Encoding 128b/130b
Bidirectional Bandwidth 64 GB/s theoretical
Reference Clock 100 MHz ±100 ppm (HCSL)
DMA Channels 8 independent (4 Host→Device, 4 Device→Host)
MSI-X Vectors 8
SR-IOV Support Up to 8 virtual functions
Power Management D0, D1, D2, D3hot, D3cold states
Configuration Space
Register Value Description
Vendor ID 0x1DB7Dyber, Inc.
Device ID 0x0100QUAC 100
Class Code 0x100000Encryption controller
BAR0 Size 16 MB Memory-mapped register space
BAR2 Size 256 MB DMA buffer space
Transaction Latency
Operation Typical Maximum Notes
Configuration Read 100 ns 1 μs
Configuration Write 100 ns 1 μs
Memory Read (32-bit) 200 ns 500 ns BAR0 access
Memory Write (32-bit) 150 ns 400 ns Posted write
DMA Read (4 KB) 2 μs 10 μs Host to device
DMA Write (4 KB) 2 μs 10 μs Device to host
MSI-X Interrupt 300 ns 1 μs Latency to host
HBM2e Memory Subsystem #
The QUAC 100 features 32 GB of HBM2e (High Bandwidth Memory) integrated directly with the AMD Versal HBM package. This provides exceptional memory bandwidth for cryptographic operations, key storage, and batch processing.
Parameter Specification
Memory Type HBM2e (High Bandwidth Memory 2e)
Total Capacity 32 GB (2× 16GB stacks)
Stack Configuration 2 stacks, 8-Hi each
Theoretical Bandwidth 819 GB/s
Channels 16 independent channels per stack
Channel Width 64-bit per channel
ECC SECDED per channel
Operating Voltage 1.2V
Memory Bandwidth Utilization
Workload Bandwidth Used Efficiency
ML-KEM Batch Processing ~400 GB/s 49%
ML-DSA Signing ~350 GB/s 43%
Key Store Operations ~200 GB/s 24%
Mixed Workload (typical) ~450 GB/s 55%
QRNG Architecture #
The quantum random number generation subsystem provides high-quality entropy for all cryptographic operations. It implements dual free-running ring oscillator entropy sources with SHA-3/SHAKE post-processing and continuous NIST SP 800-90B health testing.
Parameter Specification
Entropy Source Dual free-running ring oscillators
Conditioned Output Rate >800 Mbps
Post-Processing SHA-3 / SHAKE-256 conditioning
Health Testing NIST SP 800-90B: repetition count, adaptive proportion, startup tests
Min-Entropy Available via QRNG_MIN_ENTROPY register
Output Interface Memory-mapped + DMA
Power Distribution & Management #
The QUAC 100 operates at 190W TDP, drawing power from the PCIe slot plus two 8-pin auxiliary power connectors.
Power Budget
Power Rail Source Voltage Typical Power
VCC_12V PCIe Slot + 2× 8-pin Aux 12V ±8% 180W
VCC_3V3_AUX PCIe Slot 3.3V ±9% 10W
Total TDP 190W
Power Dissipation by Subsystem
Subsystem Typical Primary Heat Source
Versal HBM FPGA 130W NTT engines, logic, AI Engines
HBM2e Memory 30W Memory I/O, refresh
Power Conversion 18W Regulator losses
QRNG, Clocks, I/O 12W Miscellaneous
Total 190W
Power vs. Workload
Workload Power (W) Efficiency Notes
Idle 35 — Device enabled, no operations
ML-KEM-512 (100%) 190 6,316 ops/W 1.2M ops/s at 190W
ML-KEM-768 (100%) 110 7,270 ops/W 800K ops/s at 110W
ML-DSA-65 Sign (100%) 105 2,670 ops/W 280K ops/s at 105W
Mixed Workload (typical) 95 — Representative datacenter load
Thermal Design #
The QUAC 100 uses passive cooling with a high-performance heatsink, requiring adequate system airflow for proper operation.
Thermal Design Requirements
Requirement Specification Rationale
TDP 190W Maximum sustained power
Operating Ambient 0°C to +50°C Data center & enterprise deployment
Junction Temperature <100°C Reliability and performance
Thermal Margin ≥15°C below limits Sustained operation headroom
Cooling Method Passive heatsink ≥200 LFM required (300 LFM recommended)
Thermal Resistance
Path Value Notes
Junction to Case (θJC ) 0.10 °C/W Package specification
Case to Heatsink (θCS ) 0.05 °C/W With thermal interface material
Heatsink to Ambient (θSA ) 0.20 °C/W At 300 LFM airflow
Junction to Ambient (θJA ) 0.35 °C/W Total thermal resistance
Airflow Requirements
Airflow (LFM) Tj at 100% Load, 25°C Ambient Status
0 (Natural) Thermal shutdown Not supported
100 95°C Throttling likely
200 85°C Minimum required
300 75°C Recommended
400+ <70°C Excellent
Environmental Specifications
Parameter Operating Storage
Temperature 0°C to +50°C -40°C to +85°C
Humidity 10% to 90% (non-condensing) 5% to 95% (non-condensing)
Altitude 0 to 3,048 m (10,000 ft) 0 to 12,192 m (40,000 ft)
Vibration 0.5G, 5–500 Hz 1.0G, 5–500 Hz
Shock 10G, 11 ms half-sine 30G, 11 ms half-sine
Airflow Requirement ≥200 LFM across card N/A
ML-KEM (Kyber) Performance
Operation ML-KEM-512 ML-KEM-768 ML-KEM-1024 Unit
Full Cycle Latency <700 ns <950 ns <1.33 μs —
Full Cycle Throughput 1,200K 800K 550K ops/s
KeyGen Throughput 1,400K 950K 650K ops/s
Encaps Throughput 1,300K 900K 600K ops/s
Decaps Throughput 1,200K 800K 550K ops/s
ML-DSA (Dilithium) Performance
Operation ML-DSA-44 ML-DSA-65 ML-DSA-87 Unit
Sign Latency 850 ns 1.2 μs 1.8 μs —
Verify Latency 320 ns 480 ns 700 ns —
Sign Throughput 400K 280K 180K ops/s
Verify Throughput 900K 650K 450K ops/s
SLH-DSA (SPHINCS+) Performance
Operation SLH-DSA-128s SLH-DSA-192s SLH-DSA-256s Unit
Sign 500 μs 800 μs 1,200 μs μs
Verify 25 μs 35 μs 50 μs μs
Sign Throughput 2K 1.25K 833 ops/s
Verify Throughput 40K 29K 20K ops/s
Symmetric Cryptography Performance
Algorithm Throughput Latency (1 KB) Notes
AES-128-GCM 20 Gbps 0.4 μs Authenticated encryption
AES-256-GCM 16 Gbps 0.5 μs Authenticated encryption
SHA3-256 20 Gbps 0.4 μs Hash function
SHA3-512 12 Gbps 0.7 μs Hash function
SHAKE128/256 20 Gbps Variable Extendable output
QRNG Output >800 Mbps — Conditioned entropy (dual free-running ring oscillators, raw ~1.5 Gbps)
Register Map #
All registers are accessible via the PCIe BAR0 memory space. The register map is organized into functional regions with 4 KB alignment per region.
BAR0 Memory Map Summary
Offset Size Region Description
0x0000_00004 KB Device Control Device identification and control
0x0000_10004 KB Interrupt Control Interrupt status and masking
0x0000_20004 KB DMA Control DMA engine configuration
0x0000_30004 KB Crypto Control Cryptographic engine control
0x0000_40004 KB QRNG Control Quantum RNG control and status
0x0000_50004 KB Key Management Key storage and operations
0x0000_60004 KB Power Management Power states and monitoring
0x0000_70004 KB Debug/Diagnostic Debug registers and counters
0x0001_000064 KB Job Queues Command submission queues
0x0010_00001 MB Completion Queues Command completion queues
Device Control Registers (Base: 0x0000)
Offset Name Width Access Description
0x0000DEV_ID 32 RO Device ID (0x0100_1DB7)
0x0004DEV_REV 32 RO Device revision (HW rev | FW ver)
0x0008DEV_CAP 32 RO Device capabilities bitmap
0x000CDEV_CTRL 32 RW Device control (enable, reset)
0x0010DEV_STATUS 32 RO Device status (ready, error)
0x0014DEV_CONFIG 32 RW Device configuration
QRNG Registers (Base: 0x4000)
Offset Name Access Description
0x4000QRNG_CTRL RW QRNG control register
0x4004QRNG_STATUS RO QRNG status and health
0x4008ENTROPY_AVAIL RO Available entropy (bytes)
0x400CENTROPY_RATE RO Entropy generation rate (Mbps)
0x4010HEALTH_TEST RO Health test results
0x4018MIN_ENTROPY RO Min-entropy estimate
Ordering Information #
The QUAC 100 is available in several pre-configured SKUs. Contact sales@dyber.org for pricing, availability, and volume discounts.
SKU Name Configuration KEM Ops/s Power Warranty
QUAC100-STD Standard AMD Versal HBM, 32GB HBM2e, 190W TDP, passive cooling 1.2M+ 190W 3 yr
QUAC100-GOV Government STD + FIPS 140-3 L3, tamper-evident, USA supply chain 1.2M+ 190W 5 yr
QUAC100-DEV Developer Kit STD + 1-year SDK Pro license + debug tools + training 1.2M+ 190W 3 yr
Compliance & Standards #
Standard Status Scope
FIPS 140-3 Level 3 IUT — atsec (target Q4 2026) Cryptographic boundary, key management, self-tests
Common Criteria EAL4+ Planned Security target for hardware accelerator
CNSA 2.0 Aligned NSA algorithm requirements for national security systems
NIST SP 800-90B Compliant QRNG entropy source health testing
PCIe CEM 4.0 Compliant Card electromechanical specification
RoHS 3 (EU 2015/863) Compliant Restriction of hazardous substances
REACH Compliant Registration, evaluation, authorization of chemicals
WEEE Compliant Waste electrical and electronic equipment
Country of Origin USA ITAR-free design, trusted supplier program