QUAC 100 Hardware Architecture

QUAC100-HW-002Rev 2.0 — February 2026

Complete hardware reference for the Dyber QUAC 100 Quantum-Resistant Universal Accelerator Card. This document covers the AMD Versal HBM architecture, PCIe Gen5 x8x8 interface, 16 parallel Radix-32 NTT engines, HBM2e memory subsystem, QRNG subsystem, power distribution, thermal design, and mechanical specifications.

Product Overview #

The QUAC 100 is a PCIe Gen5 x8x8 hardware accelerator integrating NIST-standardized post-quantum cryptographic acceleration, a FIPS 140-3 Level 3 hardware security module, and quantum random number generation in a single device. Built on the AMD Versal HBM platform with 32 GB HBM2e high-bandwidth memory, the QUAC 100 delivers industry-leading performance for lattice-based cryptography.

1.2M+

ML-KEM-512 Ops/Sec

<700ns

Full KEM Cycle Latency

32 GB

HBM2e @ 819 GB/s

>800 Mbps

QRNG Conditioned Output

Key Specifications Summary

ParameterSpecification
Form FactorPCIe full-height, 3/4 length, dual-slot
Host InterfacePCIe Gen5 x8x8 (or Gen4 x16)
FPGA PlatformAMD Versal HBM
HBM Memory32 GB HBM2e DRAM (2× 16GB stacks)
Memory Bandwidth819 GB/s
NTT Engines16 parallel Radix-32 units @ 1 GHz
QRNGDual free-running ring oscillators, >800 Mbps conditioned, NIST SP 800-90B
Power190W TDP
CoolingPassive (requires ≥200 LFM airflow)
Operating Temperature0°C to +50°C ambient
Warranty3 years standard (5 years GOV SKU)

Platform Architecture #

The QUAC 100 is built on the AMD Versal HBM adaptive compute acceleration platform, combining high-performance programmable logic with integrated HBM2e memory for maximum cryptographic throughput. The architecture is optimized for lattice-based post-quantum cryptography with dedicated hardware acceleration for NTT operations, polynomial arithmetic, and hash functions.

SubsystemSpecification
FPGA PlatformAMD Versal HBM Series
Programmable LogicHigh-density adaptive logic with AI Engine array
NTT Acceleration16 parallel Radix-32 butterfly engines @ 1 GHz
Hash AccelerationSHA-3/SHAKE hardware cores, 20 Gbps aggregate
HBM Memory32 GB HBM2e (2× 16GB stacks), 819 GB/s bandwidth
PCIe InterfaceGen5 x8x8 (or Gen4 x16) with DMA engines
Security FeaturesSecure boot, key storage enclave, side-channel countermeasures

NTT Engine Array

The Number Theoretic Transform (NTT) engine array is the computational core for ML-KEM (Kyber) and ML-DSA (Dilithium) operations. The array implements 16 parallel Radix-32 butterfly units operating at 1 GHz, enabling polynomial multiplication over the ring Zq[X]/(X256+1) with exceptional throughput.

ComponentSpecification
NTT Engines16 parallel units, Radix-32 butterfly architecture
Operating Frequency1.0 GHz
Pipeline Depth8 stages
NTT Transform Size256-point
Cycles per NTT<64 cycles
Modulus (ML-KEM)q = 3329
Modulus (ML-DSA)q = 8,380,417
Polynomial Ops256-parallel add/sub/mul + Barrett/Montgomery reduction

NTT Hardware Components

ModuleFunctionPerformance
NTT_FORWARD[0:15]Forward NTT Transform<64 cycles @ 1 GHz
NTT_INVERSE[0:15]Inverse NTT Transform<64 cycles @ 1 GHz
POLY_MULCoefficient-wise multiplication1 polynomial/cycle
POLY_ADD_SUBVector addition/subtraction1 polynomial/cycle
BARRETT_REDUCEBarrett modular reduction1-cycle latency
MONTGOMERY_REDUCEMontgomery modular reduction1-cycle latency
SAMPLER_CBDCentered Binomial Distribution1 sample/cycle
COMPRESS_ENGINECompression/DecompressionVariable latency

Hash Acceleration

Dedicated SHA-3 and SHAKE hardware accelerators provide high-throughput hashing for key derivation, message hashing, and the extensive hash operations required by SLH-DSA (SPHINCS+).

ComponentSpecification
SHA3-256/384/512Hardware accelerated, 20 Gbps
SHAKE128/256Extendable output, 20 Gbps
Merkle Tree EngineParallel leaf hash computation for SPHINCS+
HMAC AccelerationHardware HMAC-SHA3

Security Features

The QUAC 100 implements comprehensive security features including hardware-enforced isolation, side-channel countermeasures, and secure key storage.

FeatureDescription
Secure BootAuthenticated boot chain with RSA-4096/ECDSA verification
Key Storage EnclaveHardware-isolated key storage with access controls
Side-Channel CountermeasuresConstant-time operations, power analysis resistance, EM shielding
Tamper DetectionEnvironmental sensors, mesh detection (GOV SKU)
ZeroizationHardware-triggered emergency key destruction
Multi-Tenant IsolationSR-IOV with hardware-enforced memory separation

Form Factor & Mechanical Specifications #

267mm

Card Length (full-length PCIe)

111mm

Card Height (full-height)

~450g

Weight (with heatsink)

2-slot

Width (dual-width profile)

PCIe Compliance

RequirementSpecificationQUAC 100 Implementation
Card TypeFull-height, 3/4 length, dual-slotCompliant
Slot WidthDual-width (2 slots)Required for heatsink
Edge Connectorx16 mechanical164 pins (82 per side)
Gold Fingers50μ" minimumHard gold plating
Insertion Cycles>100 cyclesWear-resistant plating
CEM CompliancePCIe CEM 4.0Fully compliant

PCB Specifications

ParameterValueTolerance
PCB Length267.00 mm±0.25 mm
PCB Width111.00 mm±0.25 mm
PCB Thickness2.40 mm±0.20 mm
Layer Count16 layers—
Copper Weight (outer)2 oz (70 μm)—
Copper Weight (inner)1 oz (35 μm)—
Edge Connector Gold30 μ" minimum—

PCIe Gen5 x8x8 Interface #

The QUAC 100 implements a PCIe Gen5 x8x8 (bifurcated) interface providing 64 GB/s bidirectional theoretical bandwidth between the host system and the accelerator. The implementation also supports Gen4 x16 fallback negotiation.

ParameterSpecification
GenerationPCIe Gen5 x8x8 (or Gen4 x16 fallback)
Lane Widthx8x8 (bifurcated) or x16
Per-Lane Rate16 GT/s
Encoding128b/130b
Bidirectional Bandwidth64 GB/s theoretical
Reference Clock100 MHz ±100 ppm (HCSL)
DMA Channels8 independent (4 Host→Device, 4 Device→Host)
MSI-X Vectors8
SR-IOV SupportUp to 8 virtual functions
Power ManagementD0, D1, D2, D3hot, D3cold states

Configuration Space

RegisterValueDescription
Vendor ID0x1DB7Dyber, Inc.
Device ID0x0100QUAC 100
Class Code0x100000Encryption controller
BAR0 Size16 MBMemory-mapped register space
BAR2 Size256 MBDMA buffer space

Transaction Latency

OperationTypicalMaximumNotes
Configuration Read100 ns1 μs
Configuration Write100 ns1 μs
Memory Read (32-bit)200 ns500 nsBAR0 access
Memory Write (32-bit)150 ns400 nsPosted write
DMA Read (4 KB)2 μs10 μsHost to device
DMA Write (4 KB)2 μs10 μsDevice to host
MSI-X Interrupt300 ns1 μsLatency to host

HBM2e Memory Subsystem #

The QUAC 100 features 32 GB of HBM2e (High Bandwidth Memory) integrated directly with the AMD Versal HBM package. This provides exceptional memory bandwidth for cryptographic operations, key storage, and batch processing.

ParameterSpecification
Memory TypeHBM2e (High Bandwidth Memory 2e)
Total Capacity32 GB (2× 16GB stacks)
Stack Configuration2 stacks, 8-Hi each
Theoretical Bandwidth819 GB/s
Channels16 independent channels per stack
Channel Width64-bit per channel
ECCSECDED per channel
Operating Voltage1.2V

Memory Bandwidth Utilization

WorkloadBandwidth UsedEfficiency
ML-KEM Batch Processing~400 GB/s49%
ML-DSA Signing~350 GB/s43%
Key Store Operations~200 GB/s24%
Mixed Workload (typical)~450 GB/s55%

QRNG Architecture #

The quantum random number generation subsystem provides high-quality entropy for all cryptographic operations. It implements dual free-running ring oscillator entropy sources with SHA-3/SHAKE post-processing and continuous NIST SP 800-90B health testing.

ParameterSpecification
Entropy SourceDual free-running ring oscillators
Conditioned Output Rate>800 Mbps
Post-ProcessingSHA-3 / SHAKE-256 conditioning
Health TestingNIST SP 800-90B: repetition count, adaptive proportion, startup tests
Min-EntropyAvailable via QRNG_MIN_ENTROPY register
Output InterfaceMemory-mapped + DMA

Power Distribution & Management #

The QUAC 100 operates at 190W TDP, drawing power from the PCIe slot plus two 8-pin auxiliary power connectors.

Power Budget

Power RailSourceVoltageTypical Power
VCC_12VPCIe Slot + 2× 8-pin Aux12V ±8%180W
VCC_3V3_AUXPCIe Slot3.3V ±9%10W
Total TDP190W

Power Dissipation by Subsystem

SubsystemTypicalPrimary Heat Source
Versal HBM FPGA130WNTT engines, logic, AI Engines
HBM2e Memory30WMemory I/O, refresh
Power Conversion18WRegulator losses
QRNG, Clocks, I/O12WMiscellaneous
Total190W

Power vs. Workload

WorkloadPower (W)EfficiencyNotes
Idle35—Device enabled, no operations
ML-KEM-512 (100%)1906,316 ops/W1.2M ops/s at 190W
ML-KEM-768 (100%)1107,270 ops/W800K ops/s at 110W
ML-DSA-65 Sign (100%)1052,670 ops/W280K ops/s at 105W
Mixed Workload (typical)95—Representative datacenter load

Thermal Design #

The QUAC 100 uses passive cooling with a high-performance heatsink, requiring adequate system airflow for proper operation.

Thermal Design Requirements

RequirementSpecificationRationale
TDP190WMaximum sustained power
Operating Ambient0°C to +50°CData center & enterprise deployment
Junction Temperature<100°CReliability and performance
Thermal Margin≥15°C below limitsSustained operation headroom
Cooling MethodPassive heatsink≥200 LFM required (300 LFM recommended)

Thermal Resistance

PathValueNotes
Junction to Case (θJC)0.10 °C/WPackage specification
Case to Heatsink (θCS)0.05 °C/WWith thermal interface material
Heatsink to Ambient (θSA)0.20 °C/WAt 300 LFM airflow
Junction to Ambient (θJA)0.35 °C/WTotal thermal resistance

Airflow Requirements

Airflow (LFM)Tj at 100% Load, 25°C AmbientStatus
0 (Natural)Thermal shutdownNot supported
10095°CThrottling likely
20085°CMinimum required
30075°CRecommended
400+<70°CExcellent

Environmental Specifications

ParameterOperatingStorage
Temperature0°C to +50°C-40°C to +85°C
Humidity10% to 90% (non-condensing)5% to 95% (non-condensing)
Altitude0 to 3,048 m (10,000 ft)0 to 12,192 m (40,000 ft)
Vibration0.5G, 5–500 Hz1.0G, 5–500 Hz
Shock10G, 11 ms half-sine30G, 11 ms half-sine
Airflow Requirement≥200 LFM across cardN/A

Performance Benchmarks #

ML-KEM (Kyber) Performance

OperationML-KEM-512ML-KEM-768ML-KEM-1024Unit
Full Cycle Latency<700 ns<950 ns<1.33 μs—
Full Cycle Throughput1,200K800K550Kops/s
KeyGen Throughput1,400K950K650Kops/s
Encaps Throughput1,300K900K600Kops/s
Decaps Throughput1,200K800K550Kops/s

ML-DSA (Dilithium) Performance

OperationML-DSA-44ML-DSA-65ML-DSA-87Unit
Sign Latency850 ns1.2 μs1.8 μs—
Verify Latency320 ns480 ns700 ns—
Sign Throughput400K280K180Kops/s
Verify Throughput900K650K450Kops/s

SLH-DSA (SPHINCS+) Performance

OperationSLH-DSA-128sSLH-DSA-192sSLH-DSA-256sUnit
Sign500 μs800 μs1,200 μsμs
Verify25 μs35 μs50 μsμs
Sign Throughput2K1.25K833ops/s
Verify Throughput40K29K20Kops/s

Symmetric Cryptography Performance

AlgorithmThroughputLatency (1 KB)Notes
AES-128-GCM20 Gbps0.4 μsAuthenticated encryption
AES-256-GCM16 Gbps0.5 μsAuthenticated encryption
SHA3-25620 Gbps0.4 μsHash function
SHA3-51212 Gbps0.7 μsHash function
SHAKE128/25620 GbpsVariableExtendable output
QRNG Output>800 MbpsConditioned entropy (dual free-running ring oscillators, raw ~1.5 Gbps)

Register Map #

All registers are accessible via the PCIe BAR0 memory space. The register map is organized into functional regions with 4 KB alignment per region.

BAR0 Memory Map Summary

OffsetSizeRegionDescription
0x0000_00004 KBDevice ControlDevice identification and control
0x0000_10004 KBInterrupt ControlInterrupt status and masking
0x0000_20004 KBDMA ControlDMA engine configuration
0x0000_30004 KBCrypto ControlCryptographic engine control
0x0000_40004 KBQRNG ControlQuantum RNG control and status
0x0000_50004 KBKey ManagementKey storage and operations
0x0000_60004 KBPower ManagementPower states and monitoring
0x0000_70004 KBDebug/DiagnosticDebug registers and counters
0x0001_000064 KBJob QueuesCommand submission queues
0x0010_00001 MBCompletion QueuesCommand completion queues

Device Control Registers (Base: 0x0000)

OffsetNameWidthAccessDescription
0x0000DEV_ID32RODevice ID (0x0100_1DB7)
0x0004DEV_REV32RODevice revision (HW rev | FW ver)
0x0008DEV_CAP32RODevice capabilities bitmap
0x000CDEV_CTRL32RWDevice control (enable, reset)
0x0010DEV_STATUS32RODevice status (ready, error)
0x0014DEV_CONFIG32RWDevice configuration

QRNG Registers (Base: 0x4000)

OffsetNameAccessDescription
0x4000QRNG_CTRLRWQRNG control register
0x4004QRNG_STATUSROQRNG status and health
0x4008ENTROPY_AVAILROAvailable entropy (bytes)
0x400CENTROPY_RATEROEntropy generation rate (Mbps)
0x4010HEALTH_TESTROHealth test results
0x4018MIN_ENTROPYROMin-entropy estimate

Ordering Information #

The QUAC 100 is available in several pre-configured SKUs. Contact sales@dyber.org for pricing, availability, and volume discounts.

SKUNameConfigurationKEM Ops/sPowerWarranty
QUAC100-STDStandardAMD Versal HBM, 32GB HBM2e, 190W TDP, passive cooling1.2M+190W3 yr
QUAC100-GOVGovernmentSTD + FIPS 140-3 L3, tamper-evident, USA supply chain1.2M+190W5 yr
QUAC100-DEVDeveloper KitSTD + 1-year SDK Pro license + debug tools + training1.2M+190W3 yr

Compliance & Standards #

StandardStatusScope
FIPS 140-3 Level 3IUT — atsec (target Q4 2026)Cryptographic boundary, key management, self-tests
Common Criteria EAL4+PlannedSecurity target for hardware accelerator
CNSA 2.0AlignedNSA algorithm requirements for national security systems
NIST SP 800-90BCompliantQRNG entropy source health testing
PCIe CEM 4.0CompliantCard electromechanical specification
RoHS 3 (EU 2015/863)CompliantRestriction of hazardous substances
REACHCompliantRegistration, evaluation, authorization of chemicals
WEEECompliantWaste electrical and electronic equipment
Country of OriginUSAITAR-free design, trusted supplier program