WebNN NPU Compute Measurement

Estimate effective NPU compute available to the browser by running a WebNN NPU-only matrix multiplication workload

Category

Dev Tools

How to Use

  1. Choose the NPU-only backend, compute workload size, and measurement duration
  2. Click "Measure NPU" to build the WebNN graph and run warmup batches
  3. Review measured NPU TOPS, peak TOPS, sustained TOPS, average latency, and iteration count
  4. NPU-only mode does not fall back to GPU or CPU; if NPU context creation fails, no NPU compute result is reported

Examples

  • NPU-only compute measurement

    Input: Backend: NPU Only | Workload: Compute | Duration: 20s

    Output: Measured NPU Compute 8.42 TOPS | Peak 9.10 TOPS | Sustained 8.01 TOPS

  • CPU/GPU baseline

    Input: Backend: CPU or GPU Baseline | Workload: Balanced | Duration: 10s

    Output: Compare against NPU-only mode to see browser scheduling differences

FAQ

Is this the vendor-rated NPU peak TOPS?
No. This is effective browser-visible TOPS for the WebNN graph, including browser, driver, scheduling, and small synchronization overhead.
How does it make sure the NPU is used?
NPU-only mode requests deviceType=npu and no longer falls back to GPU or CPU. If the browser cannot create an NPU context, the tool reports an error instead of another backend result.
Why use matrix multiplication?
Dense matmul layers are common neural-network operators and map well to AI accelerators. The tool computes per-inference operations as 2 × batch × hidden × hidden × layers, then divides by measured execution time to report TOPS.

Related tools