# A Power Reduction Scheme by Arithmetic Format Conversion for a DSP to Estimate Qubit States Under 4K Cryogenic Environment

1<sup>st</sup> Takashi Imagawa *Meiji University* Kanagawa, Japan imagawa@meiji.ac.jp 2<sup>nd</sup> Ryo Kishida Toyama Prefectural University Toyama, Japan ryokishida@pu-toyama.ac.jp

4<sup>th</sup> Kazutoshi Kobayashi *Kyoto Institute of Technology* Kyoto, Japan kazutoshi.kobayashi@kit.ac.jp

Abstract-In order to deploy quantum computer systems into various fields, the number of qubits in a quantum computer unit must be increased by a very large factor. One of the factors that disrupt the increase is the large scale of the quantum-classical interface (QC-IF) in a room temperature environment. We aim to reduce the scale of the control infrastructure by implementing a digital signal processor (DSP) that can operate at cryogenic environments. The DSP reduces the communication rate by encoding and decoding the signals exchanged between the qubits and the OC-IF, thus reducing the scale of the infrastructure. This DSP is required to have low power consumption so as not to disrupt the cryogenic environment by its own heat dissipation. This poster focuses on a DSP for estimating the state of qubits. In the existing implementation, the large internal memory (SRAM) dominates its power consumption and circuit area. In this poster, we convert the arithmetic data format in a submodule containing the large SRAMs from extra-long integers to single-precision floating-point numbers. This modification reduces the power and area of the entire circuit by 47.7% and 54.3%, respectively. We confirmed that this modification does not affect the processing performance and the primary output value.

*Index Terms*—quantum computing, cryogenic, low-power, DSP, floating-point

## I. INTRODUCTION

In order to expand the fields in which quantum computers can be deployed, the number of qubits in the systems must be increased. A roadmap published by IBM shows the implementation of systems with more than 1,000 logical qubits in 2030s [1]. To achieve this, the number of physical qubits must be increased by a factor of 1,000 or even more than the current level.

One of the factors disrupting the increase is the large scale of the quantum-classical interface (QC-IF) which manages and observes qubits. Each individual qubit in a dilution refrigerator must be connected directly to the QC-IF in a room temperature environment. The number of cables increases linearly with the number of qubits, resulting in a larger scale of equipment.

One possible way to reduce the number of the cables is to reduce the amount of data exchanged through the QC-IF. This requires signal processing such as encoding and decoding in the dilution refrigerator. Our goal is to implement digital signal processors (DSPs) that can operate at cryogenic environments

This work is supported by JST Moonshot R&D Grant Number JP-MJMS226A.

3<sup>rd</sup> Yuki Koyama *Kyoto Institute of Technology* Kyoto, Japan ykoyama@vlsi.es.kit.ac.jp

5<sup>th</sup> Takefumi Miyoshi *QuEL, Inc.* Tokyo, Japan miyoshi@quel-inc.com

so that some part of the processing can be performed inside the dilution refrigerator. This is expected to increase the number of qubits without increasing the scale of the system.

One of the challenges in implementing DSPs that operate at cryogenic environments is the temperature rise due to heat generated by power consumption. Therefore, signal processing for qubit control requires low-power implementations, while maintaining the high data rates and low delays.

This poster focuses on signal processing for acquiring the state of a qubit. Some FPGA-based systems have been deployed to the superconducting quantum computers in RIKEN and Osaka University, but their QC-IFs are operated at a room temperature [2], [3]. We have already fabricated a subset of this processor as an ASIC in a 22nm process technology. Verification and measurements of this chip are in progress. The power consumption estimated from the placement-and-routing results, however, is larger than 300 mW.

To reduce the power consumption, the arithmetic representation of the data signals in the dominant module is converted from extra-long integers to single-precision floating point numbers. This reduces the size of internal memory (SRAM) and the arithmetic unit, resulting in a 47.7% reduction in power and a 54.3% reduction in area for the entire circuit. On the other hand, although this modification results in a loss of arithmetic precision, we confirmed that it does not affect the results of state estimation of qubits.

### II. TARGET SIGNAL PROCESSING

Figure 1 (a) shows a block diagram of the signal processing targeted in this poster. This is a part of the processing performed in the FPGA-based QC-IF for fault-tolerant quantum computing (FTQC) proposed in [2], [3]. The input is a stream of digitized complex numbers. This is obtained by passing the reflected wave from microwave irradiation of a qubit through an A/D converter. The input signal is passed through low-pass filters to accumulate the resulting signal. The result is provided to the *classification* module to estimate the state of the qubit.

The accumulation is divided into two stages, as shown in Fig. 2. The first stage adds successive signals in the time direction (called *sum*), and the second stage accumulates signals that are temporally distant from each other (called



(a) reference processing flow (b) modified flow Fig. 1. Overview of the target signal processing and its modification.



*integration*). Since an integration section contains many sum sections, an SRAM with 1,000 words or more is required to hold the intermediate data.

In the FPGA implementations, most data signals are represented as an integer format to ensure computational accuracy, and there is no rounding during the process. For example, the result of N-bit multiplication becomes 2N-bit, and the entire output is provided to subsequent arithmetic units or memories. In the target system, the primary input has 16-bit real and imaginary parts, but the output of the filter part is 89 bits, the output of *sum* is 101 bits, and the internal and output data of *integration* is 121 bits. The data signal is converted to a singleprecision floating-point format just before the *classification*.

For these reasons, the *integration* module requires large SRAMs, which dominates the overall circuit power and area.

## III. ARITHMETIC FORMAT CONVERSION FOR POWER REDUCTION

As shown in Fig. 1 (b), the arithmetic format of the data signal is converted from the integer to the single-precision floating-point before *integration*. For computational consistency, the integer adders in the *integration* is replaced with floating-point ones. This modification reduces the SRAM size in the *integration*, and is expected to reduce the circuit area and power consumption.

On the other hand, there is a concern about the increase in area and power consumption due to the replacement the integer adders with floating-point ones. There is another concern that rounding during processing may change the primary output, that is, the estimated state of the qubits.

## IV. EVALUATION

Power consumption and circuit area are estimated by synthesizing and placement-and-routing the two circuits, *reference* and *modified*. The circuits are described with VHDL and



Fig. 3. Breakdown of power consumption of circuits before and after modification.

SystemVerilog. Design Compiler and IC Compiler II from Synopsys are utilized for logic synthesis and placement-androuting, respectively. In the logic synthesis, clock gating is applied to reduce area and power. SRAM macros and the floating-point arithmetic unit are generated using an SRAM compiler and DesignWare Library from Synopsys. The process technology is a 22nm bulk process.

Figure 3 is the breakdown of power consumption in *reference* and *modified* circuits. The *others* bar includes classification and converter circuits as well as parameter control circuits. This bar chart shows that the *integration* is dominant for power in the *reference* circuit. In the *modified* circuit, the total power consumption is reduced by 47.7% due to the smaller power of the *integration*. The power consumption of the *integration* itself is reduced by 69.8%. This reduction is equivalent to a word size reduction of 73.5% (from 121 bits to 32 bits) for the SRAM. Therefore, the impact of replacing integer adders with floating-point ones is negligible. Although the details of the area are omitted for reasons of space, the trend is the same, and the total circuit area was reduced by 54.3%.

To evaluate the effect of rounding during processing, we compared the outputs of the two circuits with randomly generated input signals and parameters such as filter coefficients. 125 parameter sets are used to simulate one million estimations of qubit state, and in all cases the outputs are identical. The number of delay cycles and throughput of the two circuits are equivalent. From these results, it can be concluded that the modification of numerical format conversion does not affect the performance and arithmetic accuracy.

### V. CONCLUSION

We are aiming to realize DSPs that can operate at cryogenic environments towards a large-scale quantum computer with a large number of qubits. To achieve this, a low-power DSP implementation is required. In this poster, we convert the numerical format of the processing from extra-long integers to singleprecision floating-point numbers, which has been dominant in terms of power and area in the conventional implementations. We demonstrate that this modification reduces power by 47.7% and area by 54.3% while maintaining computational accuracy and performance. This is a solid step toward the realization of advanced signal processing at cryogenic temperatures and a practical-scale quantum computer.

### REFERENCES

- [1] IBM. (2023) Technology IBM Quantum Computing. [Online]. Available: https://www.ibm.com/quantum/technology
- [2] M. Negoro *et al.*, "Superconducting qubit control with a system of an integrated microwave board and fpga," in *APS March Meeting*, vol. 67, 2022.
- [3] T. Miyoshi et al., "A fully pipelined architecture of quantum-classical interface for realizing fault-tolerant quantum computer," in *IEEE International Conference on Quantum Computing and Engineering (QCE)*, 2023, pp. 322–323.