# Disturbance Aware Dynamic Power Reduction in Synchronous 2RW Dualport 8T SRAM by Self-Adjusting Wordline Pulse Timing

Yoshisato Yokoyama, Koji Nii, Senior Member, IEEE, Yuichiro Ishii, Shinji Tanaka, and Kazutoshi Kobayashi Senior Member, IEEE

Abstract— An effective design is proposed to reduce dynamic power consumption for a common clock synchronous 2-read/write (2RW) dual-port (DP) 8T SRAM. A self-adjusting wordline (WL) pulse timing control circuit is newly introduced for read/write operations. Row address inputs of port A and port B are compared in each cycle to detect the same row access or not. In the same row access from both ports, the disturbance should happen, which is an inherent mode of 2RW DP 8T SRAM. Then the WL pulse width is extended to prevent the disturbance, while maintaining sufficient read/write margins. In the different row access, where there is no disturbance, the WL pulse width is shortened to reduce excessive bitline discharge power. Test chips are designed and fabricated to implement the proposed 2RW DP SRAM macros on 40-nm, 28-nm, and 7-nm Fin-FET technologies. Measured data show that read and write powers are reduced respectively by 6-13% and 13-28% with the proposed circuits. No speed degradation is found compared to conventional designs. Area overheads are found to be less than 1%.

*Index Terms*—7-nm, 8T, 28-nm, 40-nm, Disturbance, Dual-port, Dynamic power, row access, SRAM, Wordline pulse, *V*<sub>min</sub>

## I. INTRODUCTION

ALONG with scaling down of devices, the total memory capacity embedded in a die tends to increase year by year.

To achieve much energy-efficient computing execution while maintaining high performance, parallel-processing systems with many cores are significantly demanded [1]–[8] combining with power control schemes like a power-gating or a dynamically voltage-frequency scaling (DVFS). In such green high-performance energy-efficient computing systems-on-achip (SoCs), embedded multi-port SRAMs are frequently required as well as single-port SRAMs. Because the multi-port SRAMs role as register files with many ports, cache/buffer memories to enable to access the data in parallel between processing units and memories.

Typical multi-port cache/buffer memories in parallelprocessing systems have mainly two individual ports. A 1read/1-write (1R1W) 2-port (2P) SRAM with decoupled read BL 8T bitcell is widely used for a buffer memory of imaging processing or cache of multi-core CPUs [9]–[20]. However, these 1R1W 2P 8T bitcells have the read-only access functionality on port B. It is not able to write operation from

Manuscript received Mar. XX, 2022; revised XX XX, 2022, and XX XX, 2022; accepted XX XX, 2022. Date of publication XX XX, 2022; date of current version XX XX, 2022. (Corresponding author: Yoshisato Yokoyama.)

Yoshisato Yokoyama, Koji Nii and Kazutoshi Kobayashi are with Kyoto Institute of Technology, Kyoto 606-8585, Japan. (e-mail: yyokoyama@vlsi.es.kit.ac.jp, koji.nii@vlsi.es.kit.ac.jp). both ports which are often required in the parallel computing system. High-density pseudo-DP SRAMs with single-port 6T SRAM bitcell were also proposed [21], [22], but those have speed shortcomings because of double-pumping of the internal clock.

Meanwhile, 2-read/write (2RW) dual-port (DP) SRAMs are used specially as a data cache or shared cache memory for multi-core CPUs architectures [1]–[4]. DP SRAMs also function as block RAMs in FPGAs [5], [6] or buffer memories for reconfigurable processors [7], [8]. In the hardware implementation of 2RW DP SRAMs, the differential 2RW DP 8T bitcell is necessary to support read operations and write operations for both port A and port B for high-speed operation.

Fig. 1 presents a schematic diagram of a 2RW DP 8T SRAM bitcell. It has two pairs of wordlines (WLs) and bitlines (BLs) as WL<sub>A</sub>, WL<sub>B</sub>, BL<sub>A</sub> T, BL<sub>B</sub> T, BL<sub>A</sub> B, and BL<sub>B</sub> B for simultaneous read operations and/or write operations in parallel. Two pull-up PMOS (PU1, PU2) and two pull-down NMOS (PD1, PD2) make cross-coupled inverters as a latch with complementary internal nodes (MT, MB). There are two pairs of NMOS pass-gates (PG1<sub>A</sub>, PG2<sub>A</sub> and PG1<sub>B</sub>, PG2<sub>B</sub>) to access from both ports. Fig. 1 also depicts a simple block diagram of a common clock synchronous 2RW DP SRAM macro with an 8T bitcell.  $RA_A$  ( $RA_B$ ),  $CA_A$  ( $CA_B$ ) and  $D_A/Q_A$  ( $D_B/Q_B$ ), respectively denote row address inputs, column address inputs and data inputs/outputs for port A (port B). CENA (CENB) and  $WEN_A$  (WEN<sub>B</sub>) respectively stand for chip enable and write enable signals. The signal "CLK" stands for the common clock for port A and port B. The power and ground symbols shown in Fig. 1 are commonly used for all later figures. Fig. 2 presents a timing chart of a 2RW DP 8T SRAM macro. Both of port A and port B are synchronously operated with rising CLK edge. The control signals of the chip enable and write enable are independent for each port. If the  $CEN_A$  ( $CEN_B$ ) is "1", the port works as no operation (NOP). If the CEN<sub>A</sub> (CEN<sub>B</sub>) is "0" and the WEN<sub>A</sub> (WEN<sub>B</sub>) is "1", the port works as read operation, otherwise it works as write operation when the WEN<sub>A</sub> (WEN<sub>B</sub>) is "0". Each port has its own address inputs, data inputs/outputs, enabling to operate NOP, read or write operation independently.

Shinji Tanaka is with Renesas Electronics Corporation, Tokyo 187-8588, Japan, (e-mail: <a href="https://shinji.tanaka.yn@renesas.com">shinji.tanaka.yn@renesas.com</a>).

Yoshisato Yokoyama, Koji Nii and Yuichiro Ishii were with Renesas Electronics Corporation, Tokyo 187-8588, Japan.



Fig. 1. Schematic diagram of 2-read/write (2RW) dual-port (DP) 8T SRAM bitcell and block diagram of common clock synchronous 2RW DP SRAM macro.



Fig. 2. Timing chart of common clock synchronous 2RW DP SRAM.

Many reports of the relevant literature describe studies related to 2RW 8T DP SRAM designs[23]-[40], [47], [48]. Earlier works [24], [25] have particularly addressed enhancement of write/read margins for lower-voltage operation. Compact 2RW 8T bitcells were proposed to improve the macro density [23], [27]. Prevention or detection of disturbance issues between both port accesses have been discussed in earlier reports of the literature [27]–[31], [39], [46]. Demonstrations on FD-SOI devices [36]-[37] and on Fin-FET devices [34], [35], [39] have also been reported. A computing-in-memory based on the 2RW 8T SRAM bitcell has also been proposed recently as another application [40]. Several works have been undertaken for reducing the standby power of 2RW DP SRAM [26], [27], [37], but few studies have particularly addressed dynamic power reduction of 2RW DP SRAMs [23], [38]. To achieve an energy-efficient parallel computing system, dynamic power reduction is important along with leakage power reduction.

As described in this paper, we propose a dynamic power reduction scheme for common clock synchronous 2RW DP SRAMs under both read operation and write operation. By detecting the same and different row address access, WL pulse timing is self-adjusted to minimize the BL discharging power while maintaining sufficient read and write margins [38]. We confirmed the proposed technique is effective on a 7-nm Fin-FET advanced technology as well as 28-nm and 40-nm planar bulk CMOS technologies.

The organization of this paper is the following. Section II describes the disturbance issues in DP SRAM design. Section III describes the concept of reducing dynamic power in DP SRAM macros. In Section IV, a proposed circuit aware of disturbance issues to reduce dynamic power is presented. In Section V, designed and fabricated test chips on 40-nm, 28-nm, and 7-nm Fin-FET technologies with well-balanced 8T bitcells are demonstrated. The given silicon measurement results are shown. A brief conclusion is presented in Section VI.

## II. DISTURBANCES IN 2RW DUAL PORT 8T SRAM

## A. Read Disturbance

The DP 8T SRAM has two disturbance modes: readdisturbance and write-disturbance. Fig. 3 presents the readdisturbance at the same row address access where both WLA and WL<sub>B</sub> are activated. A different row access mode, as shown in Fig. 3(a), works as a single-port (SP) 6T SRAM, activated either WLA or WLB, so disturbance from the other port never happens. In the same row, access mode shown in Fig. 3(b) undesirable cell current flows through the other BL into the pull-down (PD) NMOS in the accessed bitcell. In that case, the read current  $(I_{read})$  of the target BL through the pass-gate (PG) and PD NMOSs decreases by additional cell current from the other BL. As a result, the discharge speed of BL<sub>A</sub> decreases. Here, the current disturbs another port operation, defined as a "disturbing current" ( $I_{dist}$ ), as shown in Fig. 3(b). The  $I_{dist}$  is come from pre-charge circuit in another port. In this work, the technique of pre-charging unselected columns has been used to suppress the peak rash current in pre-charging BLs. It other words, pre-charge PMOSs in unselected columns are always on.



Fig. 3. Read disturbance issue.

### B. Write Disturbance

Fig. 4 shows the other disturbance issue in the write operation. It is also disturbed by BL current as well as a read-disturbance issue. In a different row access mode as shown in Fig. 4(a), the write operation is performed by competition between pulling down current  $I_{PG}$  by PG1<sub>A</sub> in Fig. 1 and pulling up current  $I_{PU}$  by PU1 in Fig. 1. Whereas, in the same row access mode as shown in Fig. 4(b).  $I_{dist}$  by the pre-charge circuit though PG1<sub>B</sub> is added as one of pulling up current. In this case, the write ability to flip the stored data is weaker than the different row access mode.

Furthermore, even if the voltage of the MT node would become lower and PU1 would turn off, the MT node does not completely become 0 V.  $I_{dist}$  remains as long as WL<sub>B</sub> is activated; the MT node is raised from the 0 V level by competition between PG1<sub>B</sub> pulling up and PG1<sub>A</sub> pulling down. Then, the MT node becomes a certain intermediate voltage. If PU2 is weak and if the pulse width of WL<sub>A</sub> is not sufficiently wide, then the voltage difference between MT and MB might not be sufficient to flip while WL<sub>A</sub> is activated. MT is lowered when WL<sub>A</sub> reaches a low level. As a result, MT and MB nodes are not flipped, resulting in the write fail.



Fig. 4. Write disturbance issue.

## C. Theoretical Consideration

This paragraph presents theoretical discussion of the degradation of read cell current ( $I_{read}$ ) of the DP 8T bitcell by read disturbance. Fig. 5 shows the current flows of two PG NMOSs and PD NMOS in the bitcell at different row access and same row access where the datum stored in the bitcell is "0". The read current of BLA in the different row access, which is indicated as  $I_{readA-diff}$ , flows to VSS through corresponding PG and PD of the bitcell. In the different row access mode, each current through PG<sub>A</sub> ( $I_{PGA}$ ) and PD ( $I_{PD}$ ) is the same as  $I_{readA-diff}$  shown as Eq. (1). Each  $I_{PGA}$  and  $I_{PD}$  is calculated using fundamental NMOS current equations for the linear and saturation region, as presented respectively in Eqs. (2) and (3) [41].



Fig. 5. Current analysis in different and same row accesses.

$$I_{\rm readA-diff} = I_{\rm PGA} = I_{\rm PD} \tag{1}$$

$$I_{\rm PGA} = \frac{1}{2} \mu C_{\rm ox} \frac{W_{\rm G}}{L_{\rm G}} (V_{\rm DD} - V_{\rm th-PGA} - dV_{\rm th-PGA} - V_{\rm m})^2 \quad (2)$$

$$I_{\rm PD} = \mu C_{\rm ox} \frac{W_{\rm D}}{L_{\rm D}} \left( (V_{\rm DD} - V_{\rm th-PD}) V_{\rm m} - V_{\rm m}^{2} \right)$$
(3)

Here,  $V_{\rm m}$  denotes the voltage of complemental internal node of bitcell;  $V_{\rm DD}$  is the supply voltage. Also,  $V_{\rm th-PGA}$  and  $V_{\rm th-PD}$ respectively represent threshold voltage of PG<sub>A</sub> and PD.  $\mu$ stands for the mobility and  $C_{\rm OX}$  denotes the gate capacitance.  $W_{\rm G}$ ,  $L_{\rm G}$ ,  $W_{\rm D}$  and  $L_{\rm D}$  respectively represent the gate width and length of PG NMOSs and PD NMOS. Precisely, the threshold voltage of PG<sub>A</sub> is affected by body bias effects, the fluctuation of threshold voltage ( $dV_{\rm th-PGA}$ ) is defined as (4). Variables  $\varepsilon$ ,  $N_{\rm A}$ , and  $\phi_{\rm F}$  respectively represent permittivity, Boltzmann's constant, and Fermi level.

$$dV_{\rm th-PGA} = \frac{\sqrt{2\varepsilon q N_{\rm A}}}{C_{\rm ox}} \left( \sqrt{V_{\rm m} + 2\phi_{\rm F}} - \sqrt{2\phi_{\rm F}} \right) \tag{4}$$

To simplify the equation, the body bias effect is omitted and the  $1/2\mu C_{OX}(W_G/L_G)$  and  $1/2\mu C_{OX}(W_D/L_D)$  are replaced to gain factor  $\beta_G$  and  $\beta_D$ , and  $V_{\text{th-PGA}}$  and  $V_{\text{th-PD}}$  are defined as the same NMOS threshold voltage ( $V_{\text{thn}}$ ). Then (2) and (3) are solved for  $I_{\text{readA-diff}}$  as (5). Similarly to different row access,  $I_{\text{readA-same}}$  is given as (6). By comparing the Eqs. (5) and (6), it is delivered that the  $I_{\text{readA-same}}$  is smaller than  $I_{\text{readA-diff}}$  (7).

$$I_{\text{readA-diff}} = \frac{1}{2} \beta_{\text{G}} \left( 1 - \frac{1}{1 + \frac{\beta_{\text{D}}}{\beta_{\text{G}}}} \right) (V_{\text{DD}} - V_{\text{thn}})^2 \qquad (5)$$

$$I_{\text{readA-same}} = \frac{1}{2} \beta_{\text{G}} \left( 1 - \frac{1}{1 + 2\frac{\beta_{\text{D}}}{\beta_{\text{G}}}} \right) (V_{\text{DD}} - V_{\text{thn}})^2 \qquad (6)$$

$$: I_{\rm readA-same} < I_{\rm readA-diff} \tag{7}$$

#### D. Practical Simulation Results

To ascertain the practical effects of read-disturbance and write-disturbance, Monte Carlo simulations for 40-nm, 28-nm, and 7-nm technologies are conducted with considering high-sigma local variations. Fig. 6 shows the reading and writing waveforms of WL and BL in the same/different row access modes on a 28-nm DP 8T bitcell at the slow-NMOS and slow-PMOS (SS) corner, low voltage, and 40°C condition. Each

waveform is given by 1000 iterations of Monte Carlo simulation with simply accelerating 2 x local variation of each transistor. The worst corner waveforms correspond approximately to  $6\sigma$ , showing the range of variations, but not practical distributions. In the write operation, as presented in Fig. 6, the negative BL write-assist technique is introduced to 28-nm and 7-nm in this work, so that the BL is biased to negative voltage.

Fig. 7 defines the read-operation and write-operation margins.  $\Delta V_{\text{sense}}$  is the differential voltage between BL<sub>A</sub>\_T and BL<sub>A</sub>\_B ( $\Delta V_{\text{BL}}$ ) for the sense amplifier (SA) to sense the read data correctly, and  $T_{\text{sense}}$  is the duration time that  $\Delta V_{\text{BL}}$  reaches required  $\Delta V_{\text{sense}}$  after WL<sub>A</sub> is activated (Fig. 7(a)). For the write operation,  $\Delta V_{\text{flip}}$  is the required voltage difference to flip the complemental nodes of MT and MB ( $\Delta V_{\text{MEM}}$ ) before WL<sub>A</sub> lowers.  $T_{\text{flip}}$  is the duration time that  $\Delta V_{\text{MEM}}$  reaches required  $\Delta V_{\text{flip}}$  after WL<sub>A</sub> and BLT<sub>A</sub> are activated (Fig. 7(b)).  $T_{\text{sense}}$  and  $T_{\text{flip}}$  of the same row access have longer durations than those of different row access, as shown in Fig. 6.

TABLE I presents simulation results of the practical  $T_{\text{sense}}$  and  $T_{\text{flip}}$  duration times in 40-nm, 28-nm, and 7-nm technologies.  $T_{\text{sense}}$  and  $T_{\text{flip}}$  in the same row access mode are worsened by approx. 39% and 145% compared to the different row access mode. Each value in TABLE I is obtained by accurate  $6\sigma$  worst-case simulation which is obtained using importance sampling Monte Carlo method [42]. Note that the  $T_{\text{flip}}$  of 28-nm is larger than 40-nm because the 28-nm is optimized for eFlash technology that is minimized the leakage power but not for speed.



Fig. 6. Monte-Carlo simulation waveforms on 28-nm for read operation and write operation.



Fig. 7. Definition of  $\Delta V_{\text{sense}}$  and  $T_{\text{sense}}$  in read operation and  $\Delta V_{\text{flip}}$  and  $T_{\text{flip}}$  in write operation.

 TABLE I

 SUMMARY OF  $T_{\text{SENSE}}$  and  $T_{\text{FLIP}}$  in same/diff row access mode

 Condition: SS, Vtyp -10%, -40°C

| Node  | Τ<br>(ΔV     | sense @ -6<br>/sense=50 r | iσ<br>nV) | Tflip @ -6σ<br>(ΔVflip=Vtyp x0.9 x0.7) |              |           |  |  |  |  |
|-------|--------------|---------------------------|-----------|----------------------------------------|--------------|-----------|--|--|--|--|
|       | Same<br>(ps) | Diff<br>(ps)              | Same/Diff | Same<br>(ps)                           | Diff<br>(ps) | Same/Diff |  |  |  |  |
| 40-nm | 754          | 618                       | 122%      | 1253                                   | 526          | 238%      |  |  |  |  |
| 28-nm | 477          | 344                       | 139%      | 1560                                   | 636          | 245%      |  |  |  |  |
| 7-nm  | 254          | 225                       | 113%      | 1072                                   | 452          | 237%      |  |  |  |  |

#### **III. DYNAMIC POWER REDUCTION**

In the preceding section II, we explained that readdisturbance and write-disturbance respectively induce the degradation of duration times of  $T_{\text{sense}}$  and  $T_{\text{flip}}$ . Those disturbances should be regarded as designing the internal timing of the 2RW DP SRAM macro for stable operation so that it is usually set of the worst timings at the same row access condition irrespective any access use cases. However, it has excessive timing margins at the single-port access either by port A or port B and difference row address access by both ports. It induces undesirable power overhead as discussed in the following. In other words, if the internal timing can be adjusted dynamically along with any access modes, then it has opportunities to reduce excessive power consumption.

Fig. 8 presents waveforms of  $T_{\text{sense}}$  and  $T_{\text{flip}}$  in the same and different row access modes by adjusting the WL pulse width to optimize the timing and power consumption. Typically, the WL pulse width for the read operation is shorter than the write operation because of poor write recovery in the same row access [35]. Theoretically, BL charging and discharging powers are delivered by the following equations.  $P_{\text{read}}$  and  $P_{\text{write}}$ , which are read and write power per I/O bit, are defined respectively as (8) and (9).

$$P_{\text{read}} = V_{\text{DD}} \times \left( \int_{t_0}^{t_{\text{R}}} I_{\text{read}} \, dt \right) \times N_{\text{M}} \tag{8}$$

$$P_{\text{write}} = V_{\text{DD}} \times \left( \int_{t_0}^{t_{\text{W}}} I_{\text{read}} \, dt \right) \times \left( N_{\text{M}} - 1 \right) + \frac{1}{2} C_{\text{B}} V_{\text{DD}}^2 \quad (9)$$

Here,  $C_{\rm B}$  stands for BL capacitance,  $V_{\rm DD}$  denotes supply voltage,  $N_{MUX}$  expresses column number per I/O,  $t_0$  signifies WL assert timing, and  $t_{\rm R}$  and  $t_{\rm W}$  respectively mean WL negate timings in read operation and write operation. Fig. 9 shows the related dynamic power consumption of the 2RW DP SRAM macro. It mainly consumes power for charging and discharging BLs during activated WL. In the read operation, it is necessary to discharge the selected column BLs by  $\Delta V_{\text{sense}}$ , and unselected column bitcells consume as same as selected column BLs. In write operation, the selected column BLs are fully discharged by the write-driver circuit, and unselected column BLs consume power similarly to the read operation. If the WL pulse width in the different row access is the same as that of in the same row access, then BLs consume excessive power, as shown in Fig. 9. Furthermore, the technique of pre-charging unselected columns has been usually used to suppress the peak rash current in precharging BLs, as shown in Fig. 5. In that case, unselected BL

Here, we assume the probability of occurrence of the same row access by both ports is 1/row if the row address is accessed randomly. For example, it is only 0.4% in the 256-row macro configuration. Although a small probability of the same row access exists, the same row access should be regarded as a worst condition for maintaining the design margin. In other words, 99.6% operation has excessive timing margins because the different row access has better timing margins for both read operation and write operation, as shown in Fig. 10. These overtiming margins consume undesirable dynamic power by extra discharge of BL capacitances.



Fig. 8. Waveform of same/diff row access and inflection of  $T_{\text{sense}}/T_{\text{flip}}$ .



Fig. 9. Power consumption of SRAM macro by discharging BLs.



Fig. 10. Probability of same/diff. row access and power reduction by reducing WL pulse width in write operation.

#### IV. ADJUSTING WL PULSE FOR POWER REDUCTION

To eliminate the extra dynamic power consumption, we propose a self-adjusting WL pulse timing control circuit. Fig. 11 presents a block diagram of the proposed circuit, where  $RA_A$  and  $RA_B$  respectively represent row address inputs of port A

and port B. The input signal CLK is a common clock for port A and port B as a synchronous operation between both ports. If both row address inputs of port A and port B are exactly the same and both enable signals of port A (CEN<sub>A</sub>) and port B (CEN<sub>B</sub>) are "0" (active-low), then the row address comparator (XNOR) signal SR becomes "1" to set the flag as the same row access mode. Otherwise, the SR signal becomes "0" when both row addresses are not the same or either CEN<sub>A</sub> or CEN<sub>B</sub> is "1" (not enabled). Fig. 11 also shows a truth table for each operation mode over each row-address input (RA<sub>A</sub>, RA<sub>B</sub>) and each enable signal (CEN<sub>A</sub>, CEN<sub>B</sub>).



Fig. 11. Block diagram of proposed DP SRAM and truth table for each operation mode over each row-address input (RA<sub>A</sub>, RA<sub>B</sub>) and each enable signal (CEN<sub>A</sub>, CEN<sub>B</sub>).

The SR flag is provided to following row-decoder and column IO blocks through control (CTL) block to adjust the WL pulse width and related column IO timing signals such as SA enable, WD enable, and pre-charge control. Fig. 12 shows details of the timing generator in the CTL block and column I/O circuit. The timing generator has replica BL, shown as RBL<sub>A</sub> in Fig. 12, which can determine WL pulse width. The replica BL is connected to two kinds of replica bitcells. Replica on-bitcells discharge the replica BL to generate the appropriate timing, and another replica off-bitcells are parasitic capacitance load imitating bitcell capacitance. The proposed circuit has further parasitic capacitance C<sub>SR</sub>, which is composed by NMOSs, connected to node RBLA shown in Fig. 12. If the signal SR is "1", then the parasitic capacitance  $C_{SR}$  becomes effective and increases the propagation delay. As a result, the WL pulse width increases by increased RBL<sub>A</sub> delay. The triggered timing of the sense-amplifier enable (SAE) is also pushed out in the read operation at same row access mode (SR="1"). In the write operation, further offset delay is added to the path through multiplexer to ensure the write ability by further extension of the WL pulse width at the same row access mode. As a result, the write-driver period is also increased at the same row address

mode. The inserted logic delay is composed by inverter chain with logic MOSs optimized at the worst PVT corner (NMOS: Slow - PMOS: Slow, low voltage and low temperature). It has small excessive delays at the other PVT corners, but it should be needed to keep the write margin for all PVT conditions. If the inserted delay is generated by SRAM bitcell circuits, it has better sensitivity against PVT variations. However, generating the delay by SRAM bitcells needs an additional memory cell array area. The logic delay composed by logic MOSs has contrary almost no area impact by implementing in the peripheral control block.

**Control & Timing generator** SR  $RBL_A$  is delayed by  $C_{SR}$  if SR is high. Replica bitcell (on)  $C_{SR}$ Read and write WL pulse are Replica bitcell (off) depend on ICK<sub>A</sub> pulse width. BBLD ğ ICK<sub>A</sub> ⊸ BK SAEA BBI WCKA WEN CLK NBSTA 4000-L) I/O Y<sub>4</sub><> MUX SAE, N N CBW CTR CBR. SA Coupling wvss capacitor vss 40-nm NBST WCKA D۸ Q, LWVSS<sub>A</sub> is clamped to VSS If DP SRAM doesn't have write-assist Negative boost write-assist (28-nm and 7-nm) Diff row -Same row CLK ICK<sub>A</sub> **RBL**<sub>△</sub> BBL₄ BBLD. **BK**<sub>A</sub> SAEA NBSTA Read Write

Fig. 12. Proposed self-adjusting WL pulse timing control circuit and I/O circuit.

In demonstrations on 28-nm and 7-nm, the write-assist techniques are applied to compensate poor write margin of the 2RW DP 8T SRAM bitcells at the same row access. Many

write-assist techniques have been reported to date [24], [25], [30], [32], [34], [35]. In this work, the negative BL write-assist techniques [24], [25], [32] are applied 28-nm and 7-nm designs not by 40-nm design, as presented in Fig. 12. Although the negative boost consumes further dynamic power because it drives coupling capacitors, it is necessary irrespective of the same or different row access in the advanced processes. The proposed circuit has no penalty for access time or address setup time because the setup clock path is longer than data path. The area overhead is less than 1%.

Fig. 13 shows SPICE simulation waveforms of the 28-nm DP SRAM macro at the worst process-voltage temperature (PVT) conditions in the read operation and write operation. The simulated WL pulse width for the read operation can be shortened by 25% in the different row access compared to the same row access. Here, we assume that the same BL swing: delta voltage between CBRA and CTRA in Fig. 12 is almost identical to the different row access. However, the simulated WL pulse width for the write operation, as presented in Fig. 13(b), can be shortened further by 51% for different row access compared to the same row access. This simulation demonstrates that the write-disturbance strongly affects the differences of the WL width between the same and different row access. From the power reduction simulation results presented in Fig. 14, we estimated a dynamic power comparison in the two types of column MUX (MUX8 and MUX2) configurations in 40-nm, MUX4 type in 28-nm and MUX2 type in 7-nm under the worst PVT conditions.



Fig. 13. Simulation waveforms of 28-nm DP SRAM 48-kbit MUX4 (1024-word x 48-bit) with adjusting WL pulse timing control.

Fig. 14 portrays the estimated results of charging/discharging per column by introducing proposed circuits. Dynamic power used for read operation and write operation is reduced in the respective cases by 13-29% and 18-56%. The power consumption of the selected column in the write operation becomes smaller than unselected column if the technology node becomes smaller in Fig. 14(b). It shows that  $T_{\rm flip}$  becomes larger

in an advanced technology and it requires longer WL pulse width to perform the write operation. Fig. 15 portrays the PVT dependency of dynamic power reduction by proposed circuit in 40-nm MUX8. Read power reduction shows almost same rate in each PVT. It means WL pulse width increased by  $C_{SR}$  well matches with discharging current of the bitcell in reading. Meanwhile, write power reduction is fluctuated in the different PVT corners. In this work, the logic delay of BBLD<sub>A</sub> in Fig. 12 is further added to ensure the  $T_{flip}$  degradation. Because the logic delay has the advantage of small area overhead while its PVT dependency does not match with bitcell discharging current. Though the PVT dependency is fluctuated, write power reductions are confirmed in the different PVT corners.



Fig. 14. Simulation result of dynamic power consumption at typical PVT condition.



Fig. 15. PVT dependency of estimated dynamic power reduction of 40-nm MUX8.

## V. TEST CHIP DESIGN AND EVALUATION

7

## A. 2RW DP 8T SRAM bitcell layout

Layouts of various types for 2RW DP 8T SRAM bitcells have been reported [23],[38],[44] which have lithographic and logic-process-friendly compact layouts with no additional process steps. Fig. 16 shows representative bitcell layouts. Type (a) [23] have compact area, but not good symmetry between each true/bar BL pair, or between port A and port B because of process variations in manufacturing such as misalignment of photomasks, layout dependent effect, or mismatch of gate length/width by limitations of optical proximity correction lithography. In this work, types (b) [38], [43] and (c) [44] are used for demonstrations because layouts of these types can improve such imbalances. The symmetricity of type (b) of DP 8T bitcell layout is validated by the isolated test structure for direct measurement of each I<sub>read</sub> in 40-nm technology. Fig. 17 presents the measured distributions of Iread for A-true, A-bar, Btrue, and B-bar BLs implemented using the 40-nm technology. In the graph, all values of  $I_{read}$  are normalized by median and sigma. Type (b) has small differences among all  $I_{read}$ , whereas the type (a) has offsets between true/bar or port A/B BLs. From the measured I<sub>read</sub> distributions, the write-margin (WM), readmargin (RM), and static-noise margin (SNM) of the adopted type (b) bitcell is expected to be improved by virtue of the small offsets, resulting in a good minimum operating voltage  $(V_{\min})$ and shorter  $T_{\text{sense}}$  time (access time). The offsets in the type (a) layout should lead to extra design margins, inducing the overhead of power and timing, and deterioration of  $V_{\min}$ . The type (c) also keeps symmetry, which is used for 28-nm and 7nm demonstrations in this work. It is expected that type (c) also has small offsets between port A/B and true/bar. In the following discussion, we simulated or measured SRAM macros using the bitcell layouts of Fig. 16(b) for 40-nm and Fig. 16(c) for 28-nm and 7-nm. The coupling noise between BLs in each port is sufficiently small because of the shielded wiring of either the power supply or the ground between each port.



Fig. 16. Layout plots of 2RW DP 8T SRAM bitcells.



Fig. 17. Measured  $I_{read}$  distributions of A/B-true/bar BLs in 40-nm technology (n=20320).

#### B. Test chip implementations and evaluations

Fig. 18 shows die photographs of test chips using 40-nm, 28nm embedded flash (eFlash) [45], and 7-nm Fin-FET CMOS technologies. Two types of 40-nm DP SRAM macros (MUX2 and MUX8), 28-nm eFlash MUX4 macro and 7-nm FinFET MUX2 macro are implemented in the test chips. Fig. 19(a)– 19(d) show cumulative distribution functions (CDFs) of  $V_{min}$  for each macro at -40°C worst condition. The median of  $V_{min}$  for TT process are 0.71 V (40-nm MUX2 and MUX8), 0.79 V (28-nm eFlash), and 0.54 V (7-nm FinFET). The 28-nm eFlash process is optimized for eFlash IPs. The standby leakage for automotive applications is minimized, so that the threshold voltages of core transistors and SRAM ones are higher than those of the normal CMOS technologies. For that reason, the 28-nm  $V_{min}$  is not better than 40-nm and 7-nm.



Fig. 18. Photographs of the test chips and layout plots of the proposed 2RW DP SRAM macros: (a) 40-nm [38], (b) 28-nm eFlash [45], and (c) 7-nm FinFET.

Fig. 20 portrays a typical shmoo plot of a 7-nm FinFET DP SRAM macro showing supply voltage vs. read access time. The measured read access time is 0.58 ns at 0.75 V typical supply voltage. Fig. 21 shows the measured dynamic power consumptions of write and read operations vs. the supply voltage. The WL pulse width has been optimized to ensure the read/write margins under the worst condition (for example SS/VDD-10%/-40°C). Measured typical dynamic power consumptions are 19.5  $\mu$ W/MHz (40-nm MUX8 read operation), 19.6  $\mu$ W/MHz (40-nm MUX8 write operation), 6.9

µW/MHz (28-nm eFlash read operation), 9.3 µW/MHz (28-nm eFlash write operation), 3.4 µW/MHz (7-nm FinFET read operation), and 5.9 µW/MHz (7-nm FinFET write operation). Here, all powers are obtained under random address accessing. The probability of same row access is less than 0.4%. The measured data show that the dynamic powers of the proposed DP SRAM macros are reduced by 7%, 13%, and 6% for the read operation, and reduced respectively by 18%, 28% and 13% for the write operation compared to the conventional macros for 40-nm, 28-nm eFlash, 7-nm FinFET technologies. TABLE II presents features of the demonstrated test chips. Fig. 22 plots the power comparison over process technologies. The proposed circuit decreases the power consumption as shown at 40-nm. Furthermore, as the process technology evolution and the power supply lowering, the power consumption is decreasing. TABLE III shows the comparison results with other 2RW 8T DP SRAM works in the last 10 years. The dynamic power of this work is smaller than other disclosed work in 28 nm.



Fig. 19. Distributions of measured Vmin at -40°C worst temperature.



Fig. 20. Shmoo plot of 7-nm 32k-bit DP SRAM macro (supply voltage vs. read access time) at 25°C.



Fig. 21. Measured dynamic power vs. supply voltage at 25°C, TTprocess.

TABLE II

| FEATURES OF THE TEST CHIPS          |                |               |               |              |              |  |  |  |
|-------------------------------------|----------------|---------------|---------------|--------------|--------------|--|--|--|
| Technology n                        | ode            | 40-           | nm            | 28-nm        | 7-nm         |  |  |  |
|                                     | Capacity       | 36k           | 38k           | 48k          | 32k          |  |  |  |
| Configuration                       | Bit            | 73            | 19            | 48           | 64           |  |  |  |
| Conliguration                       | Word           | 512           | 2k            | 1k           | 512          |  |  |  |
|                                     | Col. MUX       | 2             | 8             | 4            | 2            |  |  |  |
| Physical macro size (µm             | <sup>2</sup> ) | 195.3 x 191.6 | 180.9 x 203.7 | 95.3 x 297.1 | 44.4 x 123.0 |  |  |  |
| Bit density (Mbit/mm <sup>2</sup> ) | 0.99           | 1.06          | 1.74          | 6.00         |              |  |  |  |
| Read access time (ns)               | Typical        | 1.4           | 1.6           | 1.54         | 0.58         |  |  |  |
| Frequency (MHz)                     | Typical        | 500           | 600 438       |              | 1207         |  |  |  |
| Vmin (V)                            | Typical        | 0.71          | 0.71          | 0.79         | 0.54         |  |  |  |
| Measured power                      | Read           | -             | 19.5          | 6.9          | 3.4          |  |  |  |
| (µW/MHz)                            | Write          | -             | 19.6          | 9.3          | 5.9          |  |  |  |
| Energy                              | Read           | -             | 1.026         | 0.144        | 0.053        |  |  |  |
| (pJ/bit/access)                     | Write          | -             | 1.032         | 0.193        | 0.092        |  |  |  |



Fig. 22. Dynamic power comparison over 40-nm, 28-nm, and 7-nm technologies.

TABLE III Comparison with other 2RW DP SRAM works

|                                        | TCAS-II      | SOCC         | VLC          | A-SSCC       | IEDM         | S3S          | ISSCC        | ISNCC        | TVLSI        |           |      |      |
|----------------------------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|-----------|------|------|
|                                        | 2012<br>[30] | 2013<br>[31] | 2014<br>[32] | 2014<br>[33] | 2015<br>[34] | 2017<br>[37] | 2019<br>[39] | 2020<br>[48] | 2021<br>[47] | This work |      | k    |
| Technology<br>(nm)                     | 40           | 40           | 28           | 40           | 16           | 65           | 7            | 7            | 40           | 40        | 28   | 7    |
| Capacity<br>(kbit)                     | 1024         | 72           | 32           | 24           | 64           | 32           | 16           | 512          | 16           | 38        | 48   | 36   |
| Bit Density<br>(Mbit/mm <sup>2</sup> ) | -            | 1.48         | 2.1          | 0.86         | 3.60         | 0.75         | -            | -            | 0.85         | 1.06      | 1.74 | 6.00 |
| Access time<br>(ns)                    | 4.2          | -            | 0.6          | 1.67         | 0.69         | 4.51         | -            | -            | 2.32         | 1.60      | 1.54 | 0.58 |
| Frequency<br>(MHz)                     | -            | -            | 1000         | -            | -            | -            | 2100         | 1359         | -            | 435       | 455  | 1207 |
| Write power<br>(µW/MHz)                |              |              | 10.4         |              | 10.3         | -            |              |              |              | 19.6      | 9.3  | 5.9  |
| Read power<br>(µW/MHz)                 | -            | -            | 10.4         | -            | 5.5          | 6.9          | -            | -            | -            | 19.5      | 6.9  | 3.4  |
| Typical VDD<br>(V)                     | 1.0          | 1.1          | 1.0          | 1.25         | 0.9          | 0.75         | 0.8          | -            | 1.05         | 1.1       | 1.05 | 0.75 |

#### VI. CONCLUSION

A self-adjusting WL pulse timing control circuit is proposed for a common-clock synchronous 2RW DP 8T SRAM to reduce dynamic power consumption in read/write operations, with awareness of disturbance issues. Test chips are designed and fabricated to implement proposed 2RW DP SRAM macros on 40-nm, 28-nm eFlash, and 7-nm Fin-FET technologies. Measured data show that reading and writing powers are reduced respectively by 6–13% and 13–28% with proposed circuits. No speed degradation was found compared to conventional designs. Area overheads are less than 1%.

#### REFERENCES

- H. Okano, A. Suga, T. Shiota et al., "An 8-Way VLIW Embedded Multimedia Processor Built in 7-Layer Metal 0.11 μm CMOS Technology," in *IEEE ISSCC Dig. Tech. Papers*, pp. 374-375, Feb. 2002.
- [2] T. Shiota, K. Kawasaki, Y. Kawabe, W. Shibamoto, A. Sato et al., "A 51.2GOPS 1.0GB/s-DMA Single-Chip Multi-Processor Integrating Quadruple 8-Way VLIW Processors," in *IEEE ISSCC Dig. Tech. Papers*, pp. 194-195, 593, Feb. 2005.
- [3] M. Nakajima, T. Yamamoto, M. Yamasaki, K. Kaneko, T. Hosoki, "Homogeneous Dual-Processor core with Shared L1 Cache for Mobile Multimedia SoC," in *VLSI Cir. Symp. Dig.*, pp. 216-217, June 2007.
- [4] Igor Loi, Luca Benini, "A multi banked, Multi-ported, Non-blocking shared L2 cache for MPSoC platforms," *Design, Automation & Test in Europe Conference & Exhibition (DATE)*, pp. 1-6, Mar. 2014.
- [5] https://www.xilinx.com/support/documentation/data\_sheets/ds099.pdf
- [6] https://www.xilinx.com/support/documentation/user\_guides/ug473\_7Ser ies Memory Resources.pdf
- [7] T. Toi, N. Nakamura, Y. Kato, T. Awashima, K. Wakabayashi, "Highlevel Synthesis Challenges for Mapping a Complete Program on a Dynamically Reconfigurable Processor," *IPSJ Trans. on System LSI Design Methodology*, Vol. 3, pp. 91-104, Feb. 2010.
- [8] T. Toi, N. Nakamura, T. Fujii, T. Kitaoka, K. Togawa, K. Furuta and T. Awashima, "Optimizing Time and Space Multiplexed Computation in a Dynamically Reconfigurable Processor," in *Proc. Int. Conf. on Field Programmable Tech. (ICFPT)*, pp. 106-111, Dec. 2013.
- [9] H. Fujiwara, K. Nii, J. Miyakoshi, Y. Murachi, Y. Morita, H. Kawaguchi, and M. Yoshimoto, "A Two-Port SRAM for Real-Time Video Processor Saving 53% of Bitline Power with Majority Logic and Data-Bit Reordering," in *Proc. Int. Symp. Low Power Electronics and Devices* (ISLPED), pp. 61-66, Oct. 2006.
- [10] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," in *VLSI Cir. Symp. Dig.*, pp. 256-257, June 2007.
- [11] Satoshi Ishikura, S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara, and H. Akamatsu, "A 45 nm 2-port 8T-SRAM Using Hierarchical Replica Bitline Technique with Immunity

From Simultaneous R/W Access Issues," *IEEE J. of Solid-State Circuits*, Vol. 43, No. 4, pp. 938-945, April 2008.

- [12] Leland Chang, Robert K. Montoye, Yutaka Nakamura, Kevin A. Batson, Richard J. Eickemeyer, Robert H. Dennard, Wilfried Haensch, and Damir Jamsek, "An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches," *IEEE J. of Solid-State Circuits*, vol. 43, no. 4, pp. 956-963, April 2008.
- [13] S. P. Park, S. Y. Kim, D. Lee, J.-J. Kim, W. P. Griffin, and K. Roy., "Column-Selection-Enabled 8T SRAM Array with ~1R/1W Multi-Port Operation for DVFS-Enabled Processors," *ISLPED*, pp. 303–308, 2011.
- [14] H. Fujiwara, L.-W. Wang, Y.-H. Chen, K.-C. Lin, D. Sum, S.-R. Wu, J.-J. Liaw, C.-Y. Lin, M.-C. Chiang, H.-J. Liao, S.-Y. Wu, and J. Chang, "A 64kb 16 nm asynchronous disturb current free 2-port SRAM with PMOS pass-gates for FinFET technologies," in *IEEE ISSCC Dig. Tech. Papers*, pp. X–X, Feb. 2015.
- [15] K.-H. Koo, L. Wei, J. Keane, U. Bhattacharya, E. A. Karl and K. Zhang, "A 0.094 μm<sup>2</sup> high density and aging resilient 8T SRAM with 14 nm FinFET technology featuring 560 mV VMIN with read and write assist," in *VLSI Circuits Symp., Dig. Tech. Papers*, pp. C266–C267, Jun. 2015.
- [16] J. P. Kulkarni, J. Keane, K.-H. Koo, S. Nalam, Z. Guo, E. Karl, and K. Zhang, "5.6 Mb/mm<sup>2</sup> 1R1W 8T SRAM Arrays Operating Down to 560 mV Using Small-Signal Sensing With Charge Shared Bitline and Asymmetric Sense Amplifier in 14 nm FinFET CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 1, pp. 229–239, 2017.
- [17] Jaydeep P. Kulkarni, Carlos Tokunaga, Minki Cho, Muhammad M. Khellah, James W. Tschanz, and Vivek K. De, "FMAX / VMIN and noise margin impacts of aging on domino read, static write, and retention of 8T 1R1W SRAM arrays in 22 nm high-k/metal-gate tri-gate CMOS," in VLSI Cir. Symp. Dig., pp. C116-C117, June 2017.
- [18] Makoto Yabuuchi, Yasumasa Tsukamoto, Hidehiro Fujiwara, Miki Tanaka, Shinji Tanaka, and Koji Nii, "A 28-nm 1R1W 2-port 8T SRAM Macro with Screening Circuitry against Read Disturbance and Wordline Coupling Noise Failures," *IEEE Trans. on VLSI Systems (TVLSI)*, Vol. 26, No. 11, pp. 2335-2344, June 2018.
- [19] J. P. Kulkarni, A. Malavasi, C. Augustine, C. Tokunaga, J. Tschanz, M. M. Kellah, V. De, "Low Swing and Column Multiplexed Bitline Techniques for Low-Vmin, Noise-Tolerant, High-Density, 1R1W 8T-bitcell SRAM in 10 nm FinFET CMOS," in *VLSI Cir. Symp. Dig.*, pp. 1-2, June 2020.
- [20] Alexander Fritsch, Rajiv Joshi, Sudipto Chakraborty, Holger Wetter, Uma Srinivasan, Matthew Hyde, Otto Torreiter, Michael Kugel, Dan Radko, Hyong Kim, Daniel Friedman, "A 6.2 GHz Single Ended Current Sense Amplifier (CSA) Based Compileable 8T SRAM in 7 nm FinFET Technology," in *IEEE ISSCC Dig. Tech. Papers*, pp. 334-336, Feb. 2021.
- [21] C. Wu, M. Chang, C. Chen, R. Lee, H. Liao and J. Chang, "A configurable 2-in-1 SRAM compiler with constant-negative-level write driver for low Vmin in 16 nm Fin-FET CMOS," *IEEE Asian Solid-State Circuits Conference (A-SSCC)*, pp. 145-148, Nov. 2014.
- [22] Y. Ishii, M. Yabuuchi, Y. Sawada, M. Morimoto, Y. Tsukamoto, Y. Yoshida, K. Shibata, T. Sano, S. Tanaka and K. Nii, "A 5.92-Mb/mm<sup>2</sup> 28-nm Pseudo 2-ReadWrite Dualport SRAM using Double Pumping Circuitry," *IEEE Asian Solid-State Circuits Conference (A-SSCC)*, pp. 17-20, Nov. 2016
- [23] K. Nii, Y. Tsukamoto, S. Imaoka, and H. Makino, "A 90 nm dual-port SRAM with 2.04 μm<sup>2</sup> 8T-thin cell using dynamically controlled column bias scheme," in *ISSCC Dig. Tech. Papers*, pp. 508-509, 543, Feb. 2004.
- [24] D.P. Wang, H.J. Liao, H. Yamauchi, Y.H. Chen, Y.L. Lin, S.H. Lin, D. C. Liu, H.C. Chang, and W. Hwang, "A 45 nm Dual-Port SRAM with Write and Read Capability Enhancement at Low Voltage," in *Proc. IEEE Int. System-on-Chip Conf. (SOCC)*, pp. 211-214, Sep. 2007.
- [25] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, Y. Oda, K. Usui, T. Kawamura, N. Tsuboi, T. Iwasaki, K. Hashimoto, H. Makino, and H. Shinohara, "A 45-nm Single-port and Dual-port SRAM family with Robust Read/Write Stabilizing Circuitry under DVFS Environment," *VLSI Cir. Symp. Dig.*, pp. 212-213, June 2008.
- [26] Peter Geens and Wim Dehaene, "A dual port dual width 90 nm SRAM with guaranteed data retention at minimal standby supply voltage", in *Proc. ESSCIRC*, pp. 290-293, Sep. 2008.
- [27] Koji Nii, Yasumasa Tsukamoto, Makoto Yabuuchi, Yasuhiro Masuda, Susumu Imaoka, Keiichi Usui, Shigeki Ohbayashi, Hiroshi Makino and Hirofumi Shinohara, "Synchronous Ultra-High-Density 2RW Dual-Port 8T-SRAM With Circumvention of Simultaneous Common-Row-Access," in *IEEE J. of Solid-State Circuits*, vol. 44, no. 3, pp. 977-986, Mar. 2009.
- [28] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki, A. Miyanishi and Y. Kihara, "A 28-nm dual-port SRAM macro with active

bitline equalizing circuitry against write disturb issue," in VLSI Cir. Symp. Dig., pp. 99-100, June 2010.

- [29] Y. Ishii, H. Fujiwara, S. Tanaka, Y. Tsukamoto, K. Nii, Y. Kihara and K. Yanagisawa "A 28 nm dual-port SRAM macro with screening circuitry against write-read disturb failure issues" *IEEE J. of Solid-State Circuits*, vol. 46, no. 11, pp. 2535-2544, Nov. 2011.
- [30] Jui-Jen Wu, Meng-Fan Chang, Shau-Wei Lu, Robert Lo, and Quincy Li, "A 45-nm Dual-Port SRAM Utilizing Write-Assist Cells Against Simultaneous Access Disturbances," in *IEEE Trans. on Circuits and Systems II*, vol. 59, no. 11, pp. 790-794, Nov. 2012.
- [31] N. Lien, C. Chuang and W. Wu, "Method for resolving simultaneous same-row access in Dual-Port 8T SRAM with asynchronous dual-clock operation," in *Proc. IEEE Int. System-on-Chip Conf. (SOCC)*, pp. 105-109, Sep. 2013.
- [32] S. Tanaka, Y. Ishii, M. Yabuuchi, T. Sano, K. Tanaka, Y. Tsukamoto, K. Nii and H. Sato, "A 512-kb 1-GHz 28-nm Partially Write-Assisted Dual-Port SRAM with Self-Adjustable Negative Bias Bitline," *VLSI Cir. Symp. Dig.*, pp. 113-114, June 2014.
- [33] Y. Yokoyama, Y. Ishii, K. Tanaka, T. Fukuda, Y. Tsujihashi, A. Miyanishi, S. Asayama, K. Maekawa, K. Shiba and K. Nii, "40 nm Dualport and two-port SRAMs for automotive MCU applications under the wide temperature range of -40 to 170°C with test screening against write disturb issues," in *Proc. A-SSCC*, pp. 25-28, Nov. 2014.
- [34] Koji Nii, Makoto Yabuuchi, Yoshisato Yokoyama<sup>†</sup>, Yuichiro Ishii, Takeshi Okagaki, Masao Morimoto, Yasumasa Tsukamoto, Koji Tanaka, Miki Tanaka and Shinji Tanaka, "2RW dual-port SRAM design challenges in advanced technology nodes," *IEEE Int. Electron Devices Meeting (IEDM)*, pp. 11.1.1-11.1.4, Dec. 2015.
- [35] Yen-Huei Chen, Kao-Cheng Lin, Ching-Wei Wu, Wei-Min Chan, Jhon-Jhy Liaw, Hung-Jen Liao and Jonathan Chang, "A 16 nm Dual-Port SRAM with Partial Suppressed Word-line, Dummy Read Recovery and Negative Bit-line Circuitries for Low VMIN Applications," VLSI Cir. Symp. Dig., June 2016.
- [36] M. Khayatzadeh, M. Saligane, J. Wang, M. Alioto, D. Blaauw and D. Sylvester, "17.3 A reconfigurable dual-port memory with error detection and correction in 28 nm FDSOI," in *IEEE ISSCC Dig. Tech. Papers*, pp. 310-312, Feb. 2016.
- [37] Y. Yamamoto, T. Hasegawa, M. Yabuuchi, K. Nii, Y. Sawada, S. Tanaka, Y. Shinozaki, K. Ito, H. Shinkawata and S. Kamohara, "An implementation of 2RW dual-port SRAM using 65 nm Silicon-on-Thin-Box (SOTB) for smart IoT", *IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conf.* 10.1109/S3S.2017.8309224 Oct. 2017.
- [38] Y. Yokoyama, Y. Ishii, H. Okuda and K. Nii, "A dynamic power reduction in synchronous 2RW 8T dual-port SRAM by adjusting wordline pulse timing with same/different row access mode", in *Proc. A-SSCC*, pp. 13-16, Nov. 2017.
- [39] Hidehiro Fujiwara, Chih-Yu Lin, Hsien-Yu Pan, Cheng-Han Lin, Po-Yi Huang, Kao-Cheng Lin, Jhon-Jhy Liaw, Yen-Huei Chen, Hung-Jen Liao, Jonathan Chang, "A 7 nm 2.1 GHz Dual-Port SRAM with WL-RC Optimization and Dummy-Read-Recovery Circuitry to Mitigate Read-Disturb-Write Issue", in *IEEE ISSCC Dig. Tech. Papers*, pp. 390-392, Feb. 2019.
- [40] S. Jain, L. Lin and M. Alioto, "±CIM SRAM for Signed In-Memory Broad-Purpose Computing From DSP to Neural Processing," in *IEEE J.* of Solid-State Circuits, vol. 56, no. 10, pp. 2981-2992, Oct. 2021.
- [41] Simon M. Sze and Kwok K. Ng, "Physics of Semiconductor Devices," John Wiley & Sons, Inc. Publication, July 2006.
- [42] Takeshi Kida, Yasumasa Tsukamoto and Yuji Kihara, "Optimization of importance sampling Monte Carlo using consecutive mean-shift method and its application to SRAM dynamic stability analysis", in *Proc. Int. Symp. on Quality Electronic Design (ISQED)*, pp. 572–579, Mar. 2012.
- [43] Koji Nii, "Semiconductor Memory," Patent US6529401B2, Mar. 2003.
- [44] Koji Nii and Atsushi Miyanishi, "Semiconductor memory device," Patent US6347062B2 Feb. 2002.
- [45] T. Yamauchi, Y. Yamaguchi, T. Kono and H. Hidaka, "Embedded flash technology for automotive applications," *IEEE Int. Electron Devices Meeting (IEDM)*, pp. 28.6.1–28.6.4, Dec. 2016.
- [46] M. Sultan M. Siddiqui, Sumit Srivastav, Dattatray Ramrao Wanjul, Manankumar Suthar and Sudhir Kumar, "A 7-nm Dual Port 8T SRAM with Duplicated Inter-Port Write Data to Mitigate Write Disturbance," in Proc. International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID), pp.266-270, Oct. 2018.

- [47] Yoshisato Yokoyama;Yuichiro Ishii;Koji Nii;Kazutoshi Kobayashi, "Cost-Effective Test Screening Method on 40-nm Embedded SRAMs for Low-Power MCUs," *IEEE Trans. on VLSI Systems (TVLSI)*, Vol. 29, No. 7, pp. 1495-1499, June 2021.
- [48] Suresh Babu Kotha; Kumar Rahul; Mohammad Anees; Santosh Yachareni; Subodh Kumar, "High speed low power SEU tolerant Pseudo dual port memory in 7nm," in *Proc. International Symposium on Networks, Computers and Communications (ISNCC)*, Oct. 2021.





**Yoshisato Yokoyama** received a B.E. degree in science and engineering from Chuo University, Tokyo, Japan in 2000, and the M.E. degree in electrical engineering from Tokyo Institute of Technology, Tokyo, Japan in 2002, and a Ph.D. degree in engineering from Kyoto Institute of Technology, Kyoto, Japan, in 2021. He joined NEC Electronics Corporation, Kanagawa, Japan, in 2003. He had been in charge of development of advanced FinFET SRAM compilers at Renesas Electronics, Tokyo, Japan, until 2021.

**Koji Nii** received B.E. and M.E. degrees in electrical engineering from Tokushima University, Tokushima, Japan, in 1988 and 1990, respectively, and a Ph.D. degree in informatics and electronics engineering from Kobe University, Hyogo, Japan, in 2008. In 1990, he joined the ASIC Design Engineering Center, Mitsubishi Electric Corporation, Itami, Japan, where he has been working on designing 0.8um to 130 nm embedded SRAMs and CAMs for CMOS ASICs and

researching on SOI SRAM development. In 2003, he was transferred to Renesas Technology Corporation, Itami, Japan, which is a joint company of Mitsubishi Electric Corp. and Hitachi Ltd. in the semiconductor field. He has worked on designing 45 nm to 90 nm embedded low-power and high-speed SRAMs and researching on SRAM assist circuits techniques to enhance the functional margin against variations. He transferred his work location to Kodaira, Tokyo on April 2009, where he has worked on the research and development of embedded SRAM/TCAM/ROM and low-power design techniques with power gating in 28 nm High-k/Metal-gate, advanced 7-16 nm FinFETs, and FD-SOI SRAM macros. He moved to Floadia Corporation, which is an embedded Flash IP company in Kodaira, Tokyo, in 2018. He is now with TSMC Design Technology Japan, Inc., in charge of a head of memory design team for developing advanced FinFET SRAM compilers, and custom cache SRAMs, Register files, CAMs and computing-in-memory (CiM) IPs. His current responsibility is Director, Japan Memory Design Program, Memory Solution Division. Dr. Nii holds over 100 US patents and published 41 IEEE/IEICE papers and over 100 talks at major international conferences. He received the Best Paper Awards at IEEE International Conference on Microelectronic Test Structures (ICMTS) in 2007 and IEEE International Symposium on Quality Electronic Design (ISQED) in 2013. He also received the LSI IP Design Awards in 2007 and 2008, Japan. He was a Technical Program Committee of the IEEE CICC (2010-2016) and IEEE IEDM (2015-2016). He is an Associated Editor of the IEEE Trans. on VLSI Systems. He is a senior member of the IEEE Solid-State Circuits Society, IEEE Computer Society and the IEEE Electron Devices Society. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE), Japan. He was also a Visiting Professor of Graduate School of Natural Science and Technology, Kanazawa University, Ishikawa, Japan (2012-2018) and now is a Senior Fellow of Kyoto Lab for a Greener Future, Kyoto Institute of Technology, Kyoto, Japan (2019–).



Yuichiro Ishii received B.S. and M.S. degrees in electronic engineering from Hokkaido University, Sapporo, Japan, in 1996 and 1998, respectively, and a Ph.D. degree in engineering from Kanazawa University, Ishikawa, Japan, in 2018. In 1998, he joined the System LSI Division, Mitsubishi Electric Corporation, Itami, Japan. Since then, he has been engaged in the development of embedded SRAMs. In 2003 and 2010, he was transferred to Renesas Technology Corporation and Renesas Electronics Corporation.

In 2020, he was transferred to TSMC design technology Japan, Inc. Yokohama,

Japan, where he worked on the research and development of advanced node embedded SRAMs.





Shinji Tanaka received a B.E. degree in electronic engineering from Nagoya University, Nagoya, Japan, in 1989. He joined Mitsubishi Electric Corporation, Itami, Japan, in 1989. He was engaged in the development of DRAM and Embedded SRAM. He is currently a Section Manager at Design Platform Technology Department, Renesas Electronics Corporation, Tokyo, Japan.

**Kazutoshi Kobayashi** (M' 1996, S' 2019) received B.E., M.E. and Ph.D. degrees in Electronic Engineering from Kyoto University, Japan in 1991, 1993, 1999, respectively. Starting as an Assistant Professor in 1993, he was promoted to associate professor in the Graduate School of Informatics, Kyoto University, and stayed in that position until 2009. For two years during this time, he acted as associate professor of VLSI Design and Education Center (VDEC) at the University of Tokyo. Since

2009, he has been a professor at Kyoto Institute of Technology. While in the past he focused on reconfigurable architectures utilizing device variations, his current research interest is in improving the reliability (Soft Errors, Bias Temperature Instability and Plasma Induced Damage) of current and future VLSIs. He started a research related to gate drivers for power transistors since 2013. He was the recipient of the IEICE Best Paper Award in 2009, the IRPS Best Poster Award in 2013 and the IEICE Electronics Society award in 2021.