# A 65 nm Low-Power Adaptive-Coupling Redundant Flip-Flop

Masaki Masuda, Kanto Kubota, Ryosuke Yamamoto<sup>†</sup>, Jun Furuta\*, Kazutoshi Kobayashi<sup>†</sup>‡ and Hidetoshi Onodera\*‡

†Kyoto Institute of Technology, \*Kyoto University, ‡JST, CREST, Japan

# Abstract

We propose a low-power redundant flip-flop to be operated with high reliability over 1GHz clock frequency based on the low-power (ACFF) and the highly-reliable (BCDMR) flip-flops. Its power dissipation is almost equivalent to the transmissiongate FF at 10% data activity while paying 3x area penalty. Experiments by  $\alpha$ -particle and neutron irradiation reveal its highly-reliable operations with no error at 1.2 V and 1 GHz. We measured five different process corner chips by  $\alpha$  irradiation. Soft error rates are almost equivalent in these corner chips.

### Introduction

To protect FFs (Flip-Flops) from soft errors caused by  $\alpha$  particles or neutrons, several redundant flip-flop structures are proposed such as TMR (Triple Modular Redundancy), DICE (Dual Interlocked storage Cell) [1], BISER (Built-In Soft Error Resilience) [2] or RHBD-MSFF (Radiation Hardening By Design Master-Slave Flip-Flop) [3]. According to the process scaling, reliability is increasingly reduced [4]. Currently, processors for servers are implemented with some redundancy to guarantee reliability [1]. In the near future, redundancy must be used on consumer products for low-power portable applications. The conventional redundant FFs have large area and power overhead. It is very hard to reduce the area penalty since redundancy requires additional transistors. But the power penalty can be reduced to adapt lower power techniques. We propose a low-power redundant flipflop with highly reliable0 operations over 1 GHz with almost same power as transmission gate FFs. We have fabricated a test chip in a 65 nm process. The chip includes several FFs with the proposed BCDMR-ACFF (Bistable Cross-coupled Dual Modular Redundancy Adaptive Coupling Flip Flop) As for the power dissipation, it has 38.5% power of the



Fig. 1. Schematic diagram of BCDMR-ACFF



Fig. 2. Schematic diagram of ACFF

BCDMR FF [5] obtained at 0% data activity from the measurement results. The experimental results by  $\alpha$  particle and neutron irradiations show that no error is observed up to 1GHz operations on the proposed redundant FF array.

# Low-Power Highly-Reliable Redundant Flip-Flop

Fig. 1 shows the proposed low-power highlyreliable redundant flip-flops named as "BCDMR-ACFF" based on the BCDMR FF for high reliability and ACFF [6] for low-power operation.

Fig. 2 shows the schematic of ACFF. It operates with the single-phase clocking scheme using passtransistors. Without using local clock buffers, power



Fig. 3. Schematic diagram of TGFF

dissipation can be reduced. As data activity becomes low, total power dissipation is drastically reduced. However, PMOS pass-transistors are too weak to pass through a substantially large drain current. It is difficult to overwrite the master latch because PMOS pass-transistors are located in front of the master latch. The Adaptive-Coupled (AC) two transistors make it easy to overwrite the master latch. When the next value is same as the current value, the cross-coupled loop keeps the current value. When it is different, the AC makes the holding value weak. The number of transistors of ACFF is fewer by two transistors than the transmission-gate (TG) FF as shown in Fig. 3. Ref. [6] shows that ACFF can be operated down to 0.75V supply voltage in 40nm process. It is possible to operate at lower voltages if AC elements are embedded in the slave latch.

Fig. 4 shows the schematic of BCDMR. It consists of two pairs of master and slave latches and two "2C+K" parts. The "2C+K" parts includes two Muller's inverting C-elements and one keeper. If one of two latches is flipped by a temporal soft error, the C-element becomes high impedance and the keeper keeps the original value. The upset latch recovers when the next clock is injected to the FF. It is based on the BISER structure, which is more areaefficient than the triple-modular redundancy (TMR). Fig. 5 shows the schematic of BISER. It has only a single C-element at each latch. It is vulnerable to a Single-Event Transient (SET) pulse produced by the C-element [7].

Fig. 6 shows two possible cases of upsets in redundant FFs caused by SET pulses coming to the input of master or slave latches [7], [8]. The SET pulse width is distributed from several hundred ps to 1000 ps according to the tap (well contact) density, gate sizes and etc [9], [10], [11]. The possibility that a SET pulse is captured by latches depends on the clock frequency. If a pulse is injected at a



Fig. 4. Schematic diagram of BCDMR



Fig. 5. Schematic diagram of BISER



Fig. 6. SET pulse coming to the input of redundant master or slave latches (*n* LATs) to be captured by positive or negative edge of clock.

clock edge, it will be captured by multiple redundant master or slave latches. For example, a 500 ps SET pulse is roughly captured by 50% at 1 GHz clock.

To remove a SET pulse coming to master latches, a delay element ( $\delta$ ) such as in [12] can be used as described by dotted lines in Fig. 4. But BISER is vulnerable to a SET pulse produced by the Celement between master and slave latch. There are two methods to remove a SET pulse coming to slave latches. The first method is to insert delay elements in front of slave latches. But, it makes the area and delay penalties much bigger. The second method is to duplicate C-elements. The area and delay penalties are smaller than the first method. A SET pulse from CM0 is only captured by SL0. BCDMR FF based on the conventional TGFF shows over 100× better error resiliency than TGFFs by the spallation neutron irradiation[7].

Fig. 7 is another structure of BCDMR-ACFF. The difference between Fig. 1 and Fig. 7 is a connection method of C-elements and slave latches.



Fig. 7. Another schematic of BCDMR-ACFF with smaller number of Trs, bit with lower resiliency.

The number of transistors is smaller than Fig. 1. However, a SET pulse from a master latch may be captured by both of redundant slave latches. Thus the structure in Fig. 1 is used to guarantee high reliability by dissipating two inverters.

Table I shows area, delay, power and ADP (areadelay-power) products of redundant and non redundant FFs. BCDMR-ACFF consumes over 4x higher power than transmission gate FF (TGFF) at the 100% data activity ( $\alpha = 100\%$ ), while it consumes almost same power at  $\alpha = 10\%$ . Note that the power is obtained from circuit-level simulations by driving 8 FFs with a 2x clock buffer. Without adding the clock buffer, ACFF achieves much less power because no local clock buffer is required. The delay of the BCDMR-ACFF is almost equivalent. The area is 3x larger than TGFF. Fig. 8 shows power dissipation at 1.2 V according to data activity normalized by TGFF. BCDMR-ACFF is less than the original BCDMR below 40% data activity. BCDMR-ACFF achieves low power operation, because the average data activity of flip-flops in an SoC chip is typically between 5 and 15% [6]. BCDMR-ACFF has 27% power of the original BCDMR at 0% data activity. The ADP product in Fig. 9 of BCDMR-ACFF is about 2.0 at  $\alpha$ =0%, which is almost 3.8x smaller than BCDMR FF implemented with TGFFs. We can construct a low-power BCDMR FF by using any kind of low-power master-slave edge-triggered FFs. It is the most significant advantages of the BCDMR structure.

Table I: Area, delay and power of FFs normalized by TGFF.

|   |            |      |       | $\alpha$ =10% | $\alpha$ =100% |
|---|------------|------|-------|---------------|----------------|
|   | FF         | Area | Delay | Power         |                |
|   | ACFF       | 1.05 | 0.72  | 0.51          | 1.21           |
|   | BCDMR FF   | 2.84 | 1.27  | 2.21          | 2.73           |
| ľ | BCDMR-ACFF | 3.16 | 1.11  | 1.16          | 3.79           |



Fig. 8. Power dissipation at 1.2V according to data activity normalized by DFF.



Fig. 9. ADP products at 1.2V according to data activity normalized by DFF.

## **Test Chip**

We have fabricated a 2 mm $\times$ 4 mm test chip in a 65 nm CMOS bulk process as in Fig. 10 with the detailed structures and the cell layout of the BCDMR-ACFF. The cell layout is implemented with the double height cell structure [13]. In the double height cell, its height is twice as large as single height cells such as inverter or NAND gates. Single and double height cells can be correctly handled by commercial place and route tools. The critical nodes are separated as far apart as possible



Fig. 10. Chip micrograph with detailed structure and BCDMR-ACFF layout

Table II: No. of FFs on the fabricated chip.

|                | No. of FFs |
|----------------|------------|
| TGFF           | 2336       |
| ACFF           | 2272       |
| BCDMR(3W)      | 16800      |
| BCDMR(2W)      | 16800      |
| BCDMR-ACFF(3W) | 16384      |
| BCDMR-ACFF(2W) | 16384      |

without area penalty to eliminate a simultaneous flip of redundant components [7][14]. The sensitive area of PMOS transistors is much smaller than that of NMOS transistors[15][16]. The double height cell shared N-well (PMOS transistors) region. The horizontal distance is more critical than the vertical distance. Therefore, all master or slave components such as ML0/1 and 2C+K(ML) are placed in the checker-board pattern. These four sorts of FFs are implemented on a die: BCDMR-ACFFs, BCDMR FFs, ACFFs and TGFFs on the twin-well (2W) structure, and BCDMR-ACFFs, BCDMR FFs on the triple-well (3W) structure. Table II shows bit numbers of these FFs. All those FFs are connected in series as a shift register. The chip has two clock pins, SHIFT\_CLK and PLL\_CLK. The former is used on the shift operation, while the latter is used during irradiation. To guarantee the hold restrictions of all serially-connected FFs, these clock signals are given from the tail of the shift register (CI to CO), while the shift input is given from the head (SI to SO). In order to measure soft-error resiliency of these FFs around 1GHz, a PLL (Phase-Locked Loop) is used to multiply the clock up to 80x.

Fig. 11 shows the simplified schematic structure of the shift register and the clock distribution scheme. All FFs are connected in series on the shift operation (LOOP=0). Clock signals are also connected in series from head to tail, while all FFs



Fig. 11. Clock distributions to guarantee over 1 GHz operations during soft-error experiments and no-hold violations at shift operations

are in the loop mode during irradiation, in which 8 FFs form a loop to capture flipped values. As shown in Fig. 11, redundant FFs are connected by two wires. Therefore a SET pulse from the previous FF is never captured by two redundant latches in BCDMR. In the test chip, there is no delay element ( $\delta$ ) between FFs. During irradiation, the clock signal is given from PLL\_CLK. The whole clock distribution tree consists of a clock stem and clock branches to distribute higher clock frequency to FFs. If such higher clock is given from SHIFT\_CLK through the clock branches in series, it disappears in the middle of the branches because of the propagation-induced pulse-width fluctuation.

### **Experimental Results**

The error resilience of the FFs on the fabricated chip are measured by  $\alpha$ -particles from 3M Bq  $^{241}A_m$  and neutron irradiations at RCNP (Research Center for Nuclear Physics) of Osaka University [17]. Fig. 12 shows the neutron beam spectrum compared with the terrestrial neutron spectrum at the ground level of Tokyo. The average accelerated factor is  $3.8 \times 10^8$  in this measurement. In this work, all FFs are initialized to 0.

On the  $\alpha$ -particles irradiation, clock frequency is 0, 100 M, 300 M, 800 M and 1 GHz. When clock frequency is 0 Hz, we measured two patterns (CLK=0 or 1). Flipped values are obtained every 5 min. Fig. 13 shows the measurement results of nonredundant FFs (TGFF, ACFF) by  $\alpha$ -particles. ACFF has lower error rates than TGFF over all measured frequencies. Fig. 14 shows the measurement results of redundant FFs (BCDMR, BCDMR-ACFF) by  $\alpha$ -particles at 0 Hz. We observed a few errors in BCDMR and BCDMR-ACFF regions only at 0 Hz.



Fig. 12. Neutron spectrum at RCNP.



Fig. 13. No. of errors (flipped FFs) of non-redundant FFs per 1kbit by 5min.  $\alpha$  irradiations at 1.2V.

BCDMR structure keeps value by two latches and a keeper. If one latch is upset, the other latch and the keeper hold the correct value. The upset latch recovers when the next clock is injected to the FF. When no clock is applied, the upset latch remains upset. If the other latch is upset afterwards, the output of the FF becomes wrong. BCDMR-ACFF has lower error rates than TGFF. BCDMR-ACFF has as high reliability as BCDMR for  $\alpha$ -particles.

On the neutron irradiation, clock frequency is 100 M, 300 M, 800 M and 1 GHz. Multiple DUTs (Device Under Tests) were measured at the same time to increase the number of observed errors. Flipped values are obtained every 5 min. Fig. 15 shows measurement results of non-redundant FFs by neutron irradiations. FIT (Failure In Time) is the number of errors in 10<sup>9</sup> hours. ACFF has lower error rates than TGFF except for 800 MHz. However, the number of errors is very few because of smaller number of bits. It is difficult to compare the error resilience. No errors is observed up to 1 GHz in redundant FFs. SER of redundant FFs is smaller than 5.1 FIT/Mbit.



Fig. 14. No. of errors (flipped FFs) of redundant FFs per 1kbit by 5min.  $\alpha$  irradiations at 1.2V.



Fig. 15. Soft Error Rate (FIT/Mbit) from neutron irradiations at 1.2V.

We measured power dissipation of the redundant FFs on the fabricated chip by changing the data activities. It is possible to give the clock signal only on the specified FF regions in the fabricated chip. The local loop structure in the upper-right side of Fig. 11 can be used to change the data activities,  $\alpha$ . When these 8 FFs stores the same value,  $\alpha$  is equal to 0%, while it becomes 100% by storing the checker-board pattern in these 8 FFs. Fig. 16 shows the measurement results at 1.2V supply voltage normalized by the power of TGFF. FFs implemented with the ACFF structure achieve low-power operations at lower data activities. At the 0% data activity, BCDMR-ACFF has 38.5% power of the original BCDMR.

### **Measurement Results of Process Corner Chips**

According to the aggressive process scaling, variations of transistor parameters are increasing year by year. We fabricated several corner chips that have



Fig. 16. Measured power dissipations at 1.2V normalized by the power of TGFF



Fig. 17. Number of Errros / 1kbit of TGFFs by clock frequencies

slower or faster transistors. We have four types of chips, FF, FS, SF and SS. FS means that PMOS is fast, while NMOS is slow. They were fabricated by controlling doping and channel length. Including TT (PMOS/NMOS are Typical) chips, we measured five corner chips by  $\alpha$  irradiations. Figure 17 shows the number of errors per 1 kbit of TGFFs by clock frequencies, while Fig. 18 depicts those of ACFFs. We can not see any specific differences among these corner chips, which is the similar result of 40 nm SRAMs in [18]

# Conclusions

We have fabricated a 65-nm chip including the low-power redundant FF called BCDMR-ACFF by using low-power ACFF and the highly-reliable BCDMR FF. The ADP product of BCDMR-ACFF is smaller than that of the original BCDMR when data activity is below 40%. At 0% data activity, the ADP product of BCDMR-ACFF is 2x larger than that of the TGFF. The error resilience of the



Fig. 18. Number of Errros / 1kbit of ACFFs by clock frequencies

FFs is measured by  $\alpha$ -particles and white neutron irradiations. No error is observed in the proposed BCDMR-ACFF up to 1GHz clock frequency besides 0Hz by the  $\alpha$ -particle and neutron irradiation. As for the power dissipation, BCDMR-ACFF has 38.5% power of the original BCDMR at 0% data activity from the measurement results. By measuring 5 process corner chips, we can not find any specific differences of soft error rates. We expect that the BCDMR-ACFF has better error resilience than the original BCDMR for  $\alpha$  particles, because BCDMR-ACFF is based on ACFF which has almost lower error rates than TGFF.

Acknowledgment: The VLSI chip was fabricated in the chip fabrication program of VDEC, the University of Tokyo, STARC, e-Shuttle, Inc., and Fujitsu Ltd.

#### REFERENCES

- D. Krueger, E. Francom, and J. Langsdorf. Circuit design for voltage scaling and ser immunity on a quad-core itanium processor. In *ISSCC*, pages 94–95, Feb. 2008.
- [2] M. Zhang, S. Mitra, T. M. Mak, N. Seifert, N. J. Wang, Q. Shi, K. S. Kim, N. R. Shanbhag, and S. J. Patel. Sequential element design with built-in soft error resilience. *IEEE Trans. VLSI Sys.*, 14(12):1368–1378, Dec. 2006.
- [3] B. I. Matush, T. J. Mozdzen, L. T. Clark, J. E. Knudsen. Area-Efficient Temporally Hardened by Design Flip-Flop Circuits. *IEEE Trans. on Nucl. Sci.*, 57(6):3588–3595, Dec. 2010.
- [4] N. Seifert, P. Slankard, M. Kirsch, B. Narasimham, V. Zia, C. Brookreson, A.Vo, S. Mitra, B. Gill, J. Maiz. Radiation-Induced Soft Error Rates of Advanced CMOS bulk Devices. *IRPS*, 217– 225, Mar. 2006.
- [5] J. Furuta, C. Hamanaka, K. Kobayashi, and H. Onodera. A 65nm Bistable Cross-coupled Dual Modular Redundancy Flip-Flop Capable of Protecting Soft Errors on the C-element. In VLSI Circuits Symp., pages 123–124, June 2010.
- [6] K. T. Chen, T. Fujita, H. Hara, and M. Hamada. A 77% energy-saving 22-transistor single-phase-clocking d-flipflop with adaptive-coupling configuration in 40nm cmos. In *ISSCC*, pages 338–340, Feb. 2011.

- [7] R. Yamamoto, C. Hamanaka, J. Furuta, K. Kobayashi, and H. Onodera. An Area-efficient 65 nm Radiation-Hard Dual-Modular Flip-Flop to Avoid Multiple Cell Upsets. *IEEE Trans. on Nucl. Sci.*, 58(6):3053 3059, Dec. 2011.
- [8] K. M. Warren, A. L. Sternberg, J. D. Black, R. A. Weller, R. A. Reed, M. H. Mendenhall, R. D. Schrimpf, L. W. Massengill. Heavy Ion Testing and Single Event Upset Rate Prediction Considerations for a DICE Flip-Flop. *IEEE Trans. on Nucl. Sci.*, 56(6):3130 3137, Dec. 2009.
- [9] H. Nakamura, K. Tanaka, T. Uemura, K. Takeuchi, T. Fukuda, S. Kumashiro. Measurement of neutron-induced single event transient pulse width narrower than 100ps. *IRPS*, 694 – 697, May. 2010.
- [10] M. J. Gadlage, J. R. Ahlbin, B. Narasimham, B. L. Bhuva, L. W. Massengill, R. A. Reed, R. D. Schrimpf, G. Vizkelethy. Scaling Trends in SET Pulse Widths in Sub-100 nm Bulk CMOS Processes. *IEEE Trans. on Nucl. Sci.*, 57(6):3336 – 3341, Dec. 2010.
- [11] J. Furuta, and C. Hamanaka, and K. Kobayashi, and H. Onodera. Measurement of neutron-induced SET pulse width using propagation-induced pulse shrinking. *IRPS*, 5B.2.1-5B.2.5, Apr. 2011.
- [12] S. Shambhulingaiah, L. T. Clark, T. J. Mozdzen, N. D. Hindman, S. Chellappa, K. E. Holbert. Temporal Sequential Logic Hardening by Design with a Low Power Delay Element. *RADECS Proceedings*, B-6: 144–149, Sep. 2011.
- [13] T. Uemura, Y. Tosaka, H. Matsuyama, K. Shono, C. J. Uchibori, K. Takahisa, M. Fukuda, K. Hatanaka. SEILA: Soft Error Immune Latch for Mitigating Multi-node-SEU and Local-clock-Set. *IRPS*, 218–223, May. 2010.
- [14] Jonathan E. Kundsen, Lawrence T. Clark. An Area and Power Efficient Radiation Hardened by Design Flip-Flop. *IEEE Trans. on Nucl. Sci.*, 53(6):3053 – 3059, Dec. 2006.
- [15] G. Toure, G. Hubert, K. Castellani-Coulie, S. Duzellier, and J. Portal. Simulation of Single and Multi-Node Collection: Impact on SEU Occurrence in Nanometric SRAM Cells. *IEEE Trans. on Nucl. Sci.*, 58(3): 862–869, Jun. 2011.
- [16] P. Dodd, A. Shaneyfelt, K. Horn, D. Walsh, G. Hash, T. Hill, B. Draper, J. Schwank, F. Sexton, and P. Winokur. SEU -sensitive volumes in bulk and and and SOI SRAMs from first-principles. *IEEE Trans. on Nucl. Sci.*, 48(6): 1893–1903, Dec. 2001.
- [17] Slayman, C.W. Theoretical Correlation of Broad Spectrum Neutron Sources for Accelerated Soft Error Testing. *IEEE Trans. Nucl. Sci.*, 57(6):3163-3168, Dec. 2010.
- [18] G. Gasiot, M. Glorieux, S. Uznanski, S. Clerc, P. Roche. Experimental characterization of process corners effect on SRAM alpha and neutron soft error rates. *IRPS*, 3C.4.1-3C.4.5, 2012.