Circuit-level Insight into Reliability Issues of Si-based Semiconductor Chips
Kazutoshi Kobayashi
Department of Electronics, Graduate School of Science and Technology, Kyoto Institute of Technology, Japan

Abstract
This invited talk focuses on circuit-level insight into reliability issues. One topic is aging degradations to cause hard errors and the second topic is soft errors that are one of temporal failures caused by a radiation particle hit.

1. Introduction
Silicon-based semiconductor chips are currently used everywhere surrounding our daily life. The perfect diamond-shaped crystal structure of Silicon makes chips very reliable. But reliability issues on Silicon-based semiconductor chips have been becoming a serious concern to threaten our daily life since only a single bit of errors may affect social infrastructures, automobiles, medical devices, and etc. This invited talk deals with circuit-level insight of the following two topics related to reliability issues: aging degradations and soft errors. Aging degradation causes hard errors that cannot be recovered, while soft errors are temporal errors that can be recovered.

2. Aging Degradation
2.1 Principle of Bias Temperature Instability
Due to the aggressive process scaling, aging degradation is becoming one of important concerns in semiconductor chips for automotive and other mission-critical applications. MOS (Metal Oxide Semiconductor) transistors are degraded by the stress between the gate and source terminals. In P-type MOS (PMOS) transistors, a negative bias of \( V_{gs} \) degrades the performance of transistors such as the threshold voltage \( V_{th} \) and the transconductance \( g_{m} \). Thus the degradation on PMOS transistors is called NBTI (negative bias temperature instability). Fig. 1 shows the bias conditions of NBTI on a PMOS transistor and PBTI (positive bias temperature instability) on an NMOS transistor. As can be expected from the acronym of PBTI, the degradation is accelerated by the bias \( V_{gs} \) and an ambient temperature. BTI-induced \( V_{th} \) degradation is typically proportional to \( t^{\delta n} \). A universal model proposed in [1] stated that the time exponent \( n \) is around 1/6.

2.2 BTI-sensitive and -insensitive Ring Oscillators to Measure Aging Degradations
A ring oscillator consists of an odd number of inverters. Its oscillation frequency is defined by \( 2n\delta T \) where \( n \) is the number of stages and \( \delta T \) is the delay of the single inverter. Note that \( n \) must be a prime number in order to prevent propagation of harmonics.

We use an engineering tester and a Peltier element to measure aging degradations as shown in Fig. 2 [2]. ROs periodically oscillate to measure aging degradation by the number of oscillations. Almost all the time they stop oscillation to be stressed.

In [2], we proposed BTI-sensitive and BTI-insensitive ROs in order to remove fluctuations by slight changes of the temperature or the stress bias during a long period up to \( 10^{5} \) s. We fabricated those ROs in a 65-nm thin BOX (buried oxide) FDSOI (fully depleted Silicon on insulator) process technology. The results were obtained by subtraction of the measurement results of BTI-sensitive ROs from those from BTI-insensitive ones. As shown in Figs. 3 and 4, the degradation rates of oscillation frequencies originally from BTI-sensitive/insensitive ROs were fluctuated, while the differences from the subtractions were smoothly fit to \( t^{0.15} \).

2.3 Ultra Long-term Measurement Results of BTI-induced Aging Degradation
In the previous section, the measurement system using the engineering tester is introduced. But the measurement periods were up to \( 10^{4} \) s (= a few hours). Conventional measurement results of BTI-induced degradations ended around \( 10^{4} \) to \( 10^{5} \) s since typical measurement systems consist of commercially-available measurement instruments that must be shared with many research projects. In [3], we constructed the measurement system using a micro controller and an FPGA (field programmable gate array) that can keep on working for a ultra long term as shown in Fig. 5. In [3], we disclosed measurement results of 7-nm FinFET (fin-shaped field effect transistor) ROs up to \( 1.25 \times 10^{7} \) s (\( \simeq \) 144 days). The measurement results at the nominal operating voltage of 0.75 V and the ambient temperature of 125°C followed the degradation tendency proportional to \( t^{\delta n} \). But different from the results of the previous section in which the ROs were fabricated in a 65 nm process technology, \( V_{th} \) degradations of the 7-nm ROs followed \( t^{0.23 \sim 0.25} \) as shown in Fig. 6.

3. Soft Error
Soft errors are one of temporal failures caused by a radiated particle strike on a semiconductor chip. Stored data on storage cells are flipped. Unlike the aging degradation discussed...
3.1 Mitigation Techniques on Memory Cells

In order to mitigate soft errors on semiconductor chips, many countermeasures have been proposed. Soft errors were firstly reported on DRAM (dynamic random access memory) in 1979 [4]. But currently, soft error rates (SERs) on DRAMS become much smaller. In [5], SERs of DDR4 and GDDR5 DRAMS are around a few FIT/Gb. FIT is a unit of soft errors which means the number of errors in 10^9 hours. In SRAMS (static random access memory), typical SERs are around a few hundred to a few thousand FIT/Mb. For personal use, soft errors on DRAMS or SRAMS can be ignored. But for the critical applications such as autonomous driving, avionics and medical use, some kind of soft error mitigation techniques must be utilized. For DRAMS or SRAMS, an error correction code called SECDED (single error correction and double error detection) is commonly used. SECDED can correct a single-bit error and detect two-bit errors in a word.

3.2 Mitigation Techniques on Latches or Flip-flops

For protecting a latch or flip-flop (FF), redundancy is used. The most simple mitigation technique is triplication. The TMR (triple modular redundancy) latch has a voter to resolve an error on the triplicated latches. But the area, delay and power overheads of TMR are relatively huge. We proposed lots of flip flops with lower overheads than TMR. Fig. 7 is called the bistable cross-coupled dual modular (BCDMR) FF that was revised from the BISER (built-in soft error resilience) FF. The BISER FF has a drawback where SERs are increased by clock frequencies. In the BCDMR FF, the additional C-elements and keepers resolves the weak point of the BISER FF. We fabricated the BCDMR FF in 65nm bulk and 16nm FinFET process technogies [6–8]. Fig. 8 shows measurement results by white neutron irradiation. White neutron is an accelerated neutron beam that has a similar spectrum with neutrons in the terrestrial region. As shown in Fig. 8, the SERs of all BCDMR FFs is less than 50 FIT/Mbit that is 1/10 smaller than a conventional D-type FF (DFF) in the 65 nm. “Interleaved” means the layout structure where redundant elements that must not be flipped at the same time are placed far from each other. The interleaved structure decreases the SER from 50 FIT/Mbit to 9 FIT/Mbit in 65 nm bulk. The layout structure of the 16 nm BCDMR FF was not interleaved. The SER of an interleaved BCDMR FF must be much smaller than non-interleaved one.

We also proposed a mitigation technique for the FDSOI process technology. In FDSOI, every transistor is isolated by a BOX (buried oxide) layer. Thus the stacked structure is very effective to prevent a soft error on latches or FFs [9]. [10] proposed the guard-gate structure to reduce SEUs with larger delay and are overheads. To overcome those overheads, we proposed FRFF (feedback recovery FF) as shown in Fig. 9 [11]. In FRFF, the guard gate structure is embedded in the secondary latch. In the 65 nm FDSOI, the area, delay and power overheads of FRFF is only 3%, 6% and 3% respectively. The SERs by white neutron irradiation are shown in Fig. 10. Note that DFRFF is the dual feedback recovery FF where additional delay elements are implemented in the secondary latch. Due to the small overheads, the averaged SERs of FRFF and DFRFF are only 1/3 and 1/5 of a DFF, respectively. But it is one of countermeasures to soft errors where large overheads cannot be allowed. We also fabricated those FFs in a 22 nm FDSOI process technology [12]. Due to the strict design rule of the 22nm FDSOI, the overheads become larger than those in the 65 nm FDSOI. But the SERs of FRFF and DFRFF by a carbon beam irradiation are less than 1 FIT/Mbit.

4. Conclusion

In this paper, two topics related to circuit reliability issues on Si-based semiconductor chips are explained. Aging degradations must be considered during design phases in order to mitigate permanent failures, while soft errors must be mitigated to prevent erroneous operations. In Section 2, several ring oscillators are introduced to measure aging degradations. The ultra long-term measurement system can keep on operating over several months. The ROs in 7 nm FinFET have been degrading by t^0.24 over 100 days. In Section 3 several radiation-hard FFs are introduced for bulk and FDSOI process technologies. BCDMR FF was proposed for bulk or FinFET and FRFF was proposed for FDSOI.

References