



# Circuit-level Insight of Soft Errors and Aging Degradations

Kazutoshi Kobayashi Kyoto Institute of Technology Japan kazutoshi.kobayashi@kit.ac.jp

# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

#### Reliability Issues in VLSIs



# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

# First Step to Reliability

- In 2007, Japanese funding agency starts a research project named "Dependable VLSI".
  - 11 projects started in 2007-2009.
  - Our project is "Dependable VLSI Platform using Robust Fabrics" managed by Prof. Onodera (Kyoto Univ.)



Prof.

Reliability: Max

Onodera

# From Novice to Expert

- I was a novice researcher in the field of reliability in 2007.
  - In 2007, "FIT is the number of errors in 10<sup>9</sup> hours" on my notebook.
  - I did not know the word "RTN".
    - "What is telegraph?"
- First attendance of IRPS in 2008
- First soft error paper in VLSI symposium in 2010
- First soft error paper on IRPS in 2011
  - 15 papers + 1 tutorial on IRPS from 2011 to 2022. Best poster award in 2013
- IEDM RSD TPC member (2019–20), IRPS Circuit Reliability topic chair (2022)



Telegraph



# Paper Lists of Reliability

- 138 papers (including international conferences) were published from 2009 to 2022.
- Most-cited papers (by Scopus)
  - R. Yamamoto, C. Hamanaka, J. Furuta, K. Kobayashi, and H. Onodera, "An Area-efficient 65 nm Radiation-Hard Dual-Modular Flip-Flop to Avoid Multiple Cell Upsets", IEEE TNS, vol.58, 2011 (50 citations)
  - K. Kobayashi, K. Kubota, M. Masuda, Y. Manzawa, J. Furuta, S. Kanda, and H. Onodera, "A Low-Power and Area-Efficient Radiation-Hard Redundant Flip-Flop, DICE ACFF, in a 65 nm Thin-BOX FD-SOI", IEEE TNS, vol.61, 2014 (45 citations)
  - J. Furuta, C. Hamanaka, K. Kobayashi, and H. Onodera, "A 65nm Bistable Cross-coupled Dual Modular Redundancy Flip-Flop Capable of Protecting Soft Errors on the C-element", VLSI Circuit Symposium, 2010 (34 citations)
  - K. Ito, T. Matsumoto, S. Nishizawa, H. Sunagawa, K. Kobayashi, and H. Onodera, "The Impact of RTN on Performance Fluctuation in CMOS Logic Circuits", IRPS, 2011 (24 citations)
  - J. Furuta, K. Kobayashi, and H. Onodera, "Impact of Cell Distance and Well-contact Density on Neutron-induced Multiple Cell Upsets", IEEE IRPS, 2013 (23 citations)

Red: Soft errors Blue: Circuit Reliability

Full List on http://www-vlsi.es.kit.ac.jp/database/paper-e.php5

# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

# What is Soft Error?



- Caused when a radiation particle penetrates in Si and generates e-h pairs
  - When neutron hits a Si atom (not always). Whenever lpha particle go through chip
- Upset storage cells such as SRAMs/FFs
- A pressing issue of semiconductor chips for automotive, aerospace and HPC
- Not so many companies / researchers knows well about soft errors. Unknown errors → Soft errors?

#### Scaling Trend of Soft Error Rate (SER)



[TNS11] R. Yamamoto et. al, TNS, pp. 3053-3059 (2011), [TNS14] K. Kobayashi et. al, TNS, pp. 1881-1888 (2014) [UemuraPhD] T. Uemura, "A Study on Soft Error Mitigation for Microprocessor in Bulk CMOS Technology", PhD thesis (2011) [Intel 22nm] N. Seifert et. al, TNS, pp. 2666-2673 (2012) [Intel 14nm] N. Seifert et. al, TNS, pp. 2570-2577 (2015) [Samsung 10nm] M. Jin et. al., IEDM 15.1.1-15.1.4 (2016) 11

12

# **SEU Mitigation Techniques**

- Dual lock-step on architecture level
  - -For automotive and aerospace
- Parity and ECC on circuit/algorithm level
   For SRAM and DRAM
- Majority voting on circuit level
  - -For latches and flip flops
- SOI/FinFET on process/device level
  - -For automotive and HPC

Dual lock-step

[M. Baleani, et al., CASES, 2013]



# Soft Error Mitigation Techniques

- Circuit–level
  - Majority Voting such as TMR, DICE, BCDMR FF and etc.
  - Large area, delay and power (ADP) overheads
- Process-level
  - SOI (Silicon on Insulator)
    - 10-100x stronger than bulk
    - No ADP overhead, but more expensive to fabricate
  - FinFET
    - Strong but huge cost (Only for iPhone, FPGA ···)
  - Circuit-level technique for SOI
    - Stacked Strucutre



Stacked FF

# BISER FF



- Built-in Soft-Error Resilience FF
  - Developed by Intel and Stanford
  - Two latches and a weak keeper hold data
  - C-element resolves SBU on latches
  - Area efficient but weak to an SET (Single event transient) pulse from the C-element



[M. Zhang, S. Mitra, et al., Trans. VLSI Sys., 14(12):1368-1378, 2006]

#### BCDMR FF [Furuta et.al, VLSI Cir. 2010]



- Bistable Cross-coupled Dual Modular Redundancy FF
  - Strong against an SET pulse from C-element
  - Duplicated C-elements strongly assists to keep correct data. No areaoverhead because of smaller transistors on C-elements



# Alpha and Neutron Results



Fabricated in a 65 nm bulk

- BCDMR is strong against soft errors at higher clock frequency
- Below 10 FIT at 100MHz. BISER in twin well is 50 FIT. BCDMR FF in twin well has no error

# BCDMR FF in Scaled Technology



- Similar SERs b/w 65nm interleaved and 16nm not-interleaved BCDMR
  - Interleaved layout decreases SER

[K. Kobayashi, et. al. IRPS 2017]



Interleaved Place redundant storage cells as far apart as possible 17

# Soft Errors in Bulk and SOI



- BOX layer prevents carriers from collecting from substrate
  - SOI is resistant to soft errors. SER is 1/10-1/100 of bulk

#### Experimental Results of Standard FF



[Y. Morita, VLSI Tech. Symp., pp 166-167, 2008]

Standard FF

# Soft-error Mitigation for SOI

Stacked Transistor Structure on SOI



- No simultaneous turn-on
  - All transistors are isolated by BOX layer.
  - Not effective on bulk process
- With area and delay overheads
- 1/3 to 1/10 SER reduction on stacked FF

[A. Makihara, TNS 2004]

#### Stacked Latch on HPC Processor

• 22nm IBM System z Microprocessor



- Additional transistors on latch
  - This figure was not included in the paper, but in slides

# Guard-Gate Flip Flop (GGFF)

- 100x higher soft-error tolerance in 16 nm FinFET
  - Longer delay and 12 additional trs.



[A. Balasubramanian, IEEE TNS, vol. 52, no. 6, pp. 2531–2535, 2005.]
[H. Zhang et al., IRPS, pp. 5C-3–1–5C–3–5, 2016]

#### Filtering Out SET Pulse by Guard Gate



- Two inverters delay SET pulse
  - Output of C-element is stable if  $\tau$  >SET pulse width
  - Delay time to flip latch becomes long (+  $\tau$  )



#### Feedback Recovery FF



[K. Yamada et al, IEEE S3S, 2018]

Duplicated FRFF (DFRFF)

- Construct guard gate by master and slave latches
   FRFF
  - Only 2 additional transistors
  - Only master latch is strong
  - DFRFF
    - 6 additional transistors
    - Both of master/slave latches are strong

## Circuit Performance in 65 nm FDSOI

| FF            | Area                       | Delay                      | Power       | ADP  | # Tr. |
|---------------|----------------------------|----------------------------|-------------|------|-------|
| Standard FF   | 1.00                       | 1.00                       | 1.00        | 1.00 | 24    |
| Guard-Gate FF | 1.47 (1)                   | <mark>2.20</mark> (1)      | 1.06 (1)    | 3.42 | 36    |
| FRFF          | 1.06 ( <mark>0.72</mark> ) | 1.06 ( <mark>0.48</mark> ) | 1.03 (0.97) | 1.16 | 26    |
| DFRFF         | 1.18 ( <mark>0.80</mark> ) | 1.08 ( <mark>0.49</mark> ) | 1.02 (0.96) | 1.29 | 30    |





FRFF is faster because of the number of inverters from input to output

#### **Neutron Irradiation Results**



 Guard gate FF w/ 240% ADP o.v. is strongest, but FRFF w/ 16 % o.v. and DFRFF w/ 30% o.v. have 3-4x radiation hardness than Standard FF

#### Heavy-ion Results



- ML on FRFF is stronger against soft errors than SL because of delay time
  - More delay is required on SL
- Average CSs of DFRFF 1/20 and 1/6 smaller than those of TGFF by Ar and Kr
  - Kr produces longer error pulse than Ar

# DFRFF in 22 nm FDSOI

- DFRFF and DFFRFFLD (Long-Delay version) were designed and fabricated in 22 nm FDSOI with the collaboration of Dolphin Design since 2019 (Vincent Huard)
- Just presented in RADECS 2022 last week in Venice
- DFRFFLD 100x more radiation-harder than standard FF
  - Ready for automotive and aerospace applications.
- Another paper also designed and measured our DFRFF in the same 22 nm FDSOI
  - Authors stated "We designed 14 FFs. But DFRFF is strongest of all!"









IPHIN

# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

# Bias Temperature Instability (BTI)

- Aging degradation
  - NBTI (Negative BTI)
    - $V_{\rm gs}$  of PMOS < 0 V
  - PBTI (Positive BTI)
    - $V_{\rm gs}$  of NMOS > 0 V





- Dangling bonds in gate oxide or defects in gate oxide
- $V_{\rm th}$   $\clubsuit$  by trapping carriers
- Time constant to trap carrier distributed from  $10^{-9}$  to  $10^{9}$  s



## ROs to Measure Aging Degradation

Only NAND-gates Ring Oscillator
 - EN = 0 : RO stops and PBTI occurs.



Only NOR-gates Ring Oscillator
 ENB = 1 : RO stops and NBTI occurs.



#### NBTI-sensitive and -insensitive RO



NBTI-sensitive RO

- NBTI is accelerated
- |*Vgs*| = VDD >> *Vth*



NBTI-insensitive RO

- NBTI is suppressed
- |Vgs| = Vth

#### PBTI-sensitive and -insensitive RO





PBTI-sensitive RO

- PBTI is accelerated
- |*Vgs*| = VDD >> *Vth*

PBTI-insensitive RO

- PBTI is suppressed
- |Vgs| = Vth

#### Extract BTI w/o Fluctuations



- Temporal bias/temperature fluctuations for longterm measurement affect measurement results
- Goal: Extract BTI w/o fluctuations by subtraction b/w BTI-sensitive and -insensitive ROs

# Test Chip & Measurement Setup





| Structure        | Number of ROS |  |
|------------------|---------------|--|
| PBTI-sensitive   |               |  |
| PBTI-insensitive | 840           |  |
| NBTI-sensitive   | in all ROs    |  |
| NBTI-insensitive |               |  |

Measurement system Engineering Tester + Peltier Heater

# Measurement Results (PBTI & NBTI)



- Osc. Freqs. of NBTI and PBTI ROs fluctuate around 5x10<sup>3</sup> second
  - May be due to temperature or voltage fluctuations
  - Can be removed by subtraction

#### Subtraction Results



• Fluctuations are removed by subtraction!!

- Smoothly increase with time
- PBTI: logarithmic (slightly-power-law)
   NBTI: power-law

# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

## BTI Suppression to control VDD & VBB





- BTI-induced Degradation  $\propto t^n$
- NBTI can be suppressed to reduce V<sub>DD</sub>
   NBTI-induced degradation becomes < 10% at V<sub>BB</sub> = 0.20 V
- VBB (body bias) is controlled to compensate performance degradation
  - -Pros: suppress BTI and dynamic power
  - -Cons: increase static power
    - BTI Time exponent x P<sub>dynamic</sub> x P<sub>static</sub> is almost constant

# Outline

- Reliability Issues in VLSIs
- My Experience of Reliability
- Soft Errors
  - What is soft error?
  - How to measure soft errors
  - Mitigation technique and our proposed radiation-hard flops
- Circuit Reliability
  - RO-based BTI-induced degradation measurement
  - VDD and VBB control to suppress BTI-induced degradation while keeping circuit performance.
  - How PID influences initial and aging degradation

# Plasma Induced Damage (PID)

Charging damage from antenna during back-end-of-line (BEOL) metallization process

Plasma

- Generate defects
  - gate oxide breakdown
  - threshold voltage  $(V_{\rm th})$
  - oscillation frequency
- Antenna Defects BOX≈10 nm

- Multilayer wiring
- Thin gate oxide

• PID has become serious reliability issue

#### Antenna Ratio (AR)

- Antenna Ratio (AR)

   Antenna area
   Gate area
- Strength of PID



- Upper limit: 500
- Difficult to stay below AR 500 in large scale circuits

How PID affects initial and aging degradation?

# Ring Oscillators to measure PID

1. Current starved RO to measure initial degradation by PID



2. Measure correlation b/w initial and aging degradation

- Antennas on wires inside Ring Oscillators (ROs)

[R. Kishida et al, JJAP, 2015]

#### **PMOS Type** Current Starved RO



- PMOS w/o antenna (Ref.Tr.) as reference
- PMOS w/ antenna (PID Tr.)
- $|V_{th}|$  of PID Tr. increases  $\Rightarrow$ Virtual VDD voltage and frequency  $\checkmark$

### NMOS Type Current Starved RO



- NMOS Trs. b/w GND and RO GND
- $|V_{th}|$  of PID Tr. increases Virtual GND voltage and frequency  $\checkmark$
- Compare frequencies w/ Ref. and PID Tr. to evaluate PID depending on antenna layers

# Test Chip



- 65 nm FDSOI process
- 2 mm x 1.5 mm
- 70 ROs
- 2k <u>antenna ratio</u>
  - Metal area / Gate area
  - 4x bigger than the upper limit

Measurement conditions

- 1.0 V (nominal)
- Room Temp.

#### Measurement Results of Initial Frequencies in PMOS and NMOS



- Freq. is decreased by increasing metal layers in PMOS
- <u>Higher</u> Freq. in NMOS Trs. than Ref. Tr.  $-|V_{th}|$  decreases by PID
- Freq. become <u>lower</u> from M2 to M5.
  - Why? Because of positive charging damage in high-k (HK)

# Difference of PID b/w Dielectrics



- Positive charging damage in HK dielectrics [6]
- SiON and HK in fabricated process
- $|V_{\rm th}|$  to by PID in SiON
- $|V_{\rm th}|$   $\checkmark$  by positive charge in HK of NMOS

[K. Eriguchi et al., ICICDT, 2008]

#### RO to measure initial and aging degradation

[R. Kishida et al, S3S, 2016]

• RO composed of NORs



# Measurement Flow

- Initial frequency to evaluate PID
- Oscillation stop to induce NBTI
- Measure frequency after NBTI stress
  - Frequency decrease by NBTI.



# Test Chip

- 65 nm process
- 1.8 V
- 80 °C
- AR: 100-1k every 100
- 576 ROs of each AR
- Bulk and thin–BOX FDSOI



#### **NBTI Measurement**

- Dots: average of measurements
- Fitting:  $f(t) = S_{NBTI} \log(t+1) + f_0$ 
  - $-S_{NBTI}$ : degradation factor
  - $-f_0$ : initial frequency



# Degradation Factor S<sub>NBTI</sub>



- Similar tendencies in bulk and FDSOI
  - NBTI caused by PID in FDSOI can be estimated to be the same as in bulk
- NBTI is accelerated by PID when  $\leq$  AR=600
  - Should consider NBTI caused by PID even within the AR limit

• S<sub>NBTI</sub> (> AR600) / NBTI correlates initial frequency?

#### Correlation b/w NBTI and Initial Frequency



- correlation coefficient ( $\rho$ ) = 0.24 (weak) in bulk
  - RDF (random dopant fluctuation is dominant than gate oxide variation
- $\rho$  = 0.68 (strong) in FDSOI
  - Gate oxide variation is dominant than RDF
    - Slower ROs w/ higher  $V_{\rm th}$  have smaller electric fields

# Conclusion

- We have been researching reliability issues in circuit level for 15 years. Over 100 papers were published.
- Our activities are mainly focused to soft errors and aging degradation
- Many radiation-hard flip-flops were proposed, fabricated and measured.
  - Recent work was done in 22-nm FDSOI with the collaboration of Dolphin Design
- BTI-induced aging degradation can be measured by BTIsensitive and -insensitive ROs.
- BTI-suppression method to control VDD and VBB while keeping circuit performance
- PID-induced damage was measured by ROs.
  - Upper layer antenna damages gate dielectrics to decrease PMOS Vth and increase NMOS Vth.
  - Correlation b/w initial and aging degradation is strong in FDSOI but weak in bulk

# Acknowledgement

• All members of our VLSI-system Lab. in KIT.



2009





• d.lab VDEC in Univ. of Tokyo for chip fabrication and EDA support

RENESAS socionext



• All corporate collaborators

DESIGN

ROHM



#### Low power FF

- Low power w/o clock buffer

• Adaptive Coupling FF (ACFF)

[K. T. Chen, ISSCC, pp. 338-340, 2011]



|             | Area | Delay | Power | # of Tr. |
|-------------|------|-------|-------|----------|
| Standard FF | 1.00 | 1.00  | 1.00  | 24       |
| ACFF        | 1.00 | 1.46  | 0.55  | 22       |



• AC element attenuates SET pulse to decrease critical charge (Qcrit)

#### Low Power Radhard FFs





| # of tr. | Area                                                |
|----------|-----------------------------------------------------|
| 28       | 1.00                                                |
| 24       | 0.85                                                |
| 126      | 5.20                                                |
| 72       | 2.50                                                |
| 72       | 2.40                                                |
| 56       | 2.00                                                |
| 48       | 2.10                                                |
|          | # of tr.<br>28<br>24<br>126<br>72<br>72<br>56<br>48 |

BCDMR ACFF



#### DICE ACFF

|       | Nonredundant |       | Redundant |       |       |      |      |
|-------|--------------|-------|-----------|-------|-------|------|------|
|       | Standard     | ACFF  | TMR       | BCDMR | BCDMR | DICE | DICE |
|       | FF           |       | FF        | FF    | ACFF  | FF   | ACFF |
| Bulk  | 554.3        | 265.7 | 0         | 7.3   | 0     | 8.5  | 16.4 |
| FDSOI | 34.7         | 0     | 0         | 0     | 0     | 0    | 0    |

#### Neutron SER [FIT/Mbit]

 Both FFs achieves low power at low data activity and low SER

[K. Kobayashi et al, IEEE TNS, vol.61, no. 4, pp. 1881-1888, 2014] [M. Masuda et al, IEEE TNS, vol.60, no. 4, pp. 2750-2755, 2013]

# Measurement Results of Initial Frequencies in PMOS



- Normalized Frequency (Freq.) =  $f_{\rm PID}/f_{\rm Ref}$ 
  - How the initial frequency differs from that of Ref. Tr.
- Lower Freq. in PID Trs. than Ref. Tr.
- Degraded by PID as upper metal layers
  - -3.1% decreases from Ref. to M5