

# Radiation Hardening by Design of Digital Circuits

ASICON2019 Tutorial Kazutoshi Kobayashi Kyoto Institute of Technology Japan

#### Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI
- Summary

#### Reliability Issues in VLSIs



#### What is Soft Error?



- Caused when a radiation particle penetrates in Si and generates e-h pairs
  - When neutron hits a Si atom (not always). Whenever lpha particle go through chip
- Upset storage cells such as SRAMs/FFs
- A pressing issue of semiconductor chips for automotive, aerospace and HPC
- Not so many companies / researchers knows well about soft errors. Unknown errors → Soft error?

#### **Reliability Metrics: FIT**

#### • FIT

- Failure in Time
- # of errors /  $10^9$  hours (114k years)
- # of errors / 1 M (10<sup>6</sup>) products /1000 (10<sup>3</sup>) hours (40 days)
- Example
  - FIT rate of 1  $\mu$  m 1Mbit SRAM: 200 FIT/Mbit at 3 V

- 1 error / 570 years / Mbit



Measurement Data of 1 Mbit SRAM at RCNP

#### Soft Error Threaten Safety

- Error rates
  - Standard SRAM/FF: ~1000 FIT/Mbit
  - Standard ASIC : 100,000 FIT/chip ≒ 1 error/year
- Automotive and Aviation
  - An error leads to an accident





ISO26262 definition of automotive safety

| Level     | FIT rate | Objective                                    |
|-----------|----------|----------------------------------------------|
| ASIL-A    | < 1000   | Convenience (Rear-camera)                    |
| ASIL-B, C | <100     | Safety (Break assistance, Dashboard display) |
| ASIL-D    | <10      | Full automatic driving (Waymo, Tesla,)       |

#### Scaling Trend of Soft Error Rate (SER)



[UemuraPhD] T. Uemura, "A Study on Soft Error Mitigation for Microprocessor in Bulk CMOS Technology", PhD thesis (2011) [Intel 22nm] N. Seifert et. al, TNS, pp. 2666-2673 (2012) [Intel 14nm] N. Seifert et. al, TNS, pp. 2570-2577 (2015) [Samsung 10nm] M. Jin et. al., IEDM 15.1.1-15.1.4 (2016)

## Scaling Trend / Failures in HPC



|           | Failures    | Failures  | MTBF     |
|-----------|-------------|-----------|----------|
|           | /day/TF     | /day/10PF | in 10PF  |
| Cray      | 0.1~1       | 1000      | 9sec     |
| XT3/XT4   |             | ~ 10000   | ~ 1.5min |
| Clusters  | 2.6 ~ 8.0   | 26000     | 1sec     |
| x86-64    |             | ~ 80000   | ~ 3.3sec |
| Blue Gene | 0.01 ~ 0.03 | 100       | 5min     |
| L/P       |             | ~ 300     | ~15min   |

MTBF: Mean Time b/w Failure

[H. D. Simon, ACTS Workshop, 2006]

- SER/Data Center is exponentially increased by technology scaling
  - Other systems (autonomous cars etc.) have the similar tendency
- 10PF HPC runs only for a few min w/ same MTBF of TF HPC
  - Must be 10,000x stronger against soft errors.
- Must take care of soft errors on HPC

#### Soft Errors on HPC

• 88 k processor cores on K computer

MTTF (Mean Time to Failure) if a single core keeps on running w/o error for 10 year (11000 FIT)



K computer at Riken, Japan

8 CPU Cores  $\rightarrow$  10 year/8 = Over 1 year

88 k CPU cores in 10 year/ 88 k = 60 min.

No error for 240 years (<500 FIT) is mandatory for 24 hours operation.



#### Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI
- Summary

## Single Event Effects

- Single Event Effects (SEE)
  - SEU (single event upset) == Soft Error
    - Flip a storage node in flip flop (FF) or memory cell.
  - SEL (single event latch-up)
    - Turn on a stray thyristor, then large current flows from VDD to ground.
  - SEB (single event burnout)
    - Turn on a power transistor, then burn it out



Diagram from Gianluca Boselli, TI



#### Single Event Transient / Upset (SET / SEU)

- Single Event Transient (SET) pulse
  - Current (Voltage) pulse induced by charged particle
  - If captured by a storage element, it flip (SEU)
- Single Event Upset (SEU)
  - SET inside a storage element may directly flip a stored value



## Charge Generation by a Particle Hit





- If neutron hits a Si atom, nuclear reaction generates charged particles (α, proton)→e-h pairs→current pulse
- Electrons drift to drain region by reverse-biased junction. Also electrons diffuse to drain
- Funneling: Enlarge depletion region by generated charge
- α from radio isotope directly generates e-h pairs

14

## **SEU Mitigation Techniques**

- Dual lock-step on architecture level
  - -For automotive and aerospace
- Parity and ECC on circuit/algorithm level
   For SRAM and DRAM
- Majority voting on circuit level
  - -For latch and flip flops
- SOI/FinFET on process/device level
  - -For automotive and HPC

Dual lock-step

[M. Baleani, et al., CASES, 2013]



#### MCU and MBU

#### [JEDEC Standard: JESD89A]

- MCU (Multiple Cell Upset) : A single event that induces several cells (e.g. memory cells or flip-flops) in an IC to flip their state at one time.
- MBU (Multiple Bit Upset) : A single event that induces upset of multiple cells where two or more of the error bits occur in the same logical word



## SBU/MBU Mitigation on SRAM

- Parity
  - Single Error Detect (SED) only for SBU Parity word Parity =  $^{\text{vord}}$ ; 1=  $^{01011101}$   $\rightarrow$  1 $\neq$   $^{01001101}$
- ECC (Error Correction Code) for SBU and MBU
  - Single Error Correct and Double Error Detect (SEC-DED) for MBU



MBU can be detected but not corrected by SEC-DED

SEU on FFs cannot be protected by Parity or ECC because of random placement and area/delay overhead by ECC

## MCU Rate Elevation by Scaling







- Sensitive area does not scale
  Possibility to cause MCU is increased by scaling
- Redundancy is not effective on scaled process nodes
- Interleaving must be adopted to eliminate MBU on SRAM

[J. Furuta et al., IRPS, 6C.3.1-6C.3.4, 2013]

#### Bit Interleaving on SRAM





[N. Seifert et al., IRPS, 2006, pp. 217-225]

- Adjacent bits may be flipped at the same time by a particle strike.
- Interleaved : Bit cells on the same word are placed not next to each other [Zhao2014]
- MBU Prob. <10% at > 0.5  $\mu$  m cell distance

#### Three Types of MCU Mechanism



## Parity and ECC on Commercial Proc.

• Intel Xeon E5-2600 v3 ( 22nm 18 Core)





- Parity or ECC on Hazardous registers and SRAM
- SER becomes 1/4

[B. Bowhill et al., ISSCC, 4.5, 2015]

## SEU Mitigation on FF / Latch

• Majority Voting by multiple storage nodes





- If one of three latches is flipped, voter resolves contradiction
- Delay element (  $\tau$  ) prevents SET pulse to be captured by multiple latches

## Dual Interlocked Storage Cell (DICE)

- Most frequently-used redundant latch
  - Simple but effective and patent-free
  - Over 600 citations [T. Calin et al., IEEE TNS, 43(6):2874-2878, 1996]



- Two latches are mutually connected
- If one node is flipped, the other nodes restore it
- Lower power, area and delay overhead than TMR

#### **DICE on Commercial Processor**



[D. Krueger et al, ISSCC, pp. 94-95, 2008]



#### Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI
- Summary

#### My Experiences

- Flight to Hawaii
  - Could control volume of iPod touch 1<sup>st</sup> gen
  - Recovery after reboot
- Tour of cyclotron facility
  - Cyclotron suspended
  - Digital still camera (DSC) malfunction. Uncontrollable
  - Recovered after removing battery (No mechanical switch)





#### **Example on Commercial Products**



- After SRAMs are replaced to DRAMs w/ ECC, Number of errors is about 1/10
  - SRAM is weak against soft errors
  - DRAM w/ ECC is very strong

#### 90% of temporal failures are from soft errors

#### Another Example on Work Station



#### EE Times:

#### SRAM soft errors cause hard network problems

| Anthony Cataldo Anthony Cataldo<br>EE Times                                                            | S Print          |
|--------------------------------------------------------------------------------------------------------|------------------|
| (08/17/2001 7:22 HOE EDT)                                                                              | Email<br>Reprint |
| SAN MATEO, Calif. — Networking equipment is<br>growing increasingly susceptible to soft errors —       | RSS  Digital     |
| nonrecoverable, temporary misfires that can play havoc with things like traffic destinations — as chip | D SHARE          |
|                                                                                                        |                  |

[EETimes 2001]



Sun CEO Scott McNealy [Forbes 2000] We never buy IBM's SRAMs

- SRAM from IBM happened to be weak against soft errors.
- Some of Sun's mission critical servers faltered because of soft errors in cache memory

#### Accident of Avionics by Soft Errors





- Fly-by-wire control system failure leading to a dangerous pitch-down event on autopilot (Oct. 2008)
  - 1/3 passengers were injured
- Soft error rate at 10 km (35 kft.) altitude is 100x of sea level
  - Terrestrial magnetism and atmosphere protect system

[R. Bowmann. Part B — Landmarks in terrestrial single event effects (SEE). NSRE2013 Short Course Notes]

#### Soft Errors on Smartphones



- Expose neutron to iPhone 3s
- MTTF (Mean time to failure):
  - Once in 2000 years at sea level
  - Once in 4 years at 10 km (35 kft)
- Once in 6 flight when 500 passengers uses smartphones for 12 hours

| DUT        | # of events | MTTF (y) at sea level | MTTF (y) at 35 kft |
|------------|-------------|-----------------------|--------------------|
| iPhone3    | 5           | 6000                  | 20                 |
| iPhone3s   | 8           | 2000                  | 4                  |
| Blackberry | 11          | 2000                  | 6                  |

[Y. Chen, "Cosmic Ray Effects on Personal Entertainment Applications for Smartphones", REDW (2013)]

#### Soft Errors on Embeded PC/ FPGAs

- -Expose white spallation neutron bem on Raspberry Pi 3 (RasPi) and two FPGAs (later on)
- -Run programs on RasPi automatically after reboot
  - Decode mpeg4 video in an infinite loop
  - Compute multiplication of 32bit x 32bit random integers
  - Browsing

Spallation Neutron Beam ~10<sup>8</sup>x Acc.



Raspberry Pi 3 (From raspberrpi.org)

## Raspberry Pi 3 (www.raspberrypi.org)

#### An embedded computer running full-spec Linux

- Quad Core 1.2 GHz Broadcom BCM2837 64-bit CPU (40 nm) including ARM Cortex-A53
- 1 GB DDR-SDRAM
- Up to 64 GB micro SD
- WLAN and (BLE) on board
- 100 Base Ethernet
- 40-pin extended GPIO
- 4 USB2.0 ports
- 4 Pole stereo output and composite video port
- Full size (1920x1080) HDMI



#### **Test Setups on Neutron Experiments**



• White Spallation Neutron Beam at RCNP

#### Example of Shutdown



MTTF (Mean Time to Failure): 227 sec. → 2950 (3k) years at sea level One failure / day / 1M Pis 338 kFIT / Pi

#### Soft Errors on FPGA

- SRAM-based FPGA is very weak against soft errors
  - Huge amount of SRAMs to store configuration
- Flash-based FPGA is very strong
  - Flash memory do not flip easily by a radiation particle hit



SRAM-based island-style FPGA

SB (Switch Block)

#### Neutron Acceleration Test Results

- Configured as a 50k-bit shift register on FPGAs
- Initialized by checker board pattern (0 or 1 / 500 FFs)
  - Leave FPGAs for 30 s.-10 min., then read out stored data



#### Error Rates and MTTF of FPGAs

|                |                                    |                                                  |     |         |                       |            |          | SRAM-based                              | d Flash-based                                                     |      |       |        |
|----------------|------------------------------------|--------------------------------------------------|-----|---------|-----------------------|------------|----------|-----------------------------------------|-------------------------------------------------------------------|------|-------|--------|
|                | SEU                                | SEU on a flip flop                               |     |         |                       |            |          | Observed                                | Observed                                                          |      | ed    |        |
|                | Firm Error on configuration memory |                                                  |     | St      | Stuck-at Fault        |            | Observed | Not Observed                            |                                                                   | rved |       |        |
|                |                                    |                                                  |     | y Re    | Repeating Burst Error |            | Observed | Not Observed                            |                                                                   |      |       |        |
|                | 180                                | •                                                | •   | · · · · |                       |            |          |                                         |                                                                   |      | SRAM  | Flash  |
| SER [FIT/Mbit] | 160<br>140                         | SER on Fi                                        |     |         |                       | Firm error |          |                                         | of Firm Errors in 16h                                             |      | 149   | C      |
|                | 120                                |                                                  |     |         |                       |            |          |                                         | /ITTF/h (Ground level)                                            |      | 4.1e7 | >6.1e9 |
|                | 100 -<br>80 -<br>60 -<br>40 -      |                                                  |     |         |                       |            |          |                                         | Frequent Firm Error on SRAM-based<br>No Firm Error on Flash-based |      |       |        |
|                | 20<br>0                            |                                                  |     |         |                       |            |          |                                         | Flash-based FPGA meets the requirement of ASIL-B/C (<100FIT)      |      |       |        |
|                |                                    | 0.5 1 2 3 5 10 Average<br>Measurement time [min] |     |         |                       |            | e        | Periodic refresh or reboot is mandatory |                                                                   |      |       |        |
|                |                                    | 0.5                                              | 1   | 2       | 3                     | 5          | 10       |                                         | on SRAM-based FF                                                  | PGA  |       |        |
|                |                                    | 16%                                              | 12% | 31%     | 51%                   | 57%        | 95%      |                                         |                                                                   |      |       |        |
|                | Firm Error on SRAM                 |                                                  |     |         |                       |            |          |                                         |                                                                   |      |       |        |


# Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI
- Summary

# How to Evaluate Soft Errors

- Simulation
  - Circuit Simulation
    - Logic gate, SRAM, latch, flip-flops
  - Device Simulation
    - Discrete MOSFET, logic gate, SRAM or latch
  - Logic Simulation
    - Transient or static simulation by fault injection
- Measurement
  - Accelerated test
    - $\alpha$  Particle, Neutron, Heavy ion and Muon
  - Field test
    - High altitude for higher neutron flux
    - Underground for lower neutron flux

### **Circuit Simulation**



Double Exp. Model

$$I(t) = I_0(\exp(-\alpha t) - \exp(-\beta t))$$

Single Exp. Model

I(t) dt

$$I(t) = \frac{2Q}{T\sqrt{\pi}} \sqrt{\frac{t}{T}} \exp\left(-\frac{t}{T}\right)$$
[Mes1982]



- Attach a current source to replicate a current pulse induced by a particle hit
- Obtain critical charge Qcrit
- SER is computed by

$$N_{\rm SER} \propto F \cdot A \cdot \exp\left(-\frac{Q_{\rm crit}}{Q_{\rm s}}\right)$$

F: Neutron flux A: Sensitive area (e.g. drain area) Qs: Charge collection efficiency

[Mes1982] G.C. Messenger, IEEE TNS, vol. 29, no. 6, pp. 2024–2031, 1982 [Shiv2002] P. Shivakumar, et.al, ICDFN pp. 389–398, 2002. [Shiv2002]

#### **Circuit Simulation Results**



# **Device Simulation**

- Limitation of circuit simulation
  - Consider only charge collected to drain
  - Hard to replicate parasitic bipolar effect
- Constructing 2D or 3D structure on TCAD
  - Synopsys Sentaurus is used



Device models on NMOS Circuit models on PMOS

#### Device Simulation (Sentaurus by Synopsys)

• Possible to replicate C-V and I-V characteristics to optimize device parameters



0

V<sub>as</sub> [a.u.]

 $V_{as}$  [a.u.]

# **Device Simulation Results**



 Expose a heavy ion with some amount of LET (Linear energy transfer) [MeV-cm<sup>2</sup>/mg]

LET: energy deposited per unit length

- Possible to obtain a critical LET to cause upset (Not SER)
- SER can be computed by using PHITS (Particle and Heavy Ion Transport code System) https://phits.jaea.go.jp/

[T. Sato, et al, Journal of Nuclear Sci. & Tech., 2013]

[J. Furuta, et al., SISPAD, 2017]

# Alpha Irradiation Test



- <sup>241</sup>Am or <sup>232</sup>Th source on a chip
- Alpha particles are shielded by a sheet of paper.
  - Ceramic package with removable lid
  - Decap mold package
  - Better to remove polyimide to increase SER
  - DUT is placed to alpha source as close as possible (< 1 mm is recommended)



[JEDEC Standard: JESD89A]

https://www.jedec.org/sites/default/files/docs/jesd89a.pdf (not free of charge)

# Alpha Irradiation in Vacuum Chamber

Vacuum chamber





• To reduce air shielding effect

#### SER computation method



- $\alpha$  emission rate: 0.0005~0.024 count/cm<sup>2</sup> h from mold package
  - 0.001 count/cm<sup>2</sup> h is generally used

$$F_{\text{source}} = F_{\alpha}/2 \cdot GF$$

$$F_{\text{acc}} = \frac{F_{\text{source}} \left[ \text{count/sec} \right] \cdot 3600 \left[ \text{sec} \right] \cdot}{0.001 \left[ \text{count/cm}^2 \cdot \text{hour} \right] \cdot a \left[ \text{cm}^2 \right]}$$

$$SER_{\text{alpha}} \left[ \text{FIT/Mbit} \right] = \frac{N_{\text{error}}}{N_{\text{SE}} \left[ \text{bit} \right]} \cdot \frac{1}{F_{\text{acc}}} \cdot \frac{3600}{T_{\text{irr}} \left[ \text{sec} \right]} \cdot 10^9 \left[ \text{hour} \right] \cdot 10^6 \left[ \text{bit} \right]$$

 $F_{\alpha}$ α source fluxGFGeometry Factoraα source area $N_{SE}$ Size of storage element $T_{irr}$ Irradiation time

[T. Uemura, PhD Thesis, Osaka University, 2015] [JEDEC Standard: JESD89A] 45

# White Neutron Irradiation Test

- Accelerator must be used
  - Only a few facilities are available in the world
    - RCNP in Japan (Cyclotron), LANCSE in USA (LINAC), TRIUMF in Canada, TSL in Sweden
  - White (spallation) neutron: similar energy spectrum at sea level
    - Acceleration factor is  $\sim 4x10^8$  RCNP (1s. irrad. 10y. at sea level)
      - Lots of DUTs must be prepared
        - 1000 FIT/Mbit == 2500 errors/Mbit in 1 day irradiation
        - A few errors on radiationhard (rad-hard) storage cells



# Neutron Test Setups



- Accelerated neutron is harmful to human body and test instruments
  - Humans and PCs must be outside beam room
  - Test instruments must be aside beam opening

# Heavy-Ion Irradiation

- Accelerator must be used
  - We use TIARA and CYRIC in Japan. Berkeley lab. also has an accelerator for heavy ion
  - Better to put DUT in vacuum chamber to keep heavy ion energy

TIARA @ QST: https://www.taka.qst.go.jp/index\_e.php CYRIC @ Tohoku U. : http://www.cyric.tohoku.ac.jp/english/index.html

| Heavy | LET                          | Energy |
|-------|------------------------------|--------|
| -lons | [MeV/ (mg/cm <sup>2</sup> )] | [MeV]  |
| N     | 3.4                          | 56     |
| Ne    | 6.6                          | 75     |
| Ar    | 16                           | 150    |
| Kr    | 40                           | 322    |
| Xe    | 64                           | 454    |

Heavy ions at TIARA





# Field Test

- Must prepare huge amount of DUTs
  - 1 second in RCNP =10 years at sea level
  - 100 errors/year at 1000 FIT/Mbit
- Much more neutrons at higher altitude
  - 20x on 4000 m (13,000 feet)





Field test on the summit of Mauna Kea in Hawaii main island [Tosaka et al., IRPS 2008]



### Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI
- Summary

# Contribution of NMOS and PMOS to Soft Errors

- NMOS is weak against soft errors than PMOS
  - Mainly due to carrier mobility

[P. Hazucha, IEDM2003]



Non stacked

NMOS stacked

Full stacked



|    | PMOS   | NMOS   |                   |
|----|--------|--------|-------------------|
| LO | Weak   | Weak   | P/NMOS sensitive  |
| L1 | Weak   | Strong | PMOS sensitive    |
| L2 | Strong | Strong | P/NMOS insenstive |

Stacked structure is strong in SOI (Explained later)

#### **Measurement Results**



[K. Yamada et al, IRPS, pp. P-SE.3-1-5, 2018] 52

# LET Dependence by Heavy Ion Beam



# Soft Error Mitigation Techniques

- Circuit-level
  - Majority Voting such as TMR, DICE, BCDMR FF and etc.
  - Large area, delay and power (ADP) overheads
- Process-level
  - SOI (Silicon on Insulator)
    - 10-100x stronger than bulk
    - No ADP overhead, but more expensive to fabricate
  - FinFET
    - Strong but huge cost (Only for iPhone, FPGA ···)
  - Circuit-level technique for SOI
    - Stacked Strucutre



Stacked FF

[A. Makihara, TNS 2004]

# Triple Modular Redundancy



- If one of three FFs is flipped, voter removes an error
- If two FFs are flipped, voter cannot remove errors
- Redundant FFs are weak against MCU (Multiple Cell Upset)
- MCU rate becomes higher as process scaling



[N. Gaspard et al., IRPS, pp. SE.6.1-SE.6.5., 2013]

# Placement of TMR FF to prevent MCU



pp. 1745-1353, 2015]

# BISER FF



- Built-in Soft-Error Resilience FF
  - Developed by Intel and Stanford
  - Two latches and a weak keeper hold data
  - C-element resolves SBU on latches
  - Area efficient but weak to an SET (Single event transient) pulse from the C-element

[M. Zhang, S. Mitra, et al., Trans. VLSI Sys., 14(12):1368-1378, 2006]



#### BCDMR FF [Furuta et.al, VLSI Cir. 2010]



- Bistable Cross-coupled Dual Modular Redundancy FF
  - Strong against an SET pulse from C-element
  - Duplicated C-elements strongly assists to keep correct data. No areaoverhead because of smaller transistors on C-elements



# Alpha and Neutron Results



Fabricated in a 65 nm bulk

- BCDMR is strong against soft errors at higher clock frequency
- Below 10 FIT at 100MHz.
   BISER in twin well is 50 FIT.
   BCDMR FF in twin well has no error

59

# BCDMR FF in Scaled Technology



- Similar SERs b/w 65nm interleaved and 16nm not-interleaved BCDMR
  - Interleaved layout decreases SER

[K. Kobayashi, et. al. IRPS 2017]



Interleaved Place redundant storage cells as far apart as possible 60

### Low power FF

• Adaptive Coupling FF (ACFF) - Low power w/o clock buffer

[K. T. Chen, ISSCC, pp. 338-340, 2011]



|             | Area | Delay | Power | # of Tr. |
|-------------|------|-------|-------|----------|
| Standard FF | 1.00 | 1.00  | 1.00  | 24       |
| ACFF        | 1.00 | 1.46  | 0.55  | 22       |



• AC element attenuates SET pulse to decrease critical charge (Qcrit)

#### Low Power Radhard FFs





| FF          | # of tr. | Area |
|-------------|----------|------|
| Standard FF | 28       | 1.00 |
| ACFF        | 24       | 0.85 |
| TMR FF      | 126      | 5.20 |
| BCDMR FF    | 72       | 2.50 |
| BCDMR ACFF  | 72       | 2.40 |
| DICE FF     | 56       | 2.00 |
| DICE ACFF   | 48       | 2.10 |

BCDMR ACFF



#### DICE ACFF

|       | Nonredundant |       | Redundant |       |       |      |      |
|-------|--------------|-------|-----------|-------|-------|------|------|
|       | Standard     | ACFF  | TMR       | BCDMR | BCDMR | DICE | DICE |
|       | FF           |       | FF        | FF    | ACFF  | FF   | ACFF |
| Bulk  | 554.3        | 265.7 | 0         | 7.3   | 0     | 8.5  | 16.4 |
| FDSOI | 34.7         | 0     | 0         | 0     | 0     | 0    | 0    |

#### Neutron SER [FIT/Mbit]

 Both FFs achieves low power at low data activity and low SER

[K. Kobayashi et al, IEEE TNS, vol.61, no. 4, pp. 1881-1888, 2014] [M. Masuda et al, IEEE TNS, vol.60, no. 4, pp. 2750-2755, 2013]

# Soft Errors in Bulk and SOI



- BOX layer prevents carriers from collecting from substrate
  - SOI is resistant to soft errors. SER is 1/10-1/100 of bulk

#### **Experimental Results of Standard FF**



[Y. Morita, VLSI Tech. Symp., pp 166-167, 2008]

Standard FF

# Soft-error Mitigation for SOI

• Stacked Transistor Structure on SOI



- No simultaneous turn-on
  - All transistors are isolated by BOX layer.
  - Not effective on bulk process
- With area and delay overheads
- 1/3 to 1/10 SER reduction on stacked FF

[A. Makihara, TNS 2004]

# Stacked Latch on HPC Processor

• 22nm IBM System z Microprocessor



- Additional transistors on latch
  - This figure was not included in the paper, but in slides

# AC Slave / All-stacked FF

• ACFF on master + Stacked Structure on master / slave

DATA



AC slave-stacked FF (AC\_SS FF)

| FF          | Area | Delay | Power | # of Tr. |
|-------------|------|-------|-------|----------|
| Standard FF | 1.00 | 1.00  | 1.00  | 24       |
| ACFF        | 1.00 | 1.45  | 0.62  | 22       |
| AC_SS FF    | 1.12 | 1.49  | 0.65  | 26       |
| AC_AS FF    | 1.24 | 2.17  | 0.66  | 28       |

Slave stacked is enough to reduce SER

AC all-stacked FF (AC\_AS FF)

CLK

부 CLK

CLK

CLK



[H. Maruoka et al, RADECS, 2016]

# Stacked FF and SLCCFF



| FFs          | Area | Delay | Power |
|--------------|------|-------|-------|
| Standard DFF | 1.00 | 1.00  | 1.00  |
| Stacked FF   | 1.12 | 2.00  | 2.13  |
| SLCCFF       | 1.24 | 1.67  | 1.89  |

• SLCCFF (Stacked Leveling Critical Charge FF) is for low power but faster operation

#### **Experimental Results**



- 1/10 SER w/ Stacked Structures on SOI
  - Not effectctive on bulk
- SLCCFF is faster and lower-power than Stacked FF

#### Issue of Stacked Structure



- High-energy particle turns on both of stacked trs.
   18MeV is the upper limit of secondary ions by a neutron hit
- Node separation is effective but area-consuming

# Reduction Sensitive Range Strucutre



 $\sim$ 500 nm on 150 nm

- Reduction Sensitive Range (RSR)
  - Additional wire promotes recombination of electrons and holes
  - Not effective on 150 nm FDSOI, but effective on 65nm FDSOI
    - Stacked structure is enough on 150 nm because stacked transistors are separated with enough distance

### **Device Simulation Results**



• Hole density goes down to 0 after 100 ps
## Two FFs with Additional Wire



- RSRFF (Reduction Sensitive Range FF)
  - Large delay overhead due to additional wires
- RSRLDFF (RSR with Low Delay FF)
  - RSRFF + SLCCFF to reduce delay overhead

## Heavy Ion Results and Performance



- Expose Xe (67.5 MeV-cm<sup>2</sup>/mg)
  - No error on RSRLDFF
- RSRLDFF: 29% delay reduction with 5% area overhead compared with stacked FF

## Guard-Gate Flip Flop (GGFF)

- 100x higher soft-error tolerance in 16 nm FinFET
  - Longer delay and 12 additional trs.



[A. Balasubramanian, IEEE TNS, vol. 52, no. 6, pp. 2531-2535, 2005.]
[H. Zhang et al., IRPS, pp. 5C-3-1-5C-3-5, 2016]

#### Filtering Out SET Pulse by Guard Gate



- Two inverters delay SET pulse
  - Output of C-element is stable if  $\tau$  >SET pulse width
  - Delay time to flip latch becomes long (+  $\tau$  )



### Feedback Recovery FF



[K. Yamada et al, IEEE S3S, 2018]

Duplicated FRFF (DFRFF)

- Construct guard gate by master and slave latches
   FRFF
  - Only 2 additional transistors
  - Only master latch is strong
  - DFRFF
    - 6 additional transistors
    - Both of master/slave latches are strong

#### **Circuit Performance**

| FF            | Area                       | Delay                      | Power       | ADP  | # Tr. |
|---------------|----------------------------|----------------------------|-------------|------|-------|
| Standard FF   | 1.00                       | 1.00                       | 1.00        | 1.00 | 24    |
| Guard-Gate FF | 1.47 (1)                   | 2.20 (1)                   | 1.06 (1)    | 3.42 | 36    |
| FRFF          | 1.06 ( <mark>0.72</mark> ) | 1.06 ( <mark>0.48</mark> ) | 1.03 (0.97) | 1.16 | 26    |
| DFRFF         | 1.18 ( <mark>0.80</mark> ) | 1.08 ( <mark>0.49</mark> ) | 1.02 (0.96) | 1.29 | 30    |





FRFF is faster because of the number of inverters from input to output

#### Neutron Irradiation Results



 Guard gate FF w/ 240% ADP o.v. is strongest, but FRFF w/ 16 % o.v. and DFRFF w/ 30% o.v. have 3-4x radiation hardness than Standard FF

#### Heavy-ion Results



- ML on FRFF is stronger against soft errors than SL because of delay time
  - More delay is required on SL
- Average CSs of DFRFF 1/20 and 1/6 smaller than those of TGFF by Ar and Kr
  - Kr produces longer error pulse than Ar

## Summary of FFs for FDSOI

|             | Area | Delay | Power | Rad-hard level |       |
|-------------|------|-------|-------|----------------|-------|
|             |      |       |       | Master         | Slave |
| Standard FF | 1.00 | 1.00  | 1.00  | 1              | 1     |
| ACFF        | 1.00 | 1.45  | 0.62  | 3              | 1     |
| Stacked FF  | 1.12 | 2.00  | 2.13  | 2              | 2     |
| AC_SS FF    | 1.12 | 1.49  | 0.65  | 3              | 2     |
| AS_AS FF    | 1.24 | 2.17  | 0.66  | 3              | 2     |
| SLCC FF     | 1.24 | 1.67  | 1.89  | 2              | 2     |
| RSRFF       | 1.24 | 2.16  | 1.07  | 3              | 3     |
| RSRLDFF     | 1.35 | 1.35  | 1.08  | 3              | 3     |
| GGFF        | 1.47 | 2.20  | 1.06  | 2              | 2     |
| FRFF        | 1.06 | 1.06  | 1.03  | 2              | 1     |
| DFRFF       | 1.18 | 1.08  | 1.02  | 2              | 2     |

 $1 \rightarrow 2 \rightarrow 3$ 

Weak Strong

## For Outer Space Missions

- NanoBridge FPGA (NEC)
  - ReRAM (nanoBridge) stores configuration instead of SRAMs
  - Programmed Nanobridge is a resistor
  - No single event on Nanobridge

[S. Kaeriyama et al., JSSC, 2005]

- Radiation-hard NanoBridge FPGA for highly-reliable applications
  - Current FPGA includes standard FF w/o rad-hardness even though configuration data is rad-hard
  - Standard FF is replaced by **BCDMR FF**



If positive voltage is applied, a cross-link made of copper atoms (red circles) is created between the ruthenium and the copper. (The signal is in the ON state.)







Launched into space by Epsilon rocket on Jan 11<sup>th</sup> 2019 (w/o radiation hardness on FF)

https://www.axelspace.com/en/solution /rapis1/



## Outline

- Introduction
  - Reliability issues, soft errors, scaling trend and soft errors on HPC
- Single Event Effect and its Mitigation Techniques
  - SEU, MCU, MBU, parity, ECC, Bit interleaving and Majority voting
- Realistic Issues Caused by Soft Errors
  - My experiences, SRAM/DRAM, Avionics, Smartphone, FPGA and Raspberry Pi
- Evaluation of Radiation Hardness
  - Circuit simulation and Device simulation
  - Alpha, Neutron, Heavy ions and Field test
- Our Attempts and Results on Soft Errors
  - Contribution of NMOS and PMOS to soft errors
  - Mitigation techniques for bulk and FDSOI

#### Summary

## Summary

- Soft errors threaten our safety
  - 90% of temporal failures from soft errors
  - Must take care of soft errors for mission critical applications : automotive, avionics and HPC servers
- Soft error estimation methodologies
  - Circuit simulation: fast but not accurate
  - Device simulation: slow ( $\sim$ 1000x) but more accurate
  - Acceleration tests: Alpha is easy. Accelerators must be used for neutron and heavy ions. Field tests take long time.
- Our attempts and results
  - NMOS is dominant to cause soft errors: 97.7% from NMOS by neutron
  - BCDMR FF for bulk has SER ~10 FIT/MFF in 65 nm bulk and 16 nm FinFET. BCDMR ACFF achieves low power and low SER
  - Stacked structure for SOI
    - AC\_SS FF for low power, SLCCFF for area-efficient but large delay overhead
    - **RSRLDFF** is with low delay for high performance
    - DFRFF is area-delay-power efficient (ADP overhead is only 30%)

#### **Fabricated Chips**





#### 65 nm bulk (Fujitsu)



#### 28 nm FDSOI (ST microelectronics)









65 nm bulk/FDSOI (Renesas)

#### VLSI Design and Test for Systems Dependability

|                                                         | Part II VLSI Issues in Systems Dependability                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |
|---------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Shojiro Asai <i>Editor</i>                              | Radiation-Induced Soft Errors                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |  |  |
|                                                         | Eishi H. Ibe, Shusuke Yoshimoto, Masahiko Yoshimoto, Hiroshi Kawaguchi,                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |  |  |
|                                                         | Kazutoshi Kobayashi, Jun Furuta et al.                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |  |  |
| VLSI Design<br>and Test for<br>Systems<br>Dependability | Electromagnetic Noises<br>Makoto Nagata, Nobuyuki Yamasaki, Yusuke Kumura, Shuma Hagiwara, Masayuki Inaba<br>Variations in Device Characteristics<br>Hidetoshi Onodera, Yukiya Miura, Yasuo Sato, Seiji Kajihara, Toshinori Sato, Ken Yano et al.<br>Time-Dependent Degradation in Device Characteristics and Countermeasures by Design<br>Takashi Sato, Masanori Hashimoto, Shuhei Tanakamaru, Ken Takeuchi, Yasuo Sato, Seiji Kajihara et al.<br>Connectivity in Wireless Telecommunications |  |  |  |  |
|                                                         | Kazuo Tsubouchi, Fumiyuki Adachi, Suguru Kameda, Mizuki Motoyoshi, Akinori Taira, Noriharu Suematsu<br>et al.<br>Connectivity in Electronic Packaging<br>Hiroki Ishikuro, Tadahiro Kuroda, Atsutake Kosuge, Mitsumasa Koyanagi, Kang Wook Lee, Hiroyuki                                                                                                                                                                                                                                        |  |  |  |  |
| USD \$280                                               | Hashimoto et al.<br>Responsiveness and Timing<br>Tomohiro Yoneda, Yoshihiro Nakabo, Nobuyuki Yamasaki, Masayoshi Takasu, Masashi Imai, Suguru<br>Kameda et al.                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |
|                                                         | Malicious Attacks on Electronic Systems and VLSIs for Security<br>Takeshi Fujino, Daisuke Suzuki, Yohei Hori, Mitsuru Shiozaki, Masaya Yoshikawa, Toshiya Asai et al.<br>Test Coverage<br>Masahiro Fujita, Koichiro Takayama, Takeshi Matsumoto, Kosuke Oshima, Satoshi Jo, Michiko Inoue et al.<br>Unknown Threats and Provisions<br>Nobuyasu Kanekawa, Takashi Miyoshi, Masahiro Fujita, Takeshi Matsumoto, Hiroaki Yoshida, Satoshi Jo<br>et al.                                            |  |  |  |  |
|                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |

## Acknowledgement

- This work is supported by LEAP, STARC and JSPS KAKENHI Grant # 15H02677 and JST OPERA.
- The VLSI chips in our papers have been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Fujitsu, Renesas Electronics, and ST Microelectronics designed by the EDA systems supported by VDEC, Synopsys Inc., Cadence Design System and Mentor Graphics.
- Many thanks to Prof. Kumashiro and Prof. Furuta of KIT and my students.

#### International Symposium on Reliability

March 13<sup>th</sup> (Fri.), 2020
 Wyoto Institute of Technology, Kyoto, Japan

**Tentative Invited Speakers** 



Dr. P. Roche STmicroelectronics



Prof. B. Bhuva Vanderbult Univ.



Dr. D. Linten imec



Dr. M. C. Trinczek TRIUMF



Prof. T. Grasser TU Wien



Prof. S. Mitra Stanford Univ.

**Topics:** Soft errors by Roche and Bhuva ESD by Linten **RTN/BTI** by Grasser **Resilient Computing** by Mitra **Beam Facility** by Trinczek Contact: Prof. Kobayashi Kazutoshi.kobayashi@kit.ac.jp

88/21

R. Yamamoto, C. Hamanaka, J. Furuta, K. Kobayashi, and H. Onodera,"An Area-efficient 65 nm Radiation-Hard Dual-Modular Flip-Flop to Avoid Multiple Cell Upsets", IEEE Transaction on Nuclear Science (TNS), vol.58, no.6, pp. 3053 - 3059, Dec. 2011 K. Kobayashi, K. Kubota, M. Masuda, Y. Manzawa, J. Furuta, S. Kanda, and H. Onodera,"A Low-Power and Area-Efficient Radiation-Hard Redundant Flip-Flop, DICE ACFF, in a 65 nm Thin-BOX FD-SOI ", IEEE Transaction on Nuclear Science (TNS), vol.61, no.4, pp. 1881-1888, Aug. 2014

T. Uemura, A Study on Soft Error Mitigation for Microprocessor in Bulk CMOS Technology, PhD Thesis, Osaka University, 2015 N. Seifert, B. Gill, S. Jahinuzzaman, J. Basile, V. Ambrose, Q. Shi, R. Allmon, and A. Bramnik. Soft error susceptibilities of 22 nm trigate devices. IEEE Transaction Nucl. Sci., 59(6):2666-2673, Dec 2012.

N. Seifert, S. Jahinuzzaman, J. Velamala, R. Ascazubi, N. Patel, B. Gill, J. Basile, and J. Hicks. Soft error rate improvements in 14-nm technology featuring second-generation 3d tri-gate transistors. IEEE Transaction Nucl. Sci., 62(6):2570-2577, Dec 2015.
M. Jin, C. Liu, J. Kim, J. Kim, H. Shim, K. Kim, G. Kim, S. Lee, T. Uemura, M. Chang, T. An, J. Park, and S. Pae. Reliability characterization of 10nm finfet technology with multi-vtgate stack for low power and high performance. International Electron Devices Meeting (IEDM), pages 15.1.1-15.1.4, Dec 2016.

H. Liu, M. Cotter, S. Datta, and V. Narayanan. Technology assessment of si and iii-v finfets and iii-v tunnel fets from soft error rate perspective. In IEDM, pages 25.5.1-25.5.4, Dec 2012.

H. D. Simon: "Petascale Computing in the U.S.," ACTS Workshop, Berkeley, California, 2006

M. Baleani, A. Ferrari, L. Mangeruca, A. Sangiovanni-Vincentelli, Maurizio Peri, and Saverio Pezzini. Fault-tolerant platforms for automotive safety-critical applications. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '03, pages 170-177, New York, NY, USA, 2003. ACM.

JEDEC Standard JESD89A, MEASUREMENT AND REPORTING OF ALPHA PARTICLE AND TERRESTRIAL COSMIC RAY INDUCED SOFT ERRORS IN SEMICONDUCTOR DEVICES

B. Bowhill, B. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, D. Bradley, C. Bostak, S. Bhimji, and M. Becker. 4.5 the xeon<sup>®</sup> processor e5-2600 v3: A 22nm 18-core product family. International Solid-State Circuits Conference (ISSCC), pages 1-3, Feb 2015.

N. Seifert, P. Slankard, M. Kirsch, B. Narasimham, V. Zia, C. Brookreson, A. Vo, S. Mitra, B. Gill, and J. Maiz. Radiation-induced soft error rates of advanced CMOS bulk devices. International Reliability Physics Symposium (IRPS), pages. 217-225, Mar. 2006. Hui Zhao, Shiquan Fan, Leicheng Chen, Yan Song, and Li Geng. A 0.2 V–1.8 V 8T sram with bit-interleaving capability. IEICE Electronics Express, 11(8):20140229-20140229, 2014.

D. Krueger, E. Francom, and J. Langsdorf. Circuit design for voltage scaling and SER immunity on a quad-core itanium processor. In ISSCC, pages 94-95, Feb. 2008.

J. Warnock, et al., 22nm next-generation IBM system z microprocessor. In ISSCC, pages 70-71, Feb 2015.

K. Shimbo, T. Toba, K. Nishii, E. Ibe, Y. Taniguchi, and Y. Yahagi. Quantification & mitigation techniques of soft-error rates in routers validated in accelerated neutron irradiation test and field test. SELSE (Silicon Errors in Logic System Effects), 2011.

Y. Chen, "Cosmic Ray Effects on Personal Entertainment Applications for Smartphones," 2013 IEEE Radiation Effects Data Workshop (REDW), San Francisco, CA, 2013, pp. 1-4.

R. Bowmann. Part B — Landmarks in terrestrial single event effects (SEE). NSRE2013 Short Course Notes

G.C. Messenger. Collection of charge on junction nodes from ion tracks. IEEE Transaction Nucl. Sci., 29(6):2024-2031, 1982. P. Shivakumar, M. Kistler, SW Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. In Int'l Conference on Dependable Systems and Networks, pages 389-398, 2002.

T. Sato, K. Niita, N. Matsuda, et.al, "Particle and heavy ion transport code system, phits, version 2.52.", Journal of Nuclear Science and Technology, 50(9):913-923, 2013.

J. Furuta, S. Umehara, and K. Kobayashi,"Analysis of Neutron-induced Soft Error Rates on 28nm FD-SOI and 22nm FinFET Latches by the PHITS-TCAD Simulation System", International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), pp. 185-188, Kamakura, Japan, 2017

C.W. Slayman. Theoretical correlation of broad spectrum neutron sources for accelerated soft error testing. IEEE Transaction Nucl. Sci., 57(6):3163-3168, Dec. 2010.

Y. Tosaka, R. Takasu, T. Uemura, H. Ehara, H. Matsuyama, S. Satoh, A. Kawai, and M. Hayashi. Simultaneous measurement of soft error rate of 90 nm CMOS SRAM and cosmic ray neutron spectra at the summit of mauna kea, International Reliability Physics Symposium (IRPS), pp. 727-728, May 2008.

P. Hazucha, T. Karnik, J. Maiz, S. Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar. Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-µm to 90-nm generation. International Electron Devices Meeting (IEDM), pages 21.5.1-21.5.4, Dec 2003.

K. Yamada, H. Maruoka, J. Furuta, and K. Kobayashi, "Radiation-Hardened Flip-Flops with Low-Delay Overhead Using PMOS Pass-Transistors to Suppress SET Pulses in a 65 nm FDSOI Process", IEEE Transaction on Nuclear Science (TNS), vol.65, no.8, pp. 1814-1822, Apr. 2018

J. Furuta, C. Hamanaka, K. Kobayashi, and H. Onodera, "A 65nm Bistable Cross-coupled Dual Modular Redundancy Flip-Flop Capable of Protecting Soft Errors on the C-element", VLSI Circuit Symposium, pp. 123-124, Honolulu, Hawaii, USA, June 2010

A. Makihara, T. Yamaguchi, Y. Tsuchiya, T. Arimitsu, H. Asai, Y. Iide, H. Shindou, S. Kuboyama, and S. Matsuda. See in a 0.15 um fully depleted CMOS/SOI commercial process. IEEE Transaction Nucl. Sci., 51(6):3621-3625, 2004.

N. Gaspard, S. Jagannathan, Z. Diggins, M. McCurdy, T. D. Loveless, B. L. Bhuva, L. W. Massengill, W. T. Holman, T. S. Oates, Y. Fang, S. Wen, R. Wong, K. Lilja, and M. Bounasser. Estimation of hardened flip-flop neutron soft error rates using sram multiple-cell upset data in bulk cmos. International Reliability Physics Symposium (IRPS), pages SE.6.1-SE.6.5, April 2013.

J. Furuta, K. Kobayashi, and H. Onodera,"Impact of Cell Distance and Well-contact Density on Neutron-induced Multiple Cell Upsets", International Reliability Physics Symposium (IRPS), pp. 6C.3.1-6C.3.4, Monterey, CA, USA, Apr. 2013

J. Furuta, K. Kobayashi, and H. Onodera," Impact of Cell Distance and Well-contact Density on Neutron-induced Multiple Cell Upsets", IEICE Transaction on Electronics, vol.E98-C, no.4, pp. 1745-1353, Apr. 2015

M. Zhang, S. Mitra, T. M. Mak, N. Seifert, N. J. Wang, Q. Shi, K. S. Kim, N. R. Shanbhag, and S. J. Patel. Sequential element design with built-in soft error resilience. IEEE Transaction VLSI Sys., 14(12):1368-1378, Dec. 2006.

K. Kobayashi, J. Furuta, H. Maruoka, M. Hifumi, S. Kumashiro, T. Kato, and S. Kohri,"A 16 nm FinFET Radiation-hardened Flip-Flop, Bistable Cross-coupled Dual-Modular-Redundancy FF for Terrestrial and Outer-Space Highly-reliable Systems", International Reliability Physics Symposium (IRPS), pp. SE2.1-SE2.3, Monterey, CA, USA, Apr. 2017

K. T. Chen, T. Fujita, H. Hara, and M. Hamada. A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptivecoupling configuration in 40nm CMOS. In ISSCC, pages 338-340, Feb. 2011.

H. Maruoka, M. Hifumi, J. Furuta, and K. Kobayashi,"A Non-Redundant Low-Power Flip Flop with Stacked Transistors in a 65 nm Thin BOX FDSOI Process", The conference on Radiation and its Effects on Components and Systems (RADECS), Bremen, Germany, Sept. 2016

K. Kobayashi, K. Kubota, M. Masuda, Y. Manzawa, J. Furuta, S. Kanda, and H. Onodera, "A Low-Power and Area-Efficient Radiation-Hard Redundant Flip-Flop, DICE ACFF, in a 65 nm Thin-BOX FD-SOI", IEEE Transaction on Nuclear Science (TNS), vol.61, no.4, pp. 1881-1888, Aug. 2014

M. Masuda, K. Kubota, R. Yamamoto, J. Furuta, K. Kobayashi, and H. Onodera,"A 65 nm Low-Power Adaptive-Coupling Redundant Flip-Flop", IEEE Transaction on Nuclear Science (TNS), vol.60, no.4, pp. 2750 - 2755, Aug. 2013

P. Roche, J. L. Autran, G. Gasiot, and D. Munteanu. Technology downscaling worsening radiation effects in bulk: Soi to the rescue. In IEDM, pages 31.1.1-31.1.4, Dec 2013.

Y. Morita, R. Tsuchiya, T. Ishigaki, N. Sugii, T. Iwamatsu, T. Ipposhi, H. Oda, Y. Inoue, K. Torii, and S. Kimura. Smallest vth variability achieved by intrinsic silicon on thin box (SOTB) cmos with single metal gate. VLSI Tech. Symp., pages 166-167, June 2008 J. Furuta, J. Yamaguchi, and K. Kobayashi, "A Radiation-Hardened Non-Redundant Flip-Flop, Stacked Leveling Critical Charge Flip-Flop in a 65 nm Thin BOX FD-SOI Process", IEEE Transaction on Nuclear Science (TNS), vol.63, no.4, pp. 2080-2086, Aug. 2016

A. Balasubramanian, B. L. Bhuva, J. D. Black, and L. W. Massengill. Rhbd techniques for mitigating effects of single-event hits using guard-gates. IEEE Transactions on Nuclear Science (TNS), 52(6):2531-2535, Dec 2005.

H. Zhang, H. Jiang, T. R. Assis, D. R. Ball, K. Ni, J. S. Kauppila, R. D. Schrimpf, L. W. Massengill, B. L. Bhuva, B. Narasimham, S. Hatami, A. Anvar, A. Lin, and J. K. Wang. Temperature dependence of soft-error rates for ff designs in 20-nm bulk planar and 16-nm bulk finfet technologies. International Reliability Physics Symposium (IRPS), pages 5C-3-1-5C-3-5, April 2016

K. Yamada, J. Furuta, and K. Kobayashi, "Radiation-Hardened Flip-Flops with Small Area and Delay Overheads Using Guard-Gates in FDSOI Processes", SOI-3D-Subthreshold Microelectronics Technology Unified Conference, Burlingame, CA, USA, Oct. 2018 S. Kaeriyama et al., "A nonvolatile programmable solid-electrolyte nanometer switch," in IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 168-176, Jan. 2005.

M. Ebara, K. Yamada, K. Kojima, Y. Tsukita, J. Furuta, and K. Kobayashi,"Evaluation of Soft-Error Tolerance by Neutrons and Heavy Ions on Flip Flops with Guard Gates in a 65 nm Thin BOX FDSOI Process", The conference on Radiation and its Effects on Components and Systems, no.F-1, Montpellier, France, Sept. 2019

## Author Biography



Kazutoshi Kobayashi received his B.E., M.E. and Ph. D. in Electronic Engineering from Kyoto University, Japan in 1991, 1993, 1999, respectively.

Starting as an Assistant Professor in 1993, he was promoted to associate professor in the Graduate School of Informatics, Kyoto University, and stayed in that position until 2009. For two years during this time, he acted as associate professor of VLSI Design and Education Center (VDEC) at the University of Tokyo. Since 2009, he has been a professor at Kyoto Institute of Technology.

While in the past he focused on reconfigurable architectures utilizing device variations, his current research interest is in improving the reliability (Soft Errors, Bias Temperature Instability and Plasma Induced Damage) of current and future VLSIs. He started a research related to gate drivers for power transistors since 2013.

He was the recipient of the IEICE best paper award in 2009 and the IRPS best poster award in 2013

#### Supplemental Slides

## **Terrestrial and Outer Space**



- Must mitigate SEE for huge number of products

## **Technology Downscaling Trend**



- FIT/Mbit↓ as process scaling, but integration density ↑
- FIT rate /area: 8x at 28 nm than 150 nm

https://www.xilinx.com/support/documentation/white\_papers/wp395-Mitigating-SEUs.pdf

# Scaling Trend

- Technology scaling on bulk
  - 0.5x / Gen [Intel 14nm]
  - Probability of neutron hit  $\downarrow \rightarrow$  SER  $\downarrow$
  - Critical charge  $\downarrow \rightarrow SER \uparrow$
  - After 28 nm, sensitive area becomes larger than a transistor.
     Scaling does not decrease probability of neutron hit



- Technology scaling on FinFET
  - 0.2x / Gen [Intel 14nm]
  - Current drive capability ∝ Fin height (No area overhead)
    - Does not increase reverse-biased drain junction area [Intel 14nm]

## SERs on 65/28 nm FDSOI



- Heavy ion produces more SEUs than neutron
  - -Possible to compare SER on SOI process
- •28 nm is 18x stronger than 65 nm in error/bit, 5x in error/area
- •Transistor volume on SOI is scaled by process node
  - -Tr. Volume on bulk includes substrate region

# SEU, SET and MCU

- Single Event Upset (SEU)
  - Flip a stored datum by a particle hit on a storage cell (SRAM or FF)
- Single Event Transient (SET)
  - Transient pulse induced by a particle hit
  - If captured by a storage element, SEU
- Multiple Cell Upset (MCU)
  - Flip multiple bits by a particle hit
  - Charge Sharing: Generated carriers are collected to multiple nodes
  - Parasitic Bipolar Effect (PBE): Turn on Trs by elevating well potential
  - Multiple hit: Particle penetrates multiple storages cells
  - MBU: MCU on a single word on memory



**P-bulk** 

Particle hit

#### # of SEU on Satellites



- # of SEU/day/bit at geostationary orbit
- A few SEU/day/Mbit

### **Fabrication Process**



- FDSOI w/ thin BOX layer (10 -15 nm) in 65 nm developed by Renesas Electronics
  - No channel doping to reduce process variations
  - Control substrate bias through thin BOX
  - similar to 28 nm thin BOX FDSOI of ST microelectronics w/ 25 nm BOX layer

[Y. Morita, VLSI Tech. Symp., pp 166-167, 2008]

### **Device Simulation Results**



#### Dual Lock-step for Automotive **Double Modular Dual Lock-step** redundancy **LSDC CPU** Compare (main) Check **IMTS** CPU **IMTS IMTS** (checker) R/D Α/ Conv. Conv. LSDC : Lock-step dual core Conv. Conv. A/D Conv. : Analog-Digital Converter **R/D Conv. : Resolver-Digital Converter** IMTS: Intelligent Motor-Timer System

[H. Kimura et al., ISSCC2017, 3-5]