# Measurement Results of Within-Die Variations on a 90nm LUT Array for Speed and Yield Enhancement of Reconfigurable Devices

Kazuya Katsuki, Manabu Kotani, Kazutoshi Kobayashi and Hidetoshi Onodera Graduate School of Informatics, Kyoto University, Kyoto, Japan. {katsuki, kotani, kobayasi, onodera}@vlsi.kuee.kyoto-u.ac.jp

Abstract— It is possible to enhance speed and yield of reconfigurable devices utilizing WID variations. An LUT array LSI is fabricated on a 90nm process to measure WID and D2D variations. Performance fluctuations are measured by counting the number of LUTs through which a signal is passing within a certain time. D2D and WID variations are clearly observed by the measurement.

## I. INTRODUCTION

Process scaling makes it possible to integrate billions of transistors on a die. It is quite difficult to manufacture such small transistors with similar characteristics. Down-scaling increases variations of transistor performance. Transistor performances are different die-todie (D2D) and also within-die (WID). [1] reveals that WID variations are apparently observed in a 90nm process, which become dominant according to the process scaling[2].

Degradations of transistor performance by variations impacts gate delay, and finally it degrades the speed and yield of the LSIs. On the other hand, such WID variations can be compensated by reconfiguration. If a circuit is reconfigured according to the process variation, speed and yield can be enhanced. In order to place functional blocks according to the process variations, we must obtain variation data in some way. Once they are obtained, functional blocks can be placed according to the measured variations of each chip and their lengths of critical paths.

We have fabricated an LUT array LSI to confirm whether the WID variations clearly occur in reconfigurable devices. Its structure is shown in section II, measurement results is in section III, and we conclude this paper in section IV.

## II. FABRICATED LUT ARRAY

Fig. 1 shows the structure of a logic block (LB) which contains a 4-bit LUT and a scan flip-flop (SDFF). An LUT consists of 16 flip-flops to store an LUT configuration and five MUX4s (4-input multiplexers). The output signal **Mout** from the MUX4 is sent to the adjacent LUT. Fig. 2 shows the array structure of logic blocks in the fabricated chip. They are laid out in a fractal structure to observe



Fig. 1. Structure of a logic block. A signal is transmitted along the dashed arrow through two MUX4s per LB at the measurement.



Fig. 2. Structure of the LUT array. LBs are connected in a fractal structure to observe scalable process variations.

scalable process variations. If they are laid out in a line, WID variations may be canceled. The fractal structure makes it possible to measure WID variations in scalable square regions.

On measuring the process variations, a signal is rushing through LUTs from the first LB in a square region, which is captured by the SDFF in each LB. LUTs are configured as follows during the measurement.

- The LUT in the first LB is configured to become true at any input value.
- The LUT in the second LB is configured to become true if the input **B** from the previous Sout becomes true.
- The LUTs in the other LBs are configured to become true only if the input **A** from **Mout** becomes true.

Applying a clock pulse to SDFFs under the above LUT configuration, **Sout** of the first LB becomes true, which is transmitted through LBs. During the transmission, let



Fig. 3. Chip micrograph of a 90nm LUT array LSI including 2,048 logic blocks located at the bottom.

us apply another clock pulse to SDFFs. Then the SDFFs in the LBs where the true signal have been transmitted become true. If WID variations are observed, number of transmitted LBs will be different in each square region as shown in Fig. 2. Fig. 3 shows a micrograph of a fabricated LSI.

#### III. MEASUREMENT RESULTS

In a single measurement, very little variations appear since the transistor speed is quantized as the number of LBs. To avoid the quantization and measure the difference clearly, clock cycle (time to transmission) is varied from 4.0ns to 8.0ns at 0.1ns interval. We repeat it 100 times per cycle at the resolution of 16 ( $4\times4$ ) LBs. The average value of 100 results is regarded as the number of transmissions at the cycle. By setting the clock cycle on the horizontal axis and the average number of transmissions on the vertical axis, the gradient is calculated using the least square method. The gradient depends on the performance of each block of LBs. The ratio of the gradients is equivalent to the ratio of the speeds. We can regard these gradients as the performance indicator.

Fig. 4 shows the statistic of WID variations from a fabricated chip. The peripheral LBs tend to be fast and the central LBs are slow. The other 24 chips have the same tendency. The possible reasons of the concave delay curve is that the central portions degradation caused by IR drop. To distinguish the WID and D2D variations, statistics from the 25 dice are averaged for every 16 LBs. These averaged gradients are called the reference delays. The average value of the residual errors between the measured and reference delays on a die is regarded as the D2D process variation, which is shown in Fig. 5. Fig. 6 shows WID variations of the slowest, typical and fastest chips. Each distribution is obtained to subtract measured delays from the reference delays. The three distributions are very similar to the Gaussian distribution. Therefore the above residual-based method is practical to extract WID variations. Fig. 5 and 6 reveals that the WID and D2D variations have the same order in the 90nm process.



Fig. 4. Statistics of a fabricated dice by regarding the gradient from the least square method as the performance indicator. Peripheral LBs are fast and central ones are slow.



Fig. 5. Observed D2D variations, which is obtained from the average residual error between the measured values and averaged representative ones.

Fig. 6. Extracted WID variations from three distinguished chips. Left: slowest, Center: typical, Right: fastest. They are similar to the Gaussian distribution. The scale of the x axis is same as Fig. 5

# IV. CONCLUSION

We propose compensating WID variations by reconfiguration. An LUT array LSI is fabricated on a 90nm process to measure process variations of reconfigurable devices. D2D and WID variations are cleary observed on the fabricated chip, and they have the same order on 90nm process. So it is likely that WID variations will be dominant in near future. This means that the proposed method is efficient and will be effective.

#### References

- S. Ohkawa, M. Aoki, and H. Masuda. Analysis and Characterization of Device Variations in an LSI Chip Using an Integrated Device Matrix Array. *IEEE Transactions on Semiconductor Manufacturing*, *Vol.17, No.2*, pages 155–165, 2004.
- [2] Samie B. Samaan. The Impact of Device Parameter Variations on the Frequency and Performance of VLSI Chips. In *ICCAD2004*, pages 343–346, 2004.
- [3] Kazuya Katsuki, Manabu Kotani, Kazutoshi Kobayashi, and Hidetoshi Onodera. A Yield and Speed Enhancement Scheme under Within-die Variations on 90nm LUT Array. In Proceedings of IEEE 2005 Custom Integrated Circuits Conference, pages 601–604, 2005.