JEI Letters

New automatic defect classification algorithm based on a classification-after-segmentation framework

[+] Author Affiliations
Sang-Hak Lee, Hyung-Il Koo, Nam-Ik Cho

Seoul National University, Department of Electrical Engineering and Computer Science, San 56-1, Shilim-Dong, Kwanak-Gu, Seoul, Korea

J. Electron. Imaging. 19(2), 020502 (June 01, 2010). doi:10.1117/1.3429116
History: Received September 07, 2009; Revised April 12, 2010; Accepted April 19, 2010; Published June 01, 2010; Online June 01, 2010
Text Size: A A A

Open Access Open Access

We propose a new method that classifies wafer images according to their defect types for automatic defect classification in semiconductor fabrication processes. Conventional image classifiers using global properties cannot be used in this problem, because the defects usually occupy very small regions in the images. Hence, the defects should first be segmented, and the shape of the segment and the features extracted from the region are used for classification. In other words, we need to develop a classification-after-segmentation approach for the use of features from the small regions corresponding to the defects. However, the segmentation of scratch defects is not easy due to the shrinking bias problem when using conventional methods. We propose a new Markov random field-based method for the segmentation of wafer images. Then we design an AdaBoost-based classifier that uses the features extracted from the segmented local regions.

Figures in this Article

Automatic defect classification (ADC) is a wafer fabrication process that classifies defects into predefined types, e.g., particle, scratch, etc. Figures 1 show examples of a particle and scratches, respectively. By correctly classifying the defects, the cause of the defects can be analyzed, and this information is used for improving the process and consequently the yield. Hence, there have been many studies on ADC. For example, Kameyama and Kosugi:1 proposed a method that exploits a hyperellipsoid clustering network (HCN) with radial basis function (RBF) and model switching. Also, smart beam search (SBS) using a support vector machine (SVM) was proposed for feature selection.2 However, there are some difficulties in applying conventional appearance-based pattern classification methods (e.g., techniques used in face detection3) to ADC, because defects in the same class have too many variants in their shapes. Also, since the defect regions occupy very small portions of the image, the global feature statistics (e.g., frequency or filter bank responses) are useless. To resolve these difficulties, we propose a new method based on classification-after-segmentation.

Graphic Jump LocationF1 :

Example images and segmentation results: (a) particle image, (b) scratch image, (c) proposed segmentation for particle image, (d) proposed segmentation for scratch image, (e) Potts model-based segmentation for particle image, and (f) Potts model-based segmentation for scratch image.

The most essential part of the process may be the correct segmentation of defects. Due to the shrinking bias problem,7 however, the conventional state of the art segmentation methods based on maximum a posterior Markov random fields (MAP-MRF) with a Potts model data term46 often oversegment a scratch (thin object) into several objects, and thus a scratch can appear as particles. For dealing with this problem, we propose a new MAP-MRF based segmentation method. To be specific, we develop a new energy function for the MAP-MRF scheme based on the Retinex theory8 for handling the blurry scratches and the color inconsistency. After segmentation, we define several features (shape, intensity, and so on) for each defect, and develop an AdaBoost classifier to tell whether the patch corresponds to a particle or other defects (including scratch).

In the experiments, we show that our segmentation method reduces the oversegmentation of scratches and thus keeps the false alarm rate low, even for high particle detection rates.

Among the many segmentation methods, the state of the art MAP-MRF approach is adopted here.46 That is, segmentation is achieved by minimizing an energy function:Display Formula

1E(f)=pPVp(fp)+pPqN(p)Vp,q(fp,fq),
where P is a set of sites, N(p) is an eight-neighborhood system, fp{1,,N} is a label of the site p, and f={fp}pP. The label fp indicates that the site p belongs to the segment whose label is fp. Specifically, the data term Vp(fp) is designed so that pixels having similar intensity values are clustered into a single segment:Display Formula
2Vp(fp)=|I(p)M×fp|,
where I(p) is the intensity of a given pixel p, and M denotes the difference of intensity between the adjacent labels. Vp,q(fp,fq) is called the smoothness term, defined asDisplay Formula
3Vp,qC(fp,fq)=exp[|I(p)I(q)|2σ2]δ(fp,fq),
where σ is a constant and δ(fp,fq) is defined asDisplay Formula
4δ(fp,fq)={1iffpfq0otherwise},
in conventional works.56Vp,qC(fp,fq) enforces the continuity of labels by penalizing label discontinuities. However, it is well known that since the Vp,qC(fp,fq) tries to minimize the length of boundary9 (shrinking bias), this method often segments a long and thin object into several parts. Also, blurred boundaries deteriorate the performance of Vp() (especially when capturing thin objects). Hence, many long and thin objects in wafer images are frequently segmented into several parts, and they are often confused with particles.

To prevent oversegmentation, we develop a new function to be included into the data term:Display Formula

5Vp,qR(fp,fq)={(1δ[L(p),L(q)])}×δ(fp,fq),
where L(p) is a Retinex filtering of input.8 To be specific, the L(p) for our purpose is defined asDisplay Formula
6L(p)={1,ifI(p)G(p)<T0,otherwise},
where G(p) is a Gaussian blurred image at pixel p, and T is a predefined parameter (in our implementation, T is 0.9). Note that the Retinex filtering compensates for the blurring at the image formation process, and improves color consistency caused by illumination change.8 Therefore, L(p)=L(q) and pN(q) means that p and q have the same intrinsic colors [even if |I(p)I(q)|0], and the labels satisfying fpfq should be penalized. In summary, Vp,qR(fp,fq) alleviates the shrinking bias by penalizing the discontinuities occurring on the pixels having the same intrinsic colors. A similar idea can also be found in the binarization of document images.10 Finally, the smoothness term in Eq. 1 is given byDisplay Formula
7Vp,q(fp,fq)=λ1Vp,qC(fp,fq)+λ2Vp,qR(fp,fq),
where λ1 and λ2 are two balancing parameters.

Since each particle and scratch is segmented into a single region by the proposed segmentation algorithm, we can use several features extracted from the segments to determine whether it is a particle defect or not.

Features

Let S be a set of positional vectors in a given segment, |S| be the size of S, and I(x,y) be the pixel intensity at (x,y)S. Then features can be summarized as follows.

Mean intensity:Display Formula

81|S|(x,y)SI(x,y).

Shape descriptor:Display Formula

9λmaxλmin,
where λmax and λmin are two eigenvalues of the covariance matrix of S (when the value is close to unity, the shape has no directional preference, and λmaxλmin1 means that the shape is a thin and elongated one).

Texture measure:Display Formula

10i=14λi2,
where λi’s are eigenvalues of the covariance matrix of the vectors at points (x,y)S,Display Formula
11(|Ix|,|Iy|,|2Ix2|,|2Iy2|).
The vector in Eq. 11 is computed over (x,y)S and the covariance matrix of these vectors is found. Then, Eq. 10 is defined as the measure of texture, which has a large value when there exists a particular directional texture.

Measure of orientation bias (of edges): obtained from the histogram ofDisplay Formula

12arctan(IyIx).
By computing the orientation histogram and summing the sizes of four dominant bins, we can measure the bias of orientation distribution.

AdaBoost

For machine learning using the extracted features, we use the AdaBoost algorithm, where each weak classifier is based on the log-likelihood ratio testDisplay Formula

13ht(x)=lnpt+(x)pt(x),
where x indicates the output of segmentation in the feature space, pt+(x) denotes the weighted histogram of the t’th feature for a positive sample, and pt(x) is similarly defined. Then, after the training process, a strong classifier F(x) is given byDisplay Formula
14F(x)=t=1Tαtht(x),
where, from the standard AdaBoost algorithm,11αt is determined by the function of current error rate e:Display Formula
15αt=log1ee.

The dataset for the experiment consists of defected images acquired by a 266-nm bright field inspection instrument (12-in. ⟨100⟩ oriented silicon wafer, magnification of more than 104 times). Among the dataset, we use 380 images including particle defects and 150 images including scratch defects as training samples. Then we test 200 images containing particle defects and 200 images having no particle defects (of course they contain other defects such as scratch defects). As can be seen in Fig. 1, the proposed term improves segmentation performance. The main purpose of ADC is to automatically classify the particle from the other kinds of defects (mostly scratches), hence we evaluate the performance of ADC using the detection ratio (DR) of particle defects and the false alarm (FA) of other defects considered as particle, which are defined asDisplay Formula

16DR=thenumberofcorrectlydetectedparticledefectsthenumberofallparticledefects,
Display Formula
17FA=thenumberofdefectsmisclassifiedasparticlethenumberofallscratchdefects.
In typical detection problems, if we try to increase the DR, the FA also increases and vice versa. Hence the performance of classifier or detector can be measured by the receiver operating characteristic (ROC) that shows the DR versus FA. When the FA is kept low for the high DRs, the detector is considered to be a good one. We compare the ROCs before and after applying our new method in Fig. 2, where it can be observed that our method keeps the DR very high even for very low (down to 0.05) FA rates.

Graphic Jump LocationF2 :

ROC curve comparing our method to the Potts model-based method.

In this work, we propose a new approach to ADC based on the classification-after-segmentation framework. The wafer image is first segmented based on the MAP-MRF approach, where a new energy function is designed to prevent the degeneration of scratch into several regions. Then, an AdaBoost classifier is trained using the features extracted from the segments. According to the experimental results on wide variants of particles, the proposed approach shows good classification performance.

This research was supported by the Ministry of Culture, Sports and Tourism (MCST), and the Korea Culture Content Agency (KOCCA) in the Culture Technology (CT) Research and Development Program 2009.

Kameyama  K., and Kosugi  Y., “ Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching. ,”  Proc. Int. Joint Conf. on Neural Networks (IJCNN'99). , Washington, DC, July 10–16, 1999, pp. 3505–3510  ((1999)).
Gupta  P., , Doermann  D., , and DeMenthon  D., “ Beam search for feature selection in automatic svm defect classification. ,”  Proc. 16th Int. Conf. on Pattern Recognition. , vol. 2, , pp. 212–215 ,  IEEE , Piscataway, NJ ((2002)).
Viola  P., and Jones  M., “ Rapid object detection using a boosted cascade of simple features. ,”  Proc. Int. Conf. on Computer Vision and Pattern Recognition (CVPR). , vol. 1, , pp. 511–518 ,  IEEE , Piscataway, NJ ((2001)).
Boykov  Y., , Veksler  O., , and Zabih  R., “ Fast approximate energy minimization via graph cuts. ,” IEEE Trans. Pattern Anal. Mach. Intell..  0162-8828 23, (11 ), 1222–1239  ((2001)).
Boykov  Y., and Jolly  M., “ Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. ,”  Proc. Int. Conf. on Computer Vision. , pp. 105–112 ,  IEEE , Piscataway, NJ ((2001)).
Rother  C., , Kolmogorov  V., , and Blake  A., “ grabcut: interactive foreground extraction using iterated graph cuts. ,” in  SIGGRAPH ’04. , pp. 309–314 ,  ACM , Washington, DC ((2004)).
Vicente  S., , Kolmogorov  V., , and Rother  C., “ Graph cut based image segmentation with connectivity priors. ,”  Proc. Int. Conf. on Computer Vision and Pattern Recognition (CVPR). , pp. 1–8 ,  IEEE , Piscataway, NJ ((2008)).
Land  E. L., “ The retinex theory of colour vision. ,” Scientific Am.. 237, (6 ), 108–128  ((1977)).
Kolmogorov  V., and Boykov  Y., “ What metrics can be approximated by geo-cuts, or global optimization of length/area and flux. ,”  Proc. Int. Conf. on Computer Vision. , vol. 1, , pp. 564–571 ,  IEEE Computer Society , Washington, DC ((2005)).
Pilu  M., and Pollard  S., “ A light-weight text image processing method for handheld embedded camera. ,”  Proc. Brit. Mach. Vision Conf.. , pp. 547–556 ,  British Machine Vision Association (BMVA) , Malvern, UK ((2002)).
Schapire  R. E., “ The boosting approach to machine learning: an overview. ,”  Proc. MSRI Workshop Nonlinear Est. Class. , pp. 149–172 ,  Springer , New York ((2002)).
© 2010 SPIE and IS&T

Citation

Sang-Hak Lee ; Hyung-Il Koo and Nam-Ik Cho
"New automatic defect classification algorithm based on a classification-after-segmentation framework", J. Electron. Imaging. 19(2), 020502 (June 01, 2010). ; http://dx.doi.org/10.1117/1.3429116


Figures

Graphic Jump LocationF1 :

Example images and segmentation results: (a) particle image, (b) scratch image, (c) proposed segmentation for particle image, (d) proposed segmentation for scratch image, (e) Potts model-based segmentation for particle image, and (f) Potts model-based segmentation for scratch image.

Graphic Jump LocationF2 :

ROC curve comparing our method to the Potts model-based method.

Tables

References

Kameyama  K., and Kosugi  Y., “ Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching. ,”  Proc. Int. Joint Conf. on Neural Networks (IJCNN'99). , Washington, DC, July 10–16, 1999, pp. 3505–3510  ((1999)).
Gupta  P., , Doermann  D., , and DeMenthon  D., “ Beam search for feature selection in automatic svm defect classification. ,”  Proc. 16th Int. Conf. on Pattern Recognition. , vol. 2, , pp. 212–215 ,  IEEE , Piscataway, NJ ((2002)).
Viola  P., and Jones  M., “ Rapid object detection using a boosted cascade of simple features. ,”  Proc. Int. Conf. on Computer Vision and Pattern Recognition (CVPR). , vol. 1, , pp. 511–518 ,  IEEE , Piscataway, NJ ((2001)).
Boykov  Y., , Veksler  O., , and Zabih  R., “ Fast approximate energy minimization via graph cuts. ,” IEEE Trans. Pattern Anal. Mach. Intell..  0162-8828 23, (11 ), 1222–1239  ((2001)).
Boykov  Y., and Jolly  M., “ Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. ,”  Proc. Int. Conf. on Computer Vision. , pp. 105–112 ,  IEEE , Piscataway, NJ ((2001)).
Rother  C., , Kolmogorov  V., , and Blake  A., “ grabcut: interactive foreground extraction using iterated graph cuts. ,” in  SIGGRAPH ’04. , pp. 309–314 ,  ACM , Washington, DC ((2004)).
Vicente  S., , Kolmogorov  V., , and Rother  C., “ Graph cut based image segmentation with connectivity priors. ,”  Proc. Int. Conf. on Computer Vision and Pattern Recognition (CVPR). , pp. 1–8 ,  IEEE , Piscataway, NJ ((2008)).
Land  E. L., “ The retinex theory of colour vision. ,” Scientific Am.. 237, (6 ), 108–128  ((1977)).
Kolmogorov  V., and Boykov  Y., “ What metrics can be approximated by geo-cuts, or global optimization of length/area and flux. ,”  Proc. Int. Conf. on Computer Vision. , vol. 1, , pp. 564–571 ,  IEEE Computer Society , Washington, DC ((2005)).
Pilu  M., and Pollard  S., “ A light-weight text image processing method for handheld embedded camera. ,”  Proc. Brit. Mach. Vision Conf.. , pp. 547–556 ,  British Machine Vision Association (BMVA) , Malvern, UK ((2002)).
Schapire  R. E., “ The boosting approach to machine learning: an overview. ,”  Proc. MSRI Workshop Nonlinear Est. Class. , pp. 149–172 ,  Springer , New York ((2002)).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.