In the semiconductor fabrication process, yield is negatively impacted by defects that appear systematically within specific patterns of the physical layout design. Those defective patterns are popularly known as hotspots, and they can arise due to various causes. There are several known approaches of hotspot detection. One approach for hotspot detection is Machine Learning (ML), where known hotspot and non-hotspot patterns are used for training the model to be used afterwards in prediction of new hotspots. The objective in ML approaches is to maximize the hit rate (i.e. finding all potential hotspots) and to minimize the false alarm rate (i.e. reduce the overhead of false positives). The model’s ability to correctly classify between hotspots and non-hotspots depends on the coverage of the training data set. The real-world challenge in training a ML system to classify hotspots/non-hotspots is the imbalanced nature of the problem, where the known hotspot patterns are always in the minority class. Another challenge specific to the problem of hotspot classification is the difficulty to correctly classify non-hotspots that are similar to hotspots. These “hard-to-classify” patterns are ones with high mask error enhancement factor (MEEF), as small variations in the pattern can make it change between hotspot and non-hotspot. These two challenges cause conventional methods of handling imbalanced training datasets to be inadequate to the problem of hotspot detection. This paper will present a flow for quantified training dataset selection approach and put extra focus on the patterns that are hard to classify due to close similarity with known hotspots. Improved model accuracy is illustrated when adopting the quantified sampling approach compared to conventional sampling approaches.
|