A quantified approach of dataset selection for training ML models on hard-to-classify patterns

Mohamed Ismail; Mohamed Bahnas; Tiago Reimann; Ilhami Torunoglu; Kareem Madkour

doi:10.1117/12.2586265

22 February 2021 A quantified approach of dataset selection for training ML models on hard-to-classify patterns

Mohamed Ismail, Mohamed Bahnas, Tiago Reimann, Ilhami Torunoglu, Kareem Madkour

Proceedings Volume 11614, Design-Process-Technology Co-optimization XV; 116140A (2021) https://doi.org/10.1117/12.2586265
Event: SPIE Advanced Lithography, 2021, Online Only

Abstract

In the semiconductor fabrication process, yield is negatively impacted by defects that appear systematically within specific patterns of the physical layout design. Those defective patterns are popularly known as hotspots, and they can arise due to various causes. There are several known approaches of hotspot detection. One approach for hotspot detection is Machine Learning (ML), where known hotspot and non-hotspot patterns are used for training the model to be used afterwards in prediction of new hotspots. The objective in ML approaches is to maximize the hit rate (i.e. finding all potential hotspots) and to minimize the false alarm rate (i.e. reduce the overhead of false positives). The model’s ability to correctly classify between hotspots and non-hotspots depends on the coverage of the training data set. The real-world challenge in training a ML system to classify hotspots/non-hotspots is the imbalanced nature of the problem, where the known hotspot patterns are always in the minority class. Another challenge specific to the problem of hotspot classification is the difficulty to correctly classify non-hotspots that are similar to hotspots. These “hard-to-classify” patterns are ones with high mask error enhancement factor (MEEF), as small variations in the pattern can make it change between hotspot and non-hotspot. These two challenges cause conventional methods of handling imbalanced training datasets to be inadequate to the problem of hotspot detection. This paper will present a flow for quantified training dataset selection approach and put extra focus on the patterns that are hard to classify due to close similarity with known hotspots. Improved model accuracy is illustrated when adopting the quantified sampling approach compared to conventional sampling approaches.

Conference Presentation

Citation Download Citation

Mohamed Ismail, Mohamed Bahnas, Tiago Reimann, Ilhami Torunoglu, and Kareem Madkour "A quantified approach of dataset selection for training ML models on hard-to-classify patterns", Proc. SPIE 11614, Design-Process-Technology Co-optimization XV, 116140A (22 February 2021); https://doi.org/10.1117/12.2586265

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available