Open Access
31 October 2023 Spatially augmented guided sequence-based bidirectional encoder representation from transformer networks for hyperspectral classification studies
Yuanyuan Zhang, Wenxing Bao, Hongbo Liang, Yanbo Sun
Author Affiliations +
Abstract

In recent years, bidirectional encoder representation from transformers (BERT) models have achieved superior performance in hyperspectral images (HSIs). It can capture the long-range correlations between HSI elements, but the local space and spectral band information of HSI is insufficient. We propose a spatially augmented guided sequence BERT network for HSI classification study, referred to as SAS-BERT, which makes more effective use of HSI’s spatial and spectral information by improving the BERT model. First, a spatial augmentation learning module is added in the preprocessing stage to obtain more significant spatial features before the input network and better guide the spatial sequence. Then a spectral correlation module was used to represent the spectral band features of the HSI and to establish a correlation with the spatial location of the images to obtain better classification performance. Experimental results on three datasets show that the method proposed achieves better classification performance than other state-of-the-art methods.

1.

Introduction

With hundreds of narrow continuous spectral bands, hyperspectral images (HSIs)1 better represent the semantic information of remotely sensed features.2,3 Their rich spectral features provide a powerful tool for achieving accurate pixel-level classification.4,5 HSI classification is widely used in precision agriculture,68 mineral surveying,9 anomaly detection,10 and land cover mapping.11 Compared with traditional visual images, HSIs have two unique features.12 The first is that the heterogeneity of matter leads to the high-dimensional and nonlinear spectral characteristics of hyperspectral data. Second, there is a high correlation between adjacent hyperspectral spectra, and spatial correlation can supplement spectral information to provide a more accurate interpretation of surface cartography.

Classification of hyperspectral remote sensing images is fundamental to HSI processing and applications. Its ultimate aim is to assign a unique identity to each image element. Analyzing and processing the spectral and spatial characteristics of the various types of land features in the HSIs classify each image element into a category corresponding to the actual land features, thus enabling the classification of land features. Conventional methods generally use band selection and feature extraction for dimensionality reduction, compressing the original spectral image elements into a low-dimensional space, such as principal component analysis,13 support vector machines (SVM),14 and random forests.15 The properties of HSIs constrain these methods, and their classification results could be better. HSIs possess both spectral properties and spatial dependence, which implies a joint representation of spectral and spatial features. Therefore some researchers have proposed using a spatial–spectral feature extractor to extract HSI features. For example, spatial information was combined with multinomial logistic regression for HSI feature extraction.16 Li et al.17 proposed an edge-preserving filtering-based framework for spectral–spatial classification, significantly improving the classification accuracy of SVM with fewer computer resources. However, the above models all consist of shallow structures, such network models contain fewer hidden layers but require a more significant number of neurons, so shallow network structures do not provide a better description of HSIs.

With the development of artificial intelligence, deep-learning-based methods are widely used in remote sensing classification. For example, the spatial update super voxel stacking autoencoder method is an improved depth network. This method uses the spatial context of similar spectra within consecutive pixels for effective HSI classification.18 The convolution neural network (CNN) has a unique advantage in processing high-dimensional HSI data among these deep-learning models. For example, Chen et al.19 designed a 3D convolutional neural network capable of efficiently extracting spectral and spatial features with good classification performance. Li et al.20 proposed a deep-learning framework for fully convolutional neural networks, applying deconvolution to HSI for the first time. Zhong et al.21 proposed a deep residual network for HSI classification, which used ResNets and CNNs of different depths and widths to investigate the effect of deep-learning model size on HSI classification accuracy. Kanthi et al.22 proposed a new 3D depth feature extraction CNN model for HSI classification using spectral and spatial information, which divides the HSI data into 3D patches and feeds them into the proposed model for depth feature extraction.

However, CNN’s reliance on a geometrically fixed structure of convolutional kernels hinders long-range dependencies between features from nonlocal locations. Transformers can effectively alleviate the problem of small perceptual range and low description efficiency caused by geometrically structured convolutional filter banks like CNN. Originally proposed as a sequence-to-sequence (seq2seq) model for machine translation, the transformer has become a mainstream model in natural language processing (NLP). This simple and efficient architecture also performs well in HSI classification. He et al.23 applied for the first time a spatial transformation network to obtain the optimal input to a CNN for HSI classification. Zhao et al.24 proposed a convolutional transform network for the fusion of spectral information and spatial location of HSIs using central position coding. Hong et al.25 proposed the spectral–spatial transformer to capture the relationship between spectral bands along the spectral dimension. Meanwhile, spectral-spatial transformer network (SSTN) designed a spectral–spatial transformer network consisting of a spectral association module and a spatial attention module, using a new factorized architecture search framework to determine the hierarchical operations and block order of SSTN.26

The bidirectional encoder representations from transformers (BERT) model27 is based on the transformer taking its encoder part to obtain a bidirectional encoder representation model, a self-coding language model. BERT is structurally more superficial than the transformer in that it only uses the encoder part of the transformer. The BERT model was proposed in October 2018 by the Google AI Institute, and the model was initially popular in NLP. Because it is built on the transformer, BERT has powerful linguistic representation and feature extraction capabilities. However, BERT models consume substantial hardware resources so that reproducibility could be better and model convergence could be faster. RoBERTa, proposed by Liu et al.,28 the model does not change the network structure of BERT but only modifies some pretraining methods, including dynamic masking and discarding the next sentence predict task. Chen et al. proposed ALBERT, which argued that the model parameters of BERT were too large and too resource-intensive. They proposed using word vector factorisation, cross-layer parameter sharing or sentence order prediction to reduce the model size and thus improve the training speed.29 Cui et al.30 proposed MacBERT, the model that proposes not using a mask, replacing the mask-tagged position with another similar word, and then allowing the model to correct errors to achieve better results automatically. Related improved models are MASS,31 UNILM,32 and SpanBert,33 all of which have achieved good results.

He et al. argued that neurolinguistic models are more similar to HSIs in some aspects. They assembled multiple self-attention layers into transformer units. They converted the image elements of HSIs into sequences as input data to be applied to HSI classification tasks with satisfactory results. Their proposed HSI-BERT converts spectral image sets into sequences for characterization. This approach not only does not disrupt the inherent continuous spectral distribution of HSIs but also supports more flexible and dynamic input regions and is more easily generalized.34 The BERT model captures richer nonlocal image element information. However, the complex distribution of spectral space in HSIs results in low efficiency of BERT model feature extraction and inconspicuous feature description in high-dimensional spectral space. Each pixel of HSI is marked by BERT sequence mode, and a multihead self-attention mechanism extracts nonlocal spatial information of HSI. Still, the representation of local spatial features of HSIs needs to be improved. Moreover, the multihead self-attention mechanism only considers the feature characterization of the spatial pixel rotation sequence. It needs to utilize the rich spectral features unique to HSIs more effectively. In order to solve the above problems based on the BERT model, this paper proposes a spatially augmented guided sequential BERT network for HSI classification studies called SAS-BERT.

The main contributions of this paper are as follows.

  • This paper uses a new framework for HSI classification based on unsupervised BERT networks. The framework adds a spatial augmentation module to the BERT model to capture the local spatial information of HSI and to guide the sequence patterns. The nonlocal and local spatial information of HSI is utilized more effectively.

  • In order to enhance the ability of the BERT model to describe remotely sensed features, this paper combines the rich spectral information of HSIs to extract spectral features and establish correlations-spatial locations to better account for the intraclass consistency and interclass variability between spectral bands.

  • Compared with the BERT model, the network proposed in this paper aggregates the spatial augmentation module and the spectral association module to integrate and invert the spatial information of HSIs. The experimental results illustrate that the model obtains a more satisfactory classification accuracy than the state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 presents a relevant introduction to the BERT model. Section 3 presents the scheme of the proposed SAS-BERT network and its components. Section 4 presents the experimental results and analytical findings. In Sec. 5, conclusions are drawn.

2.

Related Introduction

2.1.

BERT Network Structure

The BERT model has been fine-tuned with good results on some downstream NLP tasks. The general framework of the BERT model is shown in Fig. 1, which is mainly based on the transformer’s encoder module, which will be referred to as TEn, as shown in Fig. 2, and the BERT model is obtained by stacking the TEn.

Fig. 1

Diagram of the general framework of the BERT model. The BERT model consists of an embedding layer, an encoder layer, and an FCL. The encoder layer is obtained by stacking multiple TEn.

OE_62_10_103103_f001.png

Fig. 2

Structural diagram of TEn.

OE_62_10_103103_f002.png

The BERT model consists of two main modules: embedding and encoder layers. The embedding layer includes token embedding, position embedding, and segment embedding, and segment embedding is not required for HSI classification. The token embedding implements a vector representation of the image element itself, and the position embedding learns the position properties of the image element.

The embedding layer is followed by the encoder layer, which contains the multiheaded self-attention (MHSA) module (Fig. 3). This mechanism improves the model’s ability to focus on different locations and multiple focus regions simultaneously. The self-attentive mechanism requires the generation of three-word vectors Q, K, and V to calculate the attention of each image element, which is obtained by multiplying the image element with three training learned weight matrices. However, in multiheaded attention, multiple learned weight matrices of Q, K, and V are randomly initialized and independently map the input vectors to different subspaces, thus enriching the feature representation of the information. Different heads of the MHSA mechanism obtain different attentions. For the input vector, if multiple attention is used and the number of heads used is p, the input vector is divided into p independent vectors, each using self-attention to calculate the attention weight. When completed, it will be merged. Therefore, it is a parallel mechanism within a submodule. All heads work independently and in parallel. Internal to the multihead self-attention mechanism is a scaled dot product attention (Fig. 4), calculated as follows:

Eq. (1)

attention(Q,K,V)=softmax(QKTdk)V,
where Q, K, and V denote query, key, and value, respectively, and dk denotes the dimension of the vector key. After the input image is converted into a sequence, all the pixel vectors on the sequence are multiplied by three randomly initialized matrices, and three vectors, Q, K, and V, are obtained. As shown in Fig. 4, the attention mechanism takes Q as the target pixel that needs to predict the label, and the number of pixel vectors generates the number of Q vectors, so each pixel in the sequence can be the target pixel. K, as the Q context pixel, matches the similarity with Q. The similarity between the query and each key is used as the weight, and then the value of the target pixel is weighted and fused with the value of the individual pixels of the context as the output of attention.

Fig. 3

Schematic diagram of the MHSA mechanism.

OE_62_10_103103_f003.png

Fig. 4

Schematic diagram of the scaled dot product attention mechanism.

OE_62_10_103103_f004.png

The MHSA contains p heads (H1,H2,,Hp):

Eq. (2)

MHSA(X)=concat(H1,H2,,Hp)WO,
where WO is the matrix of learned parameters, and hi is the result of the attention structure of i. The hi of self-attention structure can be described as follows:

Eq. (3)

hi=attention(XWiQ,XWiK,XWiV),
where X is the target image element, and the learning parameter matrices WQ,WK,WVRd×d/p are used for the affine projection.

3.

Modified BERT Network Structure

In this paper, the BERT network is modified by adding a spatial augmentation module and a spectral correlation module. The modified network is named the spatially augmented guided sequence-based BERT network (SAS-BERT). The spatial augmentation module guides the sequence patterns and obtains local spatial information of the HSI. The spectral correlation module establishes correlations between spectral kernels and spatial locations, enhancing the description of the spectral information of the image. A schematic diagram of the SAS-BERT framework is shown in Fig. 5. Random in Fig. 5 refers to random seeds, it means that the input samples for the 10 repeated experiments were randomly selected. Assume the original HSI dataset IRH×W×b, where H×W is the spatial size and b is the spectral dimensions. When fed into the network, it forms a one-hot category vector Y={y1,y2,,yC}R1×1×C, where C is the number of land cover categories. Taking the Pavia dataset as an example, the spectral dimensions are 103, and the land cover category is 9, which is introduced explicitly in the experimental dataset part of the fourth part. The spectral size is set to 11×11. SAS-BERT consists of the spatially augmented learning module and the spe-BERT module, which contains the spectral correlation module and the multiheaded self-attentive module.

Fig. 5

Schematic diagram of the SAS-BERT framework.

OE_62_10_103103_f005.png

3.1.

Spatially Augmented Learning Module

The MHSA mechanism of the BERT model captures rich nonlocal spatial contextual information but ignores the representation of local spatial information on HSIs. In this paper, we guide HSI’s local spatial feature extraction through the spatial augmented learning module to obtain an optimal input and then send it to the BERT module for further feature extraction and classification tasks.35 The diagram of the spatially augmented learning module is shown in Fig. 6, which contains three parts that are described below.

Fig. 6

Diagram of the spatially augmented learning module.

OE_62_10_103103_f006.png

The first section is the localization network (Floc). The localization network in Fig. 6 completes the extraction of local spatial information. The affine transformation of Eq. (4) through several hidden layers (convolution, pooling, full connection, etc.) map the coordinate point relationship between the input and output feature maps. The parameters in affine transformation are used to rotate, translate, and scale the original input image, and then the optimal input image for BERT model. The ultimate goal of using localization network is to obtain a good classification performance (as shown in red in Fig. 6). It learns the affine transformation matrix Aθ. After several convolutions or entire connection operations, the network will input image I, and then a regression layer output affine transformation matrix AθR2×3, Aθ is a learnable parameter. Four transformations are carried out by changing the parameters of Aθ: translation, scaling, rotation, and clipping. These parameters map the coordinate point relationship between the input and output feature maps. The coordinate points of the input feature map are obtained according to Aθ:

Eq. (4)

Floc(I)=Aθ=[abcdef],
where I is the input image; Aθ is an affine transformation matrix; Floc(·) denotes a function formed by a fully connected and other layers; a,b,c,d,e,f denotes different transformations by changing the values of these six parameters. For example, c,e for translation, a,e for scaling, a,b,d,e for rotation, and b,d for cropping.

The second section is the grid generator, as shown in the yellow section of Fig. 6. The position mapping relationship is based on the affine transformation matrix Aθ. The affine transformation matrix Aθ and the lattice points of the output feature map are used to find the coordinates of each pixel point of the output feature map corresponding to the input image:

Eq. (5)

Sθ(G(i,j))=(xiIyjI)=Aθ(xiOyjO1),
where Sθ(·) denotes the function formed by parametric grid sampling; G(i,j)=(xiO,yjO) denotes the individual grid coordinates of the output feature map; and (xiI,yjI) denotes the corresponding grid coordinate points of the output feature map on the input image.

The third section is the sampler, as shown in the blue section of Fig. 6. That is, the pixel value of the output image is calculated using (xiI,yjI) corresponding to the pixel value of the input image as the pixel value of the output image at (i,j). The interpolation algorithm calculates the pixel values of the output image based on the location mapping. Because the grid coordinate point corresponding to the input image in the second part may be a fractional number, the interpolation algorithm is used to correct the pixel values at that location. Bilinear interpolation is used here for interpolation:

Eq. (6)

Vi,jl=0H0WIn,ml·max(0,1|xiIm|)max(0,1|yjIn|),
where Vi,jl is the pixel value of the output feature map at channel l and coordinate (xiO,yjO), In,ml is the pixel value of the input feature map at channel l, coordinate (n,m) ((n,m) traverses all coordinate points of the input feature map). The smaller the values of |xiIm| and |yjIn| are, the closer the distance between (xi,yj) and (n,m) is. The larger the values of 1|xiIm| and 1|yjIn| are, the larger the value of max is, and the greater the weight will be. (xiI,yjI)(n,m) is in a lattice point, and finally, four corresponding weights are obtained, summed, and output Vi,jl.

The spatial augmentation module can be trained end-to-end with the entire convolutional network. It is inserted directly into the convolutional network and runs with good classification performance.

3.2.

spe-BERT Module

The spe-BERT module contains an MHSA module and a spectral correlation module. One of the essential concepts in the original BERT model is the MHSA mechanism, as shown in Fig. 4, which can link information at different positions on the input image element to obtain nonlocal long-range dependencies. However, the MHSA mechanism only considers the spatial key to the image element’s feature representation, ignoring the spectral band’s characteristics for HSI interpretation. Spectral features are the key features to distinguish ground objects in HSI. So this paper adds a spectral correlation module to the BERT module to aggregate HSI’s spectral and spatial features for better classification performance.

The schematic diagram of the spectral combined module is shown in Fig. 7. It comprises a multibranch network structure that uses two-dimensional convolution to integrate and invert the spatial information of the image element sequences along the spectral dimension. The sequenced feature cube X feed into the spectral correlation module and its spectral correlation kernel is calculated as follows:

Eq. (7)

M1=σ(ζ(X;WM1))Rmn×k,

Eq. (8)

Asso=M1T·XT=(X·M1)TRk×l,
where ζ(·) denotes a convolution operation and generates a tensor mn×k; σ(·) is the softmax function, which assigns independent weights to each image element. M1 is the generated mask, which integrates the spatial information in conjunction with the input features X to obtain a feature map of the spectral kernel concerning the spatial location. The output of the spectral correlation kernel is as follows:

Eq. (9)

M2=σ(ζ(X;WM2))Rmn×k,

Eq. (10)

SpeAsso=AssoT·M2T=(M2·Asso)TRm×n×l,
where M2 is another generated mask and ζ(·;WM1) shares the training parameters with ζ(·;WM2).

Fig. 7

Schematic diagram of the spectral combined module.

OE_62_10_103103_f007.png

The spectral correlation module establishes the correlation between spectral kernels and spatial locations, complementing the BERT model for HSIs with missing spectral information.

3.3.

SAS-BERT

In this paper, we have chosen the Pavia University dataset of dimensions 610×340×103 as an example to illustrate the designed SAS-BERT model. The Pavia University dataset is a selection of hyperspectral data from images taken in 2003 of the Italian city of Pavia, containing nine feature classes. Figure 8 details the SAS-BERT algorithm flow. The entire network consists of a spatial augmentation module, a spe-BERT module, and a fully connected layer (FCL). The spe-BERT module, in turn, contains the MHSA mechanism module (a module of the BERT model itself) and the spectral correlation module.

Fig. 8

Detailed framework diagram of SAS-BERT.

OE_62_10_103103_f008.png

SAS-BERT uses a cross-entropy loss function to minimise losses:

Eq. (11)

Lce=1Bi=1Bj=1Cyi,jlog(yi,j),
where y and y denote the actual and predicted one-hot label vectors, respectively. B is the number of samples in a batch, and C is the number of categories. yi,j denotes the scalar of the j’th category for the i’th sample.

Algorithm 1 describes the detailed training process of SAS-BERT in detail.

Algorithm 1

SAS-BERT algorithm.

Input: Hyperspectral image cubes I, model depth d, attention to the number of heads h, training batch b, training epoch e.
Output: Classification evaluation results for Xtest and predicted classification charts.
1: Begin
2: Input hyperspectral image cubes I;
3: fori=0 to epoch edo
4:  forj=0 to batch bdo
5:   Generate the affine transformation matrix Aθ using the localization network;
6:   The grid generator calculates the coordinates of each pixel point of the output feature map corresponding to the input image according to Aθ;
7:   As shown in Eq. (6), calculate the pixel values of the output image using bilinear interpolation to obtain a spatially augmented input image;
8:   Spatially augmented images for sequence conversion;
9:   fork=1 to depth ddo
10:    The sequenced image is fed into the spectral correlation module to obtain a spectral kernel relation to the spatial location of the feature map SpeAsso;
11:    Divide the last dimension of the feature map into h attention heads.
12:    The attentional characteristics of each attentional head are obtained by Eq. (1);
13:    Concat up the attentional features of h attentional heads;
14:    Combined with the spectral correlation of the relational feature map SpeAsso;
15:    k++;
16:   end for
17:   j++;
18:  end for
19:  i++;
20: end for
21: Load the model and feed Xtest into the model prediction;
22: Obtaining classification evaluation results and predicted classification maps.

4.

Experimental Results

All experiments in this paper are implemented on an Ubuntu 18.04 system using a GeForce RTX 2080 GPU and TensorFlow with CUDA 9.0 and Python 3.6. All subsequent training and testing experiments were conducted based on this environment.

4.1.

Experimental Datasets

In this paper, three classical real datasets, Indian Pines (IN), Pavia University (PU), and Houston (HOU), are used for the experiments.

  • Indian Pines. IN was acquired by the airborne visible, infrared imaging spectrometer to image a patch of Indian pine trees in Indiana, United States, in 1992. They then captured a 145×145 size image for annotation as an HSI classification study. The spatial resolution was 20 m and contained 16 vegetation classes. Bands 104 to 108, 150 to 163, and 220, which cannot reflect by water, were removed, and the remaining 200 bands retain.

  • Pavia University. The PU image by an airborne reflectance optical spectroscopic imager of Pavia, Italy, was then selected for HSI classification studies after annotating images with dimensions 610×340 in size. The spatial resolution is 1.3 m and contains nine classes of land features. Twelve bands were affected by noise removal, leaving 103 bands.

  • Houston. The HOU data acquired by the ITRES CASI-1500 sensor. Initially used in the IEEE GRSS Data Fusion Competition 2013, it was provided by the Hyperspectral Image Analysis Group and the NSF-funded Center for Airborne Laser Mapping (NCALM) at the University of Houston, USA, with a size of 349×1905 and containing 15 feature classes. Contains 144 spectral bands in the range of 364 to 1046 nm.

4.2.

Evaluation Indicators

In this paper, three classification evaluation metrics, overall accuracy (OA), average accuracy (AA), and kappa coefficient (Kappa), are used as evaluation metrics to validate the experimental performance of SAS-BERT. OA indicates the percentage of correctly classified pixels to the total number of pixels; AA indicates the average percentage of the sum of the ratio of the number of correctly classified pixels in each category to the overall number of pixels in that category. The Kappa coefficient indicates the percentage of good or bad classification of the image as a whole. Higher OA, AA, and Kappa values indicate better classification results, whereas AA measures the category’s excellent or lousy classification results.

4.3.

Parameter Adjustment

In order to improve the efficiency of the SAS-BERT network, this paper sets the neighbourhood size to 11×11, the learning rate to 3×104, the batch size to 16, the training times to 200, the dropout rate to 0.3, and samples 5%, 1% and 2% for IN, PU, and HOU, respectively, for training. This paper experiments with several factors affecting SAS-BERT’s representation of HSIs.

4.3.1.

TEn layers

Evaluating the number of TEn layers at different depths in BERT: the effects of TEn layers on HSIs classification were evaluated. The effect of TEn layer depth on classification accuracy verifies for the BERT network for the three datasets above. Letting the number of attention heads h=10, the classification results for different TEn layer depths on the IN, PU, and HOU datasets show in Fig. 9.

Fig. 9

Line graphs of classification accuracy for different TEn layer depths on the (a) IN, (b) PU, and (c) HOU datasets.

OE_62_10_103103_f009.png

As shown in Fig. 9, the highest evaluation results for HSI classification accuracy were obtained on the IN and HOU datasets when the TEn layer depth was 5. When the TEn layer depth is 2, the highest evaluation result is received on the PU dataset. Therefore, the TEn layer depths selected in this paper are 5, 2, and 5 for the IN, PU, and HOU datasets, respectively.

4.3.2.

Number of self-attended heads

Evaluation of the number of heads for the MHSA mechanism: different numbers in the MHSA mechanism in the BERT model significantly impact classification performance, and this paper evaluates the effect of other numbers of attention heads on the classification results. Let the TEn layer depths d of the IN, PU, and HOU datasets be 5, 2, and 5, respectively. The classification accuracies of different numbers of attention heads on the IN, PU, and HOU datasets show in Fig. 10.

Fig. 10

Line graphs of classification accuracy for different numbers of attention heads on the (a) IN, (b) PU, and (c) HOU datasets.

OE_62_10_103103_f010.png

As shown in Fig. 10, different attention heads are set for the proposed model. The highest evaluation results were obtained when the number of attention heads was 10 on the IN and PU datasets. When the number of attention heads is 4, the highest evaluation result is received on the HOU dataset. Therefore, the number of attention heads selected in this paper for the IN, PU, and HOU datasets are 10, 10, and 4, respectively.

4.4.

Experiments Related to the Spatial Augmentation Learning Module and the Spectral Correlation Module

4.4.1.

Affine transformation

Evaluation of various affine transformations of the spatial augmentation learning module: since the affine transformation of the spatial augmentation module includes translation, rotation, and scaling, the results of each shift may significantly impact the experiments. In this paper, several experiments of mapping, translation, scaling, and process of the image elements were evaluated, such that the TEn layer depth d was 5, 2, and 5 for the IN, PU, and HOU datasets, respectively; the number of attention heads selected for the IN, PU, and HOU datasets were 10, 10, and 4, respectively. The classification accuracies of the different transformations on the IN, PU, and HOU datasets are shown in Table 1. Let the classification accuracy be as CA and the method in this paper as SAL.

Table 1

Classification accuracy of different affine transformations on the IN, PU, and HOU datasets.

DatasetsCASAL (dm)SAL (tra)SAL (dou)SAL (hal)SAL (30 deg)SAL (45 deg)SAL (60 deg)
INOA (%)95.08 ± 1.1093.63 ± 0.9794.83 ± 0.8792.82 ± 1.0194.93 ± 0.8295.01 ± 1.0095.13 ± 0.70
AA (%)87.73 ± 1.1189.92 ± 2.4689.30 ± 3.0187.27 ± 1.2287.18 ± 2.3588.04 ± 2.2387.35 ± 1.91
k×10094.38 ± 1.2692.73 ± 1.1094.11 ± 0.9991.80 ± 1.1694.22 ± 0.9394.31 ± 1.1494.45 ± 0.79
PUOA (%)97.12 ± 0.4196.85 ± 0.5697.19 ± 0.5096.21 ± 0.6096.91 ± 0.8896.86 ± 0.7397.18 ± 0.42
AA (%)94.54 ± 1.3094.60 ± 1.1695.36 ± 0.8893.51 ± 1.0694.67 ± 1.1294.22 ± 1.5095.07 ± 0.68
k×10096.17 ± 0.5595.81 ± 0.7496.27 ± 0.6694.96 ± 0.7995.89 ± 1.1795.83 ± 0.9896.25 ± 0.56
HOUOA (%)90.91 ± 0.9388.23 ± 1.3291.09 ± 0.9191.12 ± 1.5790.34 ± 0.9890.95 ± 0.5790.41 ± 1.04
AA (%)89.75 ± 1.2788.15 ± 2.0290.78 ± 1.0791.20 ± 1.6289.54 ± 1.1290.77 ± 0.6989.71 ± 1.69
k×10090.16 ± 1.0087.27 ± 1.4390.37 ± 0.9990.40 ± 1.7089.55 ± 1.0690.21 ± 0.6289.63 ± 1.13
Note: SAL(dou) has the best experimental results in Indian and Pavia datasets, which are marked in bold. SAL(hal) has the best experimental results in the Houston dataset and are highlighted in bold.

The affine transformations from left to right in Table 1 are direct mapping (dm), translation (tra), doubling (dou), halving (hal), rotation by 30 deg, rotation by 45 deg, and rotation by 60 deg, with rotation being a clockwise rotation by an angle around the origin. The IN dataset gives higher results for the transformations doubling the image element and rotating it by 60 deg, with the AA for the 60 deg rotation being almost 2% less accurate than the experimental results for doubling the image element and the overall higher classification accuracy for doubling the image element when considered together. The Pavia dataset sees the results of the doubled transformation outperforming the other transformations. The HOU dataset shows better classification results when the image elements reduce by half than the other transformations.

The classification accuracy is better than the original BERT module for most affine transformations when only the spatial augmentation module adds. The OA and Kappa for the IN dataset are smaller than the original BERT module for translation and halving. Still, the classification results for AA are higher than the direct use of the BERT module. The results for the PU dataset were all better than the original BERT module classification accuracy, or the classification accuracy was not significantly different. The classification results of the HOU dataset and PU dataset are basically the same, and the classification accuracy of the HOU dataset is slightly worse than that of the original BERT module in translation transformation.

The effect of using the spatial augmentation and spectral correlation modules together needs to be clarified, combined with affine transformations separately with the spectral correlation module in this paper, as explained later.

4.4.2.

Spectral correlation modules

Experiments on spectral correlation modules: the results of the experiments with the addition of a separate spectral correlation module are shown in Table 2.

Table 2

Classification accuracy of the spectral correlation module on the IN, PU, and HOU datasets.

DatasetsCAHSI-BERTspe-BERT
INOA (%)93.36 ± 0.8894.95 ± 0.89
AA (%)85.01 ± 1.9090.47 ± 2.25
k×10092.42 ± 1.0294.24 ± 1.02
PUOA (%)96.38 ± 0.5897.41 ± 0.54
AA (%)93.41 ± 0.9095.24 ± 1.71
k×10095.19 ± 0.7796.56 ± 0.72
HOUOA (%)89.59 ± 2.1091.12 ± 0.69
AA (%)87.55 ± 3.1791.06 ± 0.83
k×10088.74 ± 2.2890.40 ± 0.75
Note: Bold values represent the optimal results under the same evaluation index.

As shown in Table 2, including the spectral correlation module resulted in a corresponding improvement in the classification accuracy of the IN, PU, and HOU datasets compared to when the spectral module was not added, with a significant increase in the AA accuracy of the IN dataset in particular. Therefore, the classification effect of the spectral correlation module is noticeable.

4.4.3.

Experiments combining spatial augmentation and spectral correlation modules

Experiments on spatial augmentation and spectral correlation modules were used together, and the results are shown in Table 3.

Table 3

Classification accuracy on the IN, PU, and HOU datasets using the combined spatial augmentation and spectral correlation modules.

DatasetsINPUHOU
OA (%)AA (%)k×100OA (%)AA (%)k×100OA (%)AA (%)k×100
SAL (dm) + spe95.35 ± 0.5990.56 ± 2.1394.70 ± 0.6897.24 ± 0.2895.49 ± 0.4796.33 ± 0.3790.80 ± 0.9690.50 ± 1.1290.06 ± 1.04
SAL (tra) + spe94.26 ± 0.6590.39 ± 2.4693.45 ± 0.7497.08 ± 0.6294.78 ± 1.2596.12 ± 0.8389.29 ± 1.0689.53 ± 0.7188.41 ± 1.14
SAL (dou) + spe95.15 ± 0.3790.55 ± 2.2794.47 ± 0.4297.22 ± 0.2994.84 ± 0.9396.30 ± 0.3890.83 ± 0.9690.86 ± 1.0090.09 ± 1.04
SAL (hal) + spe93.99 ± 1.3989.37 ± 1.7393.15 ± 1.5796.62 ± 0.5194.73 ± 0.6795.50 ± 0.6890.58 ± 0.9790.72 ± 1.0289.81 ± 1.05
SAL (30 deg) + spe95.08 ± 0.5590.36 ± 2.0294.39 ± 0.6396.92 ± 0.5094.95 ± 0.8295.92 ± 0.6690.95 ± 0.6591.22 ± 0.6490.21 ± 0.71
SAL (45 deg) + spe95.47 ± 0.9191.49 ± 2.1594.84 ± 1.0497.20 ± 0.6494.71 ± 1.4896.28 ± 0.8590.80 ± 1.1290.83 ± 1.1590.05 ± 1.21
SAL (60 deg) + spe95.64 ± 0.8591.38 ± 1.9595.03 ± 0.9797.53 ± 0.4495.87 ± 0.9396.71 ± 0.5891.20 ± 0.8591.21 ± 0.7590.48 ± 0.92
spe-BERT94.95 ± 0.8990.47 ± 2.2594.24 ± 1.0297.41 ± 0.5495.24 ± 1.7196.56 ± 0.7291.12 ± 0.6991.06 ± 0.8390.40 ± 0.75
SAL94.83 ± 0.8789.30 ± 3.0194.11 ± 0.9997.19 ± 0.5095.36 ± 0.8896.27 ± 0.6691.12 ± 1.5791.20 ± 1.6290.40 ± 1.70
HSI-BERT93.36 ± 0.8885.01 ± 1.9092.42 ± 1.0296.38 ± 0.5893.41 ± 0.9095.19 ± 0.7789.59 ± 2.1087.55 ± 3.1788.74 ± 2.28
Note: Bold values represent the optimal results under the same evaluation index.

As shown in Table 3, comparing the results of the experiments, we found that the classification accuracy of the IN, PU, and HOU datasets was the best when combined with the spectral correlation module at a rotation of 60 deg. The classification accuracy was also the best when comparing the spatial augmentation and the spectral correlation module separately. The two modules of the HOU dataset were the same as the separate experiments, so the effect of using the spatial augmentation and spectral correlation module separately on the HOU dataset was not significant.

4.5.

Comparison with Various Algorithms

The SAS-BERT algorithm proposed in the paper is experimentally compared with other deep-learning methods. There are two types of deep-learning methods for comparison, CNN-based algorithms: CNN,19 SSRN,36 HybridSN,37 and LS2CM,38 and transformer-based methods: CTN,24 SSTN,26 and HSI-BERT.34

In these experiments, the SAS-BERT input hyperspectral cube has a spatial size of 11×11. The Adam optimizer optimizes 200 epochs. The learning rate set to 3×104, the batch size to 16, and the dropout rate to 0.3. Various other settings for the comparison algorithms follow the original paper. The classification accuracy for the comparison experiments is shown in Table 4.

Table 4

Comparison of classification accuracy of algorithms on IN, PU, and HOU datasets.

MethodsINPUHOU
OA (%)AA (%)k×100OA (%)AA (%)k×100OA (%)AA (%)k×100
CNN85.24 ± 1.1487.54 ± 1.5383.16 ± 1.3178.63 ± 2.3778.53 ± 1.6370.72 ± 3.9174.23 ± 2.9179.90 ± 1.4672.13 ± 3.15
SSRN84.64 ± 9.8880.16 ± 4.9882.67 ± 10.8194.89 ± 3.3391.38 ± 6.4193.23 ± 4.4184.89 ± 4.9486.52 ± 4.8083.68 ± 5.32
HybridSN86.80 ± 0.6790.73 ± 0.5684.89 ± 0.7594.76 ± 1.5693.15 ± 2.0793.01 ± 2.1188.78 ± 0.9089.80 ± 0.6887.86 ± 0.97
LS2CM92.21 ± 2.0191.31 ± 4.3491.12 ± 2.3096.04 ± 2.1394.61 ± 2.4794.76 ± 2.8086.87 ± 4.0989.50 ± 2.8485.80 ± 4.42
CTN93.57 ± 1.0988.07 ± 3.4492.66 ± 1.2495.08 ± 0.6393.36 ± 0.8793.47 ± 0.8380.36 ± 2.3381.66 ± 2.3078.73 ± 2.53
SSTN94.54 ± 0.6077.53 ± 1.3793.77 ± 0.6996.37 ± 0.8491.07 ± 2.7995.17 ± 1.1385.45 ± 1.1782.96 ± 1.0684.24 ± 1.27
HSI-BERT93.36 ± 0.8885.01 ± 1.9092.42 ± 1.0296.38 ± 0.5893.41 ± 0.9095.19 ± 0.7789.59 ± 2.1087.55 ± 3.1788.74 ± 2.28
SAS-BERT95.49 ± 0.5191.14 ± 1.0594.86 ± 0.5897.41 ± 0.3295.63 ± 0.9096.57 ± 0.4391.23 ± 0.6991.48 ± 0.5490.52 ± 0.75
Note: Bold values represent the optimal results under the same evaluation index.

Table 4 shows the classification results of the various classification methods. It can be analyzed that SAS-BERT outperforms the compared CNN and transformer methods on the IN, PU, and HOU datasets, significantly improving the HSI-BERT method in particular. Overall, the SAS-BERT classification method performs better than the current state-of-the-art methods.

Figures 11Fig. 1213 show the classification mapping of the comparison experiments on the Indian, Pavia, and Houston datasets. From these figures, the SAS-BERT algorithm shows more accurate classification mapping results with smoother and clearer edges, and SAS-BERT has better classification results.

Fig. 11

Classification maps on the Indian dataset: (a) a reference map of the real ground category on the Indian dataset; a classification map of (b) CNN; (c) SSRN; (d) HybridSN; (e) LS2CM; (f) CTN; (g) SSTN; (h) HSI-BERT; (i) SAS-BERT; and (j) each ground cover category marker colour for the Indian dataset.

OE_62_10_103103_f011.png

Fig. 12

Classification maps on the Pavia dataset: (a) a reference map of the real ground category on the Pavia dataset; a classification map of (b) CNN; (c) SSRN; (d) HybridSN; (e) LS2CM; (f) CTN; (g) SSTN; (h) HSI-BERT; (i) SAS- BERT; and (j) each ground cover category marker colour for the Pavia dataset.

OE_62_10_103103_f012.png

Fig. 13

Classification maps on the Houston dataset: (a) a reference map of the real ground category on the Houston dataset; a classification map of (b) CNN; (c) SSRN; (d) HybridSN; (e) LS2CM; (f) CTN; (g) SSTN; (h) HSI-BERT; (i) SAS-BERT; and (j) each ground cover category marker colour for the Houston dataset.

OE_62_10_103103_f013.png

5.

Conclusion

This paper proposes a SAS-BERT method for HSI classification based on the BERT model for feature extraction of spatial and spectral information. The method improves the performance of BERT model-based classification by aggregating augmented spatial features and spectral features to represent HSI features. It improves the representative characteristics of the local spatial information using a spatial augmentation module. The module transforms the input image so that distinct representational features characterise this input. It allows for better performance in classification tasks, which helps to minimize the overall cost of the network during training. In addition, it uses the spectral properties of HSIs so that they fully reflect the internal physical structure of matter. And correlation with spatial location is established, which significantly improves the interpretation of HSIs through feature description. Results from experiments on three widely used datasets show that the SAS-BERT model outperforms the current state-of-the-art CNN and transformer network classification models.

Code, Data, and Materials Availability

The data presented in this paper are publicly available in Ref. 39. The archived version of the code described in this manuscript can be freely accessed through Github: https://github.com/zyy1234aiyou/SAS-BERT.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Project No. 62201438), the Ningxia Key R&D Program of China (Project No. 2021BEG03030), and the Image and Intelligence Information Processing Innovation Team of the National Ethnic Affairs Commission of China.

References

1. 

M. Imani and H. Ghassemian, “An overview on spectral and spatial information fusion for hyperspectral image classification: current trends and challenges,” Inf. Fusion, 59 59 –83 https://doi.org/10.1016/j.inffus.2020.01.007 (2020). Google Scholar

2. 

X. Xu et al., “Multisource remote sensing data classification based on convolutional neural network,” IEEE Trans. Geosci. Remote Sens., 56 (2), 937 –949 https://doi.org/10.1109/TGRS.2017.2756851 IGRSD2 0196-2892 (2017). Google Scholar

3. 

M. Zhang, W. Li and Q. Du, “Diverse region-based CNN for hyperspectral image classification,” IEEE Trans. Image Process, 27 (6), 2623 –2634 https://doi.org/10.1109/TIP.2018.2809606 IIPRE4 1057-7149 (2018). Google Scholar

4. 

M. Ahmad et al., “Hyperspectral image classification-traditional to deep models: a survey for future prospects,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 15 968 –999 https://doi.org/10.1109/JSTARS.2021.3133021 (2022). Google Scholar

5. 

B. Tu et al., “Spectral–spatial hyperspectral classification via structural-kernel collaborative representation,” IEEE Geosci. Remote Sens. Lett., 18 (5), 861 –865 https://doi.org/10.1109/LGRS.2020.2988124 (2021). Google Scholar

6. 

Z. Xia et al., “Crop classification based on feature band set construction and object-oriented approach using hyperspectral images,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 9 (9), 4117 –4128 https://doi.org/10.1109/JSTARS.2016.2577339 (2016). Google Scholar

7. 

K. R. Manjunath, S. S. Ray and D. Vyas, “Identification of indices for accurate estimation of anthocyanin and carotenoids in different species of flowers using hyperspectral data,” Remote Sens. Lett., 7 (10), 1004 –1013 https://doi.org/10.1080/2150704X.2016.1210836 (2016). Google Scholar

8. 

E. M. Paoletti et al., “Capsule networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., 57 (4), 2145 –2160 https://doi.org/10.1109/TGRS.2018.2871782 IGRSD2 0196-2892 (2019). Google Scholar

9. 

L. Ni, H. Xu and X. Zhou, “Mineral identification and mapping by synthesis of hyperspectral VNIR/SWIR and multispectral TIR remotely sensed data with different classifiers,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 13 3155 –3163 https://doi.org/10.1109/JSTARS.2020.2999057 (2020). Google Scholar

10. 

J. Zhou et al., “A novel cluster kernel RX algorithm for anomaly and change detection using hyperspectral images,” IEEE Trans. Geosci. Remote Sens., 54 (11), 6497 –6504 https://doi.org/10.1109/TGRS.2016.2585495 IGRSD2 0196-2892 (2016). Google Scholar

11. 

C. Shang et al., “Spectral-spatial generative adversarial network for super-resolution land cover mapping with multispectral remotely sensed imagery,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 16 522 –537 https://doi.org/10.1109/JSTARS.2022.3228741 (2023). Google Scholar

12. 

H. Liang et al., “Spectral–spatial attention feature extraction for hyperspectral image classification based on generative adversarial network,” IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 14 10017 –10032 https://doi.org/10.1109/JSTARS.2021.3115971 (2021). Google Scholar

13. 

R. Sunkara, A. K. Singh and G. R. Kadambi, “Class information-based principal component analysis algorithm for improved hyperspectral image classification,” in Int. Conf. Mach. Intell. for GeoAnalytics and Remote Sens. (MIGARS), 1 –4 (2023). https://doi.org/10.1109/MIGARS57353.2023.10064597 Google Scholar

14. 

G. Liu et al., “Hyperspectral image classification based on fuzzy nonparallel support vector machine,” in Global Conf. Rob., Artif. Intell. and Inf. Technol. (GCRAIT), 242 –246 (2022). https://doi.org/10.1109/GCRAIT55928.2022.00058 Google Scholar

15. 

J. Xia et al., “Random forest ensembles and extended multiextinction profiles for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., 56 (1), 202 –216 https://doi.org/10.1109/TGRS.2017.2744662 IGRSD2 0196-2892 (2017). Google Scholar

16. 

J. Li, J. M. Bioucas-Dias and A. Plaza, “Spectral-spatial hyperspectral image segmentation using subspace multinomial logistic regression and Markov random fields,” IEEE Trans. Geosci. Remote Sens., 50 (3), 809 –823 https://doi.org/10.1109/TGRS.2011.2162649 IGRSD2 0196-2892 (2012). Google Scholar

17. 

X. Kang, S. Li and J. A. Benediktsson, “Spectral–spatial hyperspectral image classification with edge-preserving filtering,” IEEE Trans. Geosci. Remote Sens., 52 (5), 2666 –2677 https://doi.org/10.1109/TGRS.2013.2264508 IGRSD2 0196-2892 (2013). Google Scholar

18. 

A. Mughees and L. Tao, “Hyper-voxel based deep learning for hyperspectral image classification,” in IEEE Int. Conf. on Image Process. (ICIP), 840 –844 (2017). https://doi.org/10.1109/ICIP.2017.8296399 Google Scholar

19. 

Y. Chen et al., “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., 54 (10), 6232 –6251 https://doi.org/10.1109/TGRS.2016.2584107 IGRSD2 0196-2892 (2016). Google Scholar

20. 

J. Li et al., “Classification of hyperspectral imagery using a new fully convolutional neural network,” IEEE Geosci. Remote Sens. Lett., 15 (2), 292 –296 https://doi.org/10.1109/LGRS.2017.2786272 (2018). Google Scholar

21. 

Z. Zhong et al., “Deep residual networks for hyperspectral image classification,” in IEEE Int. Geosci. and Remote Sens. Symp. (IGARSS), 1824 –1827 (2017). https://doi.org/10.1109/IGARSS.2017.8127330 Google Scholar

22. 

M. Kanthi, T. H. Sarma and C. S. Bindu, “A 3D-deep CNN based feature extraction and hyperspectral image classification,” in IEEE India Geosci. and Remote Sens. Symp. (InGARSS), 229 –232 (2020). https://doi.org/10.1109/InGARSS48198.2020.9358920 Google Scholar

23. 

X. He and Y. Chen, “Optimized input for CNN-based hyperspectral image classification using spatial transformer network,” IEEE Geosci. Remote Sens. Lett., 16 (12), 1884 –1888 https://doi.org/10.1109/LGRS.2019.2911322 (2019). Google Scholar

24. 

H. W. Z. Zhao, D. Hu and X. Yu, “Convolutional transformer network for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., 19 1 –5 https://doi.org/10.1109/LGRS.2022.3169815 (2022). Google Scholar

25. 

D. Hong et al., “Spectralformer: rethinking hyperspectral image classification with transformers,” IEEE Trans. Geosci. Remote Sens., 60 1 –15 https://doi.org/10.1109/TGRS.2022.3172371 IGRSD2 0196-2892 (2021). Google Scholar

26. 

Z. Zhong et al., “Spectral–spatial transformer network for hyperspectral image classification: a factorized architecture search framework,” IEEE Trans. Geosci. Remote Sens., 60 1 –15 https://doi.org/10.1109/TGRS.2022.3225267 IGRSD2 0196-2892 (2021). Google Scholar

27. 

J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” (2018). Google Scholar

28. 

Y. Liu et al., “RoBERTa: a robustly optimized BERT pretraining approach,” (2019). Google Scholar

29. 

Z. Lan et al., “ALBERT: a lite BERT for self-supervised learning of language representations,” (2019). Google Scholar

30. 

Y. Cui et al., “Revisiting pre-trained models for Chinese natural language processing,” (2020). Google Scholar

31. 

K. Song et al., “MASS: masked sequence to sequence pre-training for language generation,” (2019). Google Scholar

32. 

L. Dong et al., “Unified language model pre-training for natural language understanding and generation,” in Adv. in Neural Inf. Process. Syst., (2019). Google Scholar

33. 

M. Joshi et al., “SpanBERT: improving pre-training by representing and predicting spans,” Trans. Assoc. Comput. Linguist., 8 64 –77 https://doi.org/10.1162/tacl_a_00300 (2020). Google Scholar

34. 

H. Ji et al., “HSI-BERT: hyperspectral image classification using the bidirectional encoder representation from transformers,” IEEE Trans. Geosci. Remote Sens., 58 (1), 165 –178 https://doi.org/10.1109/TGRS.2019.2934760 IGRSD2 0196-2892 (2019). Google Scholar

35. 

M. Gong et al., “A spectral and spatial attention network for change detection in hyperspectral images,” IEEE Geosci. Remote Sens., 60 1 –14 (2022). Google Scholar

36. 

Z. Zhong et al., “Spectral-spatial residual network for hyperspectral image classification: a 3D deep learning framework,” IEEE Trans. Geosci. Remote Sens., 56 (2), 847 –858 https://doi.org/10.1109/TGRS.2017.2755542 IGRSD2 0196-2892 (2017). Google Scholar

37. 

S. K. Roy et al., “HybridSN: exploring 3D–2D CNN feature hierarchy for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., 17 (2), 277 –281 https://doi.org/10.1109/LGRS.2019.2918719 (2019). Google Scholar

38. 

Z. Meng et al., “A lightweight spectral-spatial convolution module for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., 19 1 –5 https://doi.org/10.1109/LGRS.2021.3069202 (2021). Google Scholar

39. 

M. Graña, M. A. Veganzons and B. Ayerdi, “Hyperspectral Remote Sensing Scenes,” http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (2021). Google Scholar

Biography

Yuanyuan Zhang is currently pursuing her MEng degree from the School of Computer Science and Engineering, North Minzu University, Yinchuan, China. She received her BS degree in software engineering from North Minzu University, Yinchuan, China, in 2020. Her current research interests include hyperspectral image classification, image processing, artificial intelligence, and deep learning.

Wenxing Bao received his BEng degree in industrial automation from Xidian University, Xi’an, China, in 1993, and his MSc degree in electrical engineering and his PhD in electronic science and technology from Xi’an Jiaotong University, Xi’an, China, in 2001 and 2006, respectively. He is currently a professor and a vice president of North Minzu University, Yinchuan, China. His research interests include digital image processing, remote sensing image classification, and fusing.

Hongbo Liang received his BS degree in computer science and technology from North Minzu University, Yinchuan, China, in 2018 and his MEng degree from the School of Computer Science and Engineering, North Minzu University, Yinchuan, China, in 2021. He is currently pursuing his PhD in communication engineering from Hefei University of Technology, Hefei, China. His research interests include hyperspectral image processing, SAR image processing, remote sensing image classification, computer vision, and deep learning.

Yanbo Sun is currently pursuing his MEng degree at the School of Computer Science and Engineering, North Minzu University, Yinchuan, China. He received his BS degree in software engineering from Nanyang Normal University, Nanyang, China, in 2022. His current research interests include hyperspectral image change detection, hyperspectral image classification, and deep learning.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Yuanyuan Zhang, Wenxing Bao, Hongbo Liang, and Yanbo Sun "Spatially augmented guided sequence-based bidirectional encoder representation from transformer networks for hyperspectral classification studies," Optical Engineering 62(10), 103103 (31 October 2023). https://doi.org/10.1117/1.OE.62.10.103103
Received: 2 June 2023; Accepted: 11 October 2023; Published: 31 October 2023
Advertisement
Advertisement
KEYWORDS
Transformers

Head

Image classification

Optical engineering

Feature extraction

Data modeling

Education and training

Back to Top