KEYWORDS: Data modeling, Autoregressive models, Statistical analysis, Statistical modeling, Fourier transforms, Data analysis, 3D modeling, Mathematical modeling, Mathematics, Data conversion
This paper establishes a statistical framework of forest coverage models for spatio-temporal data. The forest coverage ratio of grid-cell data is modeled by taking human population density and relief energy as explanatory variables. The likelihood of the forest ratios is decomposed by the product of two likelihoods. The first likelihood discussed by Nishii and Tanaka (2010) is due to trinomial logistic distributions on three categories: the ratios take zero, one, or values between zero and one. We consider a precise modeling to the second likelihood for partlydeforested ratios by considering a) spline functions to the additive mean structure, b) wide spatial dependency of normal error terms, and c) an extended logistic type transform to the forest ratio. For spatio-temporal data, we implement auto-regressive terms based on the ratios observed in past. The proposed model was applied to real grid-cell data and resulted significant improvement compared to our previous model.
Tanaka and Nishii (2005) figured out that deforestation can be elucidated quantitatively by nonlinear logit regression models in four East Asian test fields: forest areal rate F as a target variable, and human population size (N) and relief energy (R: difference of minimum altitude from the maximum in a sampled area) as explanatory variables, whose functional forms had been suggested by step functions fitted to one-kilometer square high precision grid-cell data firstly in Japan (n=6825): log(F/(1 - F)) = β0 + g(N) + h(R) + error, where g(N) and h(R) are regression functions of explanatory variables N and R, respectively. Likelihood functions with spatial dependency were derived, and several deforestation models were selected for the application to four regions in East Asia by calculating relative appropriateness to data. For the measure of appropriateness, Akaike's Information Criterion (AIC) was used. To formulate East-Asian dataset, landcover dataset estimated from NOAA observations available at UNEP, Tsukuba for F, gridded population of the world of CIESIN, US for N, and GTOPO30 of USGS for R, were used. The resolutions were matched by taking their common multiple of 20 minute square. Tanaka and Nishii (ibid.) omitted the data with F = 0.0 and F = 1.0 to employ the logit models. Unfortunately the reduction of the data size for regression led to instability of parameter estimation. As for the test field in Harbin, China, n = 76 for 0.0 < F < 1.0, but n = 504 for 0.0 less than or equal to F less than or equal to 1.0. In this study, we therefore compare the models based on all data, especially with F = 1.0, by the following extended logit transformation with two additional positive parameters of κ and λ: log((F + κ)/(1 - F + λ)) = β0 + g(N) + h(R) + error. Obvious improvements in terms of relative appropriateness to data are observed in extended logit models.
We propose a contextual unsupervised classification method of geostatistical data based on combination of Ward clustering method and Markov random fields (MRF). Image is clustered into classes by using not only spectrum of pixels but also spatial information. For the classification of remote sensing data of low spatial resolution, the treatment of mixed pixel is importance. From the knowledge that the most of mixed pixels locate in boundaries of land-covers, we first detect edge pixels and remove them from the image. We here introduce a new measure of spatial adjacency of the classes. Spatial adjacency is used to MRF-based update of the classes. Clustering of edge pixels are processed as final step. It is shown that the proposed method gives higher accuracy than conventional clustering method does.
Consider a confusion matrix obtained by a classifier of land-cover categories. Usually, misclassification rates are not uniformly distributed in off-diagonal elements of the matrix. Some categories are easily classified from the others, and some are not. The loss function used by AdaBoost ignores the difference. If we derive a classifier which is efficient to classify categories close to the remaining categories, the overall accuracy may be improved. In this paper, the exponential loss function with different costs for
misclassification is proposed in multiclass problems. Costs due to misclassification should be pre-assigned. Then, we obtain an emprical cost risk function to be minimized, and the minimizing procedure is established (Cost AdaBoost). Similar treatments for logit loss functions are discussed. Also, Spatial Cost AdaBoost is proposed. Out purpose is originally to minimize the expected cost. If we can define costs appropriately, the costs are useful for reducing error rates. A simple numerical example shows that the proposed method is useful for reducing error rates.
Spatial AdaBoost, proposed by Nishii and Eguchi (2005), is a machine learning technique for contextual supervised image classification of land-cover categories of geostatistical data. The method classifies a pixel through a convex combination of a log posterior probability at the current pixel and averages of log posteriors in various neighborhoods of the pixel. Weights for the log posteriors are tuned by minimizing the empirical risk based on the exponential loss function. It is known that the method classifies test data very fast and shows a similar performance to the Markov-random-field-based classifier in many cases. However, it is also known that the classifier gives a poor result for some data when the exponential loss puts too big penalty for misclassified data. In this paper, we consider a robust Spatial boosting method by taking a robust loss function instead of the exponential loss. For example, the logit loss function gives a linear penalty for misclassified data approximately, and is robust. The Spatial boosting methods are applied to artificial multispectral images and benchmark data sets. It is shown that Spatial LogitBoost based on the logit loss can classify the benchmark data very well even though Spatial AdaBoost based on the exponential loss failed to classify the data.
Deforestation is a result of complex causality chains in most cases. But identification of limited number of factors shall provide comprehensive general understanding of the vital phenomenon at a broad scale, as well as projection for the future. Only two factors -- human population size (N) and relief energy (R: difference of minimum altitude from the maximum in a sampled area) -- were found to give sufficient elucidation of deforestation by nonlinear logit regression models, whose functional forms were suggested by step functions fitted to one-kilometer square high precision grid-cell data in Japan (n=6825). Likelihood with spatial dependency was derived, and several deforestation models were selected for the application to East Asia by calculating relative appropriateness to data. For the measure of appropriateness, Akaike's Information Criterion (AIC) was used. Logit model is employed so as to avoid anomaly in asymptotic lower and upper bounds. Therefore the forest areal rate, 0 < F < 1. To formulate East-Asian dataset, landcover dataset estimated from NOAA observations available at UNEP, Tsukuba for F, gridded population of the world of CIESIN, US for N, and GTOPO30 of USGS for R, were used. The resolutions were matched by taking their common multiple of 20 minutes square. It was suggested that data of full forest coverage, F=1.0, which were not dealt in calculations due to logit transformation this time, should give important role in stabilizing parameter estimations.
A multistep method for segmentation of feature space using triplet decision tree is developed, and another approach to cope with uncertain samples by extended Bayesian discriminant function is introduced. The latter has the lower limit for posterior probability of classification. The triplet-decision tree includes a division-wait mechanism that postpone the decision about uncertain samples which are in marginal area and not able to be classified to any categories definitely. The third node is generated for such samples. Improvement of the triplet tree method is made by introducing linearly-combined variables related to principal components. Flexible and effective segmentation is accomplished by this refinement. Results of experiments by simulation data and real remotely-sensed data are compared by the two methods in the viewpoint of cutting of feature space and classification accuracy. When the normality or representability of sample is hold, classifier with extended quadratic discrimination function has the best performance. The advantage of triplet tree appears when categories are diversified in nature or training samples have poor representabilities.
Several measures assessing the accuracy of land-cover classification are available, e.g., overall and class- averaged accuracies. Also kappa statistic is widely used for this purpose. In this article, we discuss the properties of these criteria, and point out that the kappa statistic has an unfavorable feature. We propose alternative coefficients based on Kullback-Leibler information. Further, significance tests for the difference between the coefficients derived by classification results are established.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.