PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 8183, including the Title Page, Copyright
information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
HPC for Remote Sensing and Astronomical Data Processing
Fuzzy clustering is one of the most frequently used methods for identifying homogeneous regions in remote sensing
images. In the case of large images, the computational costs of fuzzy clustering can be prohibitive unless
high performance computing is used. Therefore, efficient parallel implementations are highly desirable. This
paper presents results on the efficiency of a parallelization strategy for the Fuzzy c-Means (FCM) algorithm. In
addition, the parallelization strategy has been extended in the case of two FCM variants, which incorporates
spatial information (Spatial FCM and Gaussian Kernel-based FCM with spatial bias correction). The high-level
requirements that guided the formulation of the proposed parallel implementations are: (i) find appropriate
partitioning of large images in order to ensure a balanced load of processors; (ii) use as much as possible the
collective computations; (iii) reduce the cost of communications between processors. The parallel implementations
were tested through several test cases including multispectral images and images having a large number
of pixels. The experiments were conducted on both a computational cluster and a BlueGene/P supercomputer
with up to 1024 processors. Generally, good scalability was obtained both with respect to the number of clusters
and the number of spectral bands.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conceptualization and employment of efficient 3D processor arrays (3D-PAs) accelerator units in aggregation with
the HW/SW co-design technique is developed in this study in a FPGA platform, for the real-time
enhancement/reconstruction of large-scale remote sensing (RS) imaging for Geospatial applications. The addressed
architecture implements the previously proposed robust fused Bayesian-regularization (RFBR) enhanced radar imaging
method for the solution of ill-conditioned inverse spatial spectrum pattern (SSP) estimation problems. Finally, we show
how the proposed 3D-PAs accelerators drastically reduce the computational load of the real-world Geospatial imagery
tasks suitable for the real-time implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The extended Kalman filter is one of the most widely used techniques for state estimation of nonlinear systems. In its
two steps of forecast and data assimilation, many matrix operations including multiplication and inversion are involved.
As recent graphic processor units (GPU) have shown to provide much speedup in matrix operations, we will explore in
this work a GPU-based implementation of the extended Kalman filter. The Compute Unified Device Architecture
(CUDA) on the Nvidia GeForce GTX 590 GPU hardware will be used for comparison with a single threaded CPU
counterpart. Experiments were conducted on typical large-scale over-determined systems with thousands of components
in states and measurements. Within the GPU memory limit, a speedup of 1386x is achieved for a system with
measurements having 5000 components and states having 3750 components. The speedup profile for various
combinations of measurement and state sizes will serve as good reference for future implementation of extended Kalman
filter on real large-scale applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Future space missions are based on a new generation of instruments that often generate vast amounts of data.
Transferring this data to ground, and once there, between different computing facilities is not an easy task
whatsoever. A clear example of these missions is Gaia, a space astrometry mission of ESA. To carry out the
data reduction tasks on ground, an international consortium has been set up. Among its tasks perhaps the most
demanding one is the Intermediate Data Updating, which will have to repeatedly re-process nearly 100 TB of
raw data received from the satellite using the latest instrument calibrations available. On the other hand, one
of the best data compression solutions is the Prediction Error Coder, a highly optimized entropy coder that
performs very well with data following realitic statistics. Regarding file formats, HDF5 provides a completely
indexed, easily customizable file with a quick and parallel access. Moreover, HDF5 has a friendly presentation
format and multi-platform compatibility. Thus, it is a powerful environment to store data compressed using
the above mentioned coder. Here we show the integration of both systems for the storage of Gaia raw data.
However, this integration can be applied to the efficient storage of any kind of data. Moreover, we show that
the file sizes obtained using this solution are similar to those obtained using other compression algorithms that
require more computing power.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Java is a commonly used programming language, although its use in High Performance Computing (HPC) remains
relatively low. One of the reasons is a lack of libraries offering specific HPC functions to Java applications. In
this paper we present a Java-based framework, called DpcbTools, designed to provide a set of functions that fill
this gap. It includes a set of efficient data communication functions based on message-passing, thus providing,
when a low latency network such as Myrinet is available, higher throughputs and lower latencies than standard
solutions used by Java. DpcbTools also includes routines for the launching, monitoring and management of Java
applications on several computing nodes by making use of JMX to communicate with remote Java VMs. The
Gaia Data Processing and Analysis Consortium (DPAC) is a real case where scientific data from the ESA Gaia
astrometric satellite will be entirely processed using Java. In this paper we describe the main elements of DPAC
and its usage of the DpcbTools framework. We also assess the usefulness and performance of DpcbTools through
its performance evaluation and the analysis of its impact on some DPAC systems deployed in the MareNostrum
supercomputer (Barcelona Supercomputing Center).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The main goal of this study is to characterize the effects of lossy image compression procedures on the spatial patterns of
remotely sensed images, as well as to test the performance of job distribution tools specifically designed for obtaining
geostatistical parameters (variogram) in a High Performance Computing (HPC) environment. To this purpose,
radiometrically and geometrically corrected Landsat-5 TM images from April, July, August and September 2006 were
compressed using two different methods: Band-Independent Fixed-Rate (BIFR) and three-dimensional Discrete Wavelet
Transform (3d-DWT) applied to the JPEG 2000 standard. For both methods, a wide range of compression ratios (2.5:1,
5:1, 10:1, 50:1, 100:1, 200:1 and 400:1, from soft to hard compression) were compared. Variogram analyses conclude
that all compression ratios maintain the variogram shapes and that the higher ratios (more than 100:1) reduce variance in
the sill parameter of about 5%. Moreover, the parallel solution in a distributed environment demonstrates that HPC
offers a suitable scientific test bed for time demanding execution processes, as in geostatistical analyses of remote
sensing images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Consultative Committee for Space Data Systems (CCSDS) Rice Coding is a recommendation
for lossless compression of satellite data. It was also integrated with HDF (Hierarchical Data Format)
software for lossless compression of scientific data, and was proposed for lossless compression of
medical images. The CCSDS Rice coding is an approximate adaptive entropy coder. It uses a subset of
the family of Golomb codes to produce a simpler, suboptimal prefix code. The default preprocessor is a
unit-delay predictor with positive mapping. The adaptive entropy coder concurrently applies a set of
variable-length codes to a block of consecutive preprocessed samples. The code option that yields the
shortest codeword sequence for the current block of samples is then selected for transmission. A unique
identifier bit sequence is attached to the code block to indicate to the decoder which decoding option to
use. In this paper we explore the parallel efficiency of the CCSDS Rice code running on Graphics
Processing Units (GPUs) with Compute Unified Device Architecture (CUDA). The GPU-based
CCSDS Rice encoder will process several codeword blocks in a massively parallel fashion on different
GPU multiprocessors. We parallelized the CCSDS Rice coding by using reduction sum for code option
selection, prefix sum for intra-block and inter-block bit stream concatenation as well as asynchronous
data transfer. For NASA AVIRIS hyperspectral data, the speedup is near 6× as compared to the
single-threaded CPU counterpart. The CCSDS Rice coding has too many flow control instructions
which significantly affect the instruction throughput by causing threads of the same CUDA warp to
diverge. Consequently, the different execution paths must be serialized, increasing the total number of
instructions executed within the same warp. We conclude that this branching and divergence issue is
the bottleneck of the Rice coding that leads to smaller speedup than other entropy coding on GPUs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The popularity of Graphic Processing Units (GPUs) opens a new avenue for general-purpose
computation including the acceleration of algorithms. Massively parallel computations using GPUs
have been applied in various fields by researchers. Arithmetic coding (AC) is widely used in lossless
data compression and shows better compression efficiency than the well-known Huffman Coding.
However, AC possesses much higher computational complexity due to frequent multiplication and
branching operations. In this paper, we implement the block-parallel arithmetic encoder on GPUs using
the NVIDIA GPU and the Computer Unified Device Architecture (CUDA) programming model. The
source data sequence is divided into small blocks. Each CUDA thread processes one data block so that
data blocks can be encoded in parallel. By exploiting the GPU computational power, a significant
speedup is achieved. We show that the GPU-based AC speedup result depends on data distribution and
size. It is observed that the GPU speedup increases with higher compression ratios, due to the fact that
higher compression ratio corresponds to smaller compressed data output which reduces the bit stream
concatenation time as well as the device-to-host transfer time. Applying to the selected test images in
the USC-SIPI image database, we obtain speedup values ranging from 26x to 42x while compression
ratios ranging from 1.4 to 2.7.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The high-performance computing is necessary for remote sensing image compression to achieve real time output. There
are one Panchromatic (PAN) band and four Multi-Spectrum (MS) bands with total 970Mbps data rate in the
FORMOSAT-5 Remote Sensing Instrument (RSI). Three Xilinx Virtex 5 FPGAs with external memory are used to
perform real time image data compression based on CCSDS 122.0-B-1. Parallel and concurrent handling strategies are
used to achieve high-performance computing in the process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It addresses
the (possibly) mixed nature of pixels collected by instruments for Earth observation, which are due to several
phenomena including limited spatial resolution, presence of mixing effects at different scales, etc. Spectral
unmixing involves the separation of a mixed pixel spectrum into its pure component spectra (called endmembers)
and the estimation of the proportion (abundance) of endmember in the pixel. Two models have been widely used
in the literature in order to address the mixture problem in hyperspectral data. The linear model assumes that
the endmember substances are sitting side-by-side within the field of view of the imaging instrument. On the
other hand, the nonlinear mixture model assumes nonlinear interactions between endmember substances. Both
techniques can be computationally expensive, in particular, for high-dimensional hyperspectral data sets. In this
paper, we develop and compare parallel implementations of linear and nonlinear unmixing techniques for remotely
sensed hyperspectral data. For the linear model, we adopt a parallel unsupervised processing chain made up
of two steps: i) identification of pure spectral materials or endmembers, and ii) estimation of the abundance of
each endmember in each pixel of the scene. For the nonlinear model, we adopt a supervised procedure based
on the training of a parallel multi-layer perceptron neural network using intelligently selected training samples
also derived in parallel fashion. The compared techniques are experimentally validated using hyperspectral data
collected at different altitudes over a so-called Dehesa (semi-arid environment) in Extremadura, Spain, and
evaluated in terms of computational performance using high performance computing systems such as commodity
Beowulf clusters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It involves the
separation of a mixed pixel spectrum into its pure component spectra (called endmembers) and the estimation
of the proportion (abundance) of each endmember in the pixel. Over the last years, several algorithms have been
proposed for: i) automatic extraction of endmembers, and ii) estimation of the abundance of endmembers in
each pixel of the hyperspectral image. The latter step usually imposes two constraints in abundance estimation:
the non-negativity constraint (meaning that the estimated abundances cannot be negative) and the sum-toone
constraint (meaning that the sum of endmember fractional abundances for a given pixel must be unity).
These two steps comprise a hyperspectral unmixing chain, which can be very time-consuming (particularly for
high-dimensional hyperspectral images). Parallel computing architectures have offered an attractive solution for
fast unmixing of hyperspectral data sets, but these systems are expensive and difficult to adapt to on-board
data processing scenarios, in which low-weight and low-power integrated components are essential to reduce
mission payload and obtain analysis results in (near) real-time. In this paper, we perform an inter-comparison
of parallel algorithms for automatic extraction of pure spectral signatures or endmembers and for estimation
of the abundance of endmembers in each pixel of the scene. The compared techniques are implemented in
graphics processing units (GPUs). These hardware accelerators can bridge the gap towards on-board processing
of this kind of data. The considered algorithms comprise the orthogonal subspace projection (OSP), iterative
error analysis (IEA) and N-FINDR algorithms for endmember extraction, as well as unconstrained, partially
constrained and fully constrained abundance estimation. The considered implementations are inter-compared
using different GPU architectures and hyperspectral data sets collected by the NASA's Airborne Visible Infra-Red
Imaging Spectrometer (AVIRIS).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Endmember extraction is an important task for remotely sensed hyperspectral data exploitation. It comprises
the identification of spectral signatures corresponding to macroscopically pure components in the scene, so that
mixed pixels (resulting from limited spatial resolution, mixing phenomena happening at different scales, etc.) can
be decomposed into combinations of pure component spectra weighted by an estimation of the proportion (abundance)
of each endmember in the pixel. Over the last years, several algorithms have been proposed for automatic
extraction of endmembers from hyperspectral images. These algorithms can be time-consuming (particularly for
high-dimensional hyperspectral images). Parallel computing architectures have offered an attractive solution for
fast endmember extraction from hyperspectral data sets, but these systems are expensive and difficult to adapt
to on-board data processing scenarios, in which low-weight and low-power hardware components are essential to
reduce mission payload, overcome downlink bandwidth limitations in the transmission of the hyperspectral data
to ground stations on Earth, and obtain analysis results in (near) real-time.
In this paper, we perform an inter-comparison of the hardware implementations of two widely used techniques
for automatic endmember extraction from remotely sensed hyperspectral images: the pixel purity index (PPI)
and the N-FINDR. The hardware versions have been developed in field programmable gate arrays (FPGAs). Our
study reveals that these reconfigurable hardware devices can bridge the gap towards on-board processing of remotely
sensed hyperspectral data and provide implementations that can significantly outperform the (optimized)
equivalent software versions of the considered endmember extraction algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the main drawbacks encountered when dealing with hyperspectral images is the vast amount of data to process.
This is especially dramatic when data are acquired by a satellite or an aircraft due to the limited bandwidth channel
needed to transmit data to a ground station. Several solutions are being explored by the scientific community. Software
approaches have limited throughput performance, are power hungry and most of the times do not match the expectations
needed for real time applications. Under the hardware point of view, FPGAs, GPUs and even the Cell Processor,
represent attractive options, although they present complex solutions and potential problems for their on-board inclusion.
However, sometimes there is an impetus for developing new architectural and technological solutions while there is
plenty of work done in the past that can be exploited for solving drawbacks in the present. In this scenario, H.264/AVC
arises as the state-of-the-art standard in video coding, showing increased compression efficiency with respect to any
previous standard, and although mainly used for video applications, it is worthwhile to explore its convenience for
processing hyperspectral imaginery.
In this work, an inductive exercise of compressing hyperspectral cubes with H.264/AVC is carried out. An exhaustive set
of simulations have been performed, applying this standard locally to each spectral band and evaluating globally the
effect of the quantization factor, QP, in order to determine an optimum configuration of the baseline encoder for INTRA
prediction modes. Results are presented in terms of spectral angle as a metric for determining the feasibility of the
endmember extraction. These results demonstrate that under certain assumptions, the use of standard video codecs
represent a good compromise solution in terms of complexity, flexibility and performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
HPC for Hyper- and Multispectral Remote Sensing II
Hyperspectral image compression has received considerable interest in recent years due to the enormous data
volumes collected by imaging spectrometers for Earth Observation. JPEG2000 is an important technique for
data compression which has been successfully used in the context of hyperspectral image compression, either in
lossless and lossy fashion. Due to the increasing spatial, spectral and temporal resolution of remotely sensed
hyperspectral data sets, fast (onboard) compression of hyperspectral data is becoming a very important and
challenging objective, with the potential to reduce the limitations in the downlink connection between the
Earth Observation platform and the receiving ground stations on Earth. For this purpose, implementation of
hyperspectral image compression algorithms on specialized hardware devices are currently being investigated.
In this paper, we develop an implementation of the JPEG2000 compression standard in commodity graphics
processing units (GPUs). These hardware accelerators are characterized by their low cost and weight, and can
bridge the gap towards on-board processing of remotely sensed hyperspectral data. Specifically, we develop GPU
implementations of the lossless and lossy modes of JPEG2000. For the lossy mode, we investigate the utility of the
compressed hyperspectral images for different compression ratios, using a standard technique for hyperspectral
data exploitation such as spectral unmixing. In all cases, we investigate the speedups that can be gained by using
the GPU implementations with regards to the serial implementations. Our study reveals that GPUs represent
a source of computational power that is both accessible and applicable to obtaining compression results in valid
response times in information extraction applications from remotely sensed hyperspectral imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Anomaly detection is an important task for remotely sensed hyperspectral data exploitation. One of the most
widely used and successful algorithms for anomaly detection in hyperspectral images is the Reed-Xiaoli (RX)
algorithm. Despite its wide acceptance and high computational complexity when applied to real hyperspectral
scenes, few documented parallel implementations of this algorithm exist, in particular for multi-core processors.
The advantage of multi-core platforms over other specialized parallel architectures is that they are a low-power,
inexpensive, widely available and well-known technology. A critical issue in the parallel implementation of RX
is the sample covariance matrix calculation, which can be approached in global or local fashion. This aspect is
crucial for the RX implementation since the consideration of a local or global strategy for the computation of
the sample covariance matrix is expected to affect both the scalability of the parallel solution and the anomaly
detection results. In this paper, we develop new parallel implementations of the RX in multi-core processors and
specifically investigate the impact of different data partitioning strategies when parallelizing its computations.
For this purpose, we consider both global and local data partitioning strategies in the spatial domain of the
scene, and further analyze their scalability in different multi-core platforms. The numerical effectiveness of the
considered solutions is evaluated using receiver operating characteristics (ROC) curves, analyzing their capacity
to detect thermal hot spots (anomalies) in hyperspectral data collected by the NASA's Airborne Visible Infra-
Red Imaging Spectrometer system over the World Trade Center in New York, five days after the terrorist attacks
of September 11th, 2001.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advanced architectures have been proposed for efficient orthorectification of digital airborne camera images,
including a system based on GPU processing and distributed computing able to geocorrect three digital still
aerial photographs per second. Here, we address the computationally harder problem of geocorrecting image
data from airborne pushbroom sensors, where each individual image line has associated its own camera attitude
and position parameters. Using OpenGL and CUDA interoperability and projective texture techniques, originally
developed for fast shadow rendering, image data is projected onto a Digital Terrain Model (DTM) as if by a slide
projector placed and rotated in accordance with GPS position and inertial navigation (IMU) data. Each line is
sequentially projected onto the DTM to generate an intermediate frame, consisting of a unique projected line
shaped by the DTM relief. The frames are then merged into a geometrically corrected georeferenced orthoimage.
To target hyperband systems, avoiding the high dimensional overhead, we deal with an orthoimage of pixel
placeholders pointing to the raw image data, which are then combined as needed for visualization or processing
tasks. We achieved faster than real-time performance in a hyperspectral pushbroom system working at a line rate
of 30 Hz with 200 bands and 1280 pixel wide swath over a 1 m grid DTM, reaching a minimum processing speed
of 356 lines per second (up to 511 lps), over eleven (up to seventeen) times the acquisition rate. Our method
also allows the correction of systematic GPS and/or IMU biases by means of 3D user interactive navigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
DubaiSat-1 (DS1) captures multispectral images with 5-meter resolution using three visible bands red (420 to 510 nm),
green (510 to 580 nm), blue (600 to 720 nm) and one near-IR band (760 to 890 nm). It also has a panchromatic channel
with 2.5-meter resolution (420 to 720 nm). [1] Under certain conditions, degradation in quality might occur over DS1
captured images. The aim of this project is to enhance the quality of the image in terms of resolution, sharpness and color
quality. It is well known that the enhancement procedure is a very difficult task due to the significant noise increase
resulted from any sharpening action. Moreover, sometimes the color of the captured images might become saturated,
thus some areas will be given false coloring (i.e., some colors will be presented as gray instead of their original colors).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tsunami propagation in shallow water zone is often modeled by the shallow water equations (also called
Saint-Venant equations) that are derived from conservation of mass and conservation of momentum equations.
Adding friction slope to the conservation of momentum equations enables the system to simulate the
propagation over the coastal area. This means the system is also able to estimate inundation zone caused by the
tsunami. Applying Neumann boundary condition and Hansen numerical filter bring more interesting
complexities into the system. We solve the system using the two-step finite-difference MacCormack scheme
which is potentially parallelizable. In this paper, we discuss the parallel implementation of the MacCormack
scheme for the shallow water equations in modern graphics processing unit (GPU) architecture using NVIDIA
CUDA technology. On a single Fermi-generation NVIDIA GPU C2050, we achieved 223x speedup with the
result output at each time step over the original C code compiled with -O3 optimization flag. If the experiment
only outputs the final time step result to the host, our CUDA implementation achieved around 818x speedup
over its single-threaded CPU counterpart.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Weather Research and Forecasting (WRF) model is a numerical weather prediction and atmospheric simulation
system. It has been designed for both research and operational applications. WRF code can be run in different computing
environments ranging from laptops to supercomputers. Purdue Lin scheme is a relatively sophisticated microphysics
scheme in WRF. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and
graupel. In this paper, we accelerate the Purdue Lin scheme on the multi-core NVIDIA Graphics Processing Units
(GPUs). Lately, GPUs have evolved into highly parallel, multi-threaded, many-core processors possessing tremendous
computational speed and a high memory bandwidth. We discuss how our GPU implementation exploits the massive
parallelism, resulting in a highly efficient acceleration of the Purdue Lin scheme. We utilize a low-cost personal
supercomputer with 512 CUDA cores on a GTX590 GPU. We achieve an overall speedup of 156× in case of 1 GPU as
compared to the single-threaded CPU version. Since Purdue Lin microphysics scheme is only an intermediate module of
the entire WRF model, host-device I/O should not happen, i.e. its input data is already available in the GPU global
memory from previous modules and its output data should reside in the GPU global memory for later usage by other
modules. The speedup without host-device data transfer time is 692×.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research
model in the world. There are several single moment ice microphysics models in WRF. A mixed phase for WRF Single
Moment (WSM) represents the condensation, precipitation, and thermodynamic effects of latent heat release. In this
paper, we will show our optimization efforts on WSM5. The processing time can be reduced from 16928 ms on CPU to
48.3 ms using General Purpose Graphics Processing Unit (GPGPU). Thus, the speedup is 350x without I/O using a
single GPU. Taking I/O transfer times into account the speedup is 202x.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Weather Research and Forecasting (WRF) model is the latest-generation numerical weather prediction model. It has
been designed to serve both operational forecasting and atmospheric research needs. It proves useful for a broad
spectrum of applications for scales ranging from meters to thousands of kilometers. WRF computes an approximate
solution to the differential equations which govern the air motion of the whole atmosphere. Kessler microphysics module
in WRF is a simple warm cloud scheme that includes water vapor, cloud water and rain. Microphysics processes which
are modeled are rain production, fall and evaporation. The accretion and auto-conversion of cloud water processes are
also included along with the production of cloud water from condensation. In this paper, we develop an efficient WRF
Kessler microphysics scheme which runs on Graphics Processing Units (GPUs) using the NVIDIA Compute Unified
Device Architecture (CUDA). The GPU-based implementation of Kessler microphysics scheme achieves a significant
speedup of 70x over its CPU based single-threaded counterpart. The speedup on a GPU without host-device data
transfer time is 816x. Since Kessler microphysics scheme is just an intermediate modules of the entire WRF model, the
GPU I/O should not occur, i.e. its input data should be already available in the GPU global memory from previous
modules and their output data should reside at the GPU global memory for later usage by other modules. Thus, the
limited scaling of Kessler scheme with I/O will not be an issue once all modules have been rewritten using CUDA. High
speed WRF running completely on GPUs promises more accurate forecasts in considerably less time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The weather research and Forecasting (WRF) model in an atmospheric simulation system, which is designed for both
operational and research use. This common tool aspect promotes closer ties between research and operational
communities. It contains a lot a different physics and dynamics options reflecting the experience and input of the broad
scientific community. The WRF physics categories and microphysics, cumulus parametrization, planetary boundary
layer, land-surface model and radiation. Explicitly resolved water vapor, cloud and precipitation processes are included
in microphysics. Several bulk water microphysics schemes are available within the Weather Research and Forecasting
(WRF) model, with different numbers of simulated hydrometeor classes and methods for estimating their size fall
speeds, distributions and densities. Stony-Brook University (SBU-YLIN) microphysics scheme is a 5-class scheme with
riming intensity predicted to account for mixed-phase processes. In this paper, we develop an efficient graphics
processing unit (GPU) based SBU-YLIN scheme. WRF computation domain is 3D grid layed over the earth. SBU-YLIN
performs the same computation for each spatial position in the whole domain. This repletion of the same computation on
different data sets allows using GPU's Single Instruction Multiple Dataset (SIMD) architecture. The GPU-based SBUYLIN
scheme will be compared to a CPU-based single-threaded counterpart. The implementation achieves 213x
speedup with I/O compared to a Fortran implementation running on a CPU. Without I/O the speedup is 896x.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electromagnetic scattering of vegetation is represented by a double-layer model
comprising of vegetation layer and ground layer. The vegetation layer is composed of discrete
leaves which are approximated as ellipsoids. The ground layer is modeled as a random rough
surface. Investigation of the scattering field of a single leaf is carried out first. Then the leaves are
divided into different groups depending on their orientation. Considering the incoherent addition
property of Stokes parameters, the Stokes matrix and the phase matrix of every group are
calculated, adding them eventually to get the total scattering coefficient. In the original
CPU-based sequential code, the Monte Carlo simulation to calculate the electromagnetic
scattering of vegetation takes 97.2% of the total execution time. In this paper we take advantage
of the large-scale parallelism of Compute Unified Device Architecture (CUDA) to create and
compute all the groups simultaneously. As a result, a speedup of up to 213x is achieved on a single
Fermi-generation NVIDIA GPU GTX 480.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.