Biometrics has been introduced in many facilities, and face authentication is one of the authentication methods that has attracted attention. However, face authentication is influenced easily by changes in lighting conditions. This study proposes a face authentication method using only a thermal camera, which is robust to lighting changes. The proposed method applies FaceNet[1], an authentication method for visible light images, to each of the four parts of the face (face, eyes, nose, and mouth) and tries to improve the accuracy by using a majority voting method.
Sign language is a visual language that uses hand signs, arm movements and facial expressions to convey information. However, learning sign language is challenging task, due to variety of movements that should be executed precisely to express the correct information. Many studies have been made to recognize and translate the sign languages. Traditional methods face problems when used in a real-world situation due to the background and lighting variations. Recently, the result obtained from research using multi-modal data, including human skeleton data to Sign Language Recognition task (SLR), have archived a remarkable success.1 This paper aims to make a model that uses only skeleton data to recognize sign language using Graph Convolutional Network (GCN) and make a Japanese Sign Language dataset to help future SLR research.
The motif drawn on Nishiki-e is needed to register in the database as a search tag. The accuracies of the motif tag that are currently manually registered is unstable because it depends on the knowledge and interests of the registrant. Therefore, this study proposes an automatic generation method of motif tags using deep learning to support cultural activities. Nishiki-e is more difficult to collect training images that include specific motifs than photographs. In this study, we propose three methods for preparing training images. First, we applied a similar image generation model from a single image to a small number of Nishiki-e containing motifs to create training images. Second, we applied a Nishiki-e style processing model to photographs containing motifs to create training images. Third, we combined a small number of photographs with motifs with some background images to create training images. In particular, the third method can detect from a small number of inputs like the first method with an accuracy close to that of the second method.
A depth image of a single RGBD camera has many occlusions and noises, so it is not easy to obtain 3D data of the whole human head. Point cloud deep learning has recently attracted much attention, which allows direct input and output of point clouds. One of them, the point cloud completion, which creates a complete point cloud from a partial point cloud, has been studied. However, existing studies of point cloud completion evaluated only the shape and have not focused on colored point clouds. Therefore, this study proposes a colored point cloud completion method for the human head based on machine learning. For deep learning training, the CG dataset was created from the face and hair dataset. The proposed network inputs and outputs point cloud with XYZ coordinates, and 𝐿 ∗𝑎 ∗𝑏 ∗ color information optionally has a Discriminator that processes 𝐿 ∗𝑎 ∗𝑏 ∗ -D images by a differentiable point renderer. This study experimented using the network and the dataset and evaluated using point domain and image domain metrics.
Color constancy is a human characteristic that can recognize the color of an object correctly, even if the color of the illumination light changes. We constructed a network that reproduces the color constancy by pix2pix, an adversarial generative network. However, the current network has problems. For example, the network cannot output the color and shape of object parts correctly when the illumination is extreme colors, and the object and the background in the image assimilate. This research tries to improve the accuracy of the color constancy network by using the segmentation technique. We generate a mask image by the segmentation network from the input image, where the object part is white and the background is black. Then, we input the mask image to the network in the same way as the input image and add the information of the mask image to the network processing of the input image. By inputting the mask image, the information of the target object region is added to the color constancy network. It is possible to clarify the region of the object in the input image and to reproduce the shape and color of the object, which the existing color constancy network cannot reproduce.
Various models have been proposed to predict the future head/gaze orientation of a user watching a 360-degree video. However, most of these models do not take sound information into account, and there are few studies on the influence of sound on users in VR space. This study proposes a multimodal model for predicting head/gaze orientation for 360-degree videos based on a new analysis of users' head/gaze behavior in VR space. First, we focus on whether people are attracted to the sound source of the 360-degree video or not. We conducted a head/gaze tracking experiment with 22 subjects in AV (Audio-Visual) and V (Visual) conditions using 32 videos. As a result, it was confirmed that whether they were attracted to the sound source differed depending on the video. Next, we trained a deep learning model based on the results and constructed and evaluated a multimodal model that combined visual and auditory information. As a result, we were able to construct a multimodal head/gaze prediction model that used the sound source explicitly. However, from the viewpoint of accuracy improvement, we could not confirm any advantage of multimodalization. Finally, a discussion of this problem and prospects is given.
Deep neural networks (DNNs) are capable of achieving high performance in various tasks. However, the huge number of parameters and floating point operations make it difficult to deploy them on edge devices. Therefore, in recent years, a lot of researches have been done to reduce the weight of deep convolutional neural networks. Conventional research prunes based on a set of criteria, but we do not know if those criteria are optimal or not. In order to solve this problem, this paper proposes a method to select parameters for pruning automatically. Specifically, all parameter information is input, and reinforcement learning is used to select and prune parameters that do not affect the accuracy. Our method prunes one filter or node in one action and compresses it by repeating the action. The proposed method was able to highly compress the CNN with minimal degradation in accuracy and reduce about 97.0% of the parameters with 2.53% degradation in CIFAR10 image classification task on VGG16.
Recently, object recognition using CNN is widespread. Still, medical images do not have a sufficient number of images because they require the doctor’s findings in the training dataset. On such a small-scale dataset, there is a problem that CNN cannot realize enough high recognition accuracy. As a solution to this problem, there is a method called transfer learning that reuses the weights learned on a large dataset. In addition, there is research on a method of pruning parameters unimportant for the target task during transfer learning. In this study, after transfer learning is performed, the convolution filter is evaluated using pruning, and the low evaluation filter is replaced with the high evaluation filter. In order to confirm the usefulness of the proposed method in recognition accuracy, we compare it with the three methods, i.e., transfer learning only, pruning, and initializing the filter. As a result, we were able to obtain a high recognition accuracy compared to other methods. We confirmed that CNN might be affected by replacing the filter in object recognition of small-scale datasets.
In recent years, virtual reality (VR) and augmented reality (AR) have been developed and applied to various simulations for business and commercial use. In these simulations, computer graphics (CG) becomes very important to express virtual objects, and there are many studies on the expression of cloth. Some optical properties of an object are necessary to represent cloth with CG. These optical properties depend on the material of the thread, the number of threads, and the thickness. Therefore it is difficult to represent clothes corresponding to these changes. This study proposes a method to formulate the reflection and transmittance that depend on the component of the cloths. To formulate the reflection, we use the Kubelka-Munk theory and the component of the cloth that can be easily obtained using a smartphone, etc.
This paper proposes a method for estimating 3D information, such as shape, orientation, size, and position of objects in a monocular image, and reproduce scenes in 3D point clouds using Convolutional Neural Network (CNN). This study proposes a network that combines depth estimation, object detection, and point cloud estimation to estimate 3D information of objects. The proposed network requires networks for object detection and segmentation, and a point cloud estimation for object shape estimation. The point cloud estimation network is robust to the reproduction of the object's surface and can deal with unknown objects through a semantic understanding of the object’s shape. In addition to these networks, we combine a depth estimation network for estimating the depth of the entire scene and the distance between the camera and object. In this paper, we consider the point cloud estimation network. We estimate the point clouds for real objects in the images of the dataset and evaluate the output point clouds.
In recent years, research on virtual fitting has been conducted in the fashion field. Many of them have been put to practical use in prepared clothes, and companies are using the information on the shape, size, and fabric of their clothes to provide users with virtual fitting. In the case without known data, there are many methods of estimating the shape and size of the clothes in images. Using these methods, users can try on virtually the clothes they want to wear while fitting the users’ body shape and pose. On the other hand, a method for estimating the fabric of clothes remains to be developed. Because the materials of clothes are related to the softness of clothes in virtual fitting, it is difficult to reproduce the realistic movements and wrinkles of clothes using the conventional virtual fitting system. This study proposes a method for estimating the material of fabric from clothes images, aiming at realistic virtual fitting. A dataset focusing on each fabric’s texture and luster is constructed and estimated using a Convolutional Neural Network (CNN).
Cleaning is inseparable in life, but it is impossible to see with the naked eye where the room was actually cleaned. For this
reason, if information on the location where the cleaning was performed cannot be shared when cleaning by multiple
people, there is a possibility that an unclean area is remained. Therefore, if Augmented Reality (AR) can be used to
visualize the passing area of the hand or cleaning tool being cleaned, it will lead to improve cleaning efficiency and increase
motivation by visualizing the cleaning area. The purpose of this research is to obtain and superimpose the location
information of the passing area using Simultaneous Localization and Mapping (SLAM) in order to visualize the passing
area of the hand or the cleaning tool using AR.
In urban development, it is important to make a plan that takes into account the changes in the appearance of natural objects after decades. This study proposes a simulation method of tree growth for the prediction of the appearance change of natural objects.
This study proposes a method generating a 3D model of furniture from 3D point cloud data of a room captured by RGBD camera in order to realize the layout simulation of the real room with furniture.
When the printed material is imaged by a monocular digital camera, geometric distortions caused due to folds result in a different appearance from the content of the original printed material. This study aims to reproduce appearance by correction the obtained image. As a proposed method, the geometric distortion is corrected by deforming each local area after dividing the printed material image into local areas. In addition, the brightness change by shading is also corrected.
In recent years, many SLAM (simultaneous localization and mapping) systems have appeared showing impressive dense scene reconstruction. However, the normal SLAM system build 3D scenes at point level without any semantic information. Many computer vision applications require high ability of scene understanding and point-based SLAM shows insufficiency in these applications. This paper studies about fusing 3D object recognition into SLAM system, using hand-held RGB-D camera and RTAB-Map to reconstruct dense point cloud of 3D indoor scene. Then we use supervoxel based point cloud segmentation approaches to over-segment the scene. 3D object classification model trained by PointNet is added to merge the segmentation process and object recognition. Our experiment on indoor environment shows the effectiveness of this system.
KEYWORDS: Video, Cameras, Projection systems, Personal digital assistants, Super resolution, Cell phones, Profiling, Computing systems, Data storage, Internet
There are various kinds of learning systems in the world and quite a lot of them are using video sources. Also, those video sources have many kinds according to the content of learning and aim. In this paper, I'd like to describe the usability of learning systems by using a super high definition video source focusing on making
handling of video source using super high resolution. Furthermore, the future progress and present problems would be considered by proposing an on-demand learning system using a super high definition video source. The super high resolution here means 4K (4096x2160 dots).
Geometric registration between a virtual object and the real space is the most basic problem in augmented reality. Model-based tracking methods allow us to estimate three-dimensional (3-D) position and orientation of a real object by using a textured 3-D model instead of visual marker. However, it is difficult to apply existing model-based tracking methods to the objects that have movable parts such as a display of a mobile phone, because these methods suppose a single, rigid-body model.
In this research, we propose a novel model-based registration method for multi rigid-body objects. For each frame, the 3-D models of each rigid part of the object are first rendered according to estimated motion and transformation from the previous frame. Second, control points are determined by detecting the edges of the rendered image and sampling pixels on these edges. Motion and transformation are then simultaneously calculated from distances between the edges and the control points. The validity of the proposed method is demonstrated through experiments using synthetic videos.
Two methods are described to accurately estimate diffuse and specular reflectance parameters for colors, gloss
intensity and surface roughness, over the dynamic range of the camera used to capture input images. Neither
method needs to segment color areas on an image, or to reconstruct a high dynamic range (HDR) image. The
second method improves on the first, bypassing the requirement for specific separation of diffuse and specular
reflection components. For the latter method, diffuse and specular reflectance parameters are estimated separately,
using the least squares method. Reflection values are initially assumed to be diffuse-only reflection
components, and are subjected to the least squares method to estimate diffuse reflectance parameters. Specular
reflection components, obtained by subtracting the computed diffuse reflection components from reflection
values, are then subjected to a logarithmically transformed equation of the Torrance-Sparrow reflection model,
and specular reflectance parameters for gloss intensity and surface roughness are finally estimated using the least
squares method. Experiments were carried out using both methods, with simulation data at different saturation
levels, generated according to the Lambert and Torrance-Sparrow reflection models, and the second method,
with spectral images captured by an imaging spectrograph and a moving light source. Our results show that
the second method can estimate the diffuse and specular reflectance parameters for colors, gloss intensity and
surface roughness more accurately and faster than the first one, so that colors and gloss can be reproduced more
efficiently for HDR imaging.
KEYWORDS: Photography, Digital imaging, Image quality, Silver, Analog electronics, Image enhancement, Digital photography, Digital image processing, Image processing, Digital cameras
To overcome shortcomings of digital image, or to reproduce grain of traditional silver halide photographs, some
photographers add noise (grain) to digital image. In an effort to find a factor of preferable noise, we analyzed how a
professional photographer introduces noise into B&W digital images and found two noticeable characteristics: 1) there is
more noise in mid-tones, gradually decreasing in highlights and shadows toward the ends of tonal range, and 2)
histograms in highlights are skewed toward shadows and vice versa, while almost symmetrical in mid-tones. Next, we
examined whether the professional's noise could be reproduced. The symmetrical histograms were approximated by
Gaussian distribution and skewed ones by chi-square distribution. The images on which the noise was reproduced were
judged by the professional himself to be satisfactory enough. As the professional said he added the noise so that "it
looked like the grain of B&W gelatin silver photographs," we compared the two kinds of noise and found they have in
common: 1) more noise in mid-tones but almost none in brightest highlights and deepest shadows, and 2) asymmetrical
histograms in highlights and shadows. We think these common characteristics might be one condition for "good" noise.
Since commercial image detectors, such as charge-coupled device (CCD) cameras, have a limited dynamic range, it is difficult to obtain images that really are unsaturated, as a result of which the reflectance parameters may be inaccurately estimated. To solve this problem, we describe a method to estimate reflectance parameters from saturated spectral images. We separate reflection data into diffuse and specular components at 5-nm intervals between 380nm and 780nm for each pixel of the spectral images, which are captured at different incident angles, and estimate the diffuse reflectance parameters by applying the Lambertian model to the diffuse components. To estimate the specular reflectance parameters from the specular components, we transform the Torrance-Sparrow equation to a linear form, assuming Fresnel reflectance is constant. We then estimate specular parameters for intensity of the specular reflection and standard deviation of the Gaussian distribution, using the least squares method from unsaturated values of the specular components. Since Fresnel reflectance contributes to the physically based Torrance-Sparrow model in computer graphics and vision, we estimate both the Fresnel reflectance in terms of the Fresnel equation for the incident angle and the refractive index of the surface for dielectric materials, which varies with wavelength. We carried out experiments with measured data, and with simulated specular components at different saturation levels, generated according to the Torrance-Sparrow model. Our experimental results reveal that the diffuse and specular reflectance parameters are estimated with high quality.
We propose a new framework for interactive Augmented Reality (AR) and Mixed Reality (MR) representation using both visible and invisible projection onto physical target objects. Projection-based approach for constructing AR/MR uses physical objects such as walls, books, plaster ornaments and whatever the computer generated contents can be optically projected onto. Namely, projection makes it possible to use real objects as displays.
We mainly focus on capturing and utilizing the 3D shape of the object surface, whose information allows the AR/MR system to take into account the visual consistency when merging the physical and rendered objects. 3D shape data of the object can be used to compensate the distortion caused by the difference between positions of projectors and the viewer. The other advantage is the capability to generate proper visual occlusion between physical and virtual objects so that they seem to coexist in front of the viewer.
What we demonstrate in this study is to employ near-infrared pattern projection for triangulation so that scanning and updating the geometry data of the object is automatically performed in background process, thus parallel processing to provide AR/MR representation can be achieved according to dynamic physical geometry changes.
We propose a new technique to reproduce faithfully both the color and the gloss of an object on a computer, using multispectral images. An imaging spectrograph equipped with a monochrome charge-coupled device (CCD) camera is fixed in front of the target object. Multispectral images of a linear portion of the object's surface are captured at suitable intervals by a measuring system which comprises a light source orbiting the target object. To obtain spectral images for the whole surface, the target object is also rotated. The reflection is separated into diffuse and specular components, according to the dichromatic reflection model, and the diffuse parameters are estimated at 5-nm intervals between 380nm and 780nm for each pixel. Since the CCD camera used to capture images has a limited dynamic range, we suppose that the specular reflection is independent of wavelength for the dielectrics, and that the specular reflections are saturated, although some of them can be non-saturated. We adopt the Torrance-Sparrow reflectance model for the specular reflection, and estimate the specular parameters using the least squares method for each pixel. Our experimental results reveal that the diffuse parameters for the color and the specular parameters for the gloss of the target object are satisfactorily estimated.
In this paper, we propose new measurement technique of whole three dimensional shape for small moving objects. The proposed measurement system is very simple structure with the use of a CCD camera that installed a fish-eye lens and a cylinder that coating mirror inside. The CCD camera is set on the top side of the cylinder, and its optical axis is set to the center of cylinder. A captured image includes two types information. One is direct view of the target, the other is reflected view. These two information are used for measuring the shape of target by means of stereo matching. This proposed method can acquire the shape of target using only single image, so we can obtaine the three dimensional shape with the moving with the use of image sequence.
KEYWORDS: 3D metrology, Motion measurement, Cameras, Sensors, 3D acquisition, 3D image processing, 3D scanning, Imaging systems, Stereoscopic cameras, Gyroscopes
Wearable 3D measurement realizes to acquire 3D information of an objects or an environment using a wearable computer. Recently, we can send voice and sound as well as pictures by mobile phone in Japan. Moreover it will become easy to capture and send data of short movie by it. On the other hand, the computers become compact and high performance. And it can easy connect to Internet by wireless LAN. Near future, we can use the wearable computer always and everywhere. So we will be able to send the three-dimensional data that is measured by wearable computer as a next new data. This paper proposes the measurement method and system of three-dimensional data of an object with the using of wearable computer. This method uses slit light projection for 3D measurement and user’s motion instead of scanning system.
KEYWORDS: Visualization, Wind energy, Internet, Virtual reality, 3D acquisition, 3D modeling, 3D visualizations, Energy efficiency, Information visualization, Network architectures
Under growth of request for energy saving, city planners should consider efficiency of energy consumption from the beginning. Diversified analysis of end-use energy consumption is indispensable for exploration of desirable energy system in urban area. When the visualization is available on the Internet, the city planners can discuss freely on given plans on the Internet and can ask for the help and comments of certain learned people. This paper proposes a VR-based interactive visualization system utilizing hyperlink function of VRML. The proposed visualization relates end-use energy consumption with consumers' geometrical arrangements and nests sets of visualizations. The city planners can observe them in a virtual environment over the Internet. The proposed system was applied to a set of end-use electric power consumption data of a certain area. Experimental results clear that the visualization lets users comprehend a trend of end-user and characteristics of each consumer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.