With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
KEYWORDS: Eye, Eye models, Control systems, Visual process modeling, Cognitive modeling, Mining, Signal processing, Motion controllers, Human-computer interaction, Systems modeling
The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer
interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new
human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit
Prosody mining based on the human eye image capture technology to extract the parameters from the image of human
eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high
simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward
a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and
synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes
during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore,
how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity
of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement
control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing
system of text and eye movement control system was discussed. It was proved that under the same text familiarity
condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze
duration model based on the Chinese language level prosodic structure was presented to change previous methods of
machine learning and probability forecasting, obtain readers’ real internal reading rhythm and to synthesize voice with
personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and
application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining
based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
In order to establish measurement basis in non-cooperative environment, this paper proposes an autonomous
position and posture servo tracking method based on laser light guided monocular vision. The light of a linear laser
projected on a plane was simulated as horizon basis, while the laser light modulated by the projection in the reference
plane was taken as servo target. The modulated position and posture change information of laser light was obtained by
monocular vision system, information from which the attitude angle of the laser light could be calculated. The attitude
angle was transmitted to parallel tracking platform in real time and controlled the movement of the platform according to
the laser light. The tracking angle parameters of parallel tracking platform under different position and posture were
verified using an inclinometer, which proved the validity and effectiveness of this method. As to the existing
measurement errors, this paper analyzed possible causes and provided with feasible suggestions to further improve the
precision of the system.
KEYWORDS: Digital watermarking, Image encryption, Image processing, Chemical elements, Computer security, Digital imaging, Feature extraction, Multimedia, Image restoration, Information science
In order to improve image encryption strength, an image encryption method based on parasitic audio watermark was
proposed in this paper, which relies on double messages such as image domain and speech domain to do image
encryption protection. The method utilizes unique Chinese phonetics synthesis algorithm to complete audio synthesis
with embedded text, then separate this sentence information into prosodic phrase, obtains complete element set of initial
consonant and compound vowel that reflects audio feature of statement. By sampling and scrambling the initial
consonant and compound vowel element, synthesizing them with image watermark, and embedding the compound into
the image to be encrypted in frequency domain, the processed image contains image watermark information and
parasitizes audio feature information. After watermark extraction, using the same phonetics synthesis algorithm the audio
information is synthesized and compared with the original. Experiments show that any decryption method in image
domain or speech domain could not break encryption protection and image gains higher encryption strength and security
level by double encryption.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.