Journal Information
Vol. 10. Issue 4.
Pages 242-251 (October - December 2017)
Share
Share
Download PDF
More article options
Visits
7587
Vol. 10. Issue 4.
Pages 242-251 (October - December 2017)
Original Article
Open Access
Audio–visual interaction in visual motion detection: Synchrony versus Asynchrony
Interacción audio-visual en la detección del movimiento visual: sincronía frente a asincronía
Visits
7587
Stephanie Rosemann
Corresponding author
Stephanie.rosemann@uni-oldenburg.de

Corresponding author. Present address: Biological Psychology Lab, Department of Psychology, Faculty of Medicine and Health Sciences, European Medical School, Carl von Ossietzky University Oldenburg, 26111 Oldenburg, Germany.
, Inga-Maria Wefel, Volkan Elis, Manfred Fahle
Department of Human-Neurobiology, University of Bremen, Hochschulring 18, 28359 Bremen, Germany
This item has received

Under a Creative Commons license
Article information
Abstract
Full Text
Bibliography
Download PDF
Statistics
Figures (5)
Show moreShow less
Tables (2)
Table 1. Overview of the Experimental Design for audiovisual motion detection. For the configuration three separated experimental tests were conducted by each participant: Visual only; and simultaneous and asynchronous audiovisual tests (both consisting of congruent and incongruent trials).
Table 2. Comparison of unimodal visual and audiovisual interactions.
Show moreShow less
Abstract
Objective

Detection and identification of moving targets is of paramount importance in everyday life, even if it is not widely tested in optometric practice, mostly for technical reasons. There are clear indications in the literature that in perception of moving targets, vision and hearing interact, for example in noisy surrounds and in understanding speech. The main aim of visual perception, the ability that optometry aims to optimize, is the identification of objects, from everyday objects to letters, but also the spatial orientation of subjects in natural surrounds. To subserve this aim, corresponding visual and acoustic features from the rich spectrum of signals supplied by natural environments have to be combined.

Methods

Here, we investigated the influence of an auditory motion stimulus on visual motion detection, both with a concrete (left/right movement) and an abstract auditory motion (increase/decrease of pitch).

Results

We found that incongruent audiovisual stimuli led to significantly inferior detection compared to the visual only condition. Additionally, detection was significantly better in abstract congruent than incongruent trials. For the concrete stimuli the detection threshold was significantly better in asynchronous audiovisual conditions than in the unimodal visual condition.

Conclusion

We find a clear but complex pattern of partly synergistic and partly inhibitory audio–visual interactions. It seems that asynchrony plays only a positive role in audiovisual motion while incongruence mostly disturbs in simultaneous abstract configurations but not in concrete configurations. As in speech perception in hearing-impaired patients, patients suffering from visual deficits should be able to benefit from acoustic information.

Keywords:
Cross-modal integration
Audiovisual motion
Congruence
Vertical
Horizontal
Resumen
Objetivo

La detección e identificación de los objetivos en movimiento es de extrema importancia en la vida diaria, aun cuando no se ha probado ampliamente en la práctica optométrica por motivos técnicos. La literatura incluye indicaciones claras acerca de la interacción entre la percepción de objetivos en movimiento, la visión y la audición, como por ejemplo en los ambientes ruidosos y en la comprensión del habla. La meta principal de la percepción visual, la capacidad que trata de optimizar la optometría, es la identificación de objetos, desde los cotidianos a las letras, y también la orientación espacial de los sujetos en entornos naturales. Para ayudar a lograr esta meta, deben combinarse las correspondientes características visuales y acústicas de entre el amplio espectro de señales que aportan los ambientes naturales.

Métodos

Investigamos la influencia de un estímulo de movimiento auditivo sobre la detección del movimiento visual, tanto en el movimiento auditivo concreto (movimiento izquierda/derecha) como abstracto (incremento/decremento de tono).

Resultados

Encontramos que los estímulos audiovisuales incongruentes originaban una detección significativamente inferior en comparación a la situación únicamente visual. Además, la detección fue considerablemente mejor en los campos congruentes abstractos que en los incongruentes. Para los estímulos concretos, el umbral de detección fue significativamente inferior en situaciones audiovisuales asíncronas que en la situación visual unimodal.

Conclusión

Encontramos un patrón claro aunque complejo de interacciones audio-visuales parcialmente sinérgicas y parcialmente inhibitorias. Parece ser que la asincronía juega únicamente un papel positivo en el movimiento audiovisual, mientras que la incongruencia se altera principalmente en las configuraciones abstractas simultáneas pero no en las configuraciones concretas. Como en la percepción del habla en pacientes con deficiencias auditivas, los pacientes que padecen déficits visuales deberían poder beneficiarse de la información acústica.

Palabras clave:
Integración intermodal
Movimiento audiovisual
Congruencia
Vertical
Horizontal
Concurrencia
Full Text
Introduction

We are surrounded by a loud, vivid, moving and changing environment where objects are perceived via different sensory systems. To achieve a unified perception, we have to localize, relate and integrate the different inputs to orient ourselves in the world and to interact with it – from cooking to driving a car. An important process subserving this goal is called cross-modal integration of visual and auditory signals and this interaction is more important than most people realize.

While the visual modality was long considered to be the dominant modality, it was shown that auditory stimuli can influence and modify visual perception.1–12 Beneficial cross-modal interactions were reported, where an additional auditory stimulus facilitated visual perception compared to unimodal visual stimulation.3,9,13–26 An advantage of cross-modal integration arises in this connection because our brain associates features of auditory and visual objects with one another, e.g. high tones are associated with small objects or bright surfaces while low tones are more often combined with large objects and dark surfaces.27,28 In contrast, additional visual information during an auditory task shows only slight performance improvements.24,29–31 In the context of speech perception, access to articular movement and gestures of the speaker enhances the perception of speech.32,33 This audiovisual integration in speech perception is of particular importance in older adults34 or cochlear-implanted deaf patients.35 Patient and animal studies show strong associations of vision and hearing impairments,36 compensation and crossmodal plasticity of one modality when the other is impaired.37–39 Patients with visual disorders (hemianopia or visuo-spatial neglect) benefit tremendously from audiovisual integration to improve or recover from visual deficits.40–42 In addition, slight deficits in vision may be compensated by the auditory domain, leading to an increased reliance on auditory rather than visual cues. Therefore, this topic is highly relevant for optometry and optometric practice both concerning diagnostics and treatment of visual deficits by testing not only visual but also audio–visual abilities (analogous to hearing-impairments).

In relation to cross-modal processing the topic of motion perception has received much attention lately.1,43,44 The connection between visual and auditory motion detection was confirmed to be beneficial if both stimuli move in the same direction.6,8,12,25,45–48 If one of the inputs disturbs the perception of the others, detection of the target stimulus direction can be reduced.20 These effects have been shown mostly for the horizontal plane using apparent motion stimuli and moving vertical bars, but also for random dot paradigms and moving gratings. Auditory stimuli were auditory tones fading from one ear to the other or changing loudspeaker locations. Some studies also explored cross-modal integration for the vertical plane using abstract auditory stimuli like ascending or descending pitch tones.6,7,49 The auditory input has an advantageous effect on visual motion detection as it gives additional temporal information to the already existing spatial information of the visual system.

Synchrony is an important topic in cross-modal binding and integration.50–52 Greatest effects are observed if both stimuli are co-localized and simultaneously presented whereas a delayed presentation reduces the effect.7,13,21,53,54 But also larger time windows can still lead to cross-modal integration, depending on context and complexity of the stimuli. In this case, the visual modality takes sound velocity into account.55–58 Additionally, congruence is an essential factor in multisensory motion perception. If both stimuli move in the same direction, detection thresholds can be significantly reduced while incongruent stimuli lead to increased detection thresholds.7,13–15,18,45,49,53,59,60 In some of these studies, clear congruence effects were shown, some studies only demonstrated the clear advantageous effect of multimodal compared to unimodal presentation.

Our study produces a new example of auditory-visual interaction that is relevant for the perception of moving objects and underlines the fact that when dealing with patients’ complaints about ‘blurred’ vision, evaluation of static optometric charts may miss the relevant symptom and that in the perception of moving objects, seeing and hearing interact.

As previous research focused mostly on spatial audiovisual motion detection in the horizontal plane, we wanted to complement current findings about motion detection processes with an experiment using abstract auditory stimuli. To be able to compare results from audiovisual motion detection either with a concrete or an abstract auditory stimulus we conducted two similar tasks in the same participants, one experiment with spatial and one with frequency-based auditory motion. In these we investigated the influence of an auditory stimulus on a visual motion task in order to expand current knowledge about cross-modal motion detection. We had a unimodal visual task as control condition as well. We expected lower threshold values for the audiovisual tasks compared to the unimodal task (1) because it was shown that an additional auditory stimulus facilitates visual motion detection.8,47 Furthermore we hypothesized lower threshold values for congruent trials compared to incongruent trials (2) (referring to matching and nonmatching auditory and visual stimuli), due to significantly better detection for congruent motion than for incongruent motion.6,7,14,18,49 In our experimental design a third condition was implemented considering the synchrony of auditory and visual stimuli. The spatiotemporal window for cross-modal integration can be up to 1500ms,55,56 depending on the complexity of the time window. For our condition, the auditory stimulus was preceding the visual one but continued during visual stimulation. We thought that the preceding auditory stimulus functions as a cue and leads to priming of the participant. Therefore, we expected that a preceding congruent auditory stimulus leads to lower detection thresholds while a preceding incongruent auditory stimulus would lead to a higher detection threshold (3) than in the simultaneous condition. Results showed that visual motion was influenced by incongruent frequency-based auditory motion and by asynchronous spatial auditory motion leading to a complex and differentiated picture of different forms of audio–visual motion detection.

Material and methodsSubjects

Subjects (n=20) were eight male and twelve female healthy volunteers with a mean age of 24.2 (±3.5) years. Exclusion criteria were neurological, psychiatric (self-report) or ophthalmological disorders and auditory defects.

All subjects underwent a series of pre-investigations to check for exclusion criteria of visual and auditory defects. Visual acuity was tested for near- and long-distance reading (acuity of 20/20, corresponding to 1.0, was necessary, correction with contact lenses and glasses was allowed). The Lang Test61 and the Ishihara Color Vision test62 served as measures for stereo and color vision. An audiometry with 8 frequencies for each ear was applied for assessment of hearing (HTTS software, Berlin, Germany).

Stimuli

The experiment consisted of visual and auditory stimuli for two different motion directions. Visual stimulation was always concrete. The auditory stimulus was either concrete or abstract. The configurations are described in more detail under ‘Experimental Design’.

Auditory stimuli

In the abstract auditory configuration (vertical) there was one tone with increasing frequency (400–1200Hz) and one with decreasing frequency (1200–400Hz). The frequency was linearly increasing/decreasing. Each tone had a duration of 3000ms. The frequency range was chosen because this range appeared comfortable and very well perceived. A pilot study revealed that tones with higher frequencies in the optimal hearing range were rated uncomfortable by the subjects. Therefore tones similar to those in other studies22,25 were used. Loudness was the same for all participants.

In the concrete auditory configuration (horizontal) there was only one tone which was changing loudness from one ear to another to induce the perception of a moving tone. This tone had the frequency of 400Hz.

Visual stimuli

A screenshot of the visual stimuli can be seen in Fig. 1. For both configurations the screen was gray with a red fixation dot in the center. The fixation dot in each test had a size of 6arcmin. Visual stimuli were restricted to a central area of 10×10cm (11.3×11.3°). In this area 100 black dots with a size of 10arcmin each were moving with a velocity of 400arcmin/s. A percentage of these 100 dots was always moving in either upward or downward direction (vertical abstract configuration) or else in either leftward or rightward direction (horizontal concrete configuration) while the other dots were moving in random directions. Presentation time of the visual stimuli was 3000ms and dots were visible until they reached the edge of the central presentation area of 10×10cm. The percentage of dots moving together was computed for every trial depending on the response of the participant in the preceding trial. This design corresponds to a threshold value computation by a staircase method: How many dots must be moving together so that the participant can recognize the direction with a performance of 75% correct? Please find a detailed description of this design under ‘Data Analysis’.

Figure 1.

View of the visual stimulus for both configurations. In both configurations the visual stimulus was shown for 3000ms. The screen is gray and in the center is the red fixation dot. Around the fixation dot there are 100 black dots which are moving in partly random directions.

(0.03MB).
Experimental designConfigurations

Our design comprised two different configurations and three different conditions. The configurations defined the motion of both the visual and the auditory stimuli: in the abstract configuration the visual stimuli were moving up- and downwards and the auditory stimuli had increasing and decreasing frequencies. For the concrete configuration the visual stimuli were moving left- and rightwards and the auditory stimuli were moving from the left to the right headphone or vice versa.

In the following we use the terms ‘frequency-based’ (abstract auditory stimuli) and ‘spatial’ (concrete auditory stimuli) to differentiate between the two different configurations.

Conditions

For each configuration there were three different conditions: a simultaneous audiovisual, an asynchronous audiovisual and a visual control condition. The audiovisual conditions comprised 80 trials and the visual condition comprised 40 trials. In the simultaneous conditions auditory and visual stimuli were presented simultaneously for 3000ms. In the asynchronous condition the auditory stimuli preceded the visual one by 1500ms so that there was 1500ms overlap of auditory and visual stimulation. In the visual control condition no auditory stimuli were present. We ended up with six different conditions: up-down simultaneous, up-down asynchronous, up-down visual, left-right simultaneous, left-right asynchronous and left-right visual.

Trials

For all audiovisual conditions there were two different types of trials: congruent and incongruent ones. In the congruent trials visual and auditory stimuli were moving in the same direction, e.g. dots were moving upwards (or leftwards) and frequency was increasing (or loudness moving left). In the incongruent trials visual and auditory stimuli were moving in different directions, e.g. dots were moving upwards and the frequency was decreasing. This holds true for both configurations. As all audiovisual conditions comprised 80 trials, half of them were congruent and the other half was incongruent. Threshold values were computed for congruent and incongruent trials separately. The visual control condition comprised 40 trials and a threshold value for this condition was computed. We ended up with ten different threshold values, two of them just for visual motion detection and eight for audiovisual motion detection (depicted in Table 1).

Table 1.

Overview of the Experimental Design for audiovisual motion detection. For the configuration three separated experimental tests were conducted by each participant: Visual only; and simultaneous and asynchronous audiovisual tests (both consisting of congruent and incongruent trials).

Configuration  Condition  Trial 
Frequency-basedSimultaneousCongruent 
Incongruent 
AsynchronousCongruent 
Incongruent 
Visual only
SpatialSimultaneousCongruent 
Incongruent 
AsynchronousCongruent 
Incongruent 
Visual only
Experimental procedure

All following tests were presented on a computer monitor. Spatial resolution of the monitor was 1600×1200 pixels (39×31 degrees) and the frame rate was 75Hz. Subjects sat in 50cm viewing distance in front of the monitor with their head in a chin rest. They wore stereo-headphones (Sennheiser HD 201, frequency spectrum of 21–18000Hz, sound pressure level of max. 108dB) when required.

Subjects had two colored buttons in their hands: the red one in the left and the green one in the right hand. In the frequency-based (vertical) configuration the red button should be pressed when a motion was perceived in downward direction and the green button should be pressed for an upward motion direction. In the spatial (horizontal) configuration the right (green) and left (red) hands corresponded to motions to the right and to the left.

All subjects accomplished all configurations in counterbalanced order (four audiovisual and two visual experimental conditions) with short breaks in between. During audiovisual conditions they wore headphones and were informed about hearing an auditory stimulus. For all conditions the subjects had to direct their attention to the visual stimulus on the screen and to respond to the perceived motion direction via button-press. The instruction to the participant was to press the button as fast as possible.

Data analysis

The experimental design used for this study corresponds to a threshold value computation with a staircase method. During stimulus presentation the actual percentage of dots moving in the same direction was computed by Matlab66 for every trial. The aim was to measure the threshold of the percentage of dots that must be moving in the same direction so that the subject correctly detects the motion direction with an accuracy of 75% (for each participant for every configuration in the different conditions, like in Table 1). Congruent and incongruent trials for each condition are tested during one experimental trial but yield separate threshold values.

Threshold values for all participants have been compared with SPSS (IBM SPSS Statistics 23). First, an analysis took place of each configuration separately. An Anova for repeated measurements was conducted 1) between the audiovisual and the visual control condition (to detect the general effect of an auditory stimulus), 2) between congruent and incongruent trials (to determine the effect of congruence) and 3) between simultaneous and asynchronous conditions (to determine the influence of a synchrony of stimulus onset). Second, both configurations were compared by paired t-tests to check for differences between frequency-based and spatial auditory stimulation on visual motion detection. Normal distribution as prerequisite for t-test calculation was fulfilled.

The same was done for reaction time data obtained for all measurements.

Results

In this study two experiments (frequency-based and spatial) were conducted with the same participants. Main aim was to expand current knowledge about cross-modal motion detection by investigating the influence of an auditory stimulus on a visual motion task on a more concrete and a more abstract level of processing. We used three different conditions: a unimodal visual condition, a simultaneous audiovisual condition and an asynchronous audiovisual condition. In the following the results for the two different configurations of the motion detection task are presented separately. Normality of the data was verified by the Kolmogorov–Smirnov Test.

Frequency-based configurationsPerformance data

Mean values and statistical results for the frequency-based configuration can be seen in Fig. 2. The mean threshold value for the simultaneous conditions in the congruent trials was 30.0 (±9.8)% and for the incongruent trials it was 33.5 (±7.8)%. For the asynchronous conditions the mean threshold was 31.3 (±10.6)% for the congruent trials and 32.6 (±8.3)% for the incongruent trials. The visual condition yielded a mean threshold value of 28.5 (±9.7)%.

Figure 2.

Mean threshold values (±standard error) for all frequency-based configurations. Significant differences (p<0.05) are expressed by * and trends of difference (p<0.1) are expressed by T.

(0.08MB).

An ANOVA for repeated measurements for the frequency-based conditions yielded no significant difference, F(4,76)=1.406, p=0.24. Post hoc paired t-tests between single conditions yielded significant results: 1) simultaneous congruent versus incongruent conditions (p=0.025) and 2) visual condition versus simultaneous incongruent trials (p=0.016). A trend was observed for visual versus asynchronous incongruent trials (p=0.087). Effect size measures (η2=0.285) showed a small effect.

Reaction time data

Mean values and statistical results for the reaction time data of the frequency-based configuration can be seen in Fig. 3. The mean reaction time for the simultaneous conditions in the congruent trials was 2013.8 (±278.8)ms and for the incongruent trials it was 1940.6 (±282.9)ms. For the asynchronous conditions the mean reaction time was 1548.6 (±187.3)ms for the congruent trials and 1625.2 (±198.7)ms for the incongruent trials. In the visual condition the mean reaction time was 1685 (±180.4)ms.

Figure 3.

Mean reaction time data (±standard error) for all frequency-based configurations. Significant differences (p<0.05) are expressed by * and trends of difference (p<0.1) are expressed by T.

(0.1MB).

An ANOVA for repeated measurements for the reaction times in this configuration yielded no significant difference but a trend, F(4,76)=2.502, p=0.099. Post hoc paired t-tests between single conditions yielded significant results between simultaneous and asynchronous congruent trials (p=0.014). Several trends were observed as well: 1) asynchronous congruent versus incongruent conditions (p=0.077), 2) simultaneous versus asynchronous incongruent trials (p=0.058) and 3) visual condition versus simultaneous congruent trials (p=0.093). Effect size measures (η2=0.116) showed a small effect.

Spatial configurationsPerformance data

Mean thresholds were 34.4 (±8.18)% for the congruent trials and 37.8 (±7.04)% for the incongruent trials in the simultaneous condition (Fig. 4). For the asynchronous condition the mean threshold values were 30.0 (±8.34)% for the congruent and 33.0 (±8.12)% for the incongruent trials. In the visual condition the mean threshold value was 37.8 (±9.3)%.

Figure 4.

Mean thresholds (±standard error) for all spatial configurations. Significant differences (p<0.05) are expressed by * and trends of difference (p<0.1) are expressed by T.

(0.09MB).

An ANOVA for repeated measurements for the frequency-based conditions yielded no significant difference, F(4,76)=2.263, p=0.1. Post hoc paired t-tests between single conditions yielded significant results: 1) visual versus asynchronous congruent trials (p=0.035) and 2) visual versus asynchronous incongruent trials (p=0.036). Additionally, two trends were observed: 1) simultaneous congruent and incongruent conditions (p=0.056) and 2) simultaneous incongruent and asynchronous incongruent conditions (p=0.055). Effect size measures (η2=0.312) showed a medium effect.

Reaction time data

Mean reaction times were 1970.2 (±297.9) ms for the congruent trials and 1959.9 (±286.6) ms for the incongruent trials in the simultaneous condition (Fig. 5). For the asynchronous condition the mean reaction times were 1660.4 (±207.3) ms for the congruent and 1709.8 (±229.9) ms for the incongruent trials. In the visual condition the mean reaction time was 1691.5 (±220.7) ms.

Figure 5.

Mean reaction time data (±standard error) for all spatial configurations. No significant differences between conditions were obtained.

(0.09MB).

An ANOVA for repeated measurements for the frequency-based conditions yielded no significant difference, F(4,76)=1.199, p=0.311. Post hoc paired t-tests between single conditions yielded no significant results or trends and effect size measures (η2=0.059) showed a small effect.

Comparison of both configurations

One-sided paired t-tests between both configurations showed significant differences for the congruent trials of the simultaneous condition (t=−1.800; p=0.044) and for the visual condition (t=−4.005; p=0.000). For these comparisons the frequency-based configuration yielded smaller threshold values than the spatial configuration.

Simultaneous and asynchronous conditions from both configurations were compared by a one-sided paired t-test which showed a trend (t=1.572; p=0.06). Asynchronous stimulation was slightly better than simultaneous presentation.

Table 2 contains an overview over the results of unimodal visual presentation compared to audiovisual stimulation.

Table 2.

Comparison of unimodal visual and audiovisual interactions.

  Simultaneous  Asynchronous 
Frequency congruent  Øa  Ø 
Frequency incongruent  –  (−) 
Spatial congruent  Ø 
Spatial incongruent  Ø 
a

Plus and minus denote significant improvements or impairment, brackets indicate a trend and no effect is shown by Ø.

Reaction time data did not significantly differ across configurations.

Discussion

The main aim of this study was to investigate the influence of a moving auditory stimulus of different abstract levels on visual motion detection. Twenty participants conducted three different subtasks, containing a unimodal visual condition, a simultaneous audiovisual test and an asynchronous audiovisual task, in two configurations (frequency-based and spatial auditory motion).

Main results in the abstract frequency-based configuration were that incongruent trials mostly impaired the detection process compared to congruent and unimodal visual trials (see Fig. 2 and Table 2). A significant incongruent effect was only observed for the simultaneous condition. Preceding auditory stimuli led to an assimilation of congruent and incongruent thresholds, and both tended to be not as good as pure visual stimulation (Fig. 2 and Table 2). In the more direct, spatial auditory motion condition, priming (concurrency effect) improved motion detection compared to unimodal visual presentations (Fig. 4 and Table 2). A small congruence effect was again observed for the simultaneous condition, while the asynchronous condition lacked a significant congruent advantage. Effects were not mediated by reaction times, as these did not differ between conditions in the spatial conditions (Fig. 5) and only showed significant differences in the frequency-based conditions between congruent synchronous and asynchronous trials. Between both configurations no significant differences were detected with regard to reaction times.

Cross-modality & congruence

The first two hypotheses expected better detection thresholds for audiovisual than for unimodal visual stimuli and that congruent trials lead to better results than incongruent trials. Hypothesis 1 must be rejected for the frequency-based configuration, as the threshold for the unimodal visual condition was lowest. Additionally, incongruence significantly increased the threshold in the simultaneous condition, leading to acceptance of hypothesis 2. In the spatial configuration both congruent and incongruent trials of the asynchronous condition showed significantly lower detection thresholds than the visual task, thus here hypothesis 1 can be partly accepted. Simultaneous and unimodal visual conditions did not differ significantly. Hypothesis 2 cannot be completely accepted as well, because statistics only detected a trend and not a significant difference between congruent and incongruent trials in the simultaneous condition.

Prime and Harris51 did not find facilitation effects in their experiment either. Prediction of a moving visual stimulus was best in the unimodal condition and impaired in incongruent (auditory spatially displaced) conditions which resembles results found in our frequency-based configuration but not in the spatial configuration. Further studies dealing with frequency- or pitch-based6,49 and spatial motion detection14,47 used apparent motion stimuli, coherent dot paradigms and gratings. They showed a directional visual motion bias in the direction of the auditory stimulus. Other experiments did not find congruence effects13,60 and argue for a probability summation at the decision level. Another study24 challenged this view as it demonstrated that an additional auditory stimulus can facilitate visual motion detection, even if it is non-informative. The authors argue that multisensory integration can indeed occur on sensory levels. Sadaghiani et al.7 performed a combined psychophysical and fMRI study and showed that different brain areas are involved in processing audiovisual motion (horizontal) and speech signals. Audiovisual pitch signals (vertical) were associated with moderate signal increases in both ‘motion’ and ‘speech’ areas. The authors argued that audiovisual motion integration emerges at the perceptual level while audiovisual speech integration emerges at the decision level. Maeda et al.6 also showed that presentation of words did not directly influence visual motion at the perceptual level but at the decision level. Pitch processing is both associated with spatial dimensions27 but also closely connected to language processing and therefore involves early and higher areas of the brain. In our experiment pitch (frequency-based) and motion (spatial) configurations were mixed. Participants may have adopted a decision-strategy for the frequency-based trials (no facilitation, but impaired by incongruence) and applied this strategy to the spatial trials as well (no facilitation effects; similar to Ref. 60).

Asynchrony

Hypothesis 3 dealt with the comparison of simultaneous and asynchronous presentation. The priming effect we expected can, if only, be seen in the spatially incongruent conditions, where the preceding auditory stimulus yields a better detection threshold (statistical trend). In both configurations, frequency-based and spatial, the asynchronous congruent and incongruent thresholds did not significantly differ from each other. Simultaneous and asynchronous congruent trials did also not significantly differ from each other. A comparison of all simultaneous trials with all asynchronous ones from both configurations yielded a trend, with slightly better results for asynchronous than for simultaneous presentation. Analysis of reaction time data showed significantly faster reaction times for congruent asynchronous than simultaneous trials and a trend for the incongruent trials in the frequency-based configuration. Therefore, only reaction time data in this configuration slightly resemble the expected results.

Another explanation for slightly better detection in asynchronous trials may be that preceding presentation of an auditory stimulus might have an attentional effect on the perception of the stimuli.63,64 Thus, the asynchronous conditions might have induced an increased alertness in participants leading to an improved accuracy or faster reaction times just because attention was directed on the task. However, hypothesis 3 must be rejected, as no clear significant effects are found, although the results tend in the expected direction. In line with other studies,7,13,21,53,54 we see a reduced – rather small – influence of the auditory stimulus on visual motion detection for congruent and incongruent asynchronous stimuli. Although in our experiment the auditory stimulus was preceding and then concurring with the visual stimulus, integration of both seems not to have taken place.

Abstract vs. concrete motion

In both configurations we find that congruent and incongruent trials differ in the simultaneous condition (statistical significance in the frequency configuration and a trend in the spatial configuration), but they do not differ in the asynchronous condition. Furthermore, the difference between asynchronous compared to synchronous stimulation across both configurations is nearly significant. Unimodal visual stimulation can be affected in both configurations: either positively by asynchrony in the spatial configuration or negatively by incongruence in the frequency-based configuration (Table 2). This is a very important finding as it shows that concurrency has an effect on spatial audiovisual stimuli and that congruence has an effect on frequency-based audiovisual stimuli.

Our auditory stimuli were comparable to the auditory stimuli used by Jain et al.49 These authors showed that the motion direction of the auditory modality biased the perceived direction of the visual motion stimulus along three dimensional axes (vertical, horizontal and depth). They also used vertical displacement of auditory stimuli (spatial) and sounds gliding up and down in pitch (abstract) but they used ambiguous gratings (two oppositely moving sinusoidal gratings of the same spatial frequency) as visual stimuli. So they had an unambiguous auditory and an ambiguous visual stimulus while we used only unambiguous stimuli. One might infer that influencing ambiguous visual stimuli by an additional acoustic motion stimulus is more effective than influencing an unambiguous visual stimulus. In the former case the visual motion perception is biased by the auditory stimulus while in the latter case the visual motion detection is impaired/facilitated by the auditory motion direction.

Conclusion

We conclude that a congruence effect, or more precisely an incongruence effect, was present in the frequency-based configuration. Moreover, temporal asynchrony had an effect on acoustic spatial motion. The preceding (priming) auditory stimulus in the asynchronous condition induced an equalization of congruent and incongruent thresholds in both the spatial and the frequency-based configuration and diminished the congruence effect visible for the simultaneous condition. Cross-modal perception is a highly complex process currently not fully understood and we present results which are not in line with several previous studies casting doubts on the generality of the results of their studies.6,14,47 One might infer that multimodal facilitation and congruence effects between the visual and acoustic domain differ between frequency-based and spatial acoustic motion. For spatial audiovisual integration asynchrony is an important factor while it is not for frequency- or pitch-based audiovisual integration. Only for pitch-based acoustic movement the congruence of motion direction of stimuli plays an essential role.

In the context of recovery from visual impairments and improvement of visual perception, multisensory integration and audio–visual interactions play an important role. In hearing-impaired people additional visual input significantly increases speech perception.33 Additionally, cross-modal distraction is an indicator for hearing loss.65 Similarly, it may be the case that cross-modal distraction for the visual domain might be an indicator for a visual deficit. Therefore, slight impairments in the visual system could be assessed by an audio–visual motion task in which deficits would become obvious when the influence of the auditory system over the visual system grows. This would mean that incongruent trials would mostly be answered based on auditory information and not on visual information and therefore would lead to false answers and an increased detection threshold. As this topic presents rather complex mechanisms, multisensory tests for improvement of visual functions should be selected and designed carefully and adapted to the need of the patient. Thus, in order to find an appropriate test, the mechanisms of audio–visual motion within different paradigms should be further clarified.

Ethical approval

The study was approved by the local ethics committee of the University of Bremen and carried out in accordance to the Declaration of Helsinki. All subjects signed a written consent form.

Funding

This research received no specific grant from any funding agency in the public, commercial, or notfor-profit sectors. The first author received a PhD grant from the German National Academic Foundation (Studienstiftung des Deutschen Volkes).

Conflict of interest

The authors declare that they have no conflict of interest.

Acknowledgments

The authors would like to thank Dennis Trenner for writing the programs for data acquisition and analysis

References
[1]
D. Burr, P. Thompson.
Motion psychophysics: 1985–2010.
Vis Res, 51 (2011), pp. 1431-1456
[2]
J. Driver, C. Spence.
Multisensory perception: beyond modularity and convergence.
Curr Biol, 10 (2000), pp. R731-R735
[3]
S. Hidaka, W. Teramoto, M. Keetels, J. Vroomen.
Effect of pitch-space correspondence on sound-induced visual motion perception.
Exp Brain Res, 231 (2013), pp. 117-126
[4]
N. Kitagawa, S. Ichihara.
Hearing visual motion in depth.
Nature, 416 (2002), pp. 172-174
[5]
T. Koelewijn, A. Bronkhorst, J. Theeuwes.
Attention and the multiple stages of multisensory integration: a review of audiovisual studies.
Acta Psychol (Amst), 134 (2010), pp. 372-384
[6]
F. Maeda, R. Kanai, S. Shimojo.
Changing pitch induced visual motion illusion.
Curr Biol, 14 (2004), pp. R990-R991
[7]
S. Sadaghiani, J.X. Maier, U. Noppeney.
Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing.
J Neurosci, 29 (2009), pp. 6490-6499
[8]
D. Sanabria, C. Spence, S. Soto-Faraco.
Perceptual and decisional contributions to audiovisual interactions in the perception of apparent motion: a signal detection study.
Cognition, 102 (2007), pp. 299-310
[9]
L. Shams, R. Kim.
Crossmodal influences on visual perception.
Phys Life Rev, 7 (2010), pp. 269-284
[10]
S. Shimojo, L. Shams.
Sensory modalities are not separate modalities: plasticity and interactions.
Curr Opin Neurobiol, 11 (2001), pp. 505-509
[11]
C. Spence.
Multisenory integration – solving the crossmodal binding problem. comment on “crossmodal influences on visual perception” by shams & kim.
Phys Life Rev, 7 (2010), pp. 285-286
discussion 295–8
[12]
W. Teramoto, Y. Manaka, S. Hidaka, et al.
Visual motion perception induced by sounds in vertical plane.
Neurosci Lett, 479 (2010), pp. 221-225
[13]
D. Alais, D. Burr.
No direction-specific bimodal facilitation for audiovisual motion detection.
Brain Res Cogn Brain Res, 19 (2004), pp. 185-194
[14]
O. Baumann, M.W. Greenlee.
Neural correlates of coherent audiovisual motion perception.
Cerebral Cortex, 17 (2007), pp. 1433-1443
[15]
C. Frings, C. Spence.
Crossmodal congruency effects based on stimulus identity.
Brain Res, 1354 (2010), pp. 113-122
[16]
S. Gleiss, C. Kayser.
Oscillatory mechanisms underlying the enhancement of visual motion perception by multisensory congruency.
Neuropsychologia, 53 (2014), pp. 84-93
[17]
P. Jaekl, A. Perez-Bellido, S. Soto-Faraco.
On the ‘visual’ in ‘audio–visual integration’: a hypothesis concerning visual pathways.
Exp Brain Res, 232 (2014), pp. 1631-1638
[18]
R. Kim, M.A. Peters, L. Shams.
0+1>1: how adding noninformative sound improves performance on a visual task.
Psychol Sci, 23 (2012), pp. 6-12
[19]
D.J. Lewkowicz, K.S. Kraebel.
The value of multisensory redundancy in the development of intersensory perception 67.
Dev Psychol, 373 (1986), pp. 373-377
[20]
J.J. McDonald, W.A. Teder-Salejarvi, S.A. Hillyard.
Involuntary orienting to sound improves visual perception.
Nature, 407 (2000), pp. 906-908
[21]
G.F. Meyer, S.M. Wuerger, F. Rohrbein, C. Zetzsche.
Low-level integration of auditory and visual motion signals requires spatial co-localisation.
Exp Brain Res, 166 (2005), pp. 538-547
[22]
J.A. Mossbridge, M. Grabowecky, S. Suzuki.
Changes in auditory frequency guide visual-spatial attention.
Cognition, 121 (2011), pp. 133-139
[23]
T. Noesselt, S. Tyll, C.N. Boehler, E. Budinger, H.J. Heinze, J. Driver.
Sound-induced enhancement of low-intensity vision: multisensory influences on human sensory-specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity.
J Neurosci, 30 (2010), pp. 13609-13623
[24]
C. Spence, J. Driver.
Audiovisual links in exogenous covert spatial orienting.
Percept Psychophys, 59 (1997), pp. 1-22
[25]
Y. Takeshima, J. Gyoba.
Changing pitch of sounds alters perceived visual motion trajectory.
Multisens Res, 26 (2013), pp. 317-332
[26]
W.A. Teder-Salejarvi, T.F. Munte, F. Sperlich, S.A. Hillyard.
Intra-modal and cross-modal spatial attention to auditory and visual stimuli. an event-related brain potential study.
Brain Res Cogn Brain Res, 8 (1999), pp. 327-343
[27]
K.K. Evans, A. Treisman.
Natural cross-modal mappings between visual and auditory features.
J Vis, 10 (2011),
6.1–12
[28]
C. Spence.
Crossmodal correspondences: a tutorial review.
Atten Percept Psychophys, 73 (2011), pp. 971-995
[29]
D. Alais, J. Cass.
Multisensory perceptual learning of temporal order: audiovisual learning transfers to vision but not audition.
[30]
A. Bendixen, S. Grimm, L.Y. Deouell, N. Wetzel, A. Madebach, E. Schroger.
The time-course of auditory and visual distraction effects in a new crossmodal paradigm.
Neuropsychologia, 48 (2010), pp. 2130-2139
[31]
S. Soto-Faraco, C. Spence, A. Kingstone.
Cross-modal dynamic capture: congruency effects in the perception of motion across sensory modalities.
J Exp Psychol Hum Percept Perform, 30 (2004), pp. 330-345
[32]
J.L. Gilbert, C.R. Lansing, S.M. Garnsey.
Seeing facial motion affects auditory processing in noise.
Atten Percept Psychophys, 74 (2012), pp. 1761-1781
[33]
J. Kim, A. Sironic, C. Davis.
Hearing speech in noise: seeing a loud talker is better.
Perception, 40 (2011), pp. 853-862
[34]
C. Maguinness, A. Setti, K.E. Burke, R.A. Kenny, F.N. Newell.
The effect of combined sensory and semantic components on audio–visual speech perception in older adults.
Front Aging Neurosci, 3 (2011), pp. 19
[35]
J. Rouger, S. Lagleyre, B. Fraysse, S. Deneve, O. Deguine, P. Barone.
Evidence that cochlear-implanted deaf patients are better multisensory integrators.
Proc Natl Acad Sci U S A, 104 (2007), pp. 7295-7300
[36]
T. Michikawa, Y. Nishiwaki, Y. Kikuchi, et al.
Gender-specific associations of vision and hearing impairments with adverse health outcomes in older japanese: a population-based cohort study.
BMC Geriatr, 9 (2009), pp. 50
[37]
S.G. Lomber, M.A. Meredith, A. Kral.
Adaptive crossmodal plasticity in deaf auditory cortex: areal and laminar contributions to supranormal vision in the deaf.
Prog Brain Res, 191 (2011), pp. 251-270
[38]
J.P. Rauschecker, L.R. Harris.
Auditory compensation of the effects of visual deprivation in the cat's superior colliculus.
Exp Brain Res, 50 (1983), pp. 69-83
[39]
R. Rettenbach, G. Diller, R. Sireteanu.
Do deaf people see better?. Texture segmentation and visual search compensate in adult but not in juvenile subjects.
J Cogn Neurosci, 11 (1999), pp. 560-583
[40]
N.M. Dundon, C. Bertini, E. Ladavas, B.A. Sabel, C. Gall.
Visual rehabilitation: visual scanning, multisensory stimulation and vision restoration trainings.
Front Behav Neurosci, 9 (2015), pp. 192
[41]
F. Frassinetti, N. Bolognini, D. Bottari, A. Bonora, E. Ladavas.
Audiovisual integration in patients with visual deficit.
J Cogn Neurosci, 17 (2005), pp. 1442-1452
[42]
J. Lewald, M. Tegenthoff, S. Peters, M. Hausmann.
Passive auditory stimulation improves vision in hemianopia.
[43]
S. Nishida.
Advancement of motion psychophysics: review 2001–2010.
J Vis, 11 (2011), pp. 11
[44]
D. Talsma, D. Senkowski, S. Soto-Faraco, M.G. Woldorff.
The multifaceted interplay between attention and multisensory integration.
Trends Cogn Sci, 14 (2010), pp. 400-410
[45]
A. Alink, F. Euler, N. Kriegeskorte, W. Singer, A. Kohler.
Auditory motion direction encoding in auditory cortex and high-level visual cortex.
Hum Brain Mapp, 33 (2012), pp. 969-978
[46]
S. Hidaka, W. Teramoto, Y. Sugita, Y. Manaka, S. Sakamoto, Y. Suzuki.
Auditory motion information drives visual motion perception.
[47]
G.F. Meyer, S.M. Wuerger.
Cross-modal integration of auditory and visual motion signals.
Neuroreport, 12 (2001), pp. 2557-2560
[48]
S. Soto-Faraco, C. Spence, A. Kingstone.
Assessing automaticity in the audiovisual integration of motion.
Acta Psychol (Amst), 118 (2005), pp. 71-92
[49]
A. Jain, S.L. Sally, T.V. Papathomas.
Audiovisual short-term influences and aftereffects in motion: examination across three sets of directional pairings.
J Vis, 8 (2008),
7.1–13
[50]
A. Caclin, P. Bouchet, F. Djoulah, E. Pirat, J. Pernier, M.H. Giard.
Auditory enhancement of visual perception at threshold depends on visual abilities.
Brain Res, 1396 (2011), pp. 35-44
[51]
S.L. Prime, L.R. Harris.
Predicting the position of moving audiovisual stimuli.
Exp Brain Res, 203 (2010), pp. 249-260
[52]
M. Radeau, P. Bertelson.
Auditory–visual interaction and the timing of inputs.
Psychol Res, 49 (1987), pp. 17-22
[53]
S. Soto-Faraco, A. Kingstone, C. Spence.
Multisensory contributions to the perception of motion.
Neuropsychologia, 41 (2003), pp. 1847-1862
[54]
J. Vroomen, B. de Gelder.
Sound enhances visual perception: cross-modal effects of auditory organization on vision.
J Exp Psychol Hum Percept Perform, 26 (2000), pp. 1583-1590
[55]
H. Colonius, A. Diederich.
The optimal time window of visual-auditory integration: a reaction time analysis.
Front Integr Neurosci, 4 (2010), pp. 11
[56]
R.N. Denison, J. Driver, C.C. Ruff.
Temporal structure and complexity affect audio–visual correspondence detection.
Front Psychol, (2013), pp. 3
[57]
R.A. Stevenson, M.T. Wallace.
Multisensory temporal integration: task and stimulus dependencies.
Exp Brain Res, 227 (2013), pp. 249-261
[58]
Y. Sugita, Y. Suzuki.
Audiovisual perception: implicit estimation of sound-arrival time.
Nature, 421 (2003), pp. 911
[59]
A. Ogawa, E. Macaluso.
Audio–visual interactions for motion perception in depth modulate activity in visual area V3A.
Neuroimage, 71 (2013), pp. 158-167
[60]
S.M. Wuerger, M. Hofbauer, G.F. Meyer.
The integration of auditory and visual motion signals at threshold.
Percept Psychophys, 65 (2003), pp. 1188-1196
[61]
J. Lang.
Klin. Mbl. Augenheilkunde, (1983), pp. 373-375
[62]
S. Ishihara.
Ishihara's Tests for Colour-Blindness.
Kanehara & Co., LTD, (1986),
[63]
J.J. McDonald, W.A. Teder-Sälejärvi, F.D. Russo, S.A. Hillyard.
Neural basis of auditory-induced shifts in visual time-order perception.
Nature Neurosci, 8 (2005), pp. 1197-1202
[64]
W. Feng, V.S. Störmer, A. Martinez, J.J. McDonald, S.A. Hillyard.
Sounds activate visual cortex and improve visual discrimination.
J Neurosci, 34 (2014), pp. 9817-9824
[65]
S. Puschmann, P. Sandmann, A. Bendixen, C.M. Thiel.
Age-realted hearing loss increases cross-modal distractibility.
[66]
MATLAB Statistics Toolbox Release.
The MathWorks Inc..
Natick, (2013),

Both authors contributed equally to this work.

Copyright © 2017. Spanish General Council of Optometry
Journal of Optometry
Article options
Tools

Are you a health professional able to prescribe or dispense drugs?