Inaudible components of the human infant cry influence haemodynamic responses in the breast region of mothers

Doi, Hirokazu; Sulpizio, Simone; Esposito, Gianluca; Katou, Masahiro; Nishina, Emi; Iriguchi, Mayuko; Honda, Manabu; Oohashi, Tsutomu; Bornstein, Marc H.; Shinohara, Kazuyuki

doi:10.1007/s12576-019-00729-x

Original Paper
Published: 30 November 2019

Inaudible components of the human infant cry influence haemodynamic responses in the breast region of mothers

Hirokazu Doi¹^na1,
Simone Sulpizio^2,3^na1,
Gianluca Esposito^4,5,
Masahiro Katou⁶,
Emi Nishina⁷,
Mayuko Iriguchi¹,
Manabu Honda⁸,
Tsutomu Oohashi⁹,
Marc H. Bornstein^10,11 &
…
Kazuyuki Shinohara¹

The Journal of Physiological Sciences volume 69, pages 1085–1096 (2019)Cite this article

5695 Accesses
4 Citations
2 Altmetric
Metrics details

Abstract

Distress vocalizations are fundamental for survival, and both sonic and ultrasonic components of such vocalizations are preserved phylogenetically among many mammals. On this basis, we hypothesized that ultrasonic inaudible components of the acoustic signal might play a heretofore hidden role in humans as well. By investigating the human distress vocalization (infant cry), here we show that, similar to other species, the human infant cry contains ultrasonic components that modulate haemodynamic responses in mothers, without the mother being consciously aware of those modulations. In two studies, we measured the haemodynamic activity in the breasts of mothers while they were exposed to the ultrasonic components of infant cries. Although mothers were not aware of ultrasounds, the presence of the ultrasounds in combination with the audible components increased oxygenated haemoglobin concentration in the mothers’ breast region. This modulation was observed only when the body surface was exposed to the ultrasonic components. These findings provide the first evidence indicating that the ultrasonic components of the acoustic signal play a role in human mother–infant interaction.

Introduction

The cry qua distress vocalization is fundamental for survival and is preserved phylogenetically among many mammals [1, 7]. The vocalizations emitted by infants are acoustically similar across a wide array of taxonomic families [26]. Moreover, parental behaviour is governed by many phylogenetically preserved principles that are conserved from rodents to humans [35]. Determining the acoustic constituents of the cry and their functions are at the core of understanding human mother–infant interaction because of the signal role of the cry in mammalian caregiving.

In mammals other than humans, such as rodents, cats, and primates [5, 9, 36, 37], high-frequency components in cry sounds (> 20 kHz) are emitted by young offspring to signal distress [47] due to hunger, physical discomfort, isolation, or capture by predators. These vocalizations elicit strong physiological and behavioural responses in caregivers. Considering that humans share similar neural circuits for processing infant cries with other mammalian species [3, 23], it seems plausible to hypothesize that humans also possess the neural machinery to process the ultrasonic cry sounds of infants [29].

To date, the cry sounds of human infants have been thought to contain only audible frequencies, with an average fundamental frequency of 300–600 Hz [17]. Here we ascertained that human infant cries contain ultrasonic components with frequencies (in some cases) exceeding 80 kHz (see Fig. 1) by using a purpose-made apparatus that allowed us to record and reproduce sounds with audible (< 20 kHz) and ultrasonic (> 20 kHz) components. Inspired by this initial observation, we then investigated the functional value of ultrasonic sounds in infant cry sounds.

Breastfeeding is a defining mammalian maternal behaviour [18]. It has been demonstrated that infants in a state of hunger emit cry sounds with particular acoustic characteristics that prompt breastfeeding [26]. Of particular relevance to the present study, Vuorenkoski et al. [43] reported that exposure to the cry sounds of an infant induces an increase in the temperature of the mother’s breast region. Skin temperature rise in the breast region related to breastfeeding has been observed in other studies [20, 42] and is generally attributed to increased blood influx induced by oxytocin secretion [42], partly because there is a close linkage between thermal regulation and blood circulation [15, 39]. Further, exposure to infant cry sound is reported to induce increases in heart rate [16, 32]. On the basis of these, we decided to assay the potency of the ultrasonic components of cry sounds to modulate haemodynamic responses in the breast region.

Experiment 1 was designed to elucidate the nature of the ultrasonic effect of the infant cry by, first, determining whether ultrasonic components of a typical infant cry influence the haemodynamic response in mothers and, second, by determining whether ultrasonic components of the cry alone would be sufficient to induce a haemodynamic response in mothers. We measured haemodynamic responses in the breast region of mothers in response to three types of cry sounds: natural cries, scrambled cries, and ultrasonic only cries. Both natural cries and scrambled cries contained audible and inaudible components, but the frequency structure of the inaudible components was disrupted in the scrambled cries. Because the audible components were left intact in the scrambled cries as well as in natural cries, these two types of cries sounded the same. Ultrasonic only cries contained only the inaudible components of the cry sound.

Haemodynamic activity in the mothers’ breasts was recorded through dual-channel near-infrared spectroscopy (NIRS), with two sensors attached directly to the skin surface of the right and left breasts. Analyses focused on the concentration of oxygenated and deoxygenated haemoglobin (oxyHb/deoxyHb) during the presentation of the cries. OxyHb/deoxyHb measurement is a sensitive indicator of a change in breast blood flow [40]. The comparison between haemodynamic responses to natural and scrambled cries supposedly reveals effects, if any, of the ultrasonic components in infant cry sounds. We included ultrasonic only cries as sound stimuli to ascertain whether the ultrasonic cry sounds alone would induce haemodynamic responses in mothers.

In their seminal study on the effects of ultrasonic sounds on humans, Oohashi et al. [30] claimed that the effects of ultrasonic components of sounds on neural and behavioural responses (“hypersonic effect”) are observed only when the listener’s entire body is exposed to the ultrasonic sounds, indicating a reliance of the “hypersonic effect” on systems other than, or in addition to, the auditory system. Thus, it is possible that, if there are any modulatory influences of ultrasonic components of the infant cry on the haemodynamics of the mother’s breast, they may be mediated by a mechanism similar to that proposed by Oohashi et al. [30].

To investigate this possibility, we conducted a second experiment, in which mothers were exposed to the same set of cry sounds used in experiment 1, but through headphones that conveyed ultrasonic as well as audible components of the sounds. If the perceptual system outside the inner ear plays a pivotal role in the induction of the ultrasonic effects of the infant cry, an effect of ultrasonic cry sounds similar to that observed in experiment 1 should not be observed in experiment 2, because the mothers’ bodily surface is not exposed to the cry sounds.

Experiment 1

Methods

Participants

Seventeen healthy mothers (M age = 32. 3 years, SD = 4.5) took part (babies’ M age = 5.3 months; SD = 2.1) after giving written informed consent.

Materials and stimuli

The original cry sounds used for the creation of the experimental stimuli in experiment 1 and experiment 2 were chosen from a database of infant cries. We used spontaneous infant cries recorded from four different infants (aged 4–10 months). All infants were born at term and showed no signs of clinical conditions at birth or at the time of recording. Cries were recorded at least 2 h after the most recent breastfeeding to collect recordings of one bout of hunger cry from each infant. Recordings were performed using a free-field microphone (40BE; G.R.A.S Sound & Vibration, Vedbaek, Denmark), a microphone preamplifier (26CB; G.R.A.S. Sound & Vibration, Vedbaek, Denmark), and a dual-channel sensor amplifier (SR-2200; Ono Sokki, Tokyo, Japan). The signals were digitized by a signal processor (0202 USB 2.0 Audio Interface; E-MU Systems, Scotts Valley, California, USA), with an A/D sampling frequency of 192 kHz, and stored on a PC. The microphone was situated at a constant distance of 15 cm from the infants’ mouth, and the total duration of the infants’ crying was recorded.

Recorded sounds of cries originally differed in length, with two cries having short recording lengths (1.35 and 2.07 s) and two having longer recording lengths (21.97 and 20.5 s). To create cry segments of equal duration and of a reasonable length to elicit an ultrasonic effect [31], four sound files of cries lasting for 45 s were made by duplicating and concatenating the original cry recordings.

In experiment 1, four different natural cries (original recordings of cry sounds, containing both audible and intact ultrasonic components, produced by four different babies) were used. Two further versions of each cry were created: one with a scrambled ultrasonic component (scrambled cries) and one containing only the ultrasonic cry components (ultrasonic only cries). To create the scrambled cries, we first isolated the ultrasonic components of each cry by applying a high-pass filter to the sound using a 22-kHz cut-off frequency. The waveforms above the cut-off frequency were divided into 20 ms segments. Each ultrasonic waveform segment was Fourier-transformed, its phase values within frequency domain being scrambled, and then inverse Fourier-transformed to yield scrambled waveform segments. Then, scrambled ultrasonic components were created by concatenating these scrambled waveform segments in the original order [2]. Finally, after adjusting the RMS of the sound pressure of scrambled ultrasonic components with that of corresponding natural cry, we spliced the scrambled ultrasonic components onto the audible components of the cry to synthesize the scrambled cries.

Ultrasonic only cries were created using high-pass filtering of each of the natural cries with a cut-off frequency of 22 kHz. In contrast to the natural cries and scrambled cries, the ultrasonic only cries did not contain audible components and were inaudible to participants. Spectrograms of example sounds in each condition are shown in Fig. 2. The averaged sound pressure levels of each type of sound against background noise were 56.9 ± 4.47 dB for natural cry, 57.0 ± 4.43 dB for scrambled cry, and 30.3 ± 2.24 dB for ultrasonic only cry.

Apparatus and procedures

Each participant engaged in fNIRS measurement and a detection task that aimed to verify the validity of experimental manipulation. The detection task was conducted after the completion of the fNIRS measurement.

fNIRS measurement

Stimuli were presented through a 192-kHz high-resolution audio system, which allowed us to control stimulus presentation and play the ultrasonic components and audible components of cries through a speaker and a super tweeter. Specifically, we used a system designed with a 2-way monitor speaker (RL906; musikelectronic githain gmbh, Germany) for the presentation of audible range components and a custom-made super tweeter (Trb-001-ngs; Katou Acoustics Consultant Office, Japan) with frequency response 20–96 kHz for the presentation of inaudible high-frequency range components. The two speakers were positioned in front of the participant at a distance of approximately 50 cm, as shown in Fig. 3. We presented the cry sounds through the simultaneous presentation of low and high frequencies. Sounds within audible and ultrasonic frequency ranges were presented through speaker and super tweeter, respectively.

For fNIRS measurement, we measured the oxyHb and deoxyHb in participants’ breast region using a dual-channel NIRS (NIRO-220, Shimadzu. Co.) during the presentation of the three types of cries. fNIRS emitters and probes were attached to the upper inner quadrant of both breasts [40], as shown in Fig. 3. To attach the emitters and probes, a rubber probe holder (approximately 60 × 30 mm) was affixed to the breast. The modified Lambert–Beer law was used to calculate the oxyHb and deoxyHb. The sampling rate was 1 Hz.

Participants sat in front of a 19-inch computer screen and speakers and passively listened to the cries. The temporal sequence of stimulus presentation was as follows. A white fixation cross subtending approximately 1.8° in height and 1.8° in width was displayed against black background at the centre of the screen for 15 s to serve as the baseline. The cry stimulus was then presented for 45 s. Simultaneously with the onset of cry stimulus, the colour of the fixation cross changed from white to red. The colour change of fixation cross was incorporated into the experimental design so that participants noticed the start of stimulus presentation even when only inaudible sounds were being played in the ultrasonic only condition. At the end of cry stimulus, the fixation colour changed back to white, and there was a 20-s post-stimulation period during which a white fixation cross was presented at the centre of the screen. Trials were separated by 5-s inter-trial intervals during which the screen was blank (only black background was presented). Before starting the experiment, participants received verbal instructions from the experimenter and were asked to minimize their bodily movements. Three types of experimental blocks were created: one for the presentation of the natural cries, one for the presentation of scrambled cries, and one for the presentation of ultrasonic only cries. Each type of experimental block was presented twice, resulting in a total of six blocks. The order of the presentation of the four sound files of each cry type was randomized within each block, and the block order was pseudo-randomly determined across participants. The entire session lasted for approximately 45 min.

Detection task

At the start of each trial, a white fixation cross subtending approximately 1.8° in height and 1.8° in width appeared on the screen. 1 s after the appearance of the fixation cross, a short (3 s) excerpt of a cry sound was presented. The participant’s task was to press the “l” key with her right index finger as soon as she heard a sound. When the participant pressed a key, the sound presentation was terminated and the experiment proceeded to the next trial. If the participant did not press the key, the sound file was played for 3 s and the experiment automatically proceeded to the next trial. The white fixation cross remained on the screen while the sound was played, and there was no inter-trial interval. Thus, the fixation cross was presented throughout the task. The short excerpts of the four sound files that were used in each condition (natural cries, scrambled cries, ultrasonic only cries) of the fNIRS measurement were each presented twice in a pseudo-random order.

Data analysis

In the analysis, oxyHb waveforms were smoothed with a five-point moving average procedure and linearly detrended, and the oxyHb value in each temporal point was transformed into standardized oxyHb. The standardized oxyHb was computed as follows: First, the mean of the oxyHb values during the 15-s baseline period was subtracted from the oxyHb. Then, the oxyHb value was divided by the standard deviation of the oxyHb values obtained during the baseline period. Thereafter, the waveforms of the standardized oxyHb in all the trials of the same condition were averaged to generate the waveforms of standardized oxyHb for each participant in each condition. Standardized deoxyHb waveforms were computed for each participant in the same manner. Due to the high peak sound pressure in the original recordings, there were segments with signal overflow in some of the sound files, which introduces the possibility of clipping in some segments of stimulus sounds. However, we used data of all the eligible trials in the final analysis to increase the signal-to-noise ratio.

In the first set of statistical analyses, the average of the standardized oxyHb/deoxyHb during the whole 45-s stimulation period was used as the dependent variable. OxyHb/deoxyHb were then analysed by a two-way analysis of variance (ANOVA) with the type of cry (natural cries vs scrambled cries vs ultrasonic only cries) and the channel side (left vs. right) as within-participant factors.

The measured waveforms of oxyHb/deoxyHb in each condition showed clear temporal fluctuation. Thus, in the second set of analyses, we examined the temporal course of the influences of cry type on haemodynamic response. To achieve this, baseline period, stimulation period and post-stimulation period were segmented into 5-s time-windows. Then, oxyHb/deoxyHb in each condition was averaged within each time-window. This resulted in total of 3 cry types × 2 channel sides × 16 time-windows (3 time-windows during 15-s baseline, 9 time-windows during 45-s cry stimulus presentation and 4 time-windows during 20-s post-stimulation period) = 96 values for oxyHb and deoxyHb each. We decided to include the post-stimulation period in this analysis because several fNIRS studies have reported lasting influence of sensory stimulation on cortical haemodynamic responses after the end of stimulus presentation ([8, 24] for a review). OxyHb/deoxyHb were then analysed by a three-way ANOVA with the channel side (2), time-window (16), and the type of cry (3) as within-participant factors.

Results

The temporal course of oxyHb in each condition is shown in Fig. 4a. A 2 × 3 ANOVA with oxyHb as the dependent variable showed a main effect of the type of cry [F (2, 32) = 6.47, p = 0.004, η²_p = 0.29]. The ANOVA table is presented in Table 1.

Table 1 Table of ANOVA results on oxyHb in experiment 1

Full size table

Multiple comparisons by Holm’s sequentially rejective Bonferroni’s method revealed a higher level of oxyHb on presentation of natural cries than on presentation of scrambled cries [t (16) = 3.06, adjusted p = 0.022] and ultrasonic only cries [t (16) = 2.70, adjusted p = 0.031]; responses to the scrambled cries and ultrasonic only cries did not differ from each other [t (16) = 0.27, adjusted p = 0.78]. No effect of channel side was observed, and no interaction between the channel side and the type of cry emerged (Fs < 2, ps > 0.20).

A 2 × 16 × 3 ANOVA with oxyHb as the dependent variable revealed a significant main effect of the type of cry [F (2, 32) = 6.39, p = 0.0046, η²_p = 0.29]. This main effect was qualified by a significant two-way interaction between time-window and the type of cry [F (30, 480) = 2.22, p = 0.0003, η²_p = 0.12]. No other effect reached significance (Fs < 1.3, ps > 0.14).

Simple main effect analysis revealed a significant simple main effect of the type of cry in seventh to eleventh time-windows that roughly correspond to the latter half of stimulus presentation period as summarized in Table 2. Pairwise comparisons by Holm’s sequentially rejective Bonferroni’s method were carried out in each time-window. The results of pairwise comparisons are summarized in Table 3. As can be seen, oxyHb in response to natural cry sounds was higher than both scrambled and ultrasonic only cries in the eighth time-window around the apex of oxyHb fluctuation, but the conditional difference was less clear in the other time-windows.

Table 2 ANOVA table of simple main effect of the type of cry on oxyHb in each time-window in experiment 1

Full size table

Table 3 Results of pairwise comparisons in time-windows in which simple main effect of the type of cry reached significance

Full size table

The temporal course of deoxyHb in each condition is shown in Fig. 4b. A 3 × 2 ANOVA with deoxyHb as the dependent variable revealed no significant effects (Fs < 2.4, ps > 0.10). The ANOVA results are summarized in Table 4.

Table 4 Table of ANOVA results on deoxyHb in experiment 1

Full size table

A 2 × 16 × 3 ANOVA with deoxyHb as the dependent variable revealed a marginally significant main effect of the type of cry [F (2, 32) = 2.70, p = 0.082, η²_p = 0.14]; deoxyHb tended to decrease most prominently in the natural cry condition. This main effect was qualified by a significant two-way interaction between time-window and the type of cry [F (30, 480) = 2.03, p = 0.012, η²_p = 0.11]. No other effect reached or approached significance (Fs < 1.5, ps > 0.25). Simple main effect analysis revealed a significant simple main effect of the type of cry in the fourteenth time-window during the post-stimulation period [F (2, 32) = 5.14, p = 0.012, η²_p = 0.24]. Pairwise-comparisons revealed significantly higher deoxyHb to the scrambled than natural cry [t (16) = 2.98, adjusted p = 0.03]. No other pairwise comparisons reached significance after adjustment (ts < 1.75, adjusted ps > 0.20). Simple main effect of the type of cry failed to reach significance in the other time-windows (Fs < 2.8, ps > 0.10).

In the detection task, participants pressed the key every time they were exposed to sound excerpts of natural cries or scrambled cries (100%). Participants almost never pressed the key on the presentation of the ultrasonic only cries (< 1.5%).