This application incorporates by reference the entire contents of each of U.S. application Ser. No. 15/057,867 filed on Mar. 1, 2016; U.S. application Ser. No. 15/057,890 filed on Mar. 1, 2016; and U.S. application Ser. No. 15/057,842 filed on Mar. 1, 2016.
Certain example embodiments of this invention relate to speech privacy systems and/or associated methods. More particularly, certain example embodiments of this invention relate to speech privacy systems and/or associated methods that disrupt the intelligibility of speech by, for example, superimposing onto a speech signal a replica of the original speech signal in which portions of it are delayed and/or adjusted in phase and/or adjusted in amplitude, with the time delays and/or amplitude adjustments oscillating over time.
Protecting speech privacy has become an increasingly important task in modern workplaces. Those who speak would like the content of their speech to be confined to their offices or conference rooms. Unintended listeners, on the other hand, would like not to be disturbed by the unnecessary oral information. Irritating speech from others is also problematic in settings other than offices including, for example, homes, libraries, banks, and/or the like, e.g., where people are often unaware that their speech is disturbing to others.
In fact, there are a number of potential adverse effects elicited by enduring annoying sounds. These adverse effects can range from productivity losses for organizations (e.g., for failure to maintain, and/or interruptions in, concentration) to medical issues for people (e.g., the onset of headaches caused by annoying sounds, irritability, increased heart rate, and/or the like) and even to the urge to seek a new work environment. Misophonia, a learned condition relating to the association of sound with something unpleasant, also happens from time-to-time. Some people suffer from acoustic hyper-vigilance or oversensitivity to certain sounds and intruding speech.
In many settings, sound annoyance oftentimes is related to loudness, abruptness, high pitch and, in the case of speech sounds, the speech content. In many cases, there are certain components in speech or noise that make them particularly disruptive or irritating. With respect to speech content, humans tend, regardless of the volume, to strain to hear what is said, which has been found to subconsciously add to the annoyance. That is, once one is aware of somebody speaking, one oftentimes becomes involuntarily involved, adding a sort of subconscious annoyance.
People oftentimes are irritated by high frequencies (e.g., sounds in the 2,000-4,000 Hz range). These sounds do not need to be of high intensity to be perceived to be loud. In this regard,
Sound waves, including speech, propagate primarily in a longitudinal way, by alternating compressions and rarefactions of air. When the waves hit a wall, the distortion of molecules creates pressure on the outside of the wall that, in turn, emanates secondary sound.
It will be appreciated that it would be desirable to design a wall with noise-cancellation, including speech-disrupting properties, for at least some settings. Some construction materials, including glass, are poor sound insulators. At the same time, use of glass is often advantageous, as it provides an excellent visual connectivity between offices and can contribute to the engagement of employees. Thus, it will be appreciated that it would be desirable to design an optically transparent wall with noise-cancellation properties, including speech disrupting properties, for at least some of these settings.
Sound-insulating windows have been known in the art. One mainstream approach involves increasing the Sound Transmission Class (STC) of the wall. STC is an integer rating of how well a wall attenuates sound. It is weighted over 16 frequencies across the range of human hearing. STC can be increased, for example, by using certain spacing in connection with double-pane glass walls in order to destructively resonate sound; increasing the STC of single- or double-pane walls by increasing thickness of the glass, and/or using laminated glass.
Unfortunately, however, these techniques come at a cost. For example, increasing the thickness of single-pane glass allows only modest sound abatement, while adding to the cost. The use of double-pane glass, albeit more effective, typically requires the use of at least two comparatively thick (e.g., 6-12.5 mm) glass sheets. These approaches also typically require high tolerances in the wall construction, and the use of special pliant mechanical connections in order to avoid flanking effects. Glass of such thickness is heavy and expensive, and results in a high installation cost.
Furthermore, double-pane walls typically work well primarily for low-frequency sounds. This can limit their effectiveness to a smaller number of applications such as, for example, to exterior walls to counteract the low-frequency noise of jet and car engines, noise of seaports, railways, etc. At the same time, most speech sounds responsible for both annoyance and speech recognition lie within the 1800+Hz range. It therefore would be desirable to achieve noise cancellation in this higher-frequency range, e.g., in order to help block irritating components and increase speech privacy.
Instead of abating higher-frequency noise, some acoustical solutions focus on sound masking. For instance, sounds of various frequencies may be electronically overlapped through a speaker, so that the extra sound is provided “on top of” the original noise. Sound masking can include Nature sounds ranging from waterfall and rain sounds to fire crackling and thunderstorm sounds. Various types of artificially-generated masking noises such as, for example, white, pink, brown, and other noises, also are used in this regard. A main purpose of these sound-masking techniques involves reducing annoyance of the surrounding noises, and such approaches can indeed obscure the irritation. Unfortunately, however, it also creates additional noise, which some people perceive as irritating in itself. One problem of the above-mentioned sound masking techniques is that their frequencies lie outside the range of frequency of appearance of syllables—the building blocks of speech. See, for example,
Still another example approach for achieving noise cancellation is used in Bose headphones. This approach involves registering incoming noise and creating a counteracting noise that is out of phase with the registered incoming noise. Although it is relatively easy for one to isolate oneself from the environment by wearing headphones, doing so does not prevent the person wearing the headphones from making noises that others find disturbing. That is, even though the person wearing the headphones might have created an isolating environment on an individual level, there is still an issue in creating an isolation area for a group such that others in the group cannot hear what is being said. Additionally, one difficulty of this concept for walls is that it typically only works well on a small area and is suitable primarily for continuous low-frequency sounds (such as, for example, the hum of engines). One reason for this is that only a narrow band of frequencies can be effectively tuned out of phase, and the higher the frequencies, the smaller the aural space of the effective noise cancellation would be.
Thus, it will be appreciated that it would be desirable to provide for techniques that overcome some or all of the above-described and/or other speech-masking problems. For example, it will be appreciated that it would be desirable to provide acoustic techniques that help reduce or otherwise compensate for sounds, including speech, that cause irritation and annoyance to people.
The inventor has recognized that it would be desirable to block the content of the speech from being understood by people around the person speaking in environments such as, for example, open or enclosed office spaces and/or other environments, adjacent offices separated by thin walls with low STC, vehicles (including, for example, commercial and private vehicles such as cars, trucks, trains, airplanes, etc.), bank teller spaces, hospitals, police stations, conference rooms, etc. Indeed, there seemingly is an ever-increasing demand in acoustic privacy, broadly speaking, in modern office spaces.
Current techniques, including the sound-masking and sound-cancelling techniques discussed above, do not target the content of the speech, and are not specifically speech intelligibility disrupting technologies. In fact, noise masking techniques know in the art are, in a fundamental way, not intended to effectively disrupt speech without causing a great deal of additional annoyance. In this regard, the inventor has realized that although the fundamental frequencies of human speech do lie in the same frequency spectrum as some of the available masking noises and/or ranges that can be at least partially cancelled, information-containing blocks have been found to appear at the essentially different frequencies. Information-containing blocks in this context are formants, which represent the energy bursts of sound.
It thus has been recognized that it would be desirable to develop an acoustic-masking technique directed to disrupting the informational content of the speech without causing an additional annoyance. It will be appreciated that masking techniques generally add a certain amount of loudness on top of the original speech. The techniques of certain example embodiments add only a small amount of additional loudness, e.g., because they specifically target essentials cues of speech, such as formants.
In certain example embodiments, a method for disrupting speech intelligibility is provided, the method comprising: receiving, via a microphone, an original speech signal corresponding to original speech; generating an intelligibility-disrupting masking signal comprising smeared cues of the original speech in the original speech signal; and reducing the level of intelligibility of the original speech signal by outputting, through a speaker, the intelligibility-disrupting masking signal comprising the smeared speech cues.
Devices and systems incorporating such functionality also contemplated herein, as are walls incorporating such devices and systems.
The features, aspects, advantages, and example embodiments described herein may be combined to realize yet further embodiments.
These and other features and advantages may be better and more completely understood by reference to the following detailed description of exemplary illustrative embodiments in conjunction with the drawings, of which:
Certain example embodiments relate to an acoustic wall assembly that uses active (by electronic means) sound reverberation to achieve speech intelligibility disruption functionality, and/or a method of making and/or using the same. Reverberation, added in an active manner, helps to mask irritating sounds that originate from inside or outside of a room equipped with such a wall assembly. This approach includes, for example, helping to make otherwise potentially disturbing speech be perceived as unintelligible (and thus, less annoying), in certain example embodiments.
Certain example embodiments add noise-masking and speech-disruptive properties to walls with a low STC, advantageously allowing for low-cost, low-weight solutions with speech-privacy qualities. Certain example embodiments may be used in high-STC walls, e.g., as a measure to further improve speech privacy and/or noise masking.
Reverberation sometimes is advantageous when compared to common sound-abating and masking techniques. For example, reverberation in some instances adds only the loudness necessary to disrupt speech or noise. No or only minimal unnecessary additional noise is created in some embodiments. Reverberation also advantageously is not restricted to specific wall assembly dimensions and/or geometries, can work equally well at low and high frequencies, and is “forgiving” with respect to the presence of flanking losses (which otherwise sometimes undermine sound isolation as a result of sound vibrations passing through a structure along an incident path such as, for example, through framing connections, electrical outlets, recessed lights, plumbing pipes, ductwork, and other acoustical gaps). Reverberation also advantageously is resistant to surveillance. Speech masked by white noise sometimes can be easy to decipher (e.g., by removing the additional randomly generated noise from the signal), and reverberation is difficult to decode because there basically is no reference signal (e.g., it is basically self-referenced). Furthermore, reverberation in at least some instances is activated by the original speech signal, and its volume is automatically adjusted to follow the volume of the original signal. An additional benefit of using reverberation relates to its ability to disrupt so-called “beating,” which is a potentially irritating infra-sound constructed by two different sound frequencies. Although infra-sound may not always be heard, per se, it can have an adverse subconscious effect. Still further, reverberation may be advantageous from a cost perspective, because it merely disrupts the informational part of speech, rather than trying to completely cover it at an expense of loudness. Indeed, reverberation oftentimes will require less energy than the addition of white noise, for example.
When it comes to speech in particular, certain example embodiments are effective in: disrupting the rhythm of speech, including fundamental frequencies and their harmonics; masking key acoustic cues of overlapping syllables and vowels; eliminating artificially created infra-sound with sub-threshold frequencies that resonate adversely with the brain waves; etc. Certain example embodiments use reverberation in the range of 4-6 Hz, which corresponds to the number of syllables pronounced per second in normal English speech).
Reverberation time, T60, is one measure associated with reverberation. It represents the time required for sound to decay 60 decibels from its initial level. Rooms with different purposes benefit from different reverberation times.
T60 can be calculated based on the Sabine formula:
In this formula, V is the volume and Se is a combined effective surface area of the room. The Se of each wall is calculated by multiplying the physical area by the absorption coefficient, which is a textbook value that varies for different materials. The following table provides the sound absorption coefficients of some common interior building materials.
An example of the effect that reverberation can have is presented in
As indicated above, certain example embodiments may use active approaches for triggering reverberation to serve in noise-masking and speech intelligibility disrupting roles. As will become clearer from the description below, active approaches may involve electronic, electromechanical, and/or selectively-controllable mechanical apparatus, to disrupt sound waves incident on and/or proximate to a wall assembly or the like. Passive approaches may complement such techniques, in certain example embodiments. In this regard, passive approaches may involve (for example) wall assemblies specifically engineered to trigger reverberation, e.g., through the incorporation of holes in the wall assemblies and/or the attachment or other formation of sound reverberating components therein and/or thereon, using natural properties of the thus-formed wall itself, etc.
Referring once again to
The sound masking circuit 608 determines whether the signal that is provided to it from the microphone 606 is within one or more predetermined frequency ranges, and/or contains noise with the one or more predetermined frequency ranges therein. A bandpass or other filter that is a part of the sound masking circuit 608 may be used in this regard. One of the one or more predetermined frequency ranges may correspond to speech and/or noise determined to be psychoacoustically disruptive, disturbing, or annoying. One of the one or more predetermined frequency ranges may correspond to the 2800-3200 Hz range, which helps to mask the sounds of most consonants (which may be the most statistically effective manner of masking sounds) and the information-carrying sounds of at least some syllables. One of the one or more predetermined frequency ranges may correspond to the frequency range of formants, as opposed to the fundamental frequency of speech, e.g., as discussed in detail below.
Responsive to the detection of sound waves in the one or more predetermined frequency ranges, the sound masking circuit 608 creates a masking signal and actuates the speaker 610, e.g., to generate sound waves to smear, via a reverberative and/or other effect, noise in a predetermined frequency range that otherwise would pass through the wall. This includes, for example, disrupting the informational part of the perceived speech, thus reducing its intelligibility. Doing so, in turn, helps to selectively mask the detected sound waves as they pass from outside the outside major surface 600a of the wall 600 to inside the inside major surface 600b of the wall 600, thereby helping to reduce annoyance caused to the listener(s) 604. That is, the reverberation 612 in certain example embodiments helps disrupt perceived speech and/or irritating noises. In certain example embodiments, the noise in essence is concealed in a non-constant, potentially “on demand” or dynamic manner. Advantageously, this effect helps guard against surveillance, as laser microphones (for example) cannot pickup discrete sounds, reverberation is self-referencing and thus harder to decipher, there is no added white noise that can easily be subtracted, etc.
Although the microphone 606 and speaker 610 are shown on opposite sides of the wall 600 in
In addition to or in place of reverberation, certain example embodiments may implement active masking by means of reverse masking. The noise masking enabled by the sound masking circuit 608 may be performed in accordance with an algorithm (e.g., a reverberation algorithm) that uses a technique such as, for example, standard convolution, enhanced convolution, reverse reverberation, delay-controlled reverberation, and/or the like. The sound masking circuit 608 may process incoming noise 602 and control the speaker 610 in accordance with output from the algorithm, in certain example embodiments. In certain example embodiments, the algorithm may change the perceived loudness of incident noise in the time domain. Further details concerning an example algorithm that may be used in connection with certain example embodiments are provided below.
The wall 600 may be formed from any suitable material such as, for example, one or more sheets of drywall, glass, polycarbonate, plaster, and/or the like. In certain example embodiments, the wall or material(s) comprising the wall has/have acoustic absorption coefficients ranging from: 0.03-0.3 at 125 Hz, 0.03-0.6 at 250 Hz, 0.03-0.6 Hz at 500 Hz; 0.03-0.9 at 1000 Hz, 0.02-0.9 at 2000 Hz, and 0.02-0.8 at 4000 Hz. In this regard,
With respect to a cross-sectional view, the outer and inner major surfaces 600a and 600b may be separate drywall surfaces separated, for example, by metal and/or wooden studs, or the like. The speaker 610 and/or sound masking circuit 608 may be provided above the wall 600 (e.g., in the ceiling and below, for example, an upper slab), to the side of the wall 600, or within the gap between the outer and inner major surfaces 600a and 600b. Similar to the above, the sound masking circuit 608 may be connected to a side of the wall 600 but concealed from view (e.g., by being hidden in the ceiling, behind molding, within the gap between the outer and inner major surfaces 600a and 600b, etc.). The same may be true for the microphone 606. The speaker 610 may generate reverberation 612 proximate to the top and/or sides of the wall 600, within the sides of the wall 600, etc., thereby triggering reverberation therein, thereof, or proximate thereto. Thus, in certain example embodiments, the wall 600 may be said to comprise first and second substantially parallel spaced apart substrates (of or including glass and/or the like), with the speaker 610 and the sound masking circuit 608 being located therebetween and/or thereon.
As alluded to above, the wall may be of or include glass. That is, certain example embodiments may be directed to a glass wall used in connection with an acoustic wall assembly. The glass wall may comprise, one, two, three, or another number of sheets of glass. The glass may be regular float, heat-strengthened, tempered, and/or laminated glass. In certain example embodiments, the wall may be of or include an insulated glass (IG) unit, a vacuum insulated glass (VIG) unit, and/or the like. An IG unit may include first and second substantially parallel spaced apart substrates, with an edge seal formed around peripheral edges, and with the cavity between the substrates optionally being filled with an inert gas (e.g., Ar, Xe, and/or the like) with or without air. A VIG unit may include first and second substantially parallel spaced apart substrates, with an edge seal formed around peripheral edges, and spacers, with the cavity between the substrates being evacuated to a pressure less than atmospheric. Framing may be provided around the IG unit and/or the VIG unit in some instances, and that framing may be a part of the acoustic wall assembly. In certain example embodiments, other transparent materials may be used. In certain example embodiments, the naturally high sound-reflection coefficient of glass may be advantageous, e.g., when triggering reverberation and/or other noise masking effects.
In certain embodiments, one or more speakers may be located outside the wall 700. For instance, speakers may be located on one, two, or more sides of the wall 700, e.g., in or proximate to areas where some or all of listener(s) 704a-704d may be located, e.g., to mask the noise, disrupt the intelligibility of speech, etc. In such cases, reverberative effects 712a-712b and/or the like may be generated outside the wall 700. In addition, or in the alternative, one or more speakers may be located in the room to disrupt the sound therein, e.g., if potentially disruptive sound is generated in the room, outside the room, or both inside and outside the room.
It is believed that a wall's lateral dimensions may mostly affect the fundamental spectral regions of speech and their lower harmonics, while the distance between the two sheets of a wall primarily will affect high-frequency components and their higher harmonics. An example embodiment of a glass wall has dimensions 10 ft.×12 ft., with air spacing between two sheets of glass preferably in the range of 1-20 cm, more preferably in the range of 7-17 cm, and an example separation of 10 cm.
The logging of step S912 may include, for example, creation of a record in a data file stored to a non-transitory computer readable storage medium and/or the like (e.g., a flash memory, a USB drive, RAM, etc.). The record may include a timestamp indicating the start and stop times of the event, as well as a location identifier (e.g., specifying the wall at which the sound was detected for instance in the event that there are multiple walls implementing the technology disclosed herein, the microphone that detected the sound for instance in the event that there are multiple microphones in a given wall, etc.). Information about the frequency range(s) and/or signals detected and/or generated may be stored to the record, as well. In certain example embodiments, circuitry may store a digital or other representation of the detected and/or generated sound, e.g., in the record or in an associated data file. As a result, speech or other noises may be recorded, potentially with entire conversations being captured and archived for potential subsequent analysis. For instance, the sound masking circuit (for example) may be used as a recording device (e.g., like a security camera, eavesdropping device, sound statistics monitoring device, and/or the like). In certain example embodiments, information may be stored locally and/or transmitted to a remote computer terminal or the like for potential follow-up action such as, for example, playback of noise events and/or conversations, analysis of same (e.g., to help reveal what types of noises were recorded most, what time of day is the noisiest, who makes the most kinds of different noises, etc.). Transmission may be accomplished by removing physical media (such as a flash drive, USB drive, and/or the like), through a wired connection (e.g., including transmissions over a serial, USB, Ethernet, or other cable), wirelessly (e.g., by Wi-Fi, Bluetooth, over the Internet, and/or other like), etc. Information may be transmitted periodically and/or on-demand in different example embodiments.
In certain example embodiments, the sound masking circuit may be programmed to determine whether incident noise corresponds to a known pattern or type. For example, although annoying, alarm sounds, sirens, and/or the like, may be detected by the sound masking circuit and allowed to go through the wall assembly for safety, informational, and/or other purposes.
In certain example embodiments, the sound masking circuit may be programmed to operate as both a sound (e.g., speech) disrupter (e.g., through the use of reverberation and/or the like), as well as a sound sweetener. With respect to the latter, the sound masking circuit may generate reverberative and/or pleasant sounds to help mask potentially annoying noises and/or disrupt the intelligibility of speech. Pleasant sounds may be Nature sounds (e.g., the sound of the ocean, thunder, rain, waterfalls, etc.), sounds of animals (e.g., dolphins), soothing music, and/or the like. These sounds may be stored to a data store accessible by the sound masking circuit. When appropriate (e.g., when triggering reverberation as described above), the sound masking circuit may retrieve the sound sweetener and provide it as output to a speaker or the like (which may be, for example, the same or different speaker as is used as the air pump in certain example embodiments).
It will be appreciated that passive approaches to noise disruption and/or cancellation may be used in certain example embodiments, e.g., as the wall itself may be structured to serve as a reverberation-inducing resonator that involves acoustic contrast. This may be accomplished by having one or more (and preferably two or more) openings, slits, and/or the like, formed in the acoustic wall assembly, thereby using natural properties of the wall itself to create reverberative effects of a desired type. These features may be formed on one side of the acoustic wall assembly, adding to the acoustics of the wall assembly directional properties. For example, at least one opening may be made in the outside pane of a double-pane wall in order to make the effect directional, and so that the effect of reverberation is more pronounced outside of the wall. As another example, at least one opening may be made in the inside pane of the double-pane wall. This may be advantageous for some applications, like music halls, which may benefit from additional sound reverberation that makes sounds seem richer.
In certain example embodiments, additional reverberating elements may be affixed to a wall. The sound-masking reverberation-inducing element(s) may be provided in a direct contact with a single or partial wall, so the wall can act as a sound source in certain example embodiments. In certain example embodiments, the sound-masking reverberation-inducing element(s) may be provided between the walls in a wall assembly. Sound masking advantageously results in an increased noise/signal contrast, which makes speech perceived behind a single or partial wall less comprehensible and irritating sounds less annoying.
In certain example embodiments, a first set of features may be formed in and/or on an inner pane and a second set of features may be formed in and/or on an outer pane, e.g., keeping some annoying or disruptive sounds out and improving the acoustics “on the inside.” In certain example embodiments, multiple sets of features may be formed in and/or on one or both panes of a two-pane wall assembly, with each set of features targeting a different range to be eliminated and/or emphasized.
Other natural properties of the wall assembly (including size, space between adjacent upright walls, etc.) also may be selected to trigger desirable reverberative effects, e.g., as described above.
As alluded to above, it will be appreciated that these more passive techniques may be used in addition to the active techniques discussed above, e.g., with single- or two-wall acoustic wall assemblies.
The wall assembly thus may be made in the manner of a sound resonator with specifically designed fundamental resonant frequencies. As above, any suitable material may be used in constructing the walls. For example, because glass is a naturally good resonator, certain example embodiments are able to make use of a variety of resonant harmonics, which are the integer multiples of the fundamental frequency. Regardless of the material, tailoring of the incoming sound via the features may help to disrupt the frequency ranges of the speech and noise in order to make it unintelligible and/or less annoying. For example, it is possible to target those frequency ranges associated with consonants or formants when dealing with speech, etc. Moreover, because such a wall assembly is designed for selective sound disruption, it is possible in certain example embodiments to use thin glass and longer-lasting rigid joints in the wall assembly. This construction advantageously may make the entire design more solid and reliable. When glass is used, high tolerances may be desirable in order to help maximize the effectiveness of sound resonating properties by avoiding leakage, etc.
The walls described herein may be partial walls, e.g., walls that leave open space between separated areas. That is, the acoustic walls and acoustic wall assemblies may be full-height or partial-height in different instances. Single or double panel walls also may be used. Furthermore, although certain example embodiments have been described in connection with walls and/or rooms, it will be appreciated that the techniques described herein may be used in connection with more general areas where there are no or fewer defining partitions or structurally-defined breaks (e.g., in hospital rooms where curtains separate two patient areas, in lobbies, between the front and back seats of a car, between different rows or areas of an airplane, etc.).
Although passive or active (e.g., computer-generated) reverberation has been used by the assignee to reduce perceived speech intelligibility, it has been found that further improvements are still possible. For example, the human brain is adapted to deal with echoing sounds by giving a priority to the early-arrival signal. In addition, so-called phonemic restoration is known to help the brain restore the information of missing or overlapped sounds. These two phenomena sometimes filter out the identical time-delayed replicas and preserve the intelligibility of an original speech signal. This in turn can compromise the effectiveness of a straightforward reverberation. In the example embodiments described below, another potentially more effective method of disrupting the intelligibility and reducing the annoyance of the perceived speech that takes into account these issues is presented.
Referring once again to step S908 in
The above-mentioned approach has been found to produce a robust speech disruption. However, a noticeable increase in the perceived sound loudness may sometimes occur, and listeners may experience annoyance from the increased loudness. Thus, it will be appreciated that it would be desirable to further improve the technique to disrupt the original speech without significantly adding to its loudness and potential annoyance.
Humans tend to interpret replica sounds (as long as they are similar in shape) as part of the original sound, thus effectively ignoring the informational content and only focusing on increased loudness. This is known as the precedence effect. However, the replica signal can be further modified to disrupt the informational content and help reduce the impact of the precedence effect. Certain example embodiments therefore improve upon the technique described above by selectively disrupting the masking speech signal. As will be clearer from the below, this selective disruption may take place in connection with formants, phonemes, consonant sounds, and/or other building blocks of speech.
Certain example embodiments use a frequency of oscillation of the reverberation delay in the range of several Hertz. This range is advantageous because it corresponds to the number of syllables per second in a normal English speech. As a result, certain example embodiments enable speech intelligibility to be greatly disrupted without adding a significant amount of noise. That is, it has been recognized that the information-carrying frequency of the speech is in the different frequency range than the “annoyance” portion, so targeting the former allows the speech-content disruption to take place at a low expense of the additional loudness caused by acoustic masking.
In certain example embodiments, the speech intelligibility disrupting masking signal may take the general pattern of the original speech signal. In certain example embodiments, the masking signal may be delayed with respect to the original signal, and/or multiple prerecorded voices may be added to the speech intelligibility disrupting signal (e.g., to create the perception of crowd noise). In certain example embodiments, other sounds (such as, for example, the above-described and/or other Nature sounds, sound “sweeteners,” and/or the like) may be added to further improve the speech intelligibility disruption effect.
In operation, a method for disrupting speech intelligibility comprises receiving, via a microphone or other listening device, an original speech signal. The original speech signal includes a plurality of formants (the building-blocks of speech intelligibility) and has a certain basic level of intelligibility perceivable by a human listener. The original speech signal is processed (e.g., using a hardware processor or other control circuitry) to identify frequency ranges associated with the formants that comprise the original speech signal. Various parameters then may be used to in essence alter the speech signal and make the intelligibility-disrupting masking signal. For instance, an intelligibility-disrupting signal may be generated to comprise intelligibility-disrupting formants that are in the same frequency range(s) as the formants that comprise the original speech signal, and the level of intelligibility of the resultant perceived speech can be reduced by outputting, through a speaker, the intelligibility-disrupting signal comprising the generated intelligibility-disrupting formants. The intelligibility-disrupting formants are generated within a frequency range of 0.02-8 Hz in some instances. In some cases, the intelligibility-disrupting formants are generated with a frequency of 2-6 Hz (e.g., 4 Hz).
In certain example embodiments, the intelligibility-disrupting signal may be time delayed relative to the original speech signal, e.g., such that the intelligibility-disrupting masking signal follows the general pattern of the original speech signal, is a time-delayed replica of the original speech signal, a time-phased replica of the original signal, an amplitude-modulated version of the original speech signal, and/or the like. A constant time delay range of 0-150 ms is preferred, with 40-120 ms being more preferred, and 60-110 ms being more preferred. An example delay of 80 ms may be optimal in some instances and in other instances, delays that average 80 ms may be optimal. In certain example embodiments, a dynamic reverberation additionally or alternatively may be used, e.g., such that the time delay oscillates in time.
Gain relative to the original speech signal may be adjusted, additionally or alternatively, in certain example embodiments. Furthermore, the gain can be modulated in time, as well. For example, the intelligibility-disrupting masking signal may be generated such that loudness of the intelligibility-disrupting signal oscillates in time. Preferably, the gain (corresponding to the modulated intelligibility-disrupting signal summed with the original speech signal) is not too great, as this could create negative psychoacoustic effects, e.g., by creating too much loudness or disruption. In certain example embodiments, the gain applied is up to double the corresponding original speech signal. In certain example embodiments, the gain is, or averages to, 0.05-0.25%, more preferably 0.10-0.20%, with an example being 0.15%.
In certain example embodiments, the time delay and/or amplitude adjustment may be modulated at a given frequency or given frequencies. For example, the time delay and/or amplitude adjustment may be modulated at an oscillation frequency of, or averaging to, 1-10 Hz, more preferably 2-6 Hz, and 4 Hz as an example. It will be appreciated that the modulation may be the same or different for the time delay and the amplitude adjustment in different example embodiments. The delay and/or amplitude modulation may be provided in accordance with one or more algorithms in certain example embodiments. In certain example embodiments, the delay and/or amplitude modulation may be Gaussian, random, in accordance with a waveform (e.g., a sine wave, square wave, etc.), step-wise, in conformance with a predefined pattern (e.g., an increasing then decreasing frequency oscillation, etc.), a result of the application of an algorithm, and/or the like. In certain example embodiments, a dynamic time delay modulation of 40-400 Hz, more preferably 60-300 Hz, and 80-230 H, for example, may be used.
Certain example embodiments may further comprise outputting, through the speaker, an additional masking sound signal, together with the intelligibility-disrupting signal that comprises the generated intelligibility-disrupting formants. For instance, the intelligibility-disrupting signal may be generated to include a prerecorded mix of multiple voices. In addition, or in the alternative, a sound sweetener or the like may be used.
This functionality may be incorporated into an electronic device in certain example embodiments.
As alluded to above, other building blocks of speech may be targeted in certain example embodiments. For instance, fundamental frequencies of speech are known to occur between 85 Hz and 250 Hz. On top of this low-frequency “basic channel,” there are additional building blocks of speech, which comprise (a) “inert” vowels that primarily are responsible for the energetic formants determining the “power” of voice, and (b) information-carrying consonants.
Consonants contain little energy but are believed to be essential to intelligibility (at least when it comes to English and other languages), e.g., in the form of the meaning-distinguishing phonological units, i.e. phonemes (defined by both place of articulation and loudness) and frequency-dependent tonemes. Other speech building blocks, such as duration-dependent chronemes, also may be targeted in some instances. Vowels occur between 350 Hz and 2 KHz and are primarily volume-carrying blocks of speech. Targeting the low-volume information-carrying consonants and leaving high-volume vowels intact with the help of a spectral filter may further help reduce the annoyance during speech disruption.
Various consonants differ in the degree of constriction of the vocal cavity and the timing of articulation. Even so, most of them lie in a frequency range between 1.5 kHz and 4 kHz. In this regard,
Although the onset formant transition of key consonants differs depending on the following vowel, their phoneme interpretation remains unchanged. This knowledge can be used to trigger speech disruption based on the threshold frequency of consonants, which also may be thought of as primary information-carrying speech units in some instances.
Therefore, in certain example embodiments, the generation of a masking signal may be triggered based on reaching a threshold frequency that is higher than the frequency of most vowels but lower than the frequency of most consonants (e.g., around 1.5 kHz). A preset frequency range of 1.2-2 kHz may be effective in this regard, in certain example embodiments. This approach may help prevent the replication of most vowels, which carry little informational load but contribute to unwanted loudness, and instead may help focus the replica signal on the information-carrying consonants. A high-pass acoustic filter, for example, may be used in this regard. The
The masking signal in certain example embodiments may oscillate (temporal phasing) in such a way as to provide a delay between 20 ms and 95 ms, which corresponds to the voice onset time (VOT) of most consonants. VOT is the time between the release of a “stop” consonant and the onset of voicing. Modulation frequency of temporal phasing in the 1-10 Hz range may be advantageous, 2-10 Hz being more advantageous, 2-6 Hz being still more advantageous, and with 4 Hz being one example believed to be optimal. Amplitude modulations also may be implemented in certain example embodiments. Amplitude modulations of 10-100% of the original signal, and more preferably 40-90% of the original signal, have been found to be advantageous in this regard.
Certain example techniques that take into account internal reverberations will now be described. As noted above, different rooms have potentially different acoustical properties, including potentially different T60 values measured within the room. In rooms with high T60 values, too much reverberation can be an issue. For instance, rooms that incorporate glass walls or windows can present an increased challenge when it comes to high intelligibility of speech within the room: Internal reverberations from highly sound-reflecting surfaces act as masking signals. Different rooms (including those with glass) have been found to have annoying internal acoustic reverberations therein, particularly in low-frequency ranges (e.g., 20-200 Hz). Although there are some available solutions that help deal with annoying reverberations in an interior room (including, for example, using various sound-absorbing surfaces), these solutions tend to compromise the glass transparency and tend to add a significant cost.
Certain example embodiments additionally or alternatively provide an acoustic solution for reducing (and sometimes even eliminating) annoying acoustic reverberations within a room or area caused by reverberations in low-frequency ranges. For example, certain example embodiments generate a replica of the original speech signal that has an equalized (or substantially equalized) loudness, but lacks annoying reverberation in the lower portions of the spectrum.
In this way, a modified version of the acoustic pattern corresponding to the original speech is generated so that the level of the new, combined sound is equal or substantially equal to the combined level of the original sound and the annoying reverberation. The unwanted reverberation, however, is in essence “cut out” from the resultant spectrum in the modified version of the acoustic pattern, so there are no spikes therein.
It will be appreciated that the shape of the signal that essentially is cut-out may be square-shaped, in the pattern of a sine wave, Gaussian, and/or the like. In certain example embodiments, the shape of the signal that essentially is cut-out may be more precisely tailored to match the shape of the reverberation waveforms. In some instances, a single fundamental reverberation mode may be cut out, whereas in other instances wider frequency ranges will be removed. A delta-function causing an abrupt cutoff may be used in this regard, in certain example embodiments.
Although
A test room was set up, and certain example techniques were evaluated. The test room was a typical drywall office with temporarily disabled HVAC fans, a reverberation time of 0.4 s, and no special acoustical insulation. Target speech signals were played with a Yamaha HS5 loudspeaker positioned behind one of the walls with an STC of 30. The signal was registered using a Crown Audio far-field microphone, processed with software, and played with an identical loudspeaker positioned within the room, 2 meters in front of the subject. The software used a combination of the following four audio effects: (1) constant time delay, (2) time delay varying in time (temporal phasing), (3) amplitude modulation, and (4) spectral filtering. Time delay, modulation frequency, and modulation depth were all tunable parameters. The speech stimuli were blocks of 100 prerecorded brief, 5-7 word-long, unrelated, syntactically and semantically correct utterances spoken in a normal pace by a male voice. The utterances were separately presented to each of ten subjects, who subjectively scored the perceived speech recognition and the annoyance of masking sound. All subjects were native speakers of English with normal hearing. The following types of speech maskers were used in the experiment: white noise (WN), a time-delayed clone of a target speech signal (TD), a masker that was an optimized combination of the four audio effects described above (OC), and the OC masker supplemented with a multi-talker background (OCB).
In this test, the time delay of the OC masker was set to 80 ms. Time-delay phasing and amplitude modulation was done at a rate of 3 to 5 modulations per second. Prerecorded speech of three talkers, two males and one female, speaking concurrently was used as background for the OCB masker. The OC optimization was performed to alter the clone signal just enough to smear the essential cues of target speech to make it incomprehensible at a bare minimum expenses of additional annoyance. This approach is voice-activated, and the intensity of the masking signal is constantly self-adjusted to the intensity of the target speech.
The rates of delayed phasing and amplitude modulation of 3-5 cycles per second are similar to the number of syllables per second in a normal English speech, which makes the OC masking highly selective in interfering with verbal rhythms of the target speech, as noted above. For comparison, and also as noted above, white noise and Nature sounds are poor speech maskers at moderate loudness because their temporal patterns are different from that of normal speech. Further minimization of the annoyance related to masking was performed using a spectral filter. The spectral filter balanced the contribution of spectral regions responsible for the energetic vowels and information-carrying consonants.
The scoring results are presented in
From the
Methods of making the above-described and/or other walls and wall assemblies are also contemplated herein. For the example active approaches described herein, such methods may include, for example, erecting walls, connecting microphones and air pumps to sound masking circuits, etc. Configuration steps for sound masking circuits (e.g., specifying one or more frequency ranges of interest, when/how to actuate an air pump, etc.) also are contemplated. Mounting operations may be used, e.g., with respect to the microphone and/or the air pump (including the hanging of speakers), etc. Integration with HVAC systems and/or the like also is contemplated.
In a similar vein, methods of retrofitting existing walls and/or wall assemblies also are contemplated and may include the same or similar steps. Retrofit kits also are contemplated herein.
Certain example embodiments have been described in connection with acoustic walls and acoustic wall assemblies. It will be appreciated that these acoustic walls and acoustic wall assemblies may be used in a variety of applications to alter perceived speech patterns, obscure certain irritating sound components emanated from adjacent areas, and/or the like. Example applications include, for example, acoustic walls and acoustic wall assemblies for rooms in a house; rooms in an office; defined waiting areas at doctors' offices, airports, convenience stores, banks, malls, etc.; exterior acoustic walls and acoustic wall assemblies for homes, offices, and/or other structures; outer elements (e.g., doors, sunroofs, or the like) for vehicles, as well as inner areas for vehicles (e.g., so that sitting in the front seats can be acoustically obscured from their children sitting in the back seats, and vice versa); etc. Sound masking may be provided for noises emanating from an adjacent area, regardless of whether that adjacent area is another room, outside of the confines of the structure housing the acoustic wall and acoustic wall assembly, etc. Similarly, sound masking may be provided to prevent noises from entering into an adjacent area of this or other sort.
In certain example embodiments, a method for disrupting speech intelligibility is provided, the method comprising: receiving, via a microphone, an original speech signal corresponding to original speech; generating an intelligibility-disrupting masking signal comprising smeared cues of the original speech in the original speech signal; and reducing the level of intelligibility of the original speech signal by outputting, through a speaker, the intelligibility-disrupting masking signal comprising the smeared speech cues.
In addition to the features of the previous paragraph, in certain example embodiments, the intelligibility-disrupting masking signal may be time delayed relative to the original speech signal, e.g., by 20-150 ms, 80 ms, etc.
In addition to the features of the previous paragraph, in certain example embodiments, the time delay may oscillate in time, e.g., with the time delay oscillates in time in relation to the original signal within a range of 80-230 ms.
In addition to the features of any of the three previous paragraphs, in certain example embodiments, the intelligibility-disrupting masking signal may be generated such that the amplitude of the intelligibility-disrupting masking signal is modulated in time.
In addition to the features of any of the four previous paragraphs, in certain example embodiments, the intelligibility-disrupting masking signal may be generated such that gain corresponding to the intelligibility-disrupting masking signal added to the original speech signal is 0.05-0.25%.
In addition to the features of any of the five previous paragraphs, in certain example embodiments, the time delay may oscillate with an oscillation frequency of 1-10 Hz, e.g., with the time delay oscillating with an oscillation frequency of 2-6 Hz.
In addition to the features of any of the six previous paragraphs, in certain example embodiments, smeared cues may be generated at a frequency of 0.01-20 Hz, e.g., at a frequency of 2-6 Hz.
In addition to the features of any of the seven previous paragraphs, in certain example embodiments, the method may further comprise outputting, through the speaker, the intelligibility-disrupting masking signal together with a prerecorded mix of multiple voices, e.g., with the prerecorded mix of multiple voices comprises 2-7 different voices, 3 different voices, etc.
In certain example embodiments, a speech intelligibility disrupting device comprising control circuitry may be configured to implement the functionality of any of the eight previous paragraphs.
In certain example embodiments, a system may include the device of the previous paragraph.
In certain example embodiments, a wall may incorporate the system of the previous paragraph.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.