FILTERING SPURIOUS VOICEMAIL MESSAGES

Information

  • Patent Application
  • 20070280432
  • Publication Number
    20070280432
  • Date Filed
    June 01, 2006
    18 years ago
  • Date Published
    December 06, 2007
    17 years ago
Abstract
A method is provided for discarding an audio message if the audio message meets a predefined discarding condition, which may factor in at least the audio message's total length, presence of spoken word(s) in the audio message, presence of tone(s) in the audio message and the presence of click(s) in the audio message. An audio message may be discarded if no spoken words are detected in it and the audio message's total length is shorter than a predefined minimal message length, or it contains a tone and the audio message's total length minus the tone's length is shorter than the predefined minimal message length, or it contains a click sound and the audio message's total length minus the click's length is shorter than the predefined minimal message length. An audio message may be first stored in a memory buffer and then moved to a legitimate messages pool memory or to a spurious messages pool memory. A messages discriminator and a voicemail system that utilize the messages discrimination method are also provided.
Description

BRIEF DESCRIPTION OF THE FIGURES

Exemplary embodiments are illustrated in referenced figures. It is intended that the embodiments and figures disclosed herein be considered illustrative, rather than restrictive. The disclosure, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying figures, in which:



FIG. 1 schematically illustrates, by way of an example, a messages discrimination method to be used by a messages discriminator according to some embodiments of the present disclosure;



FIG. 2 schematically illustrates an exemplary messages discriminator utilizing the messages discrimination method of FIG. 1; and



FIG. 3 schematically illustrates an exemplary voicemail system using the messages discriminator of FIG. 2.





It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.


DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. However, it will be understood by those skilled in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present disclosure.


Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining”, or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.


The present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the disclosure is implemented in software, which includes but is not limited to firmware, resident software, microcode, and so on.


Embodiments of the present disclosure may include apparatuses for performing the operations described herein. This apparatus may be specially constructed for the desired purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.


Furthermore, the disclosure may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.


A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements may include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code has to be retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, and so on) can be coupled to the system either directly or through intervening I/O controllers.


Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.


The processes presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the desired method. The desired structure for a variety of these systems will appear from the description below. In addition, embodiments of the present disclosure are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosures as described herein.


As part of the present disclosure the messages discriminator (MD) disclosed herein may discriminate spurious messages from legitimate messages by using several criterions, as is described in connection with FIG. 1.


Referring now to FIG. 1, a block diagram of a messages discrimination process or method (generally shown at 100) is shown, which may be used by a messages discriminator in accordance with some embodiments of the present disclosure. Original (audio) Message 101 may be a recorded (in a voicemail system) message the nature of which (spurious or legitimate) is unknown and, therefore, Original Message 101 is forwarded (shown at 121) to process 100 for descrimination. The descrimination may occur “real-time” as the message is being recorded or transferred, or may occur after the message or portion of the message has been recorded. Process 100 may include one or more of four decision making modules: Message Length 102, Voice Detection 103, Tone Detection 104 and Click Detection 105. Original Message 101 may be forwarded serially (not shown) or simultaneously (shown as all four in parallel, but two or more may be in parallel, others may be serially before or after, or one or more decision making modules may not be used) to one or more of the four decision making modules 102, 103, 105 and 105 (shown at 112, 113, 114 and 115, respectively) for shortening (relative to cascading decision making units) the time length involved in deciding whether a recorded message (Original Message 101) is spurious or legitimate. A memory unit (not shown), or a memory space within a memory unit which may be associated with a given subscriber group or service, may include a messages pool that may include one or more recorded messages, such as Original Message 101 and, optionally, metadata related to the recorded messages. Some, or none or all of the messages in that messages pool may be spurious.


Message Length 102 may employ digital signal processing (DSP) tool(s) to measure, or calculate, and output (shown at 122) the total time length L (in seconds) of the message, including spoken words (if there are any) and pauses there-between, tones (if there are any) and background noise. Message Length 102 may receive (shown at 132) control signal(s) for controlling its operation, for example for enabling and disabling Message Length 102.


Voice Detection 103 may employ speech detection or recognition algorithm(s) to decide whether the recorded message (original message 101, for example) includes pattern(s) that is/are unique to, associated with or represent spoken word(s). Voice Detection 103 may output (shown at 123) a “word(s) detected” indication (in which case a variable, VP, is assigned a logical value “True”, or “1”) or a “word(s) not detected” indication (in which case VP is assigned logical value “False”, or “0”). If Voice Detection 103 decides that one or more words have been detected in the recorded message, then Voice Detection 103 may also output (shown at 123) the period of each detected word and/or the speech total time length (VL). The speech total time length may be expressed in various ways, such as: (1) as the difference between the time instant at which the first word was detected and the time instant at which the last word was detected, including pauses between words, and (2) as a sum of the time length of the detected words. Voice Detection 103 may receive (shown at 133) control signal(s) for controlling its operation, for example for enabling and disabling Voice Detection 103.


Tone Detection 104 may employ a DSP tool to detect the presence of a tone signal. “Tone signal” may include a one-frequency or other known form of signal superimposed on the recorded message, whether spurious or not. Tones may be generated as by the calling telephone set, such as when the caller responds to an interaction voice response (IVR) system or by the voicemail system itself. Tones may also be generated by the caller's telephony switch or PBX, such as when the caller hangs up the phone and for some reason the voicemail system did not timely detect it, in which cases a fast busy tone will be recorded by the voicemail system. Tone signal(s) may also originate elsewhere as a background noise. Tone Detection 104 may output (shown at 124) a “tone presence” indication (in which case a variable, TP, is assigned a logical value “True”, or “1”) or a “tone absence” indication (in which case TP is assigned a logical value “False”, or “0”). If Tone Detection 104 decides that a tone is present in the checked message (in Original Message 101, for example), Tone Detection 104 may also output (shown at 124) the tone's total time length (TL). Tone Detection 104 may receive (shown at 134) control signal(s) for controlling its operation, for example for enabling and disabling Tone Detection 104.


Click Detection 105 may employ a digital signal processing (DSP) tool to detect the presence of a click, or the like (such as the sound generated from an electric interference when a calling party hangs up the phone). Click Detection 105 may output (shown at 125) “click present” indication (in which case a variable, CP, is assigned a logical value “True”, or “1”) or “click absent” indication (in which case CP is assigned logical value “False”, or “0”) and, optionally, the click's time length (CL). Click Detection 105 may receive (shown at 135) control signal(s) for controlling its operation, for example for enabling and disabling Click Detection 105.


Voice Detection 103, Tone Detection 104 and Click Detection 105 may utilize substantially any existing voice detection algorithm(s) to detect spoken words and tones in a recorded message (including a message being recorded in real-time), for example in association with an algorithm called voice activity detection (VAD), which is an algorithm used in speech processing for determining the presence or absence of human speech in a given audio signal. Further description of VAD may be found, for example at “Voice Activity Detection in Noisy Environments” (Takeshi Yamada, Multimedia Laboratory, Institute of Information Sciences and Electronics, University of Tsukuba, RWCP Sound Scene Database in Real Acoustical Environments Copyright (c) 1998-2001 Takeshi Yamada, University of Tsukuba), herein incorporated by reference. The main uses of VAD are in speech coding and speech recognition. A VAD may not just indicate the presence or absence of speech, but also whether the speech is voiced or unvoiced (computer generated speech, for example), sustained or early, and so on. Speech recognition technologies allow computers equipped with a source of sound input, such as a microphone, to interpret human speech, for example, for transcription or as an alternative method of interacting with a computer. Speech recognition algorithms are also utilized by various Speech-to-Text applications. More complete description(s) related to speech processing may be found, for example, in “SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”, by Daniel Jurafsky and James H. Martin (Prentice-Hall, 2000, ISBN: 0-13-095069-6), herein incorporated by reference.


Spurious Message 106 may utilize any combination of the indications forwarded (shown at 122 through 125) to it, to predefine a discarding condition whereby one or more conditions may be used to decide whether Original Message 101 is spurious, and accordingly output (shown at 126) a logical value “True” (S=1, Original Message 101 is spurious) or a logical value “False” (S=0, Original Message 101 is not spurious). For example, a checked message (Original Message 101, for example) may be considered a spurious message (S=1) if no spoken word(s) were detected (VP=0) in the checked message; with a greater accuracy if additionally at least one of the following three conditions is met:

  • 1. L<(LMin)=True, which means that the time-wise length of the checked message is shorter than a predtermined threshold value (LMin);
  • 2. TP=True and (L−TL)<LMin=True, which means that tone(s) was/were detected in the checked message and the net time remaining for a potential short non-spurious message (a message containing at least one discernible word), which is the difference (L−TL), is too short (<LMin); and
  • 3. CP=True and (L−CL)<LMin=True, which means that click(s) was/were detected in the checked (recorded) message and the net time remaining for a potential non-spurious message, which is the difference (L−CL), is too short (<LMin).


The above-described conditions are summarized in expression (1), which designates an exemplary discarding condition:






S=(not VP) and {(L<LMin) or [TP and (L−TL)<LMin] or [CP and (L−CL)<LMin]}  (1)


where S may have one of two logical states, “True” (the checked message is spurious) and “False” (the checked message is non-spurious, or legitimate), LMin is a minimal time length (for example 2 seconds) expected for a very short voicemail message containing at least one discernible word. Spurious Message 106 may receive (shown at 136) control signal(s) for controlling its operation, for example for enabling and disabling Spurious Message 106. Explicitly contemplated may be the use of any one or two or three of the shown or additional/other conditions to facilitate a discarding condition. As noted previously, one or more conditions may be initially considered, and one or more further conditions may be considered if the first condition(s) have preselected results.


Referring now to FIG. 2, a message discriminator (MD) (generally shown at 210) is shown which utilizes the discrimination method or process shown in FIG. 1. Message discriminator 210 may include memory buffer (MBF) 211 for temporarily storing an incoming audio message, legitimate messages pool (LMP) memory 212 for storing legitimate messages, spurious messages pool (SMP) memory 213 for storing spurious messages and controller 214 for employing process 100 of FIG. 1 on recorded messages such as Audio Message 201, and may have further function(s) for controlling, among other things, the messages flow to/from MBF 211 (shown at 221), to/from LMP 212 (shown at 222) and to/from SMP 213 (shown at 223).


Audio Message 201 may be forwarded (shown at 202) from any wired, wireless or other telephonically oriented communication system (generally shown at 203) to messages discriminator 210. Audio Message 201 may be temporarily recorded and stored in MBF 211. Controller 214 may then employ method 100 of FIG. 1 to determine whether Audio Message 201 (as recorded and stored in MBF 211) is legitimate or spurious. Controller 214 may be adapted to discard Audio Message 201 if Audio Message 201 meets a predefined discarding condition. Controller 214 may be also adapted to store Audio Message 201 in LMP memory 212 if, according to the predefined discarding condition, Audio Message 201 is not to be discarded (for being a legitimate message).


If, however, Controller 214 determines that Audio Message 201 is, according to the predefined discarding condition, a spurious message and no spurious messages pool (such as SMP 213) exists, controller 214 may cause Audio Message 201 to be deleted from MBF memory 211. Alternatively, if controller 214 determines that Audio Message 201 is a spurious message and messages discriminator 210 includes also a spuriouos messages pool such as SMP 213, Controller 214 may cause Audio Message 201 to be moved from MBF memory 211 to SMP memory 213. According to some embodiments spurious message(s) may be stored in a sub-section (shown at 232) of LMP memory 212.


According to some embodiments, if spurious message(s) are not stored in a spurious memory pool memory such as SMP 213, or in a sub-section of a legitimate messages pool memory such as sub-section 232, but, rather, they are deleted immediately upon detection, a telephone subscriber associated with the stored legitimate message(s) may normally access the legitimate message(s) stored in the LMP memory. That is, the telephone subscriber may dial, use or trigger a special access code to access message(s) stored in LMP 212 memory and respond to IVR instruction(s) for further storing message(s) or deleting message(s). According to some embodiments, if spurious message(s) is/are stored in SMP memory 213, the telephone subscriber may independently access the legitimate message(s) stored in LMP memory 212 and the spurious message(s) stored in SMP memory 213 (or in memory sub-section 232, depending on the application used) by using two or more different access codes: one code for accessing messsages in LMP 212 and another code for accessing messsage(s) in SMP 213.


Referring now to FIG. 3, an exemplary telephone system wherein a message(s) discriminator (such as MD 210 of FIG. 2) is embedded in a voicemail system is schematically illustrated. Telephone device 301 is shown functionally coupled (shown at 302) to telephone system 303. Telephone device 304 is also shown functionally coupled (shown at 305) to telephone system 303. Message discriminator (MD) 307, which may operate in a similar manner as MD 210 of FIG. 2, is shown in FIG. 3 embedded in voicemail system 306, which may be functionally coupled (shown at 308) to telephone system 303. Voicemail system 306 may allocate, per telephone subscriber, a message(s) memory space: a legitimate message(s) pool memory (such as LMP memory 212 of FIG. 2) and (depending on the application used) a spurious message(s) pool memory (such as SMP memory 213 of FIG. 2). For example, voicemail system 306 may allocate a message(s) memory space (shown at 309) for storing legitimate (and, depending on the application used, also spurious) message(s) for the subscriber associated, for example, with telephone device 304.


A first telephone subscriber may use telephone device 301 to call a second telephone subscriber associated with telephone device 304 for establishing a communication path therebetween (shown at 310). The first telephone subscriber may call the second telephone subscriber over a wired telephone network (for example a PSTN network), a cellular telephone network, or partly over a wired telephone network and partly over a wireless telephone network, or any other telephone network, all of which are generally designated herein, for the sake of simplicity, as telephone system 303.


Assuming that a communication path is established (shown at 310) between telephone devices 301 and 304, and telephone device 304 rings but the second telephone subscriber does not timely respond to the ringing telephone device 304 and a voicemail service is rendered by voicemail system 306 to the subscriber associated with telephone device 304, the telephone call may (after a predefined number of rings) be redirected (shown at 311) to voicemail system 306. The caller using telephone device 301 may then leave a message and hang up telephone 301, or he/she may hang up telephone 301 without leaving a message. However, as is explained earlier, if the caller does not want to leave a message but s/he fails to timely terminate the call session, voicemail system 306 assumes that the caller wants to leave a message and, therefore, voicemail system 306 automatically enters into a record mode of operation. In such a case, a spurious message may be forwarded via communication path 311, and recorded in a common (or in an allocated) MBF memory (such as MBF 211 of FIG. 2) in MD 307. The recording duration may be substantially from the instant at which the recording begun until the caller terminates the call or the predefined recording duration elapses, whichever occurs first.


Once a (legitimate or spurious) message is recorded, substantially any type of unheard message signaling technique may be used by voicemail system 306 to forward to telephone device 304 to indicate to its subscriber, or user, that at least one unheard message is stored in the voicemail system 306. For example, the signal may be a lamp switched on and/or off, or periodically switched on and off, on telephone device 304. According to another example, the signal may be a tone beep which the subscriber or user associated with telephone device 304 may hear upon entering into a call mode of operation, such as by lifting the telephone's handset or switching on the telephone device (depending on the telephone's type).


According to some embodiments voicemail system 306 may forward to telephone device 304 a first signal associated with an unheard legitimate message(s), and a second, distinct, signal associated with an unheard spurious message(s) (provided, of course, that spurious messages can be stored in a spurious messages pool memory such as SMP memory 213 of FIG. 2).


A messages discriminator such as MD 210 of FIG. 2 may be easily affiliated with, embedded or incorpoated substantially into any existing voicemail system and into voicemail systems that may be devised in the future. In addition, as will be appreciated by a person of skill in the art, the message(s) discriminator disclosed herein is agnostic to the type of voicemail system, as it is (or may be easily adapted to be) applicable both to legacy voicemail systems and to modern packet-based voice over Internet Protocol (“VoIP”) voicemail systems. The messages discriminator is also agnostic to the location of the voicemail service rendering party or system.


While certain features of the disclosure have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the disclosure.

Claims
  • 1. A messages discriminator, comprising: a controller adapted to discard an audio message if said audio message meets a predefined discarding condition; anda legitimate messages pool memory, wherein said controller is further adapted to store said audio message in said legitimate messages pool memory if said audio message is not discarded.
  • 2. The messages discriminator according to claim 1, further comprising: a spurious messages pool memory for storing discarded messages.
  • 3. The messages discriminator according to claim 1, wherein the predefined discarding condition factors in the audio message's total length; presence of spoken word(s) in said audio message; presence of tone(s) in said audio message; and the presence of click(s) in said audio message.
  • 4. The messages discriminator according to claim 1, wherein a discarding condition is defined as: no spoken words are detected in said audio message and at least one of the following conditions is met:the audio message's total length is shorter than a predefined minimal message length, orsaid audio message contains a tone and the audio message's total length minus the tone's length is shorter than said predefined minimal message length, orsaid audio message contains a click sound and the audio message's total length minus the click's length is shorter than said predefined minimal message length.
  • 5. The messages discriminator according to claim 2, wherein the controller is adapted to render accessible both messages stored in the legitimate messages pool memory and messages stored in the spurious messages pool memory, by using two distinct access codes.
  • 6. The messages discriminator according to claim 2, further comprising: a buffer memory for storing an audio message, wherein the controller is further adapted to move said audio message from said buffer memory to the legitimate messages pool memory or to the spurious messages pool memory, whichever the case may be.
  • 7. A voicemail system, comprising: a messages discriminator adapted to discard an audio message if said audio message meets a predefined discarding condition.
  • 8. The voicemail system according to claim 7, wherein the messages discriminator comprises: a controller adapted to discard the audio message if the predefined discarding condition is met; anda legitimate messages pool memory, wherein said controller is further adapted to store said audio message in said legitimate messages pool memory if said audio message is not discarded.
  • 9. The voicemail system according to claim 8, wherein the messages discriminator further comprises: a spurious messages pool memory for storing discarded messages.
  • 10. The voicemail system according to claim 7, wherein the predefined discarding condition factors in the audio message's total length; presence of spoken word(s) in said audio message; presence of tone(s) in said audio message; and the presence of click(s) in said audio message
  • 11. The voicemail system according to claim 7, wherein a discarding condition is defined as: no spoken words are detected in said audio message and at least one of the following conditions is met:the audio message's total length is shorter than a predefined minimal message length, orsaid audio message contains a tone and the audio message's total length minus the tone's length is shorter than said predefined minimal message length, orsaid audio message contains a click sound and the audio message's total length minus the click's length is shorter than said predefined minimal message length.
  • 12. The voicemail system according to claim 9, wherein the controller is further adapted to render accessible both messages stored in the legitimate messages pool memory and messages stored in the spurious messages pool memory, by using two distinct access codes.
  • 13. The voicemail system according to claim 9, further comprising: a buffer memory for storing an audio message, wherein the controller is further adapted to move said audio message from said buffer memory to the legitimate messages pool memory or to the spurious messages pool memory, whichever the case may be.
  • 14. A method comprising: discarding an audio message if said audio message meets a predefined discarding condition.
  • 15. The method according to claim 14, wherein the predefined discarding condition factors in the audio message's total length; presence of spoken word(s) in said audio message; presence of tone(s) in said audio message; and the presence of click(s) in said audio message.
  • 16. The method according to claim 14, wherein a discarding condition is defined as: no spoken words are detected in said audio message and at least one of the following conditions is met:the audio message's total length is shorter than a predefined minimal message length, orsaid audio message contains a tone and the audio message's total length minus the tone's length is shorter than said predefined minimal message length, orsaid audio message contains a click sound and the audio message's total length minus the click's length is shorter than said predefined minimal message length.
  • 17. The method according to claim 14, wherein a discarded audio message is stored in a spurious messages pool memory.
  • 18. The method according to claim 14, wherein a non-discarded audio message is stored in a legitimate messages pool memory.
  • 19. The method according to claim 18, wherein messages stored in the legitimate messages pool memory and messages stored in the spurious messages pool memory are accessible using two distinct access codes.
  • 20. The method according to claim 18, wherein the audio message is first stored in a memory buffer and then moved to the legitimate messages pool memory or to the spurious messages pool memory, whichever the case may be.