This invention relates generally to automated audio data processing, and more specifically to a system and method for creating a complete transcription of an audio recording from separately transcribed redacted and unredacted words.
For various reasons, many companies need to transcribe audio recordings, such as for improved storage, searchability, and record keeping, as a feedback to improve their interactive voice response (IVR) systems, neural networks, automatic speech recognition (ASR) systems, customer service content, etc. Many audio recordings, especially in an IVR context, contain confidential data, such as personally identifiable information or payment card information. This makes it challenging for companies to transcribe their audio recordings due to stricter privacy laws both in the United States and internationally. In fact, not all transcription services are qualified to handle sensitive data, and those that are qualified are typically much more expensive. Therefore, there is a demand for a system and method that can create a complete transcription of an audio recording from separately transcribed redacted and unredacted words, thus enabling a company to protect the privacy of any confidential information during the transcription process.
The present disclosure describes a system, method, and computer program for creating a complete transcription of an audio recording, where redacted words in the audio recording are transcribed separately from unredacted words. The method is performed by a computer system (“the system”), such as the computer system illustrated in
Upon receiving an audio recording, the system determines if there is any confidential information, such as personally identifiable information or payment card information, etc., in the audio recording. Once the system identifies the presence of confidential information, it redacts the identified words or phrases and creates word-level time stamps for the redacted words or phrases and for the modified audio recording. The system then extracts audio clips corresponding to the redacted words or phrases using the word-level timestamps. The system sends the modified audio recording and the extracted audio clips to separate transcription services. The transcription services may be third-party transcription services (i.e., an outside company hired to handle the transcription services) or in-house transcription services (i.e., the company itself performing any transcription services) or a combination of the two, so long as the transcription services assigned to transcribe the confidential information contained in the extracted audio clips have an appropriate level of security that is compliant with any applicable privacy laws.
When the system receives the transcribed modified audio recording and the transcribed audio clips back from the separate transcription services, the system combines the two by aligning the word-level timestamps of the transcribed audio clips with the word-level timestamps of the transcribed modified audio recording and inserting the transcriptions of the audio clips into the appropriate locations of the transcription of the modified audio recording so that the combination creates a complete transcription of the original audio recording while enabling the user to protect the privacy of any confidential information during the transcription process.
In one embodiment, a method for creating a complete transcription of an audio recording from separately transcribed redacted and unredacted words comprises the following steps:
The present disclosure describes a system, method, and computer program for creating a complete transcription of an audio recording, where redacted words in the audio recording are transcribed separately from unredacted words. The method is performed by a computer system (“the system”), such as the computer system illustrated in
Upon receiving an audio recording, the system determines if there is any confidential information, such as personally identifiable information or payment card information, etc., in the recording. In certain embodiments, this is done by applying a named-entity recognition model to a machine transcription of the audio recording. The named-entity recognition model may be calibrated to over-identify for confidential information. Once the system identifies the presence of confidential information, it redacts the identified words or phrases and creates word-level time stamps for the redacted words and for the modified audio recording. The system then extracts audio clips corresponding to the redacted words or phrases using the word-level timestamps. The system sends the modified audio recording and the extracted audio clips to separate transcription services. The transcription services may be third-party transcription services or in-house transcription services or a combination of the two, so long as the transcription services assigned to transcribe the confidential information contained in the extracted audio clips have a higher level of security that is compliant with any applicable privacy laws.
When the system receives the transcribed modified audio recording and the transcribed audio clips back from the separate transcription services, the system combines the two by aligning the word-level timestamps of the transcribed audio clips with the word-level timestamps of the transcribed modified audio recording and inserting the transcriptions of the audio clips into the appropriate locations of the transcription of the modified audio recording so that the combination creates a complete transcription of the original audio recording while enabling the user to protect the privacy of any confidential information during the transcription process.
Example implementations of the method are described in more detail with respect to
The system receives an original audio recording (step 110). A plurality of words is redacted from the original audio recording to obtain a modified audio recording (i.e., a redacted audio recording) (step 120). In performing the redaction, word-level timestamps are created for the redacted words and the modified audio recording. Word-level timestamps include a start time and an end time that indicate when the audio recording commences and completes playing the word. In certain embodiments, the word-level timestamps for the modified audio recording are obtained by applying a machine speech-to-text model to the transcription of the modified audio recording. This is done by an alignment system in the machine speech-to-text model determining the highest likelihood for the timestamps considering the duration of each word in the transcription).
In certain embodiments, the redacting step includes converting the original audio recording to machine-generated text using a machine speech-to-text model, where the machine generated text includes word-level timestamps. A named-entity recognition model is then applied to the machine-generated text to identify words or phrases for redacting, and the word-level timestamps associated with the identified words or phrases are used to select audio spans for redacting in the original audio recording. In certain embodiments, the system is calibrated to over-identify words or phrases for redaction.
The audio clips of the redacted words are extracted from the original audio file using the word-level timestamps for the redacted words (step 130). The modified audio recording is sent to a first transcription service (step 140a) while the extracted audio clips are sent to a second transcription service (step 140b). A transcription of the modified audio file is received from the first transcription service (step 150a) and the transcriptions of the audio clips are received from the second transcription service (step 150b). The transcriptions of the extracted audio clips are then combined with the transcriptions of the modified audio recording to obtain a complete transcription of the original audio recording using the word-level timestamps for the modified audio recording and the start and end timestamps for the extracted audio clips (step 160).
Referring to step 110 above, system 200 receives an original audio recording 255 from audio recording database 210. Referring to step 120 above, redaction module 215 redacts a plurality of words from original audio recording 255 to create modified audio recording 242. In addition, redaction module 215 generates word-level timestamps for the redacted words 220 and the modified audio recording 257.
Referring to step 130 above, audio clip extractor module 225 extracts audio clips of the redacted words from the original audio file 255 using the word-level timestamps for the redacted words 220. Each audio clip 230 is associated with a start and end timestamp based on the word-level timestamp of the redacted word(s) within the audio clip. Referring to steps 140a and 140b above, the modified audio recording 242 is outputted to a first transcription service 245 and the extracted audio clips 230 with the start and end timestamps are outputted to a second transcription service 235. Referring to steps 150a and 150b above, the system 200 receives a transcription 250 of the modified audio file from the first transcription service 245 and transcriptions 240 of the audio clips from the second transcription service 235. Referring to step 160 above and discussed in greater detail with respect to
The methods described with respect to
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
Number | Name | Date | Kind |
---|---|---|---|
5649060 | Ellozy | Jul 1997 | A |
7502741 | Finke | Mar 2009 | B2 |
7633551 | Sullivan | Dec 2009 | B2 |
8086458 | Finke | Dec 2011 | B2 |
8131545 | Moreno | Mar 2012 | B1 |
8230343 | Logan | Jul 2012 | B2 |
8289366 | Greenwood | Oct 2012 | B2 |
10572534 | Readler | Feb 2020 | B2 |
11055055 | Fieldman | Jul 2021 | B1 |
20020116361 | Sullivan | Aug 2002 | A1 |
20050117879 | Sullivan | Jun 2005 | A1 |
20050151880 | Sullivan | Jul 2005 | A1 |
20070011012 | Yurick | Jan 2007 | A1 |
20080092168 | Logan | Apr 2008 | A1 |
20120278071 | Garland | Nov 2012 | A1 |
20130124984 | Kuspa | May 2013 | A1 |
20150106091 | Wetjen | Apr 2015 | A1 |
20160117339 | Raskin | Apr 2016 | A1 |
20170062010 | Pappu | Mar 2017 | A1 |
20180130484 | Dimino, Jr. | May 2018 | A1 |
20180301143 | Shastry | Oct 2018 | A1 |
20190065515 | Raskin | Feb 2019 | A1 |
20210233535 | Shir | Jul 2021 | A1 |