The invention relates generally to audio production and distribution. More particularly, the present invention relates to a system and method of semi-automated guided audio production and distribution.
A variety of technologies exist for recording, editing and publishing audio for use in various output formats, including audiobooks, podcasts, radio, webinars, etc. Certain conventional recording technologies offer the ability to make a combined audio recording of both a host and guest participant simultaneously on a single recording. Other technologies allow each participant (e.g., a host and a remote participant) to each be recorded separately and for separate audio track to be stored locally to each participant's device that contains only his or her audio stream (e.g. one side of a conversation). Typically, subsequently combining local recordings of individual audio streams improves the resulting audio quality compared to audio streams that are created together in real time at the time of recording via internet protocols such as VoIP (voice over internet protocol) or WebRTC (web real-time communication).
Post-production software allows users to improve the quality of the audio, e.g., analyze the audio files, convert the file formats if necessary, split stereo tracks, set panning (left/right balance), perform sound levelling, perform dynamics processing and equalization, remove problem frequencies and reduce noise, etc. This process also includes aligning the audio files, creating fades, trimming silences, inserting padding, creating joiner music, and mixing in background music. Additionally, post-production software offers tools to combine the audio files and create a finished product, e.g., perform final mixdown of the audio. This results in a finished master audio production and a compressed audio file that is suitable for distribution.
Finally, technology exists for distributing the finished audio. Existing distribution platforms host the compressed audio files and utilize metadata describing the audio files (e.g., text descriptions that allow it to be discovered by interested consumers), allowing users to stream or download it for consumption.
The above and other needs are met by a method for creating a digital audio production. The method includes receiving two or more audio segments, including at least one audio segment that is captured by a sound recording device, receiving user input related to the two or more audio segments, receiving a record plan comprised of a first part that is based on the user input and a second part that is determined automatically based on the user input; and recording the digital audio production from the two or more audio segments based on the record plan.
In some embodiments, the first part of the record plan comprises a chronological ordering of at least a portion of the two or more audio segments in the digital audio production. In some embodiments, the second part of the record plan includes audio processing steps that are determined and configured automatically based on user input. In some embodiments, certain audio processing steps are automatically applied to at least one of the two or more audio segments before the segments are combined. In some embodiments, certain audio processing steps are automatically applied to the digital audio production after the segments are combined. In certain embodiments, additional audio, which may include music, environmental sounds or sound effects may be added to the original audio segment. In certain embodiments, the additional audio may be processed and adjusted for level so as to constitute background while maintaining intelligibility of the primary audio segment content.
According to certain embodiments, the method also includes the step of compressing the digital audio production to form a digital file suitable for digital distribution and then digitally distributing the digital file according to the user input. In certain embodiments, the method includes storing the at least one audio segment captured by the sound recording device to local storage that is local to the sound recording device. The at least one audio segment is then wirelessly transmitted from the local storage to a remote storage. In certain preferred embodiments, the local storage is a memory of the sound recording device.
Certain embodiments of the method include providing a display device configured to display a user interface (UI) and a user input device for receiving the user input. In certain embodiments, the user interface comprises functional elements associated with the two or more audio segments and the position of the functional elements within the UI determines the chronological ordering the two or more audio segments. In certain embodiments, the UI includes a host UI and a host user input device that is configured to receive input from a host. Additionally, the UI includes a guest UI and a guest user input device configured to receive input from one or more guests. According to certain embodiments, the content of the host UI differs from content of the guest UI. In some embodiments, a session invite must be provided to each of the one or more guests before the guest UI is provided to the user. Certain embodiments of the method require receiving user input via the user input device to reposition the functional elements within the UI for modifying the chronological ordering of the two or more audio segments. In certain embodiments of the method, a content guide is displayed in the UI, which content guide displays contextual information associated with the functional elements that are currently displayed in the UI intended to provide guidance to the user in decision-making and execution of creating the objective audio content.
In certain embodiments of the invention, at least two of the two or more audio segments are captured by separate sound recording devices. The two audio segments are preferably stored locally to a local storage of each of the sound recording devices. The two audio segments are then preferably wirelessly transmitted to a remote storage prior to being combined or audio processed.
In certain embodiments, the method requires creating two or more individual episodes of the digital audio production, wherein each episode include a first audio segment that is universal to all of the two or more episodes and separate second audio segments that are unique to each of the two or more episodes. Certain preferred embodiments contemplate modifying the record plan to replace a first audio segment previously provided with a different and newly-provided first audio segment. After the newly-provided first audio segment has replaced the original first audio segment, the method requires re-creating the two or more individual episodes of the digital audio production to include the newly-provided first audio segment. In certain embodiments, the user of the system may be permitted to add, remove, change, re-order, insert, revise or otherwise amend previously published episodes. These amended episodes may automatically update existing instances at all points in the subsequent distribution supply chain without user intervention.
Also disclosed is an audio production and distribution system for creating and distributing a digital audio show that includes two or more related but separate episodes. The system first includes a setup module for receiving show-level user input related to show-level portions of the show that are repeated in each episode. A plan module of the system receives episode-level user input related to episode-level portions that are unique to each episode of the show. The plan module also arranges the show-level portions and episode-level portions according to a record plan. Next, a record module receives recorded show-level audio segments and recorded episode-level audio segments. A Preview module is provided for audio processing recorded audio segments, for ordering and combining the recorded audio segments to form an unprocessed episode, and for audio processing unprocessed episodes to form finished episodes. Lastly, a review module distributes finished episodes.
In order to facilitate an understanding of the invention, the preferred embodiments of the invention, as well as the best mode known by the inventor for carrying out the invention, are illustrated in the drawings, and a detailed description thereof follows. It is not intended, however, that the invention be limited to the particular embodiments described or to use in connection with the apparatus illustrated herein. Therefore, the scope of the invention contemplated by the inventor includes all equivalents of the subject matter described herein, as well as various modifications and alternative embodiments such as would ordinarily occur to one skilled in the art to which the invention relates. The inventor expects skilled artisans to employ such variations as seem to them appropriate, including the practice of the invention otherwise than as specifically described herein. In addition, any combination of the elements and components of the invention described herein in any possible variation is encompassed by the invention, unless otherwise indicated herein or clearly excluded by context.
The presently preferred embodiments of the invention are illustrated in the accompanying drawings, in which like reference numerals represent like parts throughout, and in which:
This description of the preferred embodiments of the invention is intended to be read in connection with the accompanying drawings, which are to be considered part of the entire written description of this invention. The drawings are not necessarily to scale, and certain features of the invention may be shown exaggerated in scale or in somewhat schematic form in the interest of clarity and conciseness.
The use of the terms “a”, “an”, “the” and similar terms in the context of describing embodiments of the invention are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising”, “having”, “including” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. The terms “substantially”, “generally” and other words of degree are relative modifiers intended to indicate permissible variation from the characteristic so modified. The use of such terms in describing a physical or functional characteristic of the invention is not intended to limit such characteristic to the absolute value which the term modifies, but rather to provide an approximation of the value of such physical or functional characteristic.
Terms concerning attachments, coupling and the like, such as “attached”, “connected” and “interconnected”, refer to a relationship wherein structures are operatively coupled, e.g., secured or attached or communicatively coupled to one another either directly or indirectly through intervening structures, as well as both moveable and rigid attachments or relationships, unless otherwise specified herein or clearly indicated as having a different relationship by context. The term “operatively connected” is such an attachment, coupling or connection that allows the pertinent structures to operate as intended by virtue of that relationship.
The use of any and all examples or exemplary language (e.g., “such as” and “preferably”) herein is intended merely to better illuminate the invention and the preferred embodiments thereof, and not to place a limitation on the scope of the invention. Nothing in the specification should be construed as indicating any element as essential to the practice of the invention unless so stated with specificity.
The terms “audio content” and “audio production” are used throughout this description broadly to include any forms of recorded audio, regardless of the nature of their application or use. Non-limiting examples of “audio content” (and “audio production”) include, but are not limited to, recorded audio used in movies, audiobooks, podcasts, radio, webinars, and the like.
Creating audio content is particularly difficult for small and medium size business users, which may want to use audio to provide business-related content that is easy to consume, e.g., on a mobile device on the go. Such users require a high degree of professionalism with respect to the finished audio and cannot accept low quality results; however, they often do not have the expertise to create a polished audio product and often cannot afford to spend a lot of time doing so themselves. At the same time, hiring a professional production company is typically out of budget for such projects.
A difficulty encountered when an inexperienced user attempts to create audio content using conventional technologies is that there is no end-to-end solution for creating audio content. The inexperienced user often does not know how to create or structure successful audio content from the start or how to take the audio content through post-production (i.e., audio processing and assembly) to create professional quality audio content and get it distributed via a distribution platform.
With respect to planning, inexperienced users typically encounter difficulties from the start of creating audio content, i.e., difficulties are encountered in planning or setting up the audio content (e.g., choosing a format, identifying the segments to be included, sequencing the various segments, choosing related artwork and descriptive teams and titles used in distribution, etc.). Additionally, depending on the type of audio content that is of interest (e.g., interview format), users often encounter additional difficulties unique to that format (e.g., difficulty identifying appropriate participants for the audio content).
In addition to planning challenges, inexperienced users also encounter difficulties while attempting to create (i.e., record) the audio content. Although tools exist to record the audio, such as tools to patch in remote participants, inexperienced users can encounter difficulty with making the recordings, such as accessing scripts or notes to recall topics to be addressed while recording a given segment. Other difficulties include determining how long each segment should last, and identifying and coordinating with guests.
A difficult issue that an inexperienced user will encounter is finishing the audio content in terms of handling post-production editing, processing, and mixing. This is because the output of conventional recording solutions is typically two or more audio files that need to be imported into a separate post-production software for editing and assembling an audio content file. While locally recorded, separate audio files improve the audio quality of the raw input file compared with those captured via internet protocol (e.g., VoIP) and allow for better post-production to take place, this also increases the complexity of the production process. The existing post-production tools, while offering a rich feature set, require a high degree of skill and are time consuming. Thus, an inexperienced user is often faced with taking many hours to remove unwanted noise, apply noise leveling, appropriately align and sequence audio files, incorporate music and fades, etc. These tasks are typically difficult enough that users either produce audio content having sub-optimal audio quality, hire a production assistant or company, or fail to adopt and use the audio medium.
Referring now to
As detailed below, the full end-to-end system 100 has several modules that work in conjunction with one another to assist the user in initializing (Setup module 102), planning (Plan module 104), recording (record module 106), post-processing audio (Preview module 108) and then automatically assembling and distributing (Review module 110) an audio production. These modules assist users in: (a) setting up global settings that may be applied to one or more finished audio productions (the overall production may be referred to herein as a “show” and may consist of multiple related but standalone “episodes”), (b) creating an episode plan that plans the content and arrangement of components within individual audio productions (or episodes), (c) creating and recording the content portions of each of the planned episodes, (d) processing those content portions according to an automated post-processing algorithm and according to the global show settings and the episode plan to create finished episodes, and (e) automatically assembling and distributing the finished episodes according to the global show settings.
Preferably, users interact with the system via a web-based interface (UI) using a set of UI tools for each module. The system's 100 UI tools may be provided by a central provider service (e.g., cloud hosted service) and are preferably displayed on a user's device (e.g., within a browser in a tabbed format or mobile application). The goal of the UI is to assist inexperienced users in creating professional quality audio content. The UI includes all of the elements needed to guide the user through the setup, planning, recording and launching of the audio production. Screenshots of a UI used in operating the system 100 according to a preferred embodiment are provided in
Setup Module
With reference generally to
In this module and, preferably, in every other module, a content guide 124 is provided on the same screen as the input area 122. The content guide 124 functions like a user's manual for the system 100 that is automatically turned to the correct section to provide the most relevant information to the user. The information provided in the content guide 124 relates specifically to the information that the user is inputting into the input area 122. Preferably, the content guide 124 automatically updates in response to information provided by the user in the input area 122. For example, it may be preferred for a particular type of audio production to have a title or show description with a length between a minimum length and maximum length (i.e., number of characters). If the user inputs a title or description in the input area 122 that is outside of the recommended range, the content guide 124 may be updated to provide a warning to the user.
The input area 122 shown in
Two sections of the system 100 shown in
This UI interface format is utilized in inputting globally-applied show sound information 164 (see
The content of the input area 122 and content guide 124 change depending on the combination of first element 132 and second element 136 selected. For example, as shown in
Next, while keeping the first element 132 the same (i.e., INTRO) and selecting the next second element 136 (i.e., RECORD INTRO), the content guide is replaced with an input-output (I/O) area 138 that will allow the user to record the audio for the intro. Input areas are intended to primarily merely receive information or data from the user, whereas I/O area 138 receives information from user but also provides information or data back to the user. The I/O area 138 allows a user to provide audio for use as the intro in two ways. First, the user can upload a previously-created audio file. The user can also record audio by pressing the “record” button 174. The I/O area 138 contains a volume unit meter 140, which provides visual confirmation to the user that their microphone settings are correct and that the microphone is picking up audio signals. Preferably, the system 100 is configured so that the record button only works when audio is detected in order to prevent the user from recording silence. This feature is particularly important for novice content creators to assist them in wasting time and effort trying to record audio when the microphone is not functioning properly. The input area 122, previously populated with text, remains visible in this view so that the user can read that text as they record the intro. By pressing the record button 174, users can record audio directly with their local machine hardware 206 (
A related feature is illustrated in
Next, with reference to
Finally, with reference to
As shown in the above examples, the first and second navigation sections 130 and 134 assist the user in navigating between the steps for creating various globally-applied portions of a show. The sections 130, 134 also modify the UI to provide relevant tools and information that the user needs in each of those steps on a single screen, but without providing too much content or information that would overwhelm or confuse the novice. Lastly, sections 130, 134 allow the user to easily customize the global show settings.
As discussed below, audio tracks that are selected and recorded in the Setup module 102 may be further processed and assembled automatically using the Preview module 108, which process step is indicated by the letter “B” in
Plan Module
Referring again to
The Plan module 104 assists the user in defining one or more episode plans 158, which plan may include ordering audio segments that will make up the episode and entering related written information. The page shown in
Once the initial setup of an episode is complete, the user then moves on to planning the content of the episode. As shown in
Preferably, second elements 136 may be re-arranged, added, or deleted by the user. For example, the user might add additional globally-applied sounds, such as advertisements, teasers, etc. by adding additional second elements 136. Additionally, the user may configure the order of the final mixdown. Dragging second elements 136 upwards places them earlier in the final mixdown and dragging second elements downwards places them later in the final mixdown.
Preferably, the user enters the segment or show notes in the input area 122 prior to recording the associated audio file (e.g., show notes may form an outline or script of a proposed interview of a guest with questions to ask). These notes appear in a corresponding input area in the record module, as further described below, that is displayed to the user while recording the audio segment. This assists the inexperienced user that might encounter difficulty in setting up these elements of the audio production, which will be used later and can influence the overall quality of the final product.
Another example is given in
Once the user has planned each section by visiting each of the pages associated with each of the second elements 136, they are ready to record and create the episode-specific content using the record module 106. Lastly, a “Start recording” button 172 will allow the user to bypass the pages associated with the second element 136 and go immediately to the record UI.
Record Module
The record module 106 allows a user to record or upload the various episode-specific audio segments that were previously planned using the Plan module 104 one section at a time. The UI tools presented to record audio are preferably influenced by the type of segment of audio being recorded. For example, the UI interface shown in
The record UI preferably provides all of the information and tools that a user needs to record a particular audio segment on a single page. In
Preferably, the recording UI displays a detected hardware type (e.g., microphone type). In this case, this information is displayed in the I/O area 138. This can be detected automatically (e.g., when using a built-in microphone) or via user indication (e.g., when a user selects a microphone from a drop-down menu listing multiple options). The detected hardware type can be used in automated audio processing. Custom digital signal processing (DSP) and equalization (EQ) appropriate for the type of audio production can be implemented for detected local hardware (e.g., microphone used for recording). For example, default sample rates (e.g., 44100 Hz), channels (e.g., Stereo), bit rates (e.g., 32 bit), and file formats (e.g., uncompressed WAV) can be set for a detected type or class of microphone being used. These settings can be automatically populated for the user and used for recording.
With reference to
The real-time discussion is distinct from the local recording of audio used in post-production. A local recording of each party's individual contribution to that discussion is preferably stored locally to that user's device. Each audio track is an isolated recording that contains only the audio of one of the parties to the discussion. For example, the local recording stored to the host's device includes only the host's contribution to the discussion and each of the local recordings stored to each of the guests' devices includes only those guests' respective contributions to the discussion. The isolation and local recording of audio allows each of those isolated tracks to be mixed, improved, blended, etc. separately, which allows a higher quality “complete” conversation (i.e., a combination of all parties' improved contributions to the conversation) to be produced. As discussed below, the resulting isolated audio tracks are processed and assembled together to form a complete audio segment by the Preview module 108.
In certain embodiments, the host record UI 178 presented to the host is different and more robust, including additional information and features, than the guest record UI 180 that is presented to a guest. The UI interface shown in
In certain preferred embodiments, host and remote guest audio may be recorded from an in-session UI that joins participants in a real-time session. In those cases, a session invite 190 must be provided by the host to the guest before the guest will be allowed to interact with the show, including recording audio segments for that particular segment. The session invite may be a URL link to a particular webpage associated with the show, a particular phone number associated with the show, or other similar devices. The user interacts with (e.g., clicks on) the session invite and is then taken to a call-in session interface for recording. The session invite is preferably shareable via multiple channels, including via email, text, etc. An add participant button 184 allows a host to create a call-in link 190 (e.g., a URL), which may then be transmitted to one or more guests. The session invites may be generic to all guests or may be specific to a particular guest. The remote guest uses the join link to be added to the session.
After being added to the session, the remote guest preferably initially appears in a waiting area 186 of the host UI 178. The waiting area 186 allows the host to interact with the participants. For example, a chat area 192 may be provided in both the host UI 178 and guest UI 180 to allow the parties to communicate with one another. Preferably, a transcript of the chat is automatically captured and saved by the record module 106 and may be appended to that chapter's notes in record and Plan modules. Also, while in the waiting area 186, participants can be passed audio (e.g., on-hold music, show tips, etc.). A participant may leave a voice message rather than waiting to be joined. For example, an auto-attendant may handle the participant's inputs and allow recording of a voice message. This voice message may be transcribed automatically, e.g., using natural language processing, to provide the host with a transcript of the audio. This may assist the host in determining whether the remote participant should be joined to the recording session, whether the voice file should be played during the recording session, or whether the voice file should be added after the recording session in a post-production step. This text Preview of the recording may have applicability to live radio or other applications. Guest may be connected to the real-time communication session by the host and guests that have been connected appear in the live participant area 188 and have their audio recorded. Guests' microphone levels preferably may be monitored, adjusted, and muted when they are in either the waiting or live participant area 186, 188.
Once audio segments for each of the non-global second elements 136 (
Preview Module
After the show has been initialized using the Setup module 102, as discussed above, several individual audio components have been created or selected for the show. For example, one or more audio tracks, recorded by the user, may be associated with each of the intro, outro, and CTA portions. These audio tracks may include, for example, voice tracks containing audio dialog. Additionally, the user may have optionally also selected pre-made background music or uploaded their own background music for each of the intro, outro, and CTA portions. With reference to
The Preview module 108 is preferably a remote (e.g., cloud-based) module provided by a provider that implements two main functions: auto processing of audio segments and auto assembly of audio segments. When implemented following the Setup module 102, these functions take the various user supplied-inputs discussed above (i.e., raw audio files recorded locally, music selections, ads, etc.), which are sequenced according to a user's choices in the Setup module 102 and produce, as an output, an audio file of the finished audio segment (i.e., finished intro, finished outro, finished CTA). When the Preview module 108 is implemented following the record module 106, the Preview modules takes the various user supplied-inputs discussed above (i.e., raw host/guest files recorded locally, music selections, ads, etc.), which are sequenced according to the Episode Plan in the Plan module 104 and produce, as an output, an audio file of the finished audio production that is suitable for distribution.
Implementing the Preview module 108 initiates an Auto Processing Module 198 and an Auto Assembly Module 200 and starts automatic post-processing on the audio tracks and assembly of the final compiled audio production. The Auto Processing and Auto Assembly Modules 198, 200 are illustrated separately for ease of description, but the functionality described below could be implemented in more or fewer modules. The Preview module 108 will be described collectively and it includes automated functions of a multi-track digital audio workstation (DAW), relieving the inexperienced user from the need to handle the processing, editing and mixing of the audio files.
Generally, the Auto Processing Module 198 operates on the digital audio files to improve their sound quality using sound repair or recovery techniques. This includes applying predetermined processing of the audio such as normalizing the audio files' sound levels, applying noise filters, and applying fade techniques to boundary areas of the audio files. Further, editing techniques are applied in an automated fashion (e.g., padding or trimming audio files for proper alignment). Automated mixing techniques are also applied. For example, if background music is selected by the user, Auto Processing Module 198 combines the background audio with speech track audio. In another example, where separate audio tracks (e.g., speech audio) from a host and remote guest are provided, Auto Processing Module 198 mixes those files at appropriate sound levels to ensure speech audio quality.
After the individual segment files have been prepared, the Auto Assembly Module 200 assembles (i.e., stitches together) the processed audio files using an automated algorithm that adheres generally to the user's inputs (in terms of general sequencing), applies cross fades to segment transitions, and outputs a professional level output file, which again undergoes processing (e.g., target volume adjustment), and includes metadata gathered during the planning stage (e.g., title, description, and artwork).
By way of specific example, Auto Processing Module 198 acts as an automated multi-track editing software and applies a predetermined set of configurations optimized for audio track post-production according to a predetermined set of algorithms designed to improve the audio quality and prepare the audio segments for joining in a step-wise fashion. The algorithm may vary depending on the type of audio production (e.g., podcast vs. audiobook). Examples of preset configurations applied in sequence include first applying sound leveling or normalization to a target amount. This adjusts the amplitude of the audio, e.g., raising the volume of speakers in the audio, but can also increase the level of unwanted sound (undesirable imperfections, which will make these easier to later remove). For example, a track with audio below a particular threshold amplitude may be normalized to bring its amplitude up to a normalized, target level. Automatic amplitude levelling can be applied to the audio files to account for differences between speakers (e.g., a local host and remote guest that need to be normalized for sound levels). For example, after recording an episode, the speech amplitude is automatically adjusted to a target level (e.g., adjusting all peaks to a target level). This delivers consistency between different people speaking in the output audio file. In a related manner, Auto Processing Module 198 can adjust the amplitude of audio files or tracks in a relative manner. For example, when adding in a music file as a music bed, it may occur that the amplitude of the music file is too loud. This can cause poor audio quality in the final audio file, particularly if the conversation level is low. Therefore, in addition to adjusting conversational tracks, related tracks can have their amplitude adjusted in a relative manner.
Dynamic processing (e.g., compression) and reduction of noise using one or more filters (e.g., to remove common sources of noise such as rumble, hum, and DeEssing) may then be applied. Auto Processing Module 198 applies a program or algorithm to each audio file individually or after one or more audio files are combined together. For example, high and low pass filters may be used to remove unwanted noise frequencies outside the range of human speech. A predetermined selection of such filters may be employed to reduce unwanted noise in the audio files.
Short fade ins and outs can be applied to each segment to reduce occurrences of unwanted sounds (e.g., pops) due to immediate signal starts and stops. Similarly, Auto Processing Module 198 may insert padding (i.e., silence) into an audio file to adjust its timing (e.g., relative to another track). This is utilized, for example, in ensuring that a remote guest's locally recorded file starts at a predetermined time with respect to the host's audio file (e.g. guest's answer naturally follows a host's question). This allows the files to be more precisely aligned with one another. This faun of adjustment would be particularly useful, for example, with Q & A interviews, where the parties to the discussion are not responding to one another in real time. In that case, the conversation must be constructed from recorded questions that are answered at a later time or date and where padding is inserted into the final mixdown in order to correctly time align the audio tracks.
In order to prepare the music bed, the Auto Processing Module 198 may sample the music selected by the user to prepare a shorter clip, which is normalized and has fades added (e.g., at the beginning) As above, the speech track (to be paired with the audio) is normalized to the music track (e.g., using a target value) to ensure that the music is at an appropriate volume level as compared to the speech volume level. The speech track and music track are paired or mixed to form a combined file.
After processing to adjust gain and remove noise, a volume adjustment may be performed for the tracks to a target loudness, e.g., −16 LUFS. Volume adjustment may be repeated at various points in the processing.
In addition, the Preview Module 108, via the Auto Assembly Module 200, automatically assembles the segments. Generally, the segments are assembled using a series of crossfades between each segment. This assembles the segments in a linear ordering, as follows: segment 1<crossfade>segment 2<crossfade>, etc. If not already done, the segments may need to be trimmed or padded, which can be accomplished by determining their length and then applying a function that adjusts a segment at a predetermined point (e.g., at a point determined to allow for room for a given segment, such as an outro segment).
After assembling the final audio file, the Auto Assembly Module 200 inserts file metadata (e.g., podcast title, episode number, etc., as provided by the user in the planning stage), chooses a file format, e.g., Wave PCM, MP3, etc., and selects the format settings (e.g., 256 kbps or 160 kbps, 44100 Hz MP3 settings). The Auto Assembly Module 200 populates ID3 tag data, which contains the creator name, title, year and genre of the audio file, along with the selected podcast artwork. In one example, an uncompressed master file is saved along with a compressed version for distribution (e.g., MP3).
In summary, a template of preconfigured settings is applied to process the audio files and clean them up in a predetermined, ordered fashion. This is followed by a step-wise joining of segments together according to the record plan, with additional normalization to a target sound level. Thereafter, a final audio file is output in a compressed format with appropriate metadata for placement on a distribution platform.
In certain embodiments, artificial intelligence is used to assist or improve certain automated processing or assembly steps. For example, machine learning may be applied to increase the accuracy of the sound processing applied (e.g., intelligently select presets such that noise removal presets are matched to those previously used; intelligently selects words such as crutch words, bad words for removal; etc.)
The coarse level editing decisions made by the inexperienced user in the planning phase, (i.e., the user selections of the linear sequence of audio files), dictate the ordering with which the files are assembled by the Auto Assembly Module 200, with additional audio processing and editing applied automatically for the user. Therefore, at least part of the automated production processing is linked to the user inputs (e.g., supplied via the Setup, Plan, or record UIs). The Preview Module 108 modifies the audio inputs using sound optimization techniques (e.g., sound leveling, equalization, noise reduction, filtering), editing techniques (e.g., trimming, addition of fades, etc.) and mixing techniques (e.g., addition of music bed to speech input files).
Review Module
After the final audio tracks have been produced, the final steps are to assemble the episode, via the Auto Assembly Module 200 (
An exemplary review UI is shown in
Each item in the list may be associated with an audio file that will be used in post-production. Because the ordering of the segments will impact the order and overall sound of the final audio production, if a user wishes to rework any parts of the audio production, e.g., re-record a segment, reorder the segments, etc., the user can return to a prior part of the UI (e.g., the Setup or record module) to accomplish the same. This can be accomplished before or after the audio production is assembled. Although the Auto Assembly Module 200 is shown as part of the Preview Module 108 in
Modularity
The functions of the system described above can be modularized and may be used in a variety of other contexts. For example, while the end-to-end system represents one example implementation of the current technology, various parts of the technology (e.g., the automated processing technology) may be decoupled from other module(s) (e.g., the planning module) to make the decoupled modules applicable to different contexts and intended uses. For example, the planning module may be less relevant to the recording of an audiobook when compared to recording an interview or Q&A type audio production. The modules may be modified to accommodate different contexts and intended uses (e.g., producing different audio, such as for an audiobook or audio for a video production). For example, the modules may be modified to display different information to the user (e.g., different planning modules) or incorporate different presets (e.g., automate post-production using a set of audiobook presets instead of podcast presets).
Several enhancements are possible in which the various UI views may be configured to provide supplemental information to the user, particularly the host user, in order to facilitate successful audio creation. Some of these feedback tools may be influenced by data gathered from the platform.
For example, estimates of the preferred time (i.e., duration) for given segments (e.g., intro segment) can be displayed to the user with a corresponding static estimate of the number of words. This will assist the user in planning (i.e., scripting) a segment and having it result in an audio file of suggested optimal duration, prior to recording it. These suggestions are preferably presented in the input area 122 or content guide 124 of the setup or planning UI (See
The data used for these estimates (e.g., appropriate duration for an intro segment, etc.) may be predefined, obtained from another source, dynamically updated, or a combination of the foregoing. For example, if metrics are being recorded and analyzed across the platform, highest rated audio productions may be analyzed to determine an optimum time for a particular segment. This data may be used to adjust the feedback given to the host user attempting to script and record a new segment so that the host is aware of prior successful segment metrics.
In another example, the platform could also be configured to gather data related to audio-focused metrics and to provide that data back to hosts to assist in collaboration and improving audio productions. Identifying appropriate guests (or other participants) has conventionally been difficult for inexperienced creators. Most existing forms of social media tend to focus on visual content (e.g., picture posts), which is difficult to consume in certain areas (e.g., in a car), and do not focus on audio production. Thus, matchmaking for the purpose of creating an audio-based product (e.g., a podcast) via a picture-based or visual-based social media platform is difficult. Thus, the social networking platform of the present invention can be used in matchmaking between hosts and guests. This would enable hosts to be matched with guests that meet certain criteria (e.g., expertise, area of interest, rating, etc.). Preferably, the platform includes an identity verification system (similar to block chain identity verification) that would assist users in verifying the identity of guests and participants prior to including them in a show. This could be implemented via assigning or recognizing a unique identity for users of the platform, such as user-specific session invites, and associating the unique identity with a verified guest or type of guest (e.g., an expert in a given topic, a highly-rated guest or podcast producer, etc.)
Feedback may be provided by the platform (e.g. number of listens and downloads, user ratings) as well as by other users (i.e., content creators) of the platform that form a social network and that provide feedback after listening to audio content on the platform. Feedback from other users of the platform may include audio quality scoring, such as a lack of crutch word usage, use of quality hardware (e.g., microphones used by remote guests) and the like. A scoring system could be tied to the above-described metrics within the platform to award creators, guests or participants with currency (e.g., crypto currency) that is usable within the platform or outside of the platform. This could create an opportunity for generating revenue via use of the platform by producing high quality or well received audio content.
Although this description contains many specifics, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments thereof, as well as the best mode contemplated by the inventor of carrying out the invention. The invention, as described herein, is susceptible to various modifications and adaptations as would be appreciated by those having ordinary skill in the art to which the invention relates.
This application claims the benefit of U.S. Provisional Patent Application No. 62/884,965, filed on Aug. 9, 2019 and entitled SYSTEM AND METHOD FOR SEMI-AUTOMATED GUIDED AUDIO PRODUCTION AND DISTRIBUTION, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62884965 | Aug 2019 | US |