The present invention generally relates to preparing video files, more specifically to preparing video files using a portable electronic device.
In recent years, creation of video files has become a popular and significant tool in marketing and for freelance journalists. Businessmen, such as real estate agents, medical doctors, taxi drivers, lecturers and many others create video files to be distributed online. In many cases, the creators of such video files create the video files without a team of professionals, such as an editor, photographer and the like. Thus, creation and production of such video files is difficult, and it's a long process and requires skills and dedicated equipment.
In order to have an impact online, for example at YouTube or the social networks, the video file has to be polished and stand in some criteria. Creating such high-quality videos is still difficult and time consuming. Video editing is very time-consuming for non-video-savvy users. It takes hours to create a video with a predefined script, especially in case the creator is alone at the scene, using a mobile electronic device to capture the video, such as a tablet, laptop or smartphone. When mousing around with timelines, selecting IN and OUT points, synching the images with the audio track, and finally positioning the titling, setting motion effects and audio levels etc.
The creators need to perform many time-consuming tasks in order to achieve the high-quality video file, such as place the camera properly, add images or logo properly, avoid mistakes such as placing the logo in a manner that hides important object in the video, and many more. These tasks are
Have a significant learning curve, which places a burden on businessman during the standard working day.
It is an object of the claimed invention to disclose a method of composing a video file, comprising receiving a video file having speech from an electronic device used to capture the video file, extracting the speech from the video file and converting the speech to text, analyzing the text to identify breathing points in the text, assigning a start time code and end time code to sections of the text, said sections of the text are defined between two breathing points in the text and displaying the video file while each section of the sections of the text is displayed between the start time code and end time code associated with the identified breathing points.
In some cases, the method further comprises receiving a command from the user of the electronic device to add a media file to the video file, said video file is associated with text portion of the text converted from the speech of the video, said media file is displayed in a time slot selected associated with the text.
In some cases, the method further comprises extracting the media file from a website page and add the media file to the video file received from the mobile electronic device.
In some cases, the breathing points in the video are identified according to script provided by a creator of the video file. In some cases, the method further comprises assigning a time code for the breathing points in the video file. In some cases, the breathing points are identified according to predefined stops in the script. In some cases, the method further comprises defining a camera angle for at least one of the two or more scenes.
In some cases, the method further comprises determining a background of a scene of the two or more scenes according to the viewer of the composed video. In some cases, the breathing points are commas and full stops in the text. In some cases, the breathing points are identified by analyzing the speech. In some cases, the method further comprising evaluating the breathing points to determine whether or not the presenter took enough time to breathe.
In some cases, evaluating the breathing points by analyzing images of the video and utilize face recognition algorithms to define changes in the presenter's face.
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
The present invention discloses a method for creating, editing, analyzing and sharing a digital video content by a presenter of the video content. The presenter of the video content may be the person who owns the video content and may seek to share the video content among video content consumers. In some cases, the presenter may be the person who uploaded the video content to a dedicated content control system and thereby granted with ownership permissions. In some other cases, the presenter may be a person granted with ownership permissions on some of the video content. Such ownership permissions may allow the presenter to manage the lifecycle of the video content. Managing the lifecycle of the video content may comprise actions such as, upload, edit, share, grant permissions to other participants, delete, and the like.
In some embodiments of the present invention, the lifecycle of the video content may begin by the presenter of the video content by adding video files to a content control system. The process of adding video files may be supported by a dedicate interface such as a website interface, a command line interface, a programmable application interface, and the like. In some cases, the lifecycle of the video file may also comprise inputting a header topic into the content control system. The content control system disclosed in the present invention can be a computerized device such as a personal computer, server, cluster of servers, mobile electronic device, a tablet computer, a computerized mobile device, laptop and the like. In some cases, the content control system may be operated on a server connected to communication networks such as LAN, WAN, Internet connection and others. The content control system may also be configured to receive communication from presenter and participants seeking to manage, control or consume visual content stored in the content control system.
In possible embodiments of the present invention, the process of preparing and adding video file to the content control system may be supported by a dedicated interface such as scripts controlled by the video content system. The script, or the scripts, may be prepared by the presenter, or sent from a remote device, such as from a colleague of the presenter, or a combination thereof. In some cases, the script may be capable to determine the progress speed of the script, or the total time of the video. In some other cases, the script prepared by the presenter may be operated in order to capture the presenter speaking in front of the camera. Thus, the presenter may also be provided with the option to edit the script, add or remove parts in the script. For example, a presenter may utilize the camera integrated in a mobile device to shoot a video scene. The presenter may also have the option to upload the video content from the mobile device and then, the presenter may have the option to edit a script for determining the video content properties such as the speed of the video content, the running time, external content of text or sound, and the like. The presenter may utilize the script in order to link an external sound file that may be played in some parts during the video content display, or add an option for subtitles and text displayed during the video content display. In some other cases, the content added to the script may be automatically translated to other languages and be integrated as an additional content layer to the video. The content control system may also be configured to provide a graphic interface to the presenters in order to edit the script. Thus, the presenters may be able to manage and set the video properties via a graphic interface and the content control system may translate it to a script capable to run at the client computerized device.
In some embodiments of the present invention the content control system may provide the presenter with the option to extract and/or add information provided by social networks such as Twitter and Facebook. For example, a presenter may have the option to inject text from Twitter into a video content the presenter owns. The presenter may also have the option to define the time duration and the exact place in the screen of the injected text. The content system may also have search capabilities which can be utilized by people. The search query may be generated by the content control system, according to the video content properties defined by the presenter. The video content properties may be defined automatically or by the presenter via a dedicated interface or a script inputted into the system. The video file may comprise a news article, a greeting, an opinion and the like. The content control system enables reporters and every person to record themselves speaking using a camera of a mobile electronic device in order to create a quality video and distribute it. For example, in response to a search query defined by the title of the video content.
In some cases, after one or more video shots captured by a camera of a mobile device operated by the presenter, the video file can be composed, as detailed below. The composed video can be distributed to subscribers of the presenter, for example via social networks or messages such as email or SMS. In some cases, the composed video may also be distributed via a media corporation such as CNN, BBC and the like.
In some cases, the communication module 110 may be configured to transmit the video file in real-time. Thus, the video captured by the input unit 150 and converted to a video file may be transmitted to the content control system 180, automatically after the conversion process of the video file has completed. The client side 100 also comprises a sound trimming unit 135 designed to convert the sound content provided by the microphone 165 to an audio track. In some cases, the sound trimming unit 135 may be configured to remove sections in the audio track of the video file, in which a background noise interferes with hearing the presenter's voice. The sound trimming unit 135 may also be configured to remove sound which may not be related or part of the presenter speak. In some embodiments of the present invention, the sound trimming unit 135 may be configured to sample the speaker's voice and then, per configuration settings or script commands, to remove background sounds and noise which may not belong to the presenter's speech. In some cases, the sound trimming unit 135 may provide an interface to the presenter operated client side 100 to approve a removal action performed by the sound trimming unit 135.
Step 225 discloses the computerized system capturing the presenter's background prior to capturing the video shot. In some cases, capturing the background may take a predefined duration, in terms of seconds, and terminate with an alert, a notification or a message displayed on the display device of the presenter's computerized mobile device. Step 225 is an optional step and might be unnecessary, for example using a blue/green screen or with matting algorithms, for example algorithms that require a scribble interface.
Step 230 discloses issuing an alert which indicates to the presenter when capturing the video shot begins. The alert may be a played sound. For example in a countdown from five to zero. The alert may be played by the display device of the presenter's computerized device. In some cases, the computerized system may be configured to start automatically to capture the video shot, after the alert has finished. Step 235 discloses displaying script on a display device of the computerized device during the time the video shot is captured. This step is also optional as some presenters do not need the script while taking the video shot. Moreover, some presenters may prefer using a rear camera of the computerized device, so they cannot see the screen with the script. The script may be displayed in a predefined speed, for example as inputted by the presenter prior to the video shot. The script may enable the presenter the possibility to be a sole creator of a quality video and save time in preparing to a video shot, without the necessity of memorizing the script nor with an aid of a crew in addition to the presenter.
Step 240 discloses adjusting video content properties according to a presenter input after capturing a video shot. Such video properties may be the teleprompter progress setting, audio level, location of the presenter and the like. Said adjustment may be performed via a specific interface as detailed below.
Step 245 discloses trimming the video file to an audio track captured by the mobile electronic device while capturing video content. Trimming the video file improves the video, for example by removing parts of the video in which the presenter does not speak. Naturally, the presenter may pause speaking, for example when breathing, and the trimming comprises identifying time slots in the video file that are longer than the natural breaks. The trimming discloses identifying audio levels throughout the video timeline, for example the audio levels in various time intervals throughout the video shot. Trimming the video may also remove a section in the video in which a background noise interferes with hearing the presenter's voice. Step 250 discloses receiving a user confirmation to upload the video file from the mobile electronic device to a predefined destination.
The display device interface 605 comprises an auto-trim button 620 enables the presenter to automatically trim the video to the audio. For example, in case the video file is displayed in the display device interface 605 and the presenter decide to trim the video file in order to cut-out parts of silence, the presenter may utilize the auto-trim button 620 to clean-out the parts with the silence. In one embodiment the computerized system automatically identifies intervals in the video of a minimal predefined duration, for example 0.5 sec as indicated by the “Trim Level settings”. In some exemplary cases, the computerized system determines whether the audio level in the intervals is higher than a predefined threshold, and if the audio is lower than the threshold, the interval is marked for further processing. Trimming also allows to automatically pre-select a subset of the script recorded by the presenter in order to keep it as part of the final video file.
Display device interface 605 also comprises an upload 650 in order to upload the video file to a content control system as disclosed above, and save it with via save button 655. The display device interface 605 also comprises background button 625 utilized to add a virtual background to the video content, for example a background represented by an image, or a color can be added to the video content. In some cases, the presenter can press the auto trim button, the video goes to the beginning of next audio section where the following progression is performed. The background button 625 may also introduce a green-screen option to the presenter. Thus, upon choosing the green-screen option, the presenter may be able to pick a particular space or a specific color, and to convert it to a green-screen space. The presenter may be able to add diverse virtual backgrounds as wallpapers attached to the green-screen space In case the user wished to adjust the background, any other computerized system operated on a user device can be applied to change the background of the video behind the presenter. The presenter may also adjust the lighting in the area where the video is captured. The same video file can be saved in several versions with various computerized systems operated on different user devices, for example to enable the video file to be branded differently for varied customer types. In some cases, the computerized system may begin capturing the video automatically when the presenter's voice is detected.
The scene interface 705 may also enable the presenter to create a sequence of video from a social network post, or from content residing at a link. The sequence of video comprises several items extracted from the link, for example the sequence begins with an image or several images, then the text, and the presenter has it all automatically. The presenter can select the content or the link according to results from the search query generated according to the script. In such search query the key words which may be needed to be used can be automatically identified by the script and then the query is generated. In some cases, the video sequence is generated using a predefined automated template. For example, some templates define the difference level in the components of a social network post should be presented in the video sequence. For example, generating a video from the tweet of Tweeter icon 735 and locate it above the scene box such as scene box 715.
In some cases, the scene interface may also enable the presenter to insert images that represent the video scenes as background. The presenter may utilize a script to insert an image or a video from a website or social network application into the scene box 745, such that the inserted image or video will replace the image of the presenter with a visual animation extracted from the social post. In some cases, the script may be utilized to generate a video which can be used as a background of a video scene by integrating a sequence of images stored in a specific web page, for example a Facebook album. For example, a presenter may choose an album on a social network application, or a sequence of images stored in a computerized system connected to a communication network, such the Internet. The presenter may be able to define the image sequence as the background of the newly prepared video content. The presenter may also be able to define the duration time between of each image in the sequence. In some cases, the video scenes may be defined by additional algorithms, for example algorithms which can utilize a number of words and/or audio activity per time slot or time interval. The video scene creation algorithm may automatically detect changes in the video scenes, according to audio activity in case a phrase starts or a longer/shorter duration as scenes cannot be more 7 or seconds long. In some cases, the scenes should be between 3 to 10 seconds long.
The method disclosed in the present invention also comprises process for automatic detection and identification of breathing points in a speech of a video presenter. The breathing points may be used to define or detect video scenes in which a specific event or command is required. Such event or command may be removing a portion of the video, replacing the background, or artificially moving the camera.
Said process for detection and identification of breathing points may comprise a step of analyzing the script and identifying the possible breathing points, such as commas and full stops in the text. Then, the system can define the time stamp of the breathing points, for example to define when the most important breathing pauses exist, by analyzing the signal of the audio track (speech to text algorithm). Once the process of identifying the candidates has completed, the breathing points can be evaluated, for example to determine whether or not the presenter took enough time to breathe, in case it can be a change point within the video scene. Said process may also analyze images of the video and utilize face recognition algorithms to define changes in the presenter face. For example, cases when the presenter's mouth is closed.
Step 930 discloses generating a video sequence according to content extracted from a web page, for example from a news website or from a social media post. The video sequence is automatically assembled from the content of the link, for example the title of the web page is displayed for 1.5 seconds, then the upper image for 2 seconds and the latter image for 1.5 seconds, as the scene associated with the video sequence has a duration of 1.5 seconds. The sequence may be generated according to a predefined automated formula or template, as detailed below. Step 935 discloses inserting a location signature of the video, for example Manhattan, for further analysis of to enhance distribution of the video file to subscribers according to geographic preferences of the subscribers. Step 940 discloses displaying the video on a display device, wherein the video is separated into the two or more scenes, with a background and a portion of the script for at least two of the two or more scenes. Creation of the video sequence may also comprise applying filters on least a portion of the images of the video, for example a black and white filter. The filters may be applied either on the background images or on the foreground content, for example the presenter's image. In some cases, the filters may be applied on the audio track only, or on a predefined portion of the video, for example the second scene, or only when the presenter appears.
The guest may be located in a remote location, relative to the presenter. The guest may download the mobile application, click on “Guest”, search for the name of the journalist or the presenter, and a message from the presenter directs the guest to the relevant page in the mobile application. Then, the presenter is notified that the interview may start when the guest is on the take page
After the interview begins, an audio channel is used to exchange audio files or streams between the presenter and the guest. In some cases, the video is recorded locally—in the GUEST app for the Guest electronic device, and the Presenter App for the presenter electronic device, and after the interview ends, the audio and video files from both electronic devices, of the presenter and the guest, are sent to a server.
Some text questions can be entered prior to the interview and read directly by the guest in the mobile application either in real-time or pre-loaded so that the guest may record the interview “offline” without the presenter being on the other side. The presenter listens to the guests in real-time and may mark interesting points said by the Guest in real-time. Questions, generated by the presenter, and answers, generated by the guest, may be displayed as a single sequence of scenes, as the scenes are cut according to a scene cutting algorithm. In some cases, video is presented only when the person is speaking and not when the person is listening. According to audio activity, the studio shows who is talking at a given moment.
The present invention also discloses a method and a computerized system for producing a video file. The method elaborated below enables a person to create a video file using a single electronic device and match the time in which a media file replaces a portion of a pre-recorded video to form the video desired by the user. The user's device may be portable, such as a laptop, tablet or smartphone, or may be a personal computer. Many times journalists, or any creator of a video file, wish to embed a media file into a video file they pre-recorded. The pre-recorded video file may be a video file in which the user's device captures a video of the user, or a video showing other persons or scenes with the user's speech accompanying the video. The user may watch the video and realize that an image or video may improve the quality or popularity of the video, and wish to add the image or video in a specific time stamp in the video, without using complicated video editing software, using the portable electronic device. The video or image may be stored on the user's device, or on a link embedded into the video, such that when the video reaches a predefined time stamp, the user's device, or any device that displays the video, requests the video from the web server.
In step 1110, the user's electronic device transmits the video to a server for further processing. The server may be an online server, storing computerized software or firmware for analyzing the video file, as elaborated below. The server ma have a user interface enabling the user to select an operation to be performed by the server on the video. In some other cases, the user may input the analysis commands into a user interface of the dedicated application which interacts with the server, for example sending the analysis or analysis type selected by the user to the server.
In step 1120, the server activates selected analysis engine to identify breathing points in the video. The user may choose to analyze the video according to keywords, automatically identify segments on the video using speech recognition algorithms, for example using Natural language processing (NLP), or transforming speech into time-coded transcript. In some cases, the selection of the analysis engine may be done automatically by the server, for example according to properties of the video received from the user's device, or according to congestion or load already applied on one of the analysis engines. In some cases, the analysis engines are located remotely from the server, and provide on demand analysis to the video, for example on a pay-per-video basis.
The breathing points may be stops in the speech associated with the video file, in which the presenter needed to breath. The breathing points define scenes in the video.
In some exemplary cases, the video is sent without a text, and the user requests text divided into time slots as an output from the server. This way, the first step would be to convert the speech of the video into text, and then activate the text analysis engine to output time-coded text. In this case, the term transcript also applies to text extracted from the video in an automated manner, without receiving a text file from the user, in addition to the video file.
Step 1130 discloses assigning a start time code and end time code to sections of the text, said sections of the text are defined between two breathing points in the text. For example, the first 5 words are grouped into one section, having start time at 1.2 seconds from the beginning of the video file and end time of 2.6 seconds from the beginning of the video file. The next 8 words are grouped into the second section, which begin at 4.5 from the beginning of the video file and has an end time 6.2 seconds from the beginning of the video file.
Step 1140 discloses displaying the video file while each section of the sections of the text is displayed between the start time code and end time code associated with the identified breathing points. This way, the presenter can view the text associated with each scene during the entire scene. For example, the second scene is associated with the text of the second section, from 4.5 to 6.2 seconds from the beginning of the video file.
After the speech is divided into sections having start time and end time, the server transmits the time-coded transcript to user's mobile electronic device, as shown in step 1150.
Step 1200 discloses the user's mobile electronic device receiving analyzed speech or text from server. Recipient of the analyzed speech or text may be provided directly to the dedicated software installed on the user's device, or via an email, SMS or instant message. The received speech may include time stamps in which segments of the video file end. The received analyzed text may associate words, parts of words and groups of words to time stamps of the video.
Step 1210 discloses displaying video divided into segments. The divided segments are displayed on the user's device. Each segment may consume the entire display or a portion thereof. The displayed segments comprise a user interface section for editing the segment. For example, the upper half of the screen shows the video, while the lower half shows buttons that enable the user to input commands, for example mark portions in the video, change background color, add a visual feature from a menu of the dedicated software, and the like. The user interface may also enable the user to mark the segment or time stamp in which a media file is to replace the pre-recorded video that was sent to the server for analysis. The group of segments may be displayed one on top of the other with their corresponding time stamps, and once the user selects a segment, the entire screen is allocated to the selected segment.
Step 1220 discloses user's mobile electronic device receiving user selection of portion of the video to be edited. The user's selection may be provided on a user interface on a dedicated software operating on the user's device. In some other cases, the user's selection may be made by the user marking the portion by a cursor or using a touch operated interface. In some other cases, the user may speak the text to be selected and an application operating on the user's device or on the server will identify the text from the user's speech. After receiving user selection of portion of the video to be edited, the user's device automatically assigns the time stamp associated with the selected portion of the video, for example 7.2-9.8 seconds from the beginning of the video.
Step 1230 discloses receiving a user's selection of a media item to replace the selected portion of video. The media item may be stored on the user's device, a web server, a cloud service, or via a link to media sharing website such as CNN, ESPN or YouTube. The media item may be extracted from a social media link, such as Facebook, tweeter and the like. The user may input the link or the video file itself. The media file may be an audio file, an image or a video file. When the media file is audio or video having a predefined duration, the user's device may verify that the portion of the video selected by the user fits the duration of the video or audio file. That is, when the selected segment consumes 13 seconds and the video's duration is 30 seconds, the server can either play the first 13 seconds of the selected video or suggest an alternative, such as seconds 6 to 19. In case the selected video's duration is 4 seconds, the server may ask the user to select another video, at least of the segment's duration. This way, the user may re-select the media file or re-select the parts of the video to fit to the selected media file. In some exemplary embodiments, the user may first select the media file, select the beginning of the text where the media file is to be inserted, and the user's device automatically marks the entire text consumed by the media file. For example marking 77 words in yellow, when the selected video file is 12 seconds.
Step 1240 discloses editing recorded video by replacing time-coded selected portion with a selected media item. The editing comprises deleting the selected portion and adding the media file in the s time stamp elected by the user. Step 250 discloses modifying media item in the video according to user's selection. Modifying the media item may include adding a visual feature to the media item, for example changing a colored video to black and white video, adjusting the volume, changing the video speed to slow-motion, tilting the video, applying zoom-in, adding a logo and the like. After modifying the media item, if so desired by the user, the video file is finalized. Hence, as disclosed in step 260, the user publishes the video. Such publishing may be performed by sending the video to the broadcasting agency or channel that will show it, or upload the video to a social media account of the user, for example Facebook, Twitter and the like. In some exemplary cases, when the user first downloads the dedicated software on the user's device, the user also associate social media accounts to the software, such that when publishing the video, the software suggest to the user targets for the publishing—email accounts, instant messaging accounts, social media accounts, YouTube, and the like.
The server 1420 comprises a communication module 1428 configured to communicate with the user's device 1410. Such communication may be run via the internet or another technique or protocol desired by the person skilled in the art. The server 1420 also comprises a text analysis engine 1422, configured to analyze the text associated with the video. The text may be transcript of the video as inputted by the user into the user's device 1410 and sent to the server 1420. The text may be extracted using speech to text techniques. The server 1420 also comprises a speech analysis engine configured to analyze speech as disclosed above. The output of the server 1420 comprises time stamp assigned to at least a portion of the text or segments of the pre-recorded video file. Then, the user uses the user's device 1410 to insert a media file into the pre-recorded video in a time selected by the user. After the video is finalized, the user publishes the video, for example by sending the video from the user's device 1410 to a publisher's device, such as a web server configured to receive videos from journalists.
While the disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings without departing from the essential scope thereof. Therefore, it is intended that the disclosed subject matter not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but only by the claims that follow.
This application is a Continuation-In-Part of U.S. patent application Ser. No. 15/257,941 filed on Sep. 7, 2016, which claims the benefit of priority of U.S. Provisional Application No. 62/215,050 filed on Sep. 7, 2015 entitled APPARATUS AND METHOD FOR GENERATING A VIDEO FILE BY A PRESENTER OF THE VIDEO. The contents of the above applications are all incorporated by reference as if fully set forth herein in their entirety
Number | Date | Country | |
---|---|---|---|
62215050 | Sep 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15257941 | Sep 2016 | US |
Child | 16016626 | US |