1. Field
The exemplary embodiments are directed to creating multimedia, and more specifically, to capturing, annotating, and sharing multimedia tips.
2. Description of the related art
In most organizational environments, a significant amount of knowledge transfer occurs not through official talks or documents but rather in the form of the unscheduled, brief interchange of tacit information. Many systems have attempted to help capture or augment this type of transfer, but it is difficult to encapsulate this kind of information in a way that is easy to replicate and share.
Mobile devices are particularly well suited to capturing and sharing otherwise ephemeral information since they are usually at hand and are highly personalized and flexible, often including front-facing as well as rear-facing cameras allowing for photo and video preview from a variety of user angles. Also, recent work has shown that people already are capturing a variety of multimedia information about products with their phones, including not only hardware but also computer and device screens.
Aspects of the exemplary embodiments involve an apparatus, including a camera receiving video feed; and a product identification module identifying a product from the video feed and retrieving information regarding the product.
Aspects of the exemplary embodiments may further involve a method, including receiving video feed from a camera; identifying a product from the video feed; and retrieving information regarding the product.
Aspects of the exemplary embodiments may further involve an apparatus, including a camera recording video from a video feed; a product identification module identifying a product from the video feed and retrieving information regarding the product; a bookmark creation module generating a bookmark in the recorded video, the bookmark including annotation and a tag generated from the retrieved information.
It is to be understood that both the foregoing and the following descriptions are exemplary and explanatory only and are not intended to limit the embodiments or the application thereof in any manner whatsoever.
The exemplary embodiments of the invention described here help users create tips for product operation and repair. A tip may contain one or more videos each of which can include any number of multimedia bookmarks. Bookmarks are associated with a time in the video and contain a timestamp, a keyframe (or frame from the video selected to represent the video), a name, and one or more annotations, each of which can contain any or all of: a short textual description of the bookmark; a high-resolution photo; a short audio clip, and a region marker that highlights a portion of the video frame at the time-code associated with the bookmark. In addition to its associated bookmarks, each tip can be given a name, an owner, a short description, as well as any number of text tags. Certain embodiments of the invention save videos, bookmarks, and annotations locally to the mobile device. When the user submits a tip each component is serialized, transmitted to the server, and stored in a database. Associated media files (videos, images) are stored on the server file system and referenced by path name in the database.
A search function allows the user to search tips from the database. The tips are created by other users of the system. The search system also includes a rating system, which allows users to vote for the quality and usefulness of the tip. The server maintains a record of votes cast for each tip which is updated whenever a user casts a vote.
Approaches other than OCR may also be used to search for products. For example, the server may store richer representations of products including high resolution photos or 3D models, or analyze such photos for image features. In this case, the mobile device could send an image as a query in addition to or instead of OCR text. The server could then attempt to match the image against its database of photos or 3D models to identify the object of interest, by using image features within the image. Those versed in the art will appreciate that feature-point based methods such as but not limited to those invented by Lowe form a basis for retrieval of images of similar objects.
Whether the user types the search term manually or uses the OCR scan or visual functionality to search for one, the search is always performed the same way. The selected query is submitted to the server and matching results are found based on similarity between the query and the tip contents. A list of the most similar tips is returned and shown to the user.
From the list of results the user can either choose to play a video capture from the beginning or select an individual bookmark (
In the video player 400 the user can play or pause the video and view the content associated with the capture in the seekbar 404, which can be clicked or gestured on for seeking to parts of the video. The user can swipe left or right to move to the previous or next bookmark if needed. The example provided in
After choosing a name for the tip, the user can record a video capture 502. While recording, the user can touch the screen to add a bookmark at the current time. In the background, the application creates a bookmark with empty annotation values and associates it with the current time in the video. The application also extracts a video frame from that time and associates it with the bookmark. The user can capture all the videos for a tip immediately after the scanning step, but it is also possible to capture additional videos after stopping the capture 503 and reviewing 505 and editing the current set of video captures and any bookmarks made 504.
More captures can be added at any time, and the same is true for modifying the capture, bookmark, and tip details. It should be apparent to one knowledgeable in the art that a tip previously created and submitted to the server 506 could be later altered by editing on a mobile device if such functionality was useful. Videos can be further processed 507 to remove black frames, as explained below in the description for
When all the necessary details are added, such as a tip name and description, and a name for each bookmark, the tip can be uploaded (submitted) to the server. If user did not provide all of the required information the tip cannot be uploaded and a message is shown describing what details are missing. After the tip has been uploaded the server will automatically process the captured video clips for removal of black frames or other purposes.
Once a tip and its associated captures are uploaded to the server, a variety of media processing can be performed offline.
Exemplary implementations of the exemplary embodiments are provided below. The following are hypothetical scenarios that illustrate how certain embodiments of the invention operate.
Ekim, a senior technician, heads out to a partner site to fix a problem with the Deluxe 9000-AX multifunction device. He also carries his mobile device to document the problem. Right away he can tell that the problem is likely a blown fuse. He opens his mobile device and using an exemplary embodiment, begins recording the first clip. The mobile device suggests that he names the video “Deluxe 9000-AXE”, which it recognized from video frames of the device. As he then makes his recording, Olga passes by and asks him about his child's cricket match. Ekim places his phone down to chat with Olga, knowing that this part of the video will be removed automatically in accordance to the flowchart of
Rakaj, a rookie technician, is having difficulty identifying a problem with a device he is servicing. He can tell that the problem is likely electrical, but otherwise is stumped. He searches the video archive for the device, “Deluxe 8050” and gets 12 responses back. He scans through video keyframes until he comes across one that seems to look the most like the issue he is facing on a very similar device. He clicks on the keyframe to start the video and sees that the problem is indeed similar to his. He then sees that it has been bookmarked as a “blown fuse” and goes back to the beginning to watch the rest of the video. When he finishes he gives the video a positive vote and then fixes the problem.
The exemplary embodiments thereby provide a mobile application for capturing, editing, storing, and searching “tips” composed of videos and associated multimedia annotations that document product operation and repair. The systems of the exemplary embodiments include a variety of mechanisms to help tip creators augment videos with meta-data that can help other users search the archive of tips. Specifically, the systems of the exemplary embodiments utilize a mobile application capture and edit one or more video clips and associated multimedia annotations of those clips which are then uploaded to a server.
The exemplary embodiments allow users to record multiple video clips to be associated with a tip; allow users to augment video clips with bookmarks, where bookmarks can have one or more multimedia annotations including: audio, still image, text, and marks placed on the captured videos and allows users to upload tip contents (videos, bookmarks, and their annotations) to a server to host authored tips for later retrieval.
The exemplary embodiments may use OCR of live video frames to help find product names in the database; use OCR of live video frames to help search for product tips; use image features of live video frames to help find product names in the database; and use image features of live video frames to help search for product tips.
The exemplary embodiments can also use speech-to-text engines to generate searchable text for bookmarks from the video clip's audio track; and post-process submitted videos to automatically remove unwanted segments.
The exemplary embodiments also provide a mobile retrieval and playback platform for video tips with various affordances to navigate among bookmarks, view bookmarks, skip between bookmarks while watching video playback.
The exemplary embodiments may also provide additional functionality to allow users to provide feedback (positive or negative) about submitted tips, and allow users to override OCR tools to enter product names or search tips via standard text entry.
By implementing the exemplary embodiments, users can document product issues they uncover in the field using their mobile phone to take videos that they can then annotate with a variety of multimedia annotations. These media are sent to a database that other users can search. Like many mobile applications, it is necessary to find a balance between ease-of-use and expressivity. If the application does not provide enough documentation support users capturing information will be frustrated. On the other hand, if the tool forces the users into too many unnatural tasks they will abandon it entirely. Similarly, users searching for help should he able to find information with minimal overhead. To address these issues, additional aspects of the exemplary embodiments that can help bootstrap documentation while also improving search are provided below.
Bookmark editing, annotation, and other meta-data: While recording video, users can click the screen to add a time-based bookmark without stopping the recording. After they finish recording, users can then move, delete, and annotate bookmarks with a variety of media. Users can also categorize captures as symptoms of a problem or solutions to a problem.
Live search: The integration of live OCR into the tool for both capture and search is useful to automatically set the product name without requiring text entry, which not only aids users uncomfortable with text entry on mobile devices, but also helps improve the consistency of the database. Users can launch a live video view that sends keyframes to a local OCR engine. The OCR engine extracts text from the scene and sends it to a server, which returns a scored list of products.
Speech-to-text: If the users capturing video do not add annotations to a bookmark, the exemplary embodiments can automatically select audio from the captured video in a time window around the bookmark and apply speech to text conversion. If the speech is converted with a high enough confidence, the text can be used as a searchable annotation. A tip author may utter some descriptive speech while recording a tip, in particular while marking a bookmark during capture. To help make this speech useful, speech recognition, or speech-to-text, may be employed on the server. The captured audio in the immediate neighborhood of a bookmark can be extracted for speech-to-text processing. The speech-to-text may be performed by some external service or with a local process. If speech is recognized with a good confidence level it can be added to the searchable bookmark text to aid text-based retrieval of tips.
Black frames: After a video is submitted, the exemplary embodiments can cull sections of videos that were clearly not meant to be viewed, such as long sequences of black frames. This allows users capturing video to set their mobile device aside while it is recording so that they can focus on the objects they are documenting.
Moreover, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. Various aspects and/or components of the described embodiments may be used singly or in any combination. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.