The present invention relates a system and method for predictive camera control. For example, a method for selecting an angle of a camera in a virtual camera arrangement or selecting a camera from a multi-camera system.
Current methods of event broadcasts, for example sports broadcasts, consist primarily of pre-set shots from stationary and mobile cameras operating outside or on top of a field. Standard camera types used in sports broadcasts include stationary pan-tilt-zoom cameras (PTZ), dolly cameras on rails which run along the edge of the field, boom cameras on long arms give a limited view over the field, mobile stabilized cameras for the sideline and end lines and overhead cameras on cable systems. The cameras are positioned and pre-set shots are selected based on experience of typical game-play and related action. For maximum coverage and ease of storytelling a director will assign a number of camera shots to each camera operator, for example a dolly camera operator may be tasked with tracking play along the sideline using either mid or close up shots. All camera feeds are relayed into a broadcast control room. A director determines which sequence of camera feeds will compose the live broadcast. The director selects camera feeds depending on which one best frames the action and how the director wishes to tell the story. The director uses cameras shooting at different angles to build toward an exciting moment for example, by transitioning from a wide shot to a tracking shot to a close up. Multiple camera shots are also used in replays to better describe a past action sequence by giving different views of the action and by presenting some shots in slow-motion.
The current method of broadcasting sport has evolved within the constraints of current camera systems and to accommodate the requirements from TV audiences. A problem with current systems of pre-set cameras and shots is that the current systems cannot always provide the best shot of the action and cannot respond quickly to changes in the action. Sports such as soccer (football), are high paced with dramatic changes in the direction and location of action on field. The slow response of physical cameras and operators makes capturing fast paced action difficult. As a work-around, the director may use wide shots when dramatic changes in action occur because the director does not have a camera in an appropriate location, ready to frame a mid or close up shot. Wide shots are necessary as wide shots provide context but are less exciting than mid or close shots where the TV audience can see the expressions on players' faces or the interaction between players.
The limitations of physical camera setups inhibit not only the broadcasters' ability to respond to large location based changes, but also to respond to smaller localised changes. For example, players routinely turn around on field. In this instance a player may initially have been well framed, but once the player turns around their body may occlude their face, the ball and the direction of play. Seeing the back of a player is not as informative as a front or side view of the player. A front or side view can include the player's current location, oncoming opponents and the probable destination of the ball when it is passed. There are usually not enough cameras in current broadcast systems to capture all angles of play. Similarly in locations relating to security surveillance or a theatre, there may not be enough cameras to capture relevant person or events from a perspective of a viewer.
A particular problem with current systems of broadcasting sport, for example, is that camera shots are selected in reaction to the action on the field. The director is not able to predict what will happen in the current action sequence or where the next action sequence will take place. One known method describes a method for generating virtual camera views in a manner which pre-empts how the current action sequence will develop. The known method identifies objects with attached sensors e.g., the ball, determines characteristics of the ball such as trajectory, and places a virtual camera view pre-emptively to film the ball as it moves. The known method is able to predict how the current action sequence will develop but is not able to determine where the next action sequence will take place.
Another known method uses information about regions of interest based on gaze data collected from multiple users. For example, if many spectators in a sporting area are looking or gazing in a particular direction, then that region is determined as a region of interesting action and camera shot can be selected to capture action from that region. The method using information about regions of interest uses gaze direction acquired from head mounted displays (HMDs) to identify and prioritise points of interest. The method using information about regions of interest may in some instances be used for generating heat maps or for displaying augmented graphics on the players and field. The disadvantage of the method using information about regions of interest is that the scene can only be captured after the “action” has started. In situations of a fast paced action, it may not be possible to capture the action in time due to the inherent latency of the system.
It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
One aspect of the present disclosure provides a computer-implemented method of selecting a camera angle, the method comprising: determining a visual fixation point of a viewer of a scene using eye gaze data from an eye gaze tracking device; detecting, from the eye gaze data, one or more saccades from the visual fixation point of the viewer, the one or more saccades indicating a one or more regions of future interest to the viewer; selecting, based on the detected one or more saccades, a region of the scene; and selecting a camera angle of a camera, the camera capturing video data of the selected region using the selected angle.
In some aspects, the visual fixation point is determined by comparing eye gaze data of the viewer captured from the eye gaze tracking device and video data of the scene captured by the camera.
In some aspects, the detected saccades are used to determine a plurality of regions of future interest and selecting the region relates to selecting one or more of the plurality of regions of interest.
In some aspects, selecting the camera angle further comprises selecting the camera from a multi-camera system configured to capture video data of the scene.
In some aspects, the scene is of a game and selecting the region of the scene based on the one or more saccades comprises prioritising the one or more regions of future interest according to game plays detected during play of the game.
In some aspects, the scene is of a game and selecting the region of the scene comprises prioritising the one or more regions of future interest based upon one or more of standard game plays associated with players of the game, fitness of a team playing the game or characteristics of opponents marking players of the game.
In some aspects, the method further comprises determining the plurality of future points of interest based upon determining a direction of each of the one or more saccades,
Another aspect of the present disclosure provides a computer-implemented method of selecting a camera of a multi-camera system configured to capture a scene, the method comprising: detecting a visual fixation point of a viewer of the scene and one or more saccades of the viewer relative to the visual fixation point using eye gaze data from an eye gaze tracking device; determining an object of interest in the scene based on at least the detected one or more saccades of the viewer, the object of interest being determined to have increasing relevance to the viewer of the scene; and selecting a camera of the multi-camera system, the selected camera having a field of view including the determined object of interest in the scene, the camera capturing video data of the determined object of interest.
In some aspects, the method further comprises determining trajectory data associated with the determined object of interest, wherein the camera of the multi-camera system is selected using the determined trajectory data.
In some aspects, the method further comprises determining based on the determined object of interest, and augmenting the video data with the graphical content.
In some aspects, selecting the camera of the multi-camera system comprises selecting at least one camera of the multi-camera system and generating a virtual camera view using the selected at least one camera.
In some aspects, selecting the camera of the multi-camera system comprises determining a plurality of virtual camera views, the virtual camera views generated by the cameras of the multi-camera system; and prioritising the plurality of virtual camera views based upon proximity of each virtual camera view relative to the determined object of interest.
In some aspects, selecting the camera of the multi-camera system comprises determining a plurality of virtual camera views, the virtual camera views generated by the cameras of the multi-camera system; and prioritising the plurality of virtual camera views based on an angle of each virtual camera view relative to the object of interest.
In some aspects, the camera is selected based on time required to re-frame the camera to capture video data of the determined object of interest
In some aspects, the selecting the camera of the multi-camera system comprises selecting a setting of the camera based upon the determined object of interest.
Another aspect of the present disclosure provides a computer readable medium having a program stored thereon for selecting a camera angle, the program comprising: code for determining a visual fixation point of a viewer of a scene using eye gaze data from an eye gaze tracking device; code for detecting, one or more saccades from the visual fixation point of the viewer, the one or more saccades indicating a one or more regions of future interest to the viewer; code for selecting, based on the detected one or more saccades, a region of the scene; and code for selecting a camera angle of a second camera, the camera capturing video data of the selected region using the selected angle.
Another aspect of the present disclosure provides apparatus for selecting a camera angle, the apparatus configured to: determine a visual fixation point of a viewer of a scene using eye gaze data from an eye gaze tracking device; detect, from the eye gaze tracking data, one or more saccades from the visual fixation point of the viewer, the one or more saccades indicating a one or more regions of future interest to the viewer; select, based on the detected one or more saccades, a region of the scene; and select a camera angle of a camera, the camera capturing video data of the selected region using the selected angle.
Another aspect of the present disclosure provides a system, comprising: an eye gaze tracking device for detecting eye gaze data of a viewer of a scene; a multi-camera system configured to capture video data of the scene; a memory for storing data and a computer readable medium; and a processor coupled to the memory for executing a computer program, the program having instructions for: detecting, using the eye gaze tracking data, a visual fixation point of the viewer and one or more saccades of the viewer relative to the visual fixation point; determining an object of interest in the scene based on at least the detected one or more saccades of the viewer, the object of interest being determined to have increasing relevance to the viewer of the scene; and selecting a camera of the multi-camera system, the selected camera having a field of view including the determined object of interest in the scene, the second camera capturing video data of the determined object of interest.
One or more embodiments of the invention will now be described with reference to the following drawings, in which:
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
In the arrangements described, one or more viewers at the sport stadium are wearing head mounted displays. The head mounted displays track the viewers' gaze data. The gaze data identifies fixations (when the eye stays still) and saccades (rapid eye movements between fixations). While watching the game, the viewers' eyes will track the game play with saccades and fixations. At times however eyes of one or more of the viewers will have saccades and fixations which do not track the game play. The divergent saccades may be predictive saccades or may be due to other factors such as viewer distraction.
Predictive saccades can be distinguished from other saccades by specific attributes such as reduced speed, and can be characterised by an associated velocity profile when compared to velocity profiles of non-predictive or random saccades. Predictive saccades prepare the brain for a next action in a current task, for example, turning the head. Predictive saccades indicate where the viewer predicts the viewer's next or future point of interest or action will be. The arrangements described use predictive saccades to prioritise possible next future action events using an example game being played on a sporting field e.g., who will a player kick a ball to, out of all the players available to receive the ball. Further, the arrangements described use the prediction of the possible future action to adjust a camera angle to capture the action in time.
The predictive saccades of experts are more accurate and have lower latency than the predictive saccades of novices when playing or watching sports. Accordingly, some of the arrangements described use saccade data from expert viewers such as commentators.
The arrangements described relate to the predictive saccade direction pointing to possible future points of interest in the game play. When these points of interest are pre-emptively identified a suitable camera nearby can be selected and re-framed in preparation for action at that point of interest. Alternatively a virtual camera view can be generated to frame the point of interest before the action sequence occurs.
Some of the arrangements described use predictive saccade data to identify possible future points of interest then select one or more real cameras or position virtual camera views to get the best shot of the action.
The predictive camera positioning system 100 also includes a second camera, 187, configured to capture video footage of a target location 165, such as a playing field or stadium, and as target objects of interest 170 such as players, balls, goals and other physical objects on or around the target location 165. The target objects 170 are objects involved in a current action sequence.
The system 100 includes a point of interest prediction software architecture 190. The software architecture 190 operates to receive data from the eye gaze tracking camera 127 and the camera 187. The point of interest prediction software architecture 190 consists of a gaze location module 191. The gaze location module 191 uses data from the eye gaze tracking camera 127 to identify the viewer's eye (185) gaze location. The point of interest prediction software architecture 190 also consists of a saccade recognition module 192. The saccade recognition module 192 uses data from the eye gaze tracking camera 127 to identify saccades, as distinct from fixations.
The point of interest prediction software architecture 190 also comprises of a predictive saccade detection module 193 which uses data from the eye gaze tracking camera 127 and the saccade recognition module 192 to isolate predictive saccades and identify key predictive saccade characteristics such as direction, velocity and latency. The point of interest prediction software architecture 190 also comprises a point of interest module 194. The point of interest module 194 estimates one or more future points of interest by using data from the camera 187 and predictive saccade characteristics from the predictive saccade detection module 193.
The arrangements described relate to camera selection and positioning, and more particularly to a system and method for automatic predictive camera positioning to capture a most appropriate camera shot of future action sequences at a sports game. Selecting a camera in some arrangements relates to controlling selection of a camera parameter. The predictive camera positioning described is based on predictive saccades of viewers at watching the game. As the predictive saccades of expert viewers are more accurate than novice viewers, some arrangements described hereafter use the saccades of commentators and other expert viewers at the sports game.
The saccades 320 relate to rapid eye movements relative to the fixations 310 during which the eye is not taking in visual data. In
There are two general categories of saccade, referred to as reflexive and volitional saccades. Reflexive saccades occur when the eye is being repositioned toward visually salient parts of the scene, for example high contrast or colourful objects. The second type of saccades, volitional saccades, occur when the eye is being moved to attend to objects or parts of the scene which are relevant to the viewer's current goal or task. For example, if a user is looking for her green pencil, her saccades will move toward green objects in the scene rather than visually salient objects. Reflexive saccades are bottom-up responding to the environment. Volitional saccades are top-down reflecting the viewer's current task.
Predictive saccades, also known as anticipatory saccades, are one type of volitional saccade used in the arrangements described. Predictive saccades help prepare the human brain for action and are driven by the viewer's current task. Predictive saccades tend toward locations in an environment where a next step of the viewer's task takes place. For example, predictive saccades may look toward the next tool required in a task sequence or may look toward an empty space where the viewer expects the next piece of information will appear. Accordingly, predictive saccades pre-empt changes in the environment that are relevant to the current task, and indicate regions of future interest to the viewer. Predictive saccades can have negative latencies, that is the predictive saccades can occur up to −300 ms before an expected or remembered visual target has appeared. One study (Smit, Arend C; “A quantitative analysis of the human saccadic system in different experimental paradigms;” (1989).) describes how a batsman's gaze deviates from smooth pursuit of the cricket ball trajectory to pre-empt the ball's future bounce location. The Smit study explains that the bounce location is important to determining the post bounce trajectory of the ball, which explains why the batsman's gaze pre-emptively moves there. The Smit study also found that expert batsmen are better than novices at reacting to ball trajectory, and proposes that expert batsmen react better than novice due to differences predictive saccades.
The arrangements described use predictive saccades to pre-emptively select or position a camera angle to capture the action.
As seen in
The computer module 101 typically includes at least one processor unit 105, and a memory unit 106. For example, the memory unit 106 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 101 also includes an number of input/output (I/O) interfaces including: an audio-video interface 107 that couples to the video display 114, loudspeakers 117 and microphone 180; an I/O interface 113 that couples to the keyboard 102, mouse 103, scanner 126, cameras 127 and 187 and optionally a joystick or other human interface device (not illustrated); and an interface 108 for the external modem 116 and printer 115. In some implementations, the modem 116 may be incorporated within the computer module 101, for example within the interface 108. The computer module 101 also has a local network interface 111, which permits coupling of the computer system 100 via a connection 123 to a local-area communications network 122, known as a Local Area Network (LAN). As illustrated in
The computer module 101 is typically a server computer in communication with the cameras 127 and 187. In some arrangements, the computer module 101 may be a portable or desktop computing device such as a tablet or a laptop. In arrangements where the eye gaze tracking camera 127 is a head mountable device, the computer module 101 may be implemented as part of the camera 127.
The camera 127 provides a typical implementation of an eye gaze tracking device for collecting and providing eye gaze tracking data. The eye gaze tracking camera 127 may comprise one or more image capture devices suitable for capturing image data, for example one or more digital cameras. The eye gaze tracking camera 127 typically comprises one or more video cameras, each video camera being integral to a head mountable display worn by a viewer of a game. Alternatively, the camera 127 may be implemented as part of computing device or attached to a fixed object such as a computer of furniture.
The camera 187 may comprise one or more image capture devices suitable for capturing video data, for example one or more digital video cameras. The camera 187 typically relates to a plurality of video cameras forming a multi-camera system for capturing video of a scene. The camera 187 may relate to cameras integral to a head mountable display worn by a viewer and/or cameras positioned around the scene, for example around a field on which a game is played. The computer module 101 can control one or more settings of the camera 187 such as angle, pan-tilt-zoom settings, light settings including depth of field, ISO and colour settings, and the like. If the camera 187 is mounted on a dolly, the computer module 101 may control position of the camera 187 relative to the scene.
The cameras 127 and 187 may each be in one of wired or wireless communication, or a combination or wired and wireless communication, with the computer module 101.
The I/O interfaces 108 and 113 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 109 are provided and typically include a hard disk drive (HDD) 110. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 112 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 100.
The components 105 to 113 of the computer module 101 typically communicate via an interconnected bus 104 and in a manner that results in a conventional mode of operation of the computer system 100 known to those in the relevant art. For example, the processor 105 is coupled to the system bus 104 using a connection 118. Likewise, the memory 106 and optical disk drive 112 are coupled to the system bus 104 by connections 119. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple Mac™ or like computer systems.
The methods relating to
The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 100 from the computer readable medium, and then executed by the computer system 100. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 100 preferably effects an advantageous apparatus for methods of selecting a camera.
The software 133 is typically stored in the HDD 110 or the memory 106. The software is loaded into the computer system 100 from a computer readable medium, and executed by the computer system 100. Thus, for example, the software 133 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 125 that is read by the optical disk drive 112. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 100 preferably effects an apparatus for implementing the arrangements described.
In some instances, the application programs 133 may be supplied to the user encoded on one or more CD-ROMs 125 and read via the corresponding drive 112, or alternatively may be read by the user from the networks 120 or 122. Still further, the software can also be loaded into the computer system 100 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 100 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 101. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 101 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs 133 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 114. Through manipulation of typically the keyboard 102 and the mouse 103, a user of the computer system 100 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 117 and user voice commands input via the microphone 180.
When the computer module 101 is initially powered up, a power-on self-test (POST) program 150 executes. The POST program 150 is typically stored in a ROM 149 of the semiconductor memory 106 of
The operating system 153 manages the memory 134 (109, 106) to ensure that each process or application running on the computer module 101 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 100 of
As shown in
The application program 133 includes a sequence of instructions 131 that may include conditional branch and loop instructions. The program 133 may also include data 132 which is used in execution of the program 133. The instructions 131 and the data 132 are stored in memory locations 128, 129, 130 and 135, 136, 137, respectively. Depending upon the relative size of the instructions 131 and the memory locations 128-130, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 130. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 128 and 129.
In general, the processor 105 is given a set of instructions which are executed therein. The processor 105 waits for a subsequent input, to which the processor 105 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 102, 103, data received from an external source across one of the networks 120, 102, data retrieved from one of the storage devices 106, 109 or data retrieved from a storage medium 125 inserted into the corresponding reader 112, all depicted in
The arrangements described use input variables 154, which are stored in the memory 134 in corresponding memory locations 155, 156, 157. The arrangements described produce output variables 161, which are stored in the memory 134 in corresponding memory locations 162, 163, 164. Intermediate variables 158 may be stored in memory locations 159, 160, 166 and 167.
Referring to the processor 105 of
a fetch operation, which fetches or reads an instruction 131 from a memory location 128, 129, 130;
a decode operation in which the control unit 139 determines which instruction has been fetched; and
an execute operation in which the control unit 139 and/or the ALU 140 execute the instruction.
Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 139 stores or writes a value to a memory location 132.
Each step or sub-process in the processes of
The method of selecting a camera may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of
In the arrangement described in relation to
This method 200 commences at an obtaining step 210. In execution of the obtaining step 210 the gaze location module 191, being part of the point of interest prediction architecture 190, obtains data from the video based gaze tracking camera 127.
There are a number of methods of video based eye tracking. One implementation shines an LED light into an eye of the commentator and measures a positional relationship between the pupil and a corneal reflection of the LED. An alternative implementation measures a positional relationship between reference points such as the corner of the eye with the moving iris. The measured positional relationships describe characteristics of the viewer's eye 185 such as saccade directions (320) and fixation locations (310), as shown in the scene 300.
The method 200 progresses under execution of the processor 105 from the obtaining step 210 to a determining step 220. In execution of the determining step 220 the method 200 operates to determines a location or visual fixation point on which the viewer's eye 185 is fixating. The visual fixation point is identified by comparing eye gaze direction data from the gaze location module 191 with a field of view of the camera 187 filming video data of the target objects 170 in the target location 165. In the arrangements described the eye gaze tracking camera 127 and the camera 187 filming the target location 165 are part of a head mounted display which directly maps the camera 187 field of view with the gaze tracking data collected by the gaze tracking camera 187. There are existing head mounted display systems such as Tobii Pro Glasses 2, having both a forward facing camera, relating to the camera 187, and a backward eye gaze tracking device, relating to the camera 127. In some existing systems such as Tobii Pro Glasses 2, the forward and backward facing cameras are already calibrated to map eye gaze tracking data such as fixations 310 and saccades 320 captured through the backward facing camera 127 on to the image plane of the forward facing camera 187. Although existing head mounted display systems are currently used for gaze based research, similar systems are available in consumer head mounted displays where gaze is used for pointing and selecting objects in the real world. Gaze tracking using available systems may be used to practise some of the arrangements described. The consumer head mounted displays are also used to augment virtual data onto the viewer's scene.
Other methods of identifying a user's fixation location (visual fixation point) may be used, for example, when the eye gaze tracking camera 127 is mounted on furniture or a computer in front of the viewer's eye 185. In arrangements where a head mounted display is not used the eye gaze tracking device 127 and the camera 187 filming the target location (which could be anywhere in the stadium) need to be calibrated so that the viewer's fixation location (e.g., 310 of
The method 200 progresses under execution of the processor 105 from the determining step 220 to an detecting step 230. In execution of the detecting step 230, predictive saccades of the viewer are detected and identified. Identifying the predictive saccades is achieved when the saccade recognition module 192 receives data from the gaze location module 191 and identifies the saccades 320, being eye movements, as distinct from the fixations 310, being moments when the eye is still. The saccades are detected via the eye gaze data collected by the eye gaze detection camera 127. In one implementation, the saccades are determined at the step 230 by measuring the angular eye position with respect to time. Some known methods can measure saccades to an angular resolution of 0.1 degrees. A saccade profile can be determined by measuring saccades over a period of time. For example, a saccade profile can be approximated as a gamma function. Taking first derivative of the gamma function yields a velocity profile of the saccade. The predictive saccade detection module 193 then executes to identify predictive saccades using the velocity profiles. Predictive saccades are known to be ten to twenty percent slower than other saccades. In identifying the predictive saccades, the predictive saccade detection module 193 continuously monitors the saccade velocity profile and determines if there a 10-20% drop in saccade velocity (which indicates that the saccade is predictive).
In one arrangement, the predictive saccades identified at the step 230 can be further refined or filtered by noting that the velocity profile of predicative saccades are more skewed in comparison with the velocity profile of other types of other, more symmetric, saccades.
In another arrangement, the step 230 could be further refined by comparing saccade trajectories with a target object.
Characteristics of the identified predictive saccades 450 are identified in execution of the identify characteristics of predictive saccades step 240. In execution of the step 240 the predictive saccade detection module 193 identifies the direction of the predictive saccades 450.
The method 200 progresses under execution of the processor 105 from the identifying step 240 to an identifying step 250. In execution of the identify points of interest step 250 the point of interest module 194 receives the direction of the predictive saccades 450 from the predictive saccade detection module 193 and determines a resultant direction of the predictive saccades 450. The resultant direction is used to determine a future point of interest axis 430. In one implementation, the point of interest module 194 determines an average direction of the saccades. The future point of interest axis 430 indicates a direction in which the viewer's predictive saccades 450 infer future points of interest will occur and represents trajectory data used for determining regions of future interest. The predictive saccades 450 are effectively used to determine points or regions of future interest of the viewer based upon the determined direction of the saccades.
Future points of interest can relate to a location, a player or another type of object on the field. For example future points of interest in
Future points of interest are determined by the point of interest module 194 according to the circumstance of the scene, for example the sport being viewed or a number of people in the scene for surveillance uses. In the arrangements described, a game of soccer is being broadcast, as reflected in
A scene 900 is shown in
Referring back to
In the arrangements described, the prioritisation at step 260 is determined according to game plays exhibited or detected during play the current game. For example, referring to
In other arrangements, predetermined information such as one or more standard game plays, fitness of a team playing the game or the characteristics of opponents marking team members, are used determine prioritisation of team member sets or prioritisation of future points of interest. A standard game play may for example relate to known play sequences used by or associated with a particular team or players of the game, or plays sequences associated with a particular sport, for example a penalty shoot-out in soccer. Similarly, in surveillance or theatre applications, prioritising future points of interest may respectively depend on previous actions of persons of interest, or of actors.
Referring back to
Selecting a camera at step 270 in some arrangements comprises selecting one camera of a multi-camera system (e.g., where the camera 187 is a multi-camera system). In other arrangements, selecting the camera comprises controlling selection of an parameter of the camera, such as an angle of the camera, such that the camera can capture video data of the highest priority region of future interest, or camera one or more camera settings such as light settings, cropping and/or focus suitable for capturing image data of the highest priority region of future interest.
In another arrangement, the camera selection could be further refined by prioritising cameras according to the time required to re-frame the shot, that is time to change framing settings. Framing settings include one or more of pan, tilt, zoom, move (if on an automated dolly) and depth of field. The benefit of arrangements prioritising future points of interest according to time to re-frame the shot is time saved. If a sport is fast paced any camera that is not able to re-frame fast enough to capture the onset of an action sequence cannot be used.
The method 200 progresses under execution of the processor 105 from the select step 270 to a re-frame step 280. In execution of the reframe step 280 the direction of the camera/s selected in the select camera step 270 are modified so that the future point of interest can be framed. If there are multiple cameras, one or more of the camera directions are modified to generate close up shots, while the one or more other selected cameras are used to generate wider shots. The method 200 accordingly has prepared the selected camera or cameras pre-emptively so to be ready for a predicted future action. The selected camera or cameras capture video data of the selected region of the scene. The video data is captured using any settings or angle selected at step 270.
The video data recorded using the selected camera (and, if appropriate selected camera settings) is sent to the director and is supplementary to pre-set camera feeds the director already has available for broadcast. If the action eventuates as predicted by the method 200 based on the viewer's predictive saccades 450, the director has the selected camera positioned to best capture that action, and can use the captured video data for a broadcast.
In another implementation, at step 270 instead of selecting an existing physical camera, the method 200 generates a virtual camera view to best capture the action at the future point of interest or around a future object of interest. Image data captured using one or more fields of view of one or more selected physical cameras can be used to generate video data of a portion of the scene, referred to as a virtual camera view. A virtual camera view relates to a view of a portion of the scene different to that captured according to the field of view of one of the physical cameras alone.
In some arrangements, a virtual camera system is used to capture a real scene (e.g. sport game or theatre) typically for broadcast. The virtual camera system is more versatile than a standard set of physical cameras because the virtual camera system can be used to general a synthetic view of the scene. The computer system 101 modifies the captured footage to create a synthetic view. Creating a synthetic view can be done, for example, by stitching together overlapping fields of view from more than one camera, cropping the field of view of a single camera, adding animations or annotations to the footage and/or creating a 3D model from the captured footage.
In addition to modifying pan, tilt and zoom of physical cameras, generation of a virtual camera view may relate to modifying settings such depth of field, colour, ISO settings and other image capture settings of the physical cameras. For example, if the virtual camera view relates to moving field of view of a physical camera from a sunlit area of a scene to a shaded area of the scene, ISO settings of the physical camera may be modified.
In
In another arrangement the virtual camera views 510, 520, 530, 540 and 550 are prioritised according to two factors, being proximity of each virtual camera view to the future point of interest and/or camera angle of each virtual camera view relative to the future point of interest. The time required to generate virtual camera views is non-trivial and the time available during play of a game is typically relatively short. Accordingly, it is useful to prioritise the generation of virtual camera views.
In a first step the virtual camera views 510 to 550 are prioritised according to proximity to the future point of interest 410. The closer virtual camera views, the views 530, 540 and 550 are assigned higher priority over other camera views as the camera views 530, 540 and 550 are harder to replicate with physical cameras due to being effectively on the field. In a second step, the virtual camera view positioned to capture a front of the approaching future point of interest player is given a higher priority over other camera views. The prioritisation of the virtual camera views 510, 520, 530, 540 and 550 determines an order in which footage captured from each virtual camera view is presented to the director. Given the director's limited ability to take in all camera views and given that virtual camera views may be supplementary to existing pre-set physical camera feeds, the virtual camera views 510, 520, 530, 540 and 550 with the highest priority are elevated to the top of any camera feed list and are more likely to be seen and used by the director.
Sport broadcasters are increasingly presenting graphics for TV audiences which appear to be integrated with the sport field and player actions. For example, broadcast footage may be augmented with graphical content that appears to be printed on a surface of a field on which a game is played. Sport viewers wearing augmented reality head mounted displays at the stadium also benefit from augmented graphical content, for example graphical content providing information about the game. However, time is required to generate graphics and apply the graphics to live sport broadcasts. The arrangements described predict or determine a next future point of interest of viewers of the game. The arrangements described therefore allow a graphics system to pre-emptively generate graphics based on the next future point of interest and present the generated graphics with reduced lag or without lag. The graphics system may for example be implemented as a module of the application 133 and controlled under execution of the processor 105.
Referring to
In another arrangement, team and opposition players likely to participate in the next future point of interest are identified, in accordance with step 250. Graphics are then generated for each of future points of interest players. The generated graphic indicate whether the experts watching the game think the corresponding player will participate in the next action event. The insights are derived from expert viewers watching the same match. In this way novice viewers are given further game insights from expert viewers' predictive saccades.
The arrangements described are applicable to the computer and data processing industries and particularly for the video broadcast industries. For example, as referenced above, the arrangements described are suitable for capturing relevant footage for a sports broadcast by predicting where camera footage will be most relevant and providing the footage to the director for selection. The arrangements described are also suitable for capturing video data in other broadcast industries. For example, the arrangements described are suitable for security industries for capturing video data of a suspected person of interest from the point of view of a security person watching a crowd. Alternatively, the arrangements described may be suitable for capturing video data of a theatre setting.
Using predictive saccades to determine a future point of interest and direct select a camera or camera setting accordingly provides an effect in decreased lag in providing video data capturing live action of a scene. Using predictive saccades as described above can also provide an effect of capturing video data of scenes appropriate to live action, and/or broaden scope of live action from predetermined camera positions. Determining a suitable camera or cameras or suitable camera settings or position in advance of an event actually occurring can also reduce cognitive and physical effort of camera operators, and/or reduce difficulties associated with manually adjusting light or camera settings in capturing live footage. In providing video data from a scene such as a game based upon future point of interest of a viewer, final production of live broadcasts can be made more efficient.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive For example, one or more of the features of the various arrangements described above may be combined.