Host CPU independent video processing unit

FIELD OF THE INVENTION

This invention relates to a video display processor for desktop computers processing multi-media signals.

BACKGROUND TO THE INVENTION

Computer multi-media signal processing involves combining and manipulating graphical and video images, the video images involving high data rates, particularly for moving images. Such systems are typically required to convert signals of the form received from a TV station, usually in a YVU or YCrCb color model, to RBG, the form usually used by a computer display, or vice versa, while adjusting brightness and correcting for color. They are required to perform blends, and scale the signals (stretch and/or contract) for the display, so that for example different sized video images can be superimposed in separate different sized windows. The typical host CPU of a computer system is hard-pressed to service these requirements in real time, and at the same time maintain service to other computer peripherals and devices.

For example, graphical stretches and reductions previously tended to be software implementations, and were application specific. However these are unsuitable for stretching or reducing live video images, due to the intensity of use of the computer CPU, creating a large overhead. In order to minimize CPU overhead, hardware scalers were produced. However these were typically used in digital to analog converters which translate the output of the graphics or display circuit immediately previous to the display. These scalers have only been able to scale upwards to multiples of the image source size. Further, since the output of the scaler is an analog signal suitable for the display, the image signals could only be displayed, and could not be read back digitally or operated on again.

Display processors for desktop computers were in the past able to superimpose one object upon another, for example the display of a cursor over background graphics. Such a processor typically incorporates a destination register, which stores pixel data relating to pixels to be displayed. Such data is often referred to as destination data. Other pixel data, to be superimposed (i.e. mixed) over the destination data, is stored in a source register and is referred to source data. A computer program controls software comparisons of the pixel values, and selects for display the pixel value having either a component or a value which is in excess of the corresponding value of the destination pixel.

While such an operation has been successful for graphical data, even graphical data with a varying component, such as data which varies due to a moving cursor, it has not been very successful to provide a rich array of capabilities when video data is to be mixed with video data or with graphics data. Yet these capabilities have become increasingly important as multimedia demands are made on the desktop computer. One of the primary reasons for the inability to provide such capabilities is that with software comparisons, excessive interrupt and processing demands are made on the central processor, which inhibits it from servicing the remainder of the computer in a timely fashion.

A description of software processing of pixel data, including mixing of graphical data, may be found in the text “Graphics Programming For the 8514/A”, by Jake Richter and Bud Smith, M&T Publishing, Inc., Redwood City, Calif., copyright 1990, and which is incorporated herein by reference.

SUMMARY OF THE INVENTION

In order to solve this problem, a separate graphics processor system has been designed, containing a video sub-system. Except for the loading of a video memory which interfaces the video subsystem, the present invention operates independently of the host CPU, thus greatly relieving it of major operational overhead. It can thus service the remainder of the system, increasing its response time. Yet full motion processed multi-media signals can be provided on a computer using the present video subsystem invention.

In accordance with the present invention a video display processor is comprised of apparatus for receiving digital input signal components of a signal to be displayed, apparatus for converting the components to a desired format, apparatus for scaling and blending the signals in the desired format, apparatus for outputting the scaled and blended signals for display or further processing and an arbiter and local timing apparatus for controlling all of the apparatus substantially independently of a host CPU.

BRIEF INTRODUCTION TO THE DRAWINGS

A better understanding of the invention will be obtained by a consideration of the detailed description below of a preferred embodiment, in conjunction with the following drawings, in which:

FIG. 1

is a block diagram of a preferred embodiment of the invention.

FIG. 2

illustrates a first form of signal packet carried by a control bus used in the preferred embodiment of the invention.

FIG. 3

illustrates a second form of signal packet.

FIG. 4

illustrates a third form of signal packet.

FIGS. 5 and 6

placed together illustrate a detailed block diagram of the invention.

FIG. 7

illustrates how

FIGS. 5 and 6

should be placed together, and

FIG. 8

illustrates a computer display result from use of the invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1

illustrates the invention in basic block form. Digital signals which conform to a particular color model, such as RGB or YVU are stored in video memory

1

, and are applied via high speed bus

3

to a line buffer

5

. Signals from line buffer

5

are applied to a data translator circuit

7

, which performs the functions to be described below. The output signal from the data translator circuit

7

, referred to herein as a processed source signal, is applied to a multiplexer

9

. Also applied to multiplexer

9

is a destination signal, read from the memory

1

by a destination signal read circuit

11

. The multiplexer

9

multiplexes the processed source and destination signals, and produces an output signal which is stored in memory

1

for further processing, or for translation via digital to analog converter

13

and for display on display

15

. A destination read interface circuit

11

(comprising e.g. a FIFO and a data unpacker) reads destination data from memory

1

and provides it to multiplexer

9

.

Timing and control of the parts of the data translator

7

, destination read circuit and multiplexer, as well as the reading of the memory

1

to read source data, for providing the signals to buffer

5

is provided by arbiter and host CPU interface

17

. These elements interface a main computer bus

19

, such as an ISA bus, to which the main CPU

21

of the computer is connected. The interface connects to the arbiter, which receives signals from and sends signals to the CPU

21

. Arbiter signals are generated in arbiter

17

for each of the units

7

,

9

and

11

to control their operation, and causes an address generator

23

to generate appropriate addresses for each of the units

7

,

9

and

11

to complete control signals for unit

7

,

9

and

11

.

Further, CPU

21

establishes virtual connections between the units

7

,

9

and

11

by sending signals via host interface

17

to memory

1

to set up a parameter list which defines the required operation (such as a color-space transformation, or a scaling of an image), and assigns specific trigger codes to that parameter list. There may be any number of virtual connections for any given process. Once all the virtual connections have been set up, the system operates independently of the CPU

21

, thus relieving it from the video control, and allowing it to deal with other computer processes.

The system described herein triggers operation of the various units by sending a specific trigger code assigned to that operation, via a control bus

25

. When any unit receives a trigger code, it locates the parameter list assigned to that specific message, and then performs the operation as defined in that parameter list. All this is performed independently of the computer CPU

21

.

Parameter lists may be linked together, so that one trigger code can trigger a number of operations. Furthermore, as parameter lists exist in shared memory

1

and their structure is defined to all components, parameters can be altered concurrently with a process.

Preferably the control bus uses a serial bus protocol to facilitate event synchronization between components in a multi-media computing environment. Each device on the bus has an opportunity to transmit a preferably

16

bit message to the other devices on the bus.

The bus requires only two pins on each device to implement: clock and data. The arbiter provides a stable clock and polls for requests from all devices connected to the bus. Polling for requests is accomplished by transmitting a series of “invitations”; one for each of the devices (addressed by ID number) on the bus

25

. While only one arbiter is required, any of the devices could be made capable of performing the function, by using appropriate circuitry.

The arbiter constantly cycles through a series of invitations to allow each device on the bus

25

to use a brief time slot for signalling other components in the system. An invitation begins with a start bit and is followed by a device ID signal—an “invitation to send”. All devices receive the ID signal and decode its value. The device that matches the invitation ID can then choose to accept the invitation by asserting an invitation acknowledge bit into the bit stream. Following the invitation acknowledge bit, the selected device then broadcasts its signal event which represents some form of status or message. The significance of these messages is decoded by all devices on the bus

25

and

18

acted upon by the appropriate target device(s). The arbiter cycles through all of the device IDs that are connected so that each device has an opportunity to broadcast a message. Messages or “signal events” are preferably 16 bit fields containing a 4 bit function code and a 12 bit data field.

A typical data packet, as shown in

FIG. 2

, begins when the arbiter transmits an invitation composed of a start bit (bit 0) followed by a 3 bit invitation ID (bits

1

-

3

). It then should release the bus on cycle

4

leaving the bus in the de-asserted state. The device with matching ID then should take over the bus and assert an invitation acknowledge (bit

5

) to indicate that it will commence transmission of the signal event. The sequence is depicted in the time bar chart below the packet example.

With respect to

FIG. 3

, in some cases a signal event from the invited source requires an acknowledgment from the destination or target of the signal event. In this case the service acknowledge signal should be driven from the target at bit location

22

. Bit

21

is then used as a switchover time duration for the source of the signal event to release the bus to the target. Acknowledgment of a service request is required since devices may have very limited (or no) queuing capabilities. A true acknowledge (‘1’) then indicates that the target of the service request either has room in its request queue or it isn't busy performing a service and can therefore accept another request. When a request isn't acknowledged, the requester can retry each time it is invited to use the bus until the request is acknowledged.

Most of the time the bus

25

will contain only circulating invitations from the arbiter with no device actually accepting the invitations. In these cases the Signal Event portion of the packet is skipped. It is the responsibility of each device on the bus to monitor the invitation acknowledge of each invitation to determine when to begin looking for the next start bit. The abbreviated packet is depicted in FIG.

4

.

It is not necessary for the arbiter to circulate ID codes that are never utilized. Consequently the arbiter could be programmable to allow some ID codes to be excluded. However, this will not have a large impact on worst case latency. For simplicity, it is sufficient to always cycle through each ID code from 0 to 7.

The problem of loss of synchronization can be dealt with by the following. If, for example, a device falsely detects a start bit then it must be able to re-sync within a brief period of time. For this purpose each bus device should monitor the bus to detect 10 consecutive low bits (called a “break”). Once a break is detected, each device knows that the next ‘1’ that is seen is a start bit. It is for this reason that bit

14

of a data packet is preferably always ‘1’ to ensure that the data packet can never contain 10 consecutive zeroes. The arbiter must insert a break after each set of 8 invitations to cause a re-synchronization.

A full data packet consists of an invitation (start bit followed by an invitation ID), an invitation acknowledge followed by a signal event. A signal event consists of a 4 bit function code followed by a 12 bit data field. The data field can also include an acknowledgment from the start (destination) of the signal event. The following table contains some of the function code definitions that could be used:

Function Code (4 Bits)

Data Field (12 Bits)

Audio Record Sync

12 bit Time stamp

Audio Playback Sync

12 bit Time stamp

Graphics scan line count

12 bit Line number

Video Scan line count

12 bit Line number

Service Request (0 × E)

10 bit service number

1 switch over bit (ignore

data)

1 bit empty or ack from

target device if possible

Service complete (0 × F)

10 bit service number

(always paired with

1 bit (not used)

Service request)

1 bit service successful

A service is a set of operations requested by one device (the source) and performed by another (the target).

A service request is sent by the source device and consists of a 10 bit service number indicating one of 1024 services to be performed, and a 1 bit acknowledge from the target device indicating that the service request was received. It is important that the host CPU

21

allocate unique service numbers to each target so that two request receivers will not accept the same service number. A service complete message should be sent by the receiver of a service request to indicate that it has finished processing the request. It should also return a 1 bit flag indicating that the service was performed successfully or unsuccessfully. The service number it returns should be the same as the service number that it received and acknowledged in the service request. If a service request is received and accepted by a device then it should return a completion message at some later time.

A preferred embodiment of the invention is shown in detailed block diagram as illustrated in

FIGS. 5 and 6

, which should be assembled together as illustrated in FIG.

7

. It should be understood that the various signal variables which will be shown as inputs to the various circuits are obtained from data decoded by bus interface circuits in each of the devices connected to the bus, which recognize the ID signals referred to above, receive packets designated for the circuits, and obtain the variable signals as data in the packets. The interface circuits would be known to a person skilled in the art, and thus will not be described; their designs do not form part of this invention.

Video signals in e.g. RGB or YCrCb models are received or are transmitted (by an I/O interface to a high speed bus connected to memory

1

, not shown) to scaler

531

.

Scaler circuit

531

receives source signals pixel data via source bus

532

from the memory bus. A destination bus

533

carries an output signal from the scaler to the color conversion unit.

The structure is comprised of an ALU

539

for performing a vertical blend function and an ALU

541

for performing a horizontal blend function. ALU

539

receives the vertical blending coefficients a

V

and b

V

and the vertical accumulate A

ccv

flag.

Similarly, the ALU

541

receives from screen memory, via the data portion of the packet described earlier, the horizontal blend coefficients a

H

and b

H

and the accumulate A

ccH

flag. The A

cc

bits determine whether R should be added or zero should be added. A

cc

is a flag specified in the coefficient list.

ALU

539

receives adjacent pixel data relating to the first or input trajectory on input ports Q and P, the data for the Q port being received via line buffer

543

from the data source, which can be the screen memory, via source bus

532

. The output of line buffer

543

is connected to the input of line buffer

545

via multiplexer

562

, the output of line buffer

545

being connected to the P port of ALU

539

.

The output of ALU

539

is applied to the input of pixel latch

560

. The Q pixel data is applied from the output of ALU

539

to the Q input port of ALU

541

and the P pixel data is applied from the output of pixel latch

560

to the P input port of ALU

541

. The P pixel data is also applied to the other input of multiplexer

562

.

The output of ALU

541

is applied to the input of pixel accumulator

549

, which provides an output signal on bus

533

for application to a color conversion unit.

The line buffers are ideally the maximum source line size in length. The accumulator value A

ccV

and A

ccH

applied to ALU

539

and ALU

541

respectively determine whether R should be forced to zero or should equal the value in the accumulator.

In operation, a first line of data from a source trajectory is read into line buffer

543

. The data of line buffer

543

is transferred to line buffer

545

, while a second line of data is transferred from the source trajectory to the line buffer

543

. Thus it may be seen that the data at the P and Q ports of ALU

539

represent pixels of two successive vertical lines.

Thus the output of the vertical blend ALU

549

is applied directly to the Q port of the horizontal blend ALU

541

, and the output of vertical blend ALU

539

is also applied through a pixel latch

560

to the P port of ALU

541

. The output of line buffer

543

is connected to the input of a multiplexer

562

; the output of pixel latch

560

is connected to another input of multiplexer

562

. The A

ccv

input is connected to the control input of multiplexer

562

. The output of multiplexer

562

is connected to the input of line buffer

545

.

The vertical blend ALU

539

can only accumulate into the line buffer

545

. The blend equation becomes

\frac{a_{v} P + B_{v} Q}{16} \to p

wherein the result of the equation is assigned back to P if a vertical accumulate is desired.

For the rest of each horizontal line the data relating to two consecutive horizontal pixels are applied on input lines Q and P to ALU

541

and are blended in accordance with the equation

\frac{a_{E} P + b_{E} Q}{16} + R \to R

The result of this equation is output from ALU

541

and is stored in pixel accumulator

549

.

The pixel data is transferred from line buffer

543

into line buffer

545

. The source trajectory is read and transferred to line buffer

543

. The steps described above for the vertical blending function is repeated for the rest of the image.

Coefficient generation in the vertical direction should be modified accordingly. Line buffer

545

is otherwise loaded whereby line buffer

543

data is transferred to it only when the source Y increment bit is set.

Smaller line buffer sizes, i.e. only 32 pixels strains the maximum source width, but has no effect on source height. Thus if the source width is greater than 32 pixels, the operation can be sub-divided into strips of less than 32 pixels wide. Since this may affect blending, the boundaries of these divisions should only occur after the destination has been written out (i.e. a horizontal destination increment). With a maximum stretch/reduce ratio of 16:1, the boundary thus lands between 16 and 32 pixels in the X direction. The coefficients at the boundary conditions should be modified accordingly.

In a successful prototype of the invention 32 pixel line buffers and a 128 element X coefficient cache were used. Y coefficients are not cached and were read on-the-fly. The embodiment is preferably pipelined, i.e. each block may proceed as soon as sufficient data is available.

It should be noted that the source trajectory should only increment with a source increment that is set in a coefficient list in the screen memory or equivalent. If the source is incremented in the X direction and not in the Y direction and the end of the source line is reached, the source pointer is preferred to be reset to the beginning of the current line. If the source is incrementing in both directions and the end of the source line is reached, it is preferred that the source pointer should be set to the beginning of the next line.

The destination trajectory should be incremented in a similar fashion as the source trajectory except that the destination increment bits of the coefficient list should be used.

Line buffer pointers should be incremented when the source increment bit is set in the X direction. They should be reset to zero when the end of the source line is reached. Data should not be written to line buffer

543

nor transferred to line buffer

545

if the source increment bit is not set in the Y direction. Destination data should only be written out from the pixel accumulator if both X and Y destination increments bits are set.

The X coefficient pointer in the screen memory should be incremented for each horizontal pixel operation, and the Y coefficient pointer should be incremented for each line operation.

The design described above which performs the vertical pixel blending prior to the horizontal pixel blending is arbitrary, and may be reversed in which horizontal blending is performed prior to vertical blending. It should be noted that blending in only one direction can be implemented, whereby one of the ALUs is provided with coefficients which provide unitary transformation, i.e. neither expansion nor contraction of the image.

In a successful prototype of the invention

532

pixel line buffers and a 128 element X coefficient cache were used. Y coefficients are not cached and were read on-the-fly.

The output of pixel accumulator

549

is applied via bus

533

to the input of a color space converter. This signal is typically comprised of three input signal components AinBinCin. The input signals are applied to clippers

417

,

418

and

419

respectively.

Also applied to each of the clippers

417

,

418

and

419

are ceiling and floor limit data signals or values which establish ranges within which the input signal components should be contained.

When the input signals exceed, either positively or negatively, the limits designated by the ceiling or floor values, the respective signal component is saturated (clipped) to the ceiling or floor (upward or downward limit) respectively.

The output signals of the clippers are applied to respective inputs of a matrix multiplier

421

, in the preferred embodiment a [3×3]×[3×1] matrix multiplier. Also input to the multiplier is an array

423

of parameter data which forms a color transformation matrix. The transformation performed in the matrix multiplier will be described below.

The three outputs of the matrix multiplier

421

are applied to three inputs of a vector adder

425

. A 3×1 array

427

of parameters is input to vector adder

425

, which performs the function [3×1]+[3×1], as will be described below. The parameters 0x in the array

427

constitute offset vectors.

The three outputs of vector adder

425

are applied to respectively inputs of output clippers

429

,

430

and

431

to which ceiling and floor limit data signals are applied. The output clippers operate similarly to the input clippers

417

,

418

and

419

, ensuring that the output signal components are contained within the range defined by the output ceiling and floor limits, and if the output signal components exceed those limits, they are clipped (saturated) to the ceiling and floor levels. The resulting output signals from clippers

429

,

430

and

431

, designated by A

out

, B

out

, and C

out

constitute the three components of the output signal in either RGB or YCrCb format.

In a preferred embodiment, each of the R, G and B signals are equal or greater to zero and equal or smaller than 255 units, the Y component is equal to or larger than 16 and equal or smaller than 235, and the Cr and Cb components are equal to or larger than 16, or equal to or smaller than 240.

To convert from YCrCb to RGB, the matrix multiplier

21

and vector adder

425

should perform the following transformation:

R=1.1636*(Y−16)+1.6029*(Cr−128)

G=1.1636*(Y−16)−0.8165(Cr−128)−0.3935(Cb−128)

B=1.1636*(Y−16)+2.0261(Cb−128)

To convert from RGB to YCrCb format, the multiplier and adder should perform the following transformations:

Y=+0.2570R+0.5045G+0.0980B+16

Cr=0.4373R−0.3662G−0.0711B+128

Cb=−0.1476R−0.2897G+0.4373B+128

For brightness, contrast, color saturation and hue control for a YCrCb signal, the input signal is YCrCb and the output is YCrCb, and the following transformations should be performed in the matrix multiplier and adder:

Y=Yin*Contrast+Brightness

Cr=color_sat*(cos(hue)*(Crin−128)+sin(hue)*(Cb_in−128))+128

Cb=color_sat*(−sin(hue)*(Cr_in128)+cos(hue)*(Cb_in−128))+128

The conversion from a YCrCb to a RGB signal can be expressed in the following matrix form.

\begin{matrix} R \\ G = \\ B \end{matrix} [\begin{matrix} 1.1636 & 1.6029 & 0.0000 \\ 1.1636 & - 0.8165 & - 0.3939 \\ 1.1636 & 0.0000 & 2.0261 \end{matrix}] \begin{matrix} Y \\ Cr + \\ Cb \end{matrix} [\begin{matrix} - 223.8 \\ 136.3 \\ - 278.0 \end{matrix}]

or more precisely

RGB=W

y→r

YCrCb+O

y→r

where W is the color transformation matrix and O is the offset vector.

The matrix multiplication step is performed in the matrix multiplier

421

and the addition step is performed in the vector adder

425

. The RGB elements constitute the values of the signal components in the input signal, and the numerical parameters in the b

3

×

3

matrix constitute the W

x

transformation parameters, while the values in the 3×1 matrix constitute the offset vector O.

For conversion from an RGB to YCrCb format, the transformation that should be performed in the matrix multiplier and vector added is

[\begin{matrix} Y \\ Cr \\ Cb \end{matrix}] = [\begin{matrix} 0.2570 & 0.5045 & 0.0980 \\ 0.4373 & - 0.3662 & - 0.0711 \\ - 0.1476 & _{τ} 0.2897 & 0.4373 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}] + [\begin{matrix} 16 \\ 128 \\ 128 \end{matrix}]

or more concisely

YCrCb=W

r→y

RGB+O

r→y

For brightness, contrast, color saturation and hue control in a YCrCb type signal, the input signal is YCrCb and the output signal is YCrCb. The matrix multiplier and vector adder should perform the following transformation.

Y_{out}

\begin{matrix} {Cr}_{out} = \\ {Cr}_{out} = [\begin{matrix} Contrast & 0.0000 & 0.0000 \\ 0.0000 & color_sat * \cos (hue) & color_sat * \sin (hue) \\ 0.0000 & - color_sat * \sin (hue) & - color_sat * \cos (hue) \end{matrix}] [\begin{matrix} Y_{in} \\ {Cr}_{in} \\ {Cb}_{in} \end{matrix}] + \\ [\begin{matrix} Brightness \\ 128 * (1 - color_sat * (\cos (hue) + in (hue))) \\ 128 * (1 - color_sat * (\cos (hue) - \sin (hue))) \end{matrix}] \end{matrix}

In summary, for brightness, contrast, color saturation and hue control when converting from a YCrCb format to RGB, the transformation can be reduced to

RGB=W

y>r

*(W

y>y

*YCrCb+O

y>y

)+O

y>r

For brightness, contrast, color saturation and hue control when converting from an RGB signal to a YCrCb type signal, the following reduced transformation is performed.

YCrCb=W

y>y

*(W

r>y

*RGB+O

r>y

)+O

y>y

For performing brightness, contrast, color saturation and hue control in an RGB signal, both the input and output signals are in RGB format. The transformation performed in the multiplier and vector adder in reduced form is

RGB

out

=W

y>r

*(W

y>y*(Wr>y

*RGB

in

+O

r>y

)+O

y>y

)+O

y>r

As noted above, the clippers

417

to

419

and

429

-

431

ensure that all data passing through them must be within the ranges specified. However if the input data is already between the specified ranges, the clippers may be deleted.

The three outputs of the matrix multiplier are respectively:

Aino=Ain*W

11

+Bin*W

21

+Cin*W

31

Bino=Ain*W

12

+Bin*W

22

+Cin*W

32

Cino=Ain*W

13

+Bin*W

23

+Cin*W

33

The three outputs of the vector adder are

Aouto=Aouti+O

1

Bouto=Bouti+O

2

Couto=Couti+O

3

All arithmetic is preferably performed on 10 bit wide signed integer data (1 bit sign, 1 bit integer and 8 bits fractional). This should be used under normal circumstances. However if over saturation, over contrast, or over brightness is desired, more integer bits may be rquired, increasing the number of total data bits and widening all other data paths. Floor and ceiling parameters on incoming and outgoing data channels are preferably 8 bits wide, and all other data paths are preferably 10 bits wide.

Preferred integer parameter sets for each respective operation are listed below. The dynamic range of Cr and Cb have been adjusted slightly such that all coefficients fall in the range [−512,+512). For YCrCb to RGB conversion:

W_{yop} = [\begin{matrix} 298 / 256 & 404 / 256 & 0 \\ 298 / 256 & - 206 / 256 & - 99 / 256 \\ 298 / 256 & 0 & 511 / 256 \end{matrix}]

O_{yop} = [\begin{matrix} - 220 \\ + 136 \\ - 278 \end{matrix}]

The floor and ceiling parameters for the clipping registers preferably are:

A_in_ceil

234

A_in_floor

16

B_in_ceil

240

B_in_floor

16

C_in_ceil

240

C_in_floor

16

A_out_ceil

255

A_out_floor

0

B_out_ceil

255

B_out_floor

0

C_out_ceil

255

C_out_floor

0

For RGB to YCrCb conversion:

W_{poy} = [\begin{matrix} 66 / 256 & 129 / 256 & 25 / 256 \\ 114 / 256 & - 95 / 256 & - 18 / 256 \\ - 38 / 256 & - 75 / 256 & 114 / 256 \end{matrix}]

O_{yop} = [\begin{matrix} 16 \\ 128 \\ 128 \end{matrix}]

The floor and ceiling parameters for the clipping registers preferably are:

A_in_ceil

255

A_in_floor

0

B_in_ceil

255

B_in_floor

0

C_in_ceil

255

C_in_floor

0

A_out_ceil

235

A_out_floor

16

B_out_ceil

240

B_out_floor

16

C_out_ceil

240

C_out_floor

16

For brightness, contrast, color saturation and hue control of YCrCb=>YCrCb:

W_{yop} = &LeftBracketingBar; \begin{matrix} Contrast & 0 & 0 \\ 0 & color_sat * \cos (hue) & color_sat * \sin (hue) \\ 0 & - color_sat * \sin (hue) & + color_sat * \cos (hue) \end{matrix} &RightBracketingBar;

O_{yop} = &LeftBracketingBar; \begin{matrix} Brightness \\ 128 * (1 - color_sat (\cos (hue) + \sin (hue))) \\ 128 * (1 - color_sat (\cos (hue) - \sin (hue))) \end{matrix} &RightBracketingBar;

The floor and ceiling parameters for the clipping registers preferably are:

A_in_ceil

235

A_in_floor

16

B_in_ceil

240

B_in_floor

16

C_in_ceil

240

C_in_floor

16

A_out_ceil

235

A_out_floor

16

B_out_ceil

240

B_out_floor

16

C_out_ceil

240

C_out_floor

16

For brightness, contrast, color saturation and hue control of YCrCb=>RGB:

W=W

y>r

*W

y>y

O=W

y>r

*O

y>y

+O

y>r

The clipping registers are set as with straight YCrCB to RGB conversion.

For brightness, contrast, color saturation and hue control of RGB=>YCrCb:

W=W

y>y

*W

r>y

O=W

y>y

*O

r>y

+O

y>y

Clipping registers are set as with straight RGB to YCrCb conversion.

For brightness, contrast, color saturation and hue control in RGB=>RGB:

W=W

y>r

*W

y>y

*W

r>y

O=W

y>r

*(W

y>y

*O

r>y

+O

y>y

)+O

y>r

The floor and ceiling parameters for the clipping registers preferably are:

A_in_ceil

255

A_in_floor

0

B_in_ceil

255

B_in_floor

0

C_in_ceil

255

C_in_floor

0

A_out_ceil

235

A_out_floor

0

B_out_ceil

255

B_out_floor

0

C_out_ceil

255

C_out_floor

0

It is preferred that all matrix multiplications should be performed in floating point and only converted to integer just before loading the coefficients to the hardware color conversion unit. This minimizes transformation error.

It should be noted that the input clipping parameters and output clipping parameters are preferably programmable. Thus any three component number set may be transformed into any other three component set as long as that transformation is linear. In particular, any three component color model may be transformed to any other three component color model as long as that transformation is linear. If the multipliers and data paths were widened, it would be practical to perform other useful transformations, such as xyz coordinate transformation for example.

The output of the color space conversion circuit is input to an output multiplexer

620

. Source data is data relating to a video or graphical signal which is to be mixed with destination pixel data (or in short, simply destination data). Destination data is data already in the memory

1

which is to be displayed, and can result from another source such as a video input, in a manner known in the art.

It is preferred that the source data should be passed through an output masking gate

623

. The output masking gate

623

should be always enabled, although it may be set such that it does not mask anything.

The output multiplexer

620

has a control input

621

to which a keying signal is applied. Thus depending on the value of the keying signal, a pixel of either destination data or source data is provided at the output

622

of the multiplexer

620

. Data at the output

622

is written to the destination memory, which can be a destination register or the memory

1

.

The destination and source data is also provided to inputs of an input multiplexer

624

. A mode signal applied to a control input

625

of multiplexer

624

selects which of the signals, a pixel of either destination or source, will be provided at its output, from which the keying signal, if provided for that pixel, will be derived. The mode signal can be a bit provided to the mixing unit from a control register of the display processor.

Various components of data defining each pixel (7:0, 15:8, 23:16 and/or 31:24) are then individually passed through respective gates

627

,

628

,

629

and

630

, each of which receives 8 mask bits INASK from a control register of the display processor. This provides a means to mask off bits which will not participate in generating the keying signal, and thus to inhibit keying. OMASK and IMASK are preferably 32 bits wide, corresponding to the four

8

bit pixel components that are being operated upon. Since each of the components of data can define a particular characteristic of the pixel, e.g. color, embedded data, exact data, etc., this provides a means to inhibit or enable keying on one of those characteristics, or by using several of the components and masking switches, to inhibit or enable keying based on a range of colors, embedded data, etc.

The outputs of each of the gates

627

,

628

,

629

,

630

is applied to one input of each of pairs of comparators

633

A and

633

B,

634

A and

634

B,

635

A and

635

B, and

636

A and

636

B. Data values A and B are applied via masking gates

638

A and

638

B,

639

A and

639

B,

640

A and

640

B, and

641

A and

641

B respectively to the corresponding respective inputs of the comparators

633

A-

636

B. The same masking bits IMASK that are applied to the gates

627

-

630

are applied to the respective corresponding gates

638

A-

641

B. The data values A and B are static, and are masked by the gates in a similar manner as the destination or source data. Compare function selection signals FNA

1

, FNB

1

; FNA

2

, FNB

2

; . . . —FNB

4

are applied to select the compare function of the corresponding gates

633

A-

636

B.

Each pair of comparators compares each 8 bit pixel component with two values, the respective masked pixel components from value A and from value B. Each component has a separate compare function with each of the two comparison values.

The result of all of the component comparisons with the A value are ANDed together in AND gate

643

, and the result of all of the component comparisons with the B value are ANDed together in AND gate

645

. The outputs of AND gates

643

and

645

are applied to logic circuit

647

. A CSelect bit from a control register of the memory

1

is applied to a control input of logic circuit

647

, to determine whether the results output from AND gates

643

and

645

should be ANDed or ORed together.

The output of logic circuit

647

is the keying signal. It is applied to control input

621

of the output multiplexer, preferably through inverter

649

. A signal ISelect applied from a control register of the memory

1

processor to a control input of inverter

649

determines whether the keying signal should be inverter or not. This provides means to inverse key on the data, e.g. to instantly switch the other of the destination or source data as the keyed data into or around a keying boundary merely by implementing a 1 bit software switch command ISelect.

Thus if the key signal data is FALSE, destination data is output from multiplexer

620

. If the key signal is TRUE, the source data is masked with the output mask

623

and written to the destination.

The state of the mixing unit can be programmed by the following configuration, which can be stored in control or configuration registers:

Register

Number

Name

of Bits

Description

Mode

1

Selects either the source or destination

for comparison.

CSelect

1

Selects AND or OR the results of the A

and B comparisons.

ISelect

1

Sects INVERT or no operation.

ValueA

32

Value A to compare.

ValueB

32

Value B to compare.

IMask

32

Input mask for masking off bits which

will not participate in the comparison.

OMask

32

Output mask for preventing bits from

being overwritten at the destination.

FNA1

3

Compare function for pixel component 1

and value A.

FNA2

3

Compare function for pixel component 2

and value A.

FNA3

3

Compare function for pixel component 3

and value A.

FNA4

3

Compare function for pixel component 4

and value A.

FNB1

3

Compare function for pixel component 1

and value B.

FNB2

3

Compare function for pixel component 2

and value B.

FNB3

3

Compare function for pixel component 3

and value B.

FNB4

3

Compare function for pixel component 4

and value B.

The eight possible comparison functions are the following:

Function Number

Description

000

False

001

True

010

Data >= Value

011

Data < Value

100

DataI = Value

101

Data == Value

110

Data <= Value

111

Data > Value

In the embodiment illustrated, four groups of bits, bits

0

-

7

, bits

8

-

15

, bits

16

-

23

, and bits

24

-

31

, defining four components of a single pixel, are separately processed, giving a very high degree of flexibility in keying. These four components can define the red, green and blue (RGB) color of a picture or can be each of the Y,U,V parameters for that type of picture. The fourth component is provided for in case a destination compare operation is desired to be performed. This fourth component is referred to as the alpha channel, and is usable by the application software.

However it will be noted that in some cases four, or three (if the alpha channel is not used), components need not be used. In a simpler system, such as a monochrome system, or in a system in which a color signal is to be processed by the use of only one component, only one mask

627

, one pair of comparators

633

A and

633

B, and one pair of masks

638

A and

638

B can be used. AND gates

643

and

645

can then be dispensed with and the outputs of comparators

633

A and

633

B can be applied directly to inputs of logic circuit

647

.

FIG. 8

illustrates the type of result that use of the present invention can provide. A full screen graphic screen

651

can contain multiple overlapping full motion video streams Video

1

, Video

2

, and Video

3

.

The live video windows may be partially obsured by other windows. To deal with odd clip regions, the program application software should assign an ID to each of the distinct regions: graphics, Video

1

, Video

2

, and Video

3

. This ID should then be written to the alpha channel of each pixel in the destination. Each video source should then be keyed to its own ID using the mixing unit described above, so that writing is inhibited outside it's own region.

To implement this, and assuming that the alpha channel has been set up (channel

4

, bits

0

-

7

), the data provided from the control registers to the various control inputs described above, i.e. one possible video mixer configuration can be:

Register

Value

Mode

DESTINATION

CSelect

OR

ISelect

No operation

ValueA

REGION_ID

ValueB

don't care

IMask

000000FF

OMask

FFFFFF00

FNA1

TRUE

FNA2

TRUE

FNA3

TRUE

FNA4

Data == ValueA

FNB1

FALSE

FNB2

FALSE

FNB3

FALSE

FNB4

FALSE

A possible video mixer configuration to mix two video streams, one of which is blue screened to provide for video special effects) is as follows. The non-blue screened source may also be a computer generated background.

Register

Value

Mode

Blue-screened data is SOURCE

CSelect

AND

ISelect

INVERT

ValueA

Lower color bound

ValueB

Upper color bound

IMask

FFFFFF00

OMask

FFFFFF00

FNA1

Data > ValueA

FNA2

Data > ValueA

FNA3

Data > ValueA

FNA4

TRUE

FNB1

Data < ValueB

FNB2

Data < ValueB

FNB3

Data < ValueB

FNB4

TRUE

Mode

Blue-screened data is

DESTINATION

CSelect

AND

ISelect

No operation

ValueA

Lower color bound

ValueB

Upper color bound

IMask

FFFFFF00

OMask

FFFFFF00

FNA1

Data > ValueA

FNA2

Data > ValueA

FNA3

Data > ValueA

FNA4

TRUE

FNB1

Data < ValueB

FNB2

Data < ValueB

FNB3

Data < ValueB

FNB4

TRUE

To overlay computer graphics or text on top of a video stream or graphical image, the following possible video mixer configuration can be used. It should be noted that this is similar to blue screening, except that the computer graphics signal is used to key on a specific color.

Register

Value

Mode

Graphics data is SOURCE

CSelect

OR

ISelect

INVERT

ValueA

Color Key

ValueB

Don't care

IMask

FFFFFF00

OMask

FFFFFF00

FNA1

Data == ValueA

FNA2

Data == ValueA

FNA3

Data == ValueA

FNA4

TRUE

FNB1

TRUE

FNB2

TRUE

FNB3

TRUE

FNB4

TRUE

Mode

Graphics data is DESTINATION

CSelect

OR

ISelect

NO operation

ValueA

Color Key

ValueB

Don't care

IMask

FFFFFF00

OMask

FFFFFF00

FNA1

Data == ValueA

FNA2

Data == ValueA

FNA3

Data == ValueA

FNA4

TRUE

FNB1

TRUE

FNB2

TRUE

FNB3

TRUE

FNB4

TRUE

A person skilled in the art understanding this invention may now design variations or other embodiments, using the principles described herein. All such variations or embodiments are considered to fall within the scope of the claims appended hereto.

Number	Name	Date	Kind
4864496	Triolo et al.	Sep 1989	A
4980765	Kudo et al.	Dec 1990	A
4994912	Lumelsky et al.	Feb 1991	A
5020115	Black	May 1991	A
5105266	Telle	Apr 1992	A
5124688	Rumball	Jun 1992	A
5227863	Bilbrey et al.	Jul 1993	A
5243447	Bodenkamp et al.	Sep 1993	A
5260695	Gengler et al.	Nov 1993	A
5444835	Turkowski	Aug 1995	A
5889499	Nally et al.	Mar 1999	A

	Number	Date	Country
Parent	08/667872	Jun 1996	US
Child	09/637824		US

	Number	Date	Country
Parent	08/129355	Sep 1993	US
Child	08/667872		US

	Number	Date	Country
Parent	08/667872	Jun 1996	US
Child	09/637824		US

Host CPU independent video processing unit

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

Parent Case Info

US Referenced Citations (11)

Divisions (1)

Continuations (1)

Reissues (1)