The present invention relates generally to a data stream management system. More particularly, the present invention relates to a data stream management system for accessing mass data between different distributed server groups.
Recently, video/audio server systems are getting popular due to growth of network and multimedia industry. By use of streaming technology, video/audio files can be transmitted and browsed at the same time. Furthermore, a link can also be inserted into the streamed video/audio file such that website can automatically change pages during playback of the video/audio. By use of this kind of server system, mass video/audio data stream can be transmitted to many clients at low cost. Due to digital wideband, users can easily watch/listen to video/audio on demand (video-on-demand) without waiting for a long time. However, even though the bandwidth of network is large enough, it is still hard for current network server systems to efficiently and exquisitely provide such services while many users request to obtain a certain video file (e.g., an network real time baseball game) at the same time.
Sizes of multimedia files are usually huge. For example, a movie may have a size of 5 billion bytes, and to playback a television program usually may require a transmission rate of 200 million bytes per second. Furthermore, if every user can select a video stream from a video stream database including 1014 bytes (e.g., 10 billion bytes per program multiplied by 1000 programs) and continuously playback the selected video at a transmission rate of 200 million bytes per second, and every program may expectably be provided to thousands of users, then a system that can efficiently transmit the video stream at low cost is desperately needed to fulfill user's non-endless demands.
Certain complex problems need to be put into consideration while designing such system, or else those problems may be even harder to solve afterwards. Demands for a program may differ from program to program, and therefore, can not be considered having the same amount of demands. For example, some programs are more popular than others which has a larger ratio of clients requesting for watching thereof. Hence, if every program is evenly dispatched to every server, then capacity for every program may be limited, such that demand for the popular program may not be satisfied.
Some prior arts provide solutions for the aforementioned problems. Please refer to
Please refer to
Finally, please refer to
Hence, a data stream management system that can efficiently distribute and obtain data stream in a short time and can provide more data sources for popular data (video/audio file) is desperately needed.
This paragraph extracts and compiles some features of the present invention; other features will be disclosed in the follow-up paragraphs. It is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims.
In accordance with an aspect of the present invention, a data stream management system for accessing mass data includes: a client computer, for transmitting and receiving a main data; and a plurality of distributed server groups, connected to the client computer via network. Each of the distributed server groups includes: a determination unit, for determining whether size of the main data from the client computer exceeds a predetermined size; a dividing unit, for dividing the main data into a plurality of data sections in a unit of the predetermined size and numbering the data sections into different section numbers while size of the main data exceeds the predetermined size, wherein the main data is considered as one date section while size of the main data is smaller than the predetermined size; a plurality of distributed servers, for storing the data sections; a transmitting unit, for transmitting the data sections to different distributed servers; and a dispatching server, for controlling access of the distributed servers, and storing a global index for identifying in which distributed server each data sections is located.
Preferably, the distributed server group further includes an updating unit, for updating the global index of the distributed server group therein, and transmitting the updated global index to other updating units of other distributed server groups.
Preferably, the updating unit updates the global index while the main data is transmitted or received by the client computer.
Preferably, the updating unit updates the global index regularly.
Preferably, the data section is distributed in different distributed servers of the distributed server groups.
Preferably, the data section is randomly distributed in different distributed servers.
Preferably, the data stream management system further includes an integrating unit, for locating each data sections of the main data based on the global index, selecting a distributed server to access by a specific condition, and integrating each data sections of the main data in sequence before providing to the client computer while the client computer requests to receive the main data.
Preferably, the specific condition includes transmission rate and completeness of the data sections.
Preferably, the data stream management system further includes a proxy server having a memory, for accessing each data sections of the main data from the distributed server based on the global index, storing each data sections of different section numbers to the memory, and integrating each data sections of the main data in sequence by section numbers before providing to the client computer while the client computer requests to receive the main data.
Preferably, the main data is a video/audio file.
Preferably, the global index includes at least one data section array.
In accordance with another aspect of the present invention, a method for accessing mass data by a data stream management system as aforementioned includes the following steps: a) transmitting a main data; b) determining whether the size of the main data exceeds a predetermined size; c) dividing the main data into a plurality of data sections in a unit of the predetermined size and numbering the data sections into different section numbers while size of the main data exceeds the predetermined size, wherein the main data is considered as one date section while size of the main data is smaller than the predetermined size; d) transmitting the data sections to different distributed servers; and e) updating a current location of each data sections in a global index.
Preferably, the method further includes the following steps: f) locating each data sections of the main data based on the global index while obtaining a request to receive the main data; g) selecting a distributed server to access by a specific condition; and h) integrating each data sections of the main data in sequence by section numbers.
Preferably, the specific condition includes transmission rate and completeness of the data sections.
Preferably, the method further includes between steps g) and h) a step of storing each data sections of different section numbers.
Preferably, the global index includes at least one data section array.
Please refer to
In the present embodiment, the first distributed server group 100 includes servers 1001, 1002, 1003 and 1004; the second distributed server group 130 includes servers 1301 and 1302; the third distributed server group 150 includes servers 1501 and 1502, wherein servers 1001, 1301 and 1501 are main servers.
The main servers 1001, 1301 and 1501 have the following functions: 1. Determining function: for determining whether size of the main data from the client computer 170 exceeds a predetermined size. 2. Dividing function: for dividing the main data into a number of data sections in a unit of the predetermined size and numbering the data sections into different section numbers while size of the main data exceeds the predetermined size. Furthermore, the main servers 1001, 1301 and 1501 will consider the main data as one data section while size of the main data is smaller than the predetermined size. 3. Transmitting function: for transmitting the data sections to different servers. Due to the aforementioned functions, servers 1001, 1301 and 1501 each act as a dispatching server in the distributed server groups 100, 130 and 150, which controls access of the data sections between the servers 1002, 1003, 1004, 1302, 1502 and the main servers 1001, 1301 and 1501. Moreover, the main servers 1001, 1301 and 1501 each has a global index stored therein for identifying in which distributed server each data section is located. The global index includes at least one data section array. Servers 1002, 1003, 1004, 1302 and 1502 stores and provides the data sections after receiving broadcasting notice from the main servers 1001, 1301 and 1501.
Main servers 1001, 1301 and 1501 also includes an update function, for updating the global index of the distributed server group therein, and transmitting the updated global index to other main servers of other distributed server groups. Update is performed while the main data is transmitted or received by the client computer 170. Alternatively, the global indexes of the main servers 1001, 1301 and 1501 can be configured to update regularly. Furthermore, the data sections that can be randomly or systematically distributed in different/same distributed server of different/same distributed server groups.
Main servers 1001, 1301 and 1501 further includes an integrating function. The main servers 1001, 1301 and 1501 can locate each data sections of the main data based on the global index, and then select a server to access by a specific condition. Later, each data sections of the main data is integrated in sequence before providing to the client computer 170 while the client computer 170 requests to receive the main data. In this embodiment, the specific condition includes transmission rate and completeness of the data sections. In other words, the main servers 1001, 1301 and 1501 will base on the global index select a server which provides the highest network transmission rate or includes most complete data sections (i.e., having the most data sections for integrating into the main data) of the main data for accessing the data sections.
In this embodiment, server 1004 acts as a proxy server (hereinafter called proxy server 1004). The proxy server 1004 has a memory (not shown). The proxy server 1004 accesses each data sections of the main data from the distributed server based on the global index, and stores each data sections of different section numbers to the memory. Then, the proxy server 1004 integrates each data sections of the main data in sequence by section numbers before providing to the client computer 170 while the client computer 170 requests to receive the main data.
Even though server 1004 acts as a proxy server in this embodiment, the proxy server is not limited to be in distributed server groups 100, 130 or 150, it can be in the client computer 170 or be externally connected to the client computer 170, as shown in
Please refer to
When a user wants to store a first video/audio file (main data) in a data stream management system 10 for other users to download, the user can transmit the first video/audio file to a main server 1001 through a client computer 170 (S101). In this embodiment, the first video/audio file has a size of 2.5 Mbytes. The main server 1001 will determine whether size of the first video/audio file exceeds 1 Mbytes (the predetermined size) (S102). The first video/audio file is then divided into three data sections because size of the first video/audio file exceeds 1 Mbytes, and the three data sections will be numbered as DA1, DA2 and DA3 (S103). If the first video/audio file has a size smaller than 1 Mbytes, then the main server 1001 will still consider it as one single data section (S104). Later, the main server 1001 will transmit the data sections DA1, DA2 and DA3 to distributed servers 1003, 1302 and 1502, respectively (S105). At the meantime, the main server 1001 will also update a current location of the data sections DA1, DA2 and DA3 in a global index (S106). Please refer to table 1, the global index includes at least one data section array. The data section array records corresponding relationship between each distributed servers 1001, 1002, 1003, 1004, 1301, 1302, 1501 and 1502 and data sections DA1, DA2 and DA3 (distributed servers stored with data sections are marked with a check symbol ).
In the present embodiment, the first distributed server group 100 has a widest bandwidth and a highest transmission rate among the three distributed server groups 100, 130 and 150. The third distributed server group 150 has a narrowest bandwidth and a lowest transmission rate. Later, transmission rate difference will be put into consideration while describing difference on file browsing.
Please refer to
Since the data sections of the first video/audio file are equally distributed between servers 1003, 1302 and 1502, they can't be compared. Hence, please refer to table 2. Suppose there is a second video/audio file which is divided into four data sections and numbered as DB1, DB2, DB3 and DB4 by the main server 1501. The four data sections are stored in servers 1002 and 1003, 1003 and 1004, 1301 and 1502, respectively. Furthermore, a third video/audio file is divided into five data sections and numbered as DC1, DC2, DC3, DC4 and DC5 by the main server 1501. The five data sections are stored in servers 1003 and 1502, 1002, 1301, 1302 and 1004. In this case, the global index includes three data section arrays, as shown in table 2.
Since DB1 and DB2 are each stored in two different servers, the main server 1501 will select from one of the two servers that allows the data sections to be provided in a fastest way while the user wants to browse or download the second video/audio file from the main server 1501. Obviously, server 1003 is selected for access to DB1 and DB2 since server 1003 is stored with both DB1 and DB2. Meaning that server 1003 has a better completeness of data sections than servers 1002 and 1004. In another example, since DC1 and DC2 are each stored in two different servers, the main server 1501 will select from one of the two servers that allows the data sections to be provided in a fastest way while the user wants to browse or download the third video/audio file from the main server 1501. In this case, servers 1002 and 1003 will be selected for access of DC1 and DC2 due to the fact that the first distributed server group 100 has the widest bandwidth and the highest transmission rate among the three distributed server groups.
Please refer to table 1 and
Furthermore, the following points should also be notice regarding the present invention: 1. the amount of main server included in each distributed server groups is not limited to one, a distributed server group can include many main servers or the servers included in the distributed server group can all be main servers; 2. content of a data section can be fulfilled until it's size reaches the predetermined size while size of the main data or the data section is smaller than the predetermined size; 3. data sections are widely distributed to different distributed server groups, and are not one-to-one copied; 4. size of each data section arrays in the same distributed server groups are approximately the same, whereas would be different between different distributed server groups, due to the fact that the global index is dynamically updated by each main server.
While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiment, it is understood that the invention needs not be limited to the disclosed embodiment. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims, which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.
Number | Date | Country | Kind |
---|---|---|---|
100104126 | Feb 2011 | TW | national |