Method and system for providing a unified data exchange and storage format

Information

  • Patent Application
  • 20030061241
  • Publication Number
    20030061241
  • Date Filed
    September 27, 2001
    22 years ago
  • Date Published
    March 27, 2003
    21 years ago
Abstract
A method and system is presented that provides for a unified format for both data exchange and data storage protocols in a network. The unified data format provides for an efficient protocol for receiving, parsing, decoding, and storing varying sizes of real-time data.
Description


BACKGROUND

[0001] 1. Field


[0002] The present disclosed embodiments relate generally to communication systems, and more specifically to a method and system that provides a unified format for data exchange and data storage protocols.


[0003] 2. Background


[0004] Existing formats for data exchange in a network largely differ from existing formats for data storage; therefore, data received by a data client has to be reformatted before being stored. Existing data exchange formats and disk storage formats are optimized for either small or large data sizes, but not both. Current data storage formats lack the features necessary to make them useful for exchanging real-time, size-varying data robustly over network connections, and current network protocols lack the features necessary for robustly storing data.


[0005] There is, therefore, a need in the art for a unified format that is suitable for both data exchange and storage so that the unified format would result in cost-effective implementation, compatibility with other tools, and code-reusability. There is also a need for a unified data exchange and storage format for efficient data compression/decompression to reduce hardware and/or software runtime costs.



SUMMARY

[0006] The embodiments disclosed herein address the above-stated needs by providing a method and system for formatting data in a network including a data client and a data manager. The method and system provides a unified data format that has a header section, a records section, and a tail section, such that the data format is applicable for both data exchange and data storage.


[0007] In another aspect of the invention, a data client apparatus includes a receiver configured to receive data from a data manager over a network connection and a transmitter configured to send the data to a storage unit, such that the data received over the network connection and the data sent to the storage unit have the same format.


[0008] In another aspect of the invention, a data structure for formatting data in a network includes a header section, a records section, and a tail section such that the data structure is applicable for both data exchange and data storage protocols.







BRIEF DESCRIPTION OF THE DRAWINGS

[0009]
FIG. 1 shows a representation of an exemplary network interface for implementing data exchange protocols;


[0010]
FIG. 2 shows a representation of an exemplary data format;


[0011]
FIG. 3 shows a representation of an exemplary overall data format;


[0012]
FIGS. 4A and 4B show a representation of an exemplary record format;


[0013]
FIG. 5 shows a representation of an exemplary header for an overall data format;


[0014]
FIG. 6 shows a representation of an exemplary record format; and


[0015]
FIG. 7 shows a block-diagram representation of an exemplary embodiment for the data client and data manger operating in FIG. 1.







DETAILED DESCRIPTION

[0016]
FIG. 1 shows a representation of an exemplary network interface between a data manager 102 and a data client 104, through connections 106 and 108. Data manager 102 may include a network element such as a Web server or an access network (AN). Data manager 102 may be in data communication with data sources such as a modem pool controller (MPC) 110, a modem pool transceiver (MPT) 112, and/or an access terminal (AT) 114. Data client 104 may include a client terminal such as a personal computer, a HDR analysis tool (HAT), or another data server incorporating data from data manager 102 into its own data.


[0017] In one embodiment, data communication between data manager 102 and data client 104 may be controlled by two protocols: data manager protocol (DMP) for data control, and binary data exchange format (BDEF) for data delivery and storage. The DMP protocol may include data manager control protocol (DMCP) and data manager discovery protocol (DMDP), as discussed in detail in the co-pending Patent Application.


[0018] Data Manager Protocol


[0019] The DMP protocol is designed for easy implementation and debugging. The design goals include having the commands atomic and the server connections stateless. An instruction does several things “atomically” when all the things are done immediately, and there is no chance of the instruction being half-completed, being interspersed, or being interrupted. A stateless server treats each request as an independent transaction, unrelated to any previous request. This simplifies the server design because it does not need to allocate storage to deal with conversations in progress or worry about freeing the storage if a client dies in the middle of a transaction. The commands and responses may be in American standard code for information interchange (ASCII), and their format is chosen to make machine parsing easy to implement.


[0020] Data Manager Control Protocol (DMCP)


[0021] Data manager 102 may support DMCP protocol for sending and receiving DMCP commands, through two-way connection 106, for example. The two-way connection 106 may include TCP or user datagram protocol (UDP) connections. Data manager 102 may use TCP connection for DMCP commands on port 1880. However, if data manager 102 supports DMDP protocol, as defined below, it may also support DMCP protocol on other ports.


[0022] Binary Data Exchange Format (BDEF)


[0023] A unified data exchange and storage format advantageously results in efficient implementations, compatibility to other network components, and code reusability. Such data format is suitable for large volumes of data of varying sizes that may come in real-time. FIG. 1 shows an exemplary network where the data exchanged over the network connection 108 and the data stored in the storage unit 116 have a unified format, e.g., BDEF. Data client 104 may receive data from data manager 102 over network connection 108 and send the data to storage unit 116 over connection 118. In one embodiment, the data exchange format over network connection 108 and the data storage format in storage unit 116 are unified.


[0024] In one embodiment, such a unified format may include an overall format and a record format. The overall format may be viewed as both a file format, which may define a data storage format, and a network format, which may define a data exchange format across a network.


[0025]
FIG. 2 shows an exemplary data structure for a unified data format, according to one embodiment. The overall format of a file 202 may include a header section 204, a records section 206, and a tail section 208. In one embodiment, the overall format 202 may have only a records section. The record format of a record within the records section 206 may also include a record header 210, a record body 212, and a record tail 214.


[0026]
FIG. 3 shows an exemplary detailed overall format for BDEF format. Rows 302 to 322 correspond to the header section 204, rows 324 to 332 correspond to the records section 206, and rows 334 to 336 correspond to the tail section 208. Some of these rows are described below:


[0027] OVERALL_FORMAT_ID 302 identifies the basic overall format. In one embodiment, the basic overall format is generic binary record format (GBRF). Any file viewer or editor may be used to examine the first four bytes of a file, or the OVERALL_FORMAT_ID 302, to determine the basic overall format, e.g., GBRF. As shown in FIG. 3, if the file is a GBRF-formatted file, the corresponding ID value should be set to the ASCII characters for GBRF.


[0028] OVERALL_FORMAT_VERSION 304 specifies what version of the basic overall format, e.g., GBRF, the file is based on.


[0029] RECORD_FORMAT_TYPE 306 and the RECORD_FORMAT_VERSION 308 specify the record format type, e.g., BDEF, and its version, e.g., 2, respectively. The prefix “0x” in “0x0002” indicates the hexadecimal representation of 2.


[0030] OVERALL_FORMAT_OPTIONS 310 specify format options. In one embodiment, two of the 16 bits may specify optional fields, and the other bits may be reserved, i.e., set to 0, for future format options.


[0031] EOR_SENTINEL 312 and EOS_SENTINEL 314 fields specify what sentinel values are used for the end of records and end of streams, respectively. EOR_SENTINEL 312 indicates end of a record, and EOS_SENTINEL 314 indicates end of an overall format. These values may be not fixed so that the GBRF-formatted files may be easily embedded into the records of other GBRF-formatted files, thereby providing for a recursive file structure.


[0032] TOTAL_RECORDS_SIZE 316, FIRST_RECORD_BYTE 318, and LAST_RECORD_BYTE 320 fields may be used for streams where the records are not in byte order, but are wrapped. Record wrapping is useful in situations where records are buffered locally in a fixed-sized buffer and a larger amount of data than the buffer size is likely. Therefore, the wrapped records just keep the latest records that will fit into the file, e.g. FIFO queue.


[0033] END_OF_HEADER 322 may be used as simple data integrity check after parsers have parsed the header information.


[0034] CRC field 334 at the end of the stream provides an application-level data integrity check. If the file is wrapped, then the CRC is calculated in time order. Data parsers may have to support this feature, but application writers may have the option of not implementing it.


[0035] END_OF_STREAM 336 provides another data integrity check. It may be used for checking non-reliable transports, for determining the end in real-time transports, and for data integrity checking in the GBRF recursive format.



EXAMPLE 1

[0036] The following example shows a header with no options, which requires 19 hexadecimal bytes to implement, e.g., “47 42 52 46 00 42 44 45 46 00 01 02 00 00 FA CE DE AD CO DE.”


[0037] Referring to FIG. 5, the constituent bytes are defined as:


[0038] bytes 0 to 3 define OVERALL_FORMAT_ID 302, e.g., GBRF


[0039] byte 4 defines OVERALL_FORMAT_VERSION 304


[0040] bytes 5 to 8 define RECORD_FORMAT_TYPE 306, e.g., BDEF


[0041] bytes 9 to 10 define RECORD_FORMAT_VERSION 308


[0042] bytes 11 to 12 define OVERALL_FORMAT_OPTIONS 310, e.g., no options


[0043] bytes 13 to 14 and 15 to 16 define EOR_SENTINEL 312 and the EOS_SENTINEL 314, respectively


[0044] bytes 17 to 18 define END_OF_HEATHER 322


[0045] The same header but with the CRC option enabled requires also 19 hexadecimal bytes, e.g., “47 42 52 46 00 42 44 45 46 00 01 02 00 01 FA CE DE AD CO DE,” where bytes 11-12 define OVERALL_FORMAT_OPTION 310, e.g., B0=1, meaning that CRC is present. A trailer with no CRC option may require two hexadecimal bytes “DE AD.”


[0046]
FIGS. 4A and 4B show an exemplary BDEF record format, according to one embodiment. Rows 402 to 434 correspond to the record header 210, rows 436 to 440 correspond to the record body 212, and rows 444 to 446 correspond to the record tail 214. Some of these fields are described below.


[0047] REC_SIZE_SIZE 402, which greatly simplifies implementation, provides the number of bits that are allocated for REC_SIZE 404. A data parser may quickly read one byte to determine how many bytes are needed to complete the REC_SIZE 404. Once the parser reads the entire REC_SIZE 404 and decodes it, the parser allocates memory efficiently.


[0048] Using REC_SIZE_SIZE 402 facilitates skipping a record. Applications that write records may already know the data size, so writing records is also made simple. However, an application writer may not need to worry about REC_SIZE_SIZE field 402 when writing variable REC_SIZE data, if the record writer hard-codes the upper 2 bits of a 32-bit word and uses the 30-bit word size even for small values. Therefore, the record format disclosed herein is optimally flexible, and may not require extra implementation costs.


[0049] END_OF_RECORD 444 may also be specified in the overall format header field EOR_SENTINEL 312 (FIG. 3), which the record writer may use.


[0050] Both data managers (producers) and the data clients (consumers) may know the definitions for different record BODY fields a priori, because they may be defined and mapped to a certain REC_TYPE value. BODY definitions may also be determined by some protocol or from stored special record. However, a RECORD_TYPE value that is not explicitly known to the data consumer may be treated as if the BODY definition was an integer word field with however many bits available to the BODY field. By having the unknown RECORD_TYPE values treated as arbitrary sized words, the data producers may quickly define and implement new data types such that they work automatically with existing data consumers.


[0051] In one embodiment, the BDEF streams may contain a data type summary record as the first record, thus saving the pairing of REC_TYPE values to attributes names and definitions, which may change over time.



EXAMPLE 2

[0052] In an exemplary embodiment, for TIME=0x123456789AB, CARD_IP=0XABCDEF01, defining data source, AT_IP=0x12345678, defining data target, REC_TYPE=0x2, and 4-byte BODY consisting of the characters 0, 1, 2, and 3, the record format may require 22 hexadecimal bytes, e.g., “15 02 11 23 45 67 89AB AB CD EF 01 12 34 56 78 30 31 32 33 FA CE.”


[0053] Referring to FIG. 6, the constituent bytes and/or bits are described below:


[0054] the first two bits of the byte 0, define REC-REC-SIZE 402, e.g. 0


[0055] the last six bits of byte 0 define REC-SIZE 404, e.g. 6, indicating that 21 bytes are required to define the rest of record format


[0056] the fist bit of byte 1 defines REC-FORMAT 406, e.g. 6


[0057] the second bit of byte 1 defines REC_TYPE_SIZE 408, e.g., 0, indicating 6 bytes are required to define REC_TYPE 410


[0058] the last six bits of byte 1 define REC-TYPE 410, e.g. 2


[0059] bytes 2-7 define DATA_TIME_STAMP 414


[0060] bytes 8-11 define DATA_CARD_IP 416, defining data source IP address


[0061] bytes 12-15 define DATA_AT_IP 418, defining data target IP address


[0062] bytes 16-19 define data characters


[0063] bytes 20-21 define END_OF_RECORD 446


[0064] An exemplary embodiment for data client 104, such as a cell phone or a personal digital assistant (PDA), or for data manger 102 operating in system of FIG. 1 is illustrated in FIG. 7. The system in FIG. 7 includes antenna 702 for transmitting and receiving data. Antenna 702 is coupled to duplexer 704 for isolating the receiver path from the transmitter path. The duplexer 704 is coupled to receiver circuitry 706, forming the receiver path, and is coupled to amplifier 708 and transmit circuitry 710, forming the transmitter path. Amplifier 708 is further coupled to power control adjust unit 712 that controls amplifier 708. Amplifier 708 receives the transmission signals from transmit circuitry 710. Received signals via antenna 702 are provided to power control unit 712, which may implement a closed loop power control scheme. Power control unit 712 is coupled to communication bus 714. Communication bus 714 provides a common connection among other modules in FIG. 7. Communication bus 714 is further coupled to memory unit 716. Memory 716 stores computer readable instructions for a variety of operations and functions applicable to data client 104 or data manager 102. The processor 718 performs the instructions stored in memory 714.


[0065] The word “exemplary” is used exclusively herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.


[0066] Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.


[0067] Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.


[0068] The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.


[0069] The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.


[0070] The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.


Claims
  • 1. An apparatus for exchanging data in a network, comprising: a receiver configured to receive data from a data producer over a network connection; and a transmitter configured to send the data to a storage unit, wherein the data received over the network connection and the data stored in the storage unit have the same format.
  • 2. The apparatus of claim 1 wherein the data includes a records section.
  • 3. The apparatus of claim 1 wherein the data includes a header section, a records section, and a tail section.
  • 4. The apparatus of claim 3 wherein a record in the records section includes a record header, a record body, and a record tail.
  • 5. The apparatus of claim 1 wherein the network includes a wireless network.
  • 6. The apparatus of claim 1 wherein the network includes an optical network.
  • 7. A method for formatting data in a network, comprising: providing a header section for the data; providing a records section for the data; and providing a tail section for the data, such that the data is applicable for both data exchange and data storage.
  • 8. The method of claim 7 further providing for a record in the records section a record header, a record body, and a record tail.
  • 9. The method of claim 7 further providing information identifying a format for the data.
  • 10. The method of claim 7 further providing information identifying a version for the data.
  • 11. The method of claim 7 further providing information identifying a record-format type.
  • 12. The method of claim 7 further providing information identifying a record-format version.
  • 13. The method of claim 7 further providing information identifying an option for the data.
  • 14. The method of claim 7 further providing information identifying the end of the data.
  • 15. The method of claim 7 further providing information identifying a size for the records section.
  • 16. The method of claim 7 further providing information identifying the first record in the records section.
  • 17. The method of claim 7 further providing information identifying the last record in the records section.
  • 18. The method of claim 7 further providing information identifying the end of the header section.
  • 19. The method of claim 7 further providing information about CRC.
  • 20. The method of claim 7 further providing information identifying a size of a record in the records section.
  • 21. The method of claim 7 further providing a size of a record in the records section.
  • 22. The method of claim 7 further providing information identifying a format for a record in the records section.
  • 23. The method of claim 7 further providing information identifying a size of a record type in the records section.
  • 24. The method of claim 23 further providing a record type of a record in the records section.
  • 25. The method of claim 7 further providing information identifying a format for the data.
  • 26. The method of claim 7 further providing a time stamp for the data.
  • 27. The method of claim 7 further providing information identifying a source of the data subsystem in the network.
  • 28. The method of claim 7 further providing information identifying a target for the data access terminal in the network.
  • 29. The method of claim 7 further providing information identifying a sector in a subsystem in the network.
  • 30. The method of claim 7 further providing information identifying CRC for a record in the records section.
  • 31. The method of claim 7 further providing information identifying the end of a record in the records section.
  • 32. An apparatus for formatting a data in a network, comprising: means for providing a header section; means for providing a records section; and means for providing a tail section, such that the data is applicable for both data exchange and data storage.
  • 33. A computer readable medium embodying a method for formatting a data in a network, the method comprising: providing a header section; providing a records section; and providing a tail section, such that the data is applicable for both data exchange and data storage.
  • 34. An apparatus for formatting a data in a network, comprising: a memory unit; a digital signal processor (DSP) unit communicatively coupled to the memory unit, the DSP unit being capable of: providing a header section; providing a records section; and providing a tail section, such that the data is applicable for both data exchange and data storage.
  • 35. A method for processing data in a network including a data client and a data manager, comprising: receiving, at the data client, a data from the data manager over a network connection; and storing the data in a storage unit, wherein the data received over the network connection and the data stored in the storage unit have the same format.
  • 36. A computer readable medium embodying a method for processing data in a network including a data client and a data manager, the method comprising: receiving, at the data client, a data from the data manager over a network connection; and storing the data in a storage unit, wherein the data received over the network connection and the data stored in the storage unit have the same format.
  • 37. An apparatus for processing data in a network of data processing systems including a data client and a data manager, comprising: means for receiving, at the data client, a data from the data manager over a network connection; and means for storing the data in a storage unit, wherein the data received over the network connection and the data stored in the storage unit have the same format.
  • 38. An apparatus for processing data in a network including a data client and a data manager, comprising: a memory unit; and a digital signal processor (DSP) unit communicatively coupled to the memory unit, the DSP unit being capable of: receiving, at the data client, a data from the data manager over a network connection; and storing the data in a storage unit, wherein the data received over the network connection and the data stored in the storage unit have the same format.
  • 39. A data structure for formatting data in a network, the data structure comprising: a header section for the data; a records section for the data; and a tail section for the data, such that the data is applicable for both data exchange and data storage.
  • 40. The data structure of claim 39 further including a record header, a record body, and a record tail for the records section.