The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Popularity of online social media services leads to explosive growth of microblog data. For example, every day, hundreds of millions of Twitter users post hundreds of millions of tweets, and more than one billion Facebook users post several billion comments. The microblog data can include various types of information such as text, location information, and user information. That information enables meaningful analysis tasks that can deduce fruitful conclusions for various purposes.
Aspects of the disclosure provide a system for visualizing microblog data. The system can include circuitry that is configured to receive a request for visual report from a user device, extract the selected microblog data from a database based on the request for visual report, create a pyramid data structure having a plurality of cells at different levels for data visualization based on microblog data within spatial and temporal ranges selected by a user, and create a visual report including a plurality of visual report interfaces based on the data structure.
In an embodiment, the request for visual report from a user device includes spatial and temporal ranges of selected microblog data and a categorical attribute to be analyzed. In an example, the categorical attribute to be analyzed is a language attribute, and the visual report shows counts and percentages of each language in each sub-region of a region corresponding to the spatial range of the selected microblog data and is used for language attribute analysis. In another embodiment, the categorical attribute to be analyzed is a source attribute, and the visual report shows counts and percentages of each operating system in each sub-region of a region corresponding to the spatial range of the selected microblog data and is used for source attribute analysis.
In an embodiment, the circuitry can be configured to compute counts of attribute categories of microblog data in each cell and store the counts of attribute categories in a hash table to create the pyramid data structure. The counts of attribute categories can be based on distinct users or distinct microblogs.
In an embodiment, the visual report interfaces can be based on a map having a plurality of zoom levels, and each zoom level corresponds to a level in the pyramid data structure. In addition, counts of attribute categories in a cell in the pyramid data structure can be displayed using a chart overlaid with a region in the map, and the region corresponds to the cell in the pyramid data structure. In an example, the visual report interfaces can include functions for a user to select the attribute categories to be displayed and choose to display the visual report interface based on distinct users or distinct microblogs.
In an embodiment, circuitry is further configured to generate a data selection interface for a user to select spatial and temporal ranges of microblog data and categorical attribute to be analyzed for the visual report.
In an embodiment, the circuitry is further configured to send an email to the user device, and the email can include a hyperlink to the visual interfaces generated at the visual interface generator.
Aspects of the disclosure provide a method for visualizing microblog data. The method is implemented by a system having circuitry, and can include receiving a request for visual report from a user device, extracting the selected microblog data from a database based on the request for visual report, creating a pyramid data structure having a plurality of cells at different levels for data visualization based on microblog data within spatial and temporal ranges selected by a user, and creating a visual report including a plurality of visual report interfaces based on the data structure.
Various embodiments of this disclosure that are proposed as examples will be described in detail with reference to the following figures, wherein like numerals reference like elements, and wherein:
The network 102 can be any network that allows the user 104 and the server to communicate with each other, such as a wide area network, local area network, or the internet. The server 100 may include a CPU and a memory. The server 100 may represent one or more servers connected via the network 102.
Generally, data visualization is the presentation of data in a pictorial or graphical format, such as charts and maps. Data visualization enables users to see analytical results of data and makes complex data more understandable and usable.
In an embodiment, the data to be visualized is microblog data generated from a microblogging service, such as Twitter microblogging service, Facebook, and the like. The microblog data can include numerous microblog entries. In addition to a content generated by a user of the microblogging service, in an embodiment, each microblog entry of the microblog data can include a plurality of attributes, such as a spatial attribute, a temporal attribute, and multiple categorical attributes. The spatial attribute can include a location where the user posts the microblog, while the temporal attribute can include a time when the user posts the microblog. The categorical attributes can include, for example, a language attribute and a source attribute. The language attribute can determine in which natural language the microblog is written, while the source attribute can determine from which type of operating system (OS), device or application the microblog is posted. The categorical attributes are important sources for microblog data analysis and can be exploited to draw fruitful conclusions from the microblog data. For example, the language attribute, or the source attribute, along with geolocation information enables various analysis, such as microblog user language usage, spread of different devices in an area, analysis of standards of living in different regions, and the like.
Each of the categorical attributes can take one of multiple discrete values and each discrete value can indicate a particular category of the categorical attribute, referred to as an attribute category. For example, for a microblog posted in English, the language attribute can take a value indicative of English language, and the attribute category of the language attribute for this microblog is English language. For a microblog posted in Arabic, the language attribute can take a value indicative of Arabic language, and the attribute category of the language attribute for this microblog is Arabic language. Similarly, the source attribute can have different attribute categories for different operating systems, such as an Android attribute category for Android OS, an iOS attribute category for iOS operating system, and the like.
According to an aspect of the disclosure, a user can use the system 200 to visualize and analyze categorical attributes of microblog data. For example, through a data selection interface provided by the system 200, the user can send a request for visual report to the system 200 and the request can include arbitrary spatial and temporal ranges of selected microblog data and a particular categorical attribute to be analyzed. Based on the request, the system 200 can extract selected microblog data from the database system 230. Then, the system 200 can create a data structure that provides a specific scheme for storing data to be visualized.
Next, counts of attribute categories are computed for different regions and stored in the data structure. Specifically, a count of an attribute category can be based on distinct microblogs or distinct users. A count of an attribute category based on distinct microblogs is the count of microblogs belonging to each attribute category of the selected data set, while a count of an attribute category based on distinct users is the count of users who post microblogs belonging to each attribute category of the selected data set. Subsequently, the system 200 can generate a visual report and transmit the visual report to the user. The visual report can include multiple visual report interfaces. In each visual report interface, counts of attribute categories retrieved from the data structure can be visually presented, for example, using various charts overlaid with a map, such as a Google map.
The user device 210 can be a computer, such as a desktop computer, a laptop computer, a tablet computer, a mobile phone, and the like. The user device 210 can communicate remotely with the system 200 via a communication network (not shown). The communication network can be a local area network (LAN), such as a Ethernet network, a Wi-Fi network, and the like, or a wide area network (WAN), such as the Internet, a third generation (3G) wireless mobile network, a fourth generation (4G) wireless network, and the like. In alternative embodiment, the device 210 and part or all of the components of the system 200 can be integrated into one system, and the functions previously performed by the device 210 can be implemented as a module in the system 200. Thus, the module can communicate with other components of the system 200 locally.
In the
The request receiver 220 can include a data selection interface 221. As a response to an initial request from the user device 210, the request receiver 220 can send the data selection interface to the user device 210. Subsequently, the request receiver 220 can receive the request for visual report from the user device and transmit the selected data ranges and the categorical attribute to be analyzed to the database system 230.
The database system 230 can include a storage and a query engine. The storage can be configured to store the microblog data, and the query engine can be used to extract microblog data according to the spatial and temporal ranges selected by the user. For example, in an embodiment, the microblog data is Twitter data, and the user chooses to analyze language attribute of Twitter data in the region of Gulf Arab states during the period from December 2013 to February 2014. Thus, Twitter data in the region of Gulf Arab states during the period from December 2013 to February 2014 is extracted from the database. The extracted data can then be transmitted to the data structure generator 240 for further processing.
The database system 230 can communicate with a remote computer 203 via a network 202 to obtain the microblog data. In an embodiment, the database system 230 can use an application program interface provided by a microblog service provider to obtain microblog data from the microblog service provider's database. For an example, the database system 230 can use Twitter Streaming Application Program Interfaces (APIs) provided by Twitter, Inc. company to receive a Twitter microblog data stream from a remote server. Specifically, the database system 230 can use a local client application to send a request to the remote server to set up a HTTP connection. The remote server can then retrieve microblog data from a database inside a network of Twitter company and transmit the Twitter microblog data to the database system 230. In an example, the Twitter microblog data is transmitted in real time while the Twitter users are posting microblogs using Twitter service. In an alternative embodiment, the database system 230 can access a microblog service provider's database using account information of one or more users who register for the microblog service provider's service to obtain the microblog data posted by the users.
The network 203 can be a wide area network (WAN), such as the Internet, a third generation (3G) wireless mobile network, a fourth generation (4G) wireless network, and the like.
Based on the extracted data received from the database system 230, the data structure generator 240 can create the data structure to store counts of different attribute categories of the selected categorical attribute. In an embodiment, an adaptive pyramid data structure is created to store counts of different attribute categories at different levels of granularity. Particularly, the pyramid data structure can be created through the following two phases: a structuring phase and a computation phase. During the structuring phase, the pyramid data structure is initialized as one root cell that covers the whole region corresponding to the selected spatial range and contains all the microblog entries in the extracted data. The root cell is then divided into multiple disjoint children cells, each covering a sub-region that is a portion of the whole region. The microblogs in the root cell are replicated in its children cells according to their spatial locations indicated by the spatial attribute of each microblog entry of the microblog. Any children cell that has a number of microblogs larger than a predetermined capacity threshold is further divided into multiple children cells. The process is repeated recursively for each cell until the count of microblogs in each leaf cell is less than or equal to the capacity threshold. At the end of the structuring phase, the pyramid data structure containing microblog entries is created.
During the computation phase, the counts of each attribute category for the microblog data in each pyramid cell, either a leaf cell or a non-leaf cell, are computed and stored in the cell. When computing the counts based on distinct microblogs, as described above, every individual microblog in the cell is considered even if multiple microblogs are posted by a same user. On the contrary, when computing the counts based on distinct users, all microblogs from the same user only are considered once. In addition, microblog entries are removed from the pyramid data structure after the computation.
Each cell stores the counts of attribute categories, either based on distinct users or distinct microblogs, in hash tables with the attribute categories as keys and the corresponding counts as values. The hash tables can be based on distinct microblogs or based on distinct users. For example, if a certain cell has 80 microblogs from the iOS operating system posted by 40 users, 60 microblogs from the Android operating system posted by 30 users, and 40 microblogs from the Windows Mobile operating system posted by 20 users, then the distinct microblog based hash table can contain three pairs of <iOS, 80>, <Android, 60>, and <Windows, 40>, while the distinct user based hash table can contain three pairs of <iOS, 40>, <Android, 30>, and <Windows, 20>. At the end of the computation phase, the pyramid data structure containing the counts based on distinct users or microblogs can be obtained, and stored into the storage module 250.
The storage module 250 can include nonvolatile storage and volatile storage. The nonvolatile storage can be hard disks, flash memory, optical discs, and the like, while volatile storage can be random access memory (RAM) and the like. The storage module 250 can store the data structured created at the data structure generator 240, for example, in a disk, thus that the data structure can be used for visualization operation at the visual interface generator 260.
The visual interface generator 260 can generate a visual report to visualize the content of the data structure stored in the storage module 250. The visual report can comprise multiple visual report interfaces 261. In an example, the report interfaces visualize the content of the pyramid structure on a map-based interface at different zoom levels where each map level corresponds to a level in the pyramid data structure. In addition, the report interfaces are interactive interfaces and a user can change the zoom level to be displayed. Further, the user can change the basis of the counts of the attribute categories between distinct-user-based and distinct-microblog-based.
In an example, the visual report interfaces are generated as web pages based on HTML and the visual interface generator 260 can communicate with the user device using HTTP.
In an embodiment, triggered by the data structure generator 240 after the data structure is created, the visual interface generator 260 can load the data structure from the disk in the storage module 250 to a memory (not shown), such as a random access memory (RAM). Subsequently, the visual interface 260 can read the counts of the attribute categories from different cells in a level of the pyramid data structure and display the counts, for example, in different pie charts overlaid with different regions of a map. Specifically, each zoom level of the map corresponds to a level of the pyramid data structure, and each region in the map corresponds to a cell in the level of the pyramid data structure. In an example, an initial visual report interface can display the counts of attribute categories with a default map zoom level, and subsequently the succeeding visual report interfaces can display according to user's choice of map zoom levels.
In an embodiment, after an initial visual report interface is generated, the visual interface generator 260 can cause the email sender 270 to send an email to the user. The email can include a hyperlink directed to the initial visual report interface. The user can access the visual report by selecting the hyperlink in the email. In another embodiment, the visual interface generator can directly send the initial visual report interface to the user device 210 as a response to the user's request for visual report without sending the email. Subsequently, the visual generator 260 can send succeeding visual report interfaces to the user device 210 according to the user's choice of map zoom levels or other options.
The email sender 270 can be configured to send emails to an email address provided by the user when submitting the request for visualization from the user device 210 to the request receiver 220 as described earlier.
It should be appreciated that one or more components in the system 200 could be combined into a single component providing aggregate functionality. For example, the request receiver 220 and the visual interface generator 260 can be combined as a single front end component interacting with the user device 210 to receive the user request for visual report and supply visual report interfaces to the user.
In addition, different languages can be represented with different colors in the pie charts. The interface 300B can include an attribute category selection function 320 where types of language are listed and can be included or excluded selectively such that the user can compare the aggregates of any combination of the languages.
Further, the zoom level of the map in the interface 300B can be adjusted such that finer granularity aggregate counts of attribute categories can be shown in smaller regions. Specifically, when the user changes the zoom level, a parameter indicative of the zoom level can transmitted to the visual interface generator 260. According to the zoom level, the level in the pyramid data structure can be determined at the visual interface generator 260. Accordingly, for each cell in that determined level, the name of the sub-region and the counts of each attribute category can be obtained. These names and counts are fed to a webpage, which is, for example, written in HTML and uses Google Maps API Web Services and Google Chart libraries, to generate an interface with the changed zoom level. The interface is transmitted to the user device 210 and presented to the user.
Additionally, the interface 300B can include a function 316 through which the user can choose to display the aggregate counts of attribute categories based on distinct users or distinct microblogs. In the
At S410, a request for visual report can be received from a user device at a request receiver. The request for visual report can be generated based on a data selection interface where the user can choose the spatio-temporal ranges of the microblog data and a categorical attribute to be analyzed. The data selection interface can be generated at the request receiver and transmitted to the user device as a response to an initial request from the user device. After receiving the request for visual report, the rest receiver can send the data selection requirement to a database system.
At S420, based on the data selection requirement, selected microblog data can be extracted from the database system. Data stored in the database system can be microblog data, such as Twitter tweets, Facebook comments, and the like. The selected micro data is supplied to a data structure generator.
At S430, a data structure can be created at the data structure generator. In an embodiment, the data structure is a pyramid data structure. The pyramid data structure can be created through a structuring phase and a computation phase. During the structuring phase, in a recursive manner, a parent cell corresponding to a region is divided into multiple children cells corresponding to a portion of the region until counts of microblogs in each cell are below a capacity threshold. During the computation phase, aggregate counts of each attribute category in each cell are computed based on distinct users or distinct microblogs. The counts can be stored in two hash tables, one for distinct microblogs and the other for distinct users, with attribute categories as keys and counts of microblogs or users of each attribute category as values.
In an example, the created data structure is stored in a hard disk to be used by a visual interface generator.
At S440, visual report interfaces are generated for visualizing the microblog data. In an example of the visual report interface, the counts of attribute categories of selected microblog data in different regions are displayed using pie charts overlaid with a map. The visual interface generator can interactively generate interfaces according to requests received from the user device. For example, based on user requests, the interfaces can display the map at different zoom levels, display different combination of attribute categories, and display the aggregate counts based on distinct users or distinct microblogs.
At S450, visual interfaces are transmitted to the user device. In an example, an email including a link to the visual report is sent to the user after the visual report interface is generated. When the user clicks the link, the visual report interface is transmitted to the user device. In another example, the visual report interface can be directly transmitted to the user as a response to the original request for visual report from the user.
While for purposes of simplicity of explanation, the process 400 are shown and described as a series of steps, it is to be understood that, in various embodiment, the steps may occur in different orders and/or concurrently with other steps from what is described above. Moreover, not all illustrated steps may be required to implement the process described above.
The system and the process described above can be implemented with any suitable software or hardware. In an embodiment, the system and the process can be implemented as an application program comprised of computer-executable instructions that can be stored in a computer-readable media and can run on one or more computers. In alternative embodiments, the system and the process can be implemented in combination with other programs, such as operating systems and program modules of other applications, or can be implemented as a combination of hardware and software.
In addition, the system and the process described above can be implemented with various suitable computer system configurations, such as single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like.
Further, the system and process described above can also be implemented in distributed computing environments. In a distributed computing environment, program modules can be located in both local and remote memory storage devices, and certain functions can be performed by remote processing devices that are linked to local processing devices through a communications network.
Still further, information such as computer-readable instructions, data structures, program modules or other data can be stored in a variety of computer storage media, such as random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, compact disc read-only memory (CD-ROM), digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage, and the like.
Further, the claimed advancements may be provided as a utility application, background daemon, or component of an operating system, or combination thereof, executing in conjunction with CPU 501 and an operating system such as Microsoft Windows 7, UNIX, Solaris, LINUX, Apple MAC-OS and other systems known to those skilled in the art.
The hardware elements of the computer 500 may be realized by various circuitry elements, known to those skilled in the art. For example, CPU 501 may be a Xenon or Core processor from Intel of America or an Opteron processor from AMD of America, or may be other processor types that would be recognized by one of ordinary skill in the art. Alternatively, the CPU 501 may be implemented on an FPGA, ASIC, PLD or using discrete logic circuits, as one of ordinary skill in the art would recognize. Further, CPU 501 may be implemented as multiple processors cooperatively working in parallel to perform the instructions of the inventive processes described above.
The computer 500 in
The computer 500 further includes a display controller 508, such as a NVIDIA GeForce GTX or Quadro graphics adaptor from NVIDIA Corporation of America for interfacing with display 510, such as a Hewlett Packard HPL2445w LCD monitor. A general purpose I/O interface 512 interfaces with a keyboard and/or mouse 514 as well as a touch screen panel 516 on or separate from display 510. General purpose I/O interface also connects to a variety of peripherals 518 including printers and scanners, such as an OfficeJet or DeskJet from Hewlett Packard.
A sound controller 520 is also provided in the computer 500, such as Sound Blaster X-Fi Titanium from Creative, to interface with speakers/microphone 522 thereby providing sounds and/or music.
The general purpose storage controller 524 connects the storage medium disk 504 with communication bus 526, which may be an ISA, EISA, VESA, PCI, or similar, for interconnecting all of the components of the computer 500. A description of the general features and functionality of the display 510, keyboard and/or mouse 514, as well as the display controller 508, storage controller 524, network controller 506, sound controller 520, and general purpose I/O interface 512 is omitted herein for brevity as these features are known.
The exemplary circuit elements described in the context of the present disclosure may be replaced with other elements and structured differently than the examples provided herein. Moreover, circuitry configured to perform features described herein may be implemented in multiple circuit units (e.g., chips), or the features may be combined in circuitry on a single chip.
In
For example,
Referring again to
The PCI devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. The Hard disk drive 660 and CD-ROM 666 can use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. In one implementation, the I/O bus can include a super I/O (SIO) device.
Further, the hard disk drive (HDD) 660 and optical drive 666 can also be coupled to the SB/ICH 620 through a system bus. In one implementation, a keyboard 670, a mouse 672, a parallel port 678, and a serial port 676 can be connected to the system bust through the I/O bus. Other peripherals and devices that can be connected to the SB/ICH 620 using a mass storage controller such as SATA or PATA, an Ethernet port, an ISA bus, a LPC bridge, SMBus, a DMA controller, and an Audio Codec.
Moreover, the present disclosure is not limited to the specific circuit elements described herein, nor is the present disclosure limited to the specific sizing and classification of these elements. For example, the skilled artisan will appreciate that the circuitry described herein may be adapted based on changes on battery sizing and chemistry, or based on the requirements of the intended back-up load to be powered.
The functions and features described herein may also be executed by various distributed components of a system. For example, one or more processors may execute these system functions, wherein the processors are distributed across multiple components communicating in a network. The distributed components may include one or more client and server machines, which may share processing, in addition to various human interface and communication devices (e.g., display monitors, smart phones, tablets, personal digital assistants (PDAs)). The network may be a private network, such as a LAN or WAN, or may be a public network, such as the Internet. Input to the system may be received via direct user input and received remotely either in real-time or as a batch process. Additionally, some implementations may be performed on modules or hardware not identical to those described. Accordingly, other implementations are within the scope that may be claimed.
The above-described hardware description is a non-limiting example of corresponding structure for performing the functionality described herein.
The hardware description above, exemplified by any one of the structure examples shown in
While aspects of the present disclosure have been described in conjunction with the specific embodiments thereof that are proposed as examples, alternatives, modifications, and variations to the examples may be made. Accordingly, embodiments as set forth herein are intended to be illustrative and not limiting. There are changes that may be made without departing from the scope of the claims set forth below.