Processing of vast quantities of data, or so-called big data, to glean valuable insight involves first transforming data. Data is transformed into a useable form for publication or consumption by business intelligence endpoints, such as a dashboard, by creating, scheduling, and executing of one or more jobs. In this context, a job is a unit of work over a data comprising one or more transformation operations. Typically, jobs are manually coded by data developers, data architects, business intelligence architects, or the like.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure pertains to job creation and reuse. Jobs can be created based on saved jobs, or portions thereof, within a visual authoring environment. In particular, a new job of a selected type can be added to a diagrammatic workspace. Subsequently, presentation of a mechanism configured to enable selection of a saved job can be triggered. Upon selection, a visual representation of the selected saved job can be added to the workspace including a representation of a job comprising one or more transformation operations and optionally one or more input data sources and an output data source. Furthermore, data sources associated with a saved job can be can be added to a data source designated portion of the environment for subsequent use.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
Details below generally pertain to job creation and reuse. Rather than authoring a job from scratch, a job can be created based on saved job within a visual authoring environment. A new job can be added to diagrammatic workspace in response to a user request. The new job can be devoid of an implementation, or more specifically, the new job lacks transformation operations. Further, the new job can be of a particular type of job. In one instance, a plurality of job types can be presented for selection in conjunction with creating a new job. In accordance with one aspect, upon selection of the new job, a previously created and saved job can be identified, loaded, and subsequently laid out as a diagram in the workspace. For instance, a dialog box can be presented that enables a user to locate and select a saved job. Upon selection, the visual representation of the new job can be replaced with a visualization of the saved job including one or more transformation operations, and optionally one or more input and result data sources. Visual representations of the one or more input data sources of the saved job can also be presented in a data source area to enable selection and utilization with respect to authoring other jobs.
In accordance with one embodiment, a saved job can correspond to a template or the like, wherein solely a portion of a job is implemented. Such a job can be configured utilizing a code editor to specify additional code manually that completes the job. Additionally or alternatively, the additional code can be generated automatically based on interaction with visualizations representing transformation operations. Furthermore, even if the entire job is implemented the same techniques can be used to alter the job, if desired.
Of course, job creation is not limited to using previously saved jobs. In particular, a new job can be authored manually with a code editor, automatically based on interactions with visualizations representing transformation operations, or a combination of manually and automatically. Further, the jobs can be created outside the disclosed visual authoring environment. In one instance, these jobs or portions thereof can subsequently be saved for later use by the creator or others in an organization, for example. In other words, a toolbox of user created and saved jobs can be built to facilitate later job authoring by way of reuse.
Various aspects of the subject disclosure are now described in more detail with reference to the annexed drawings, wherein like numerals generally refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
The source component 120 is configured to produce a visual representation of data sources available for use in job creation. Arbitrary data sources can be acquired and made available by the source component 120 including on-premises data sources and cloud-based data sources of substantially any format (e.g., table, stream, file . . . ) and structure (e.g., structured, unstructured, semi-structured). In other words, the data sources are heterogeneous. The source component 120 can visualize data sources or datasets available to an individual or organization. Data sources can made available by search and import functionality provided by the source component 120. Additionally, the source component 120 can be configured to monitor user or entity accounts or the like and make accessible data sources available automatically. Data sources rendered by the source component 120 are interactive and can be used as input for one or more jobs. For example with a gesture, such as drag-and-drop, a data source from a source area can be added to a workspace.
The target component 130 is configured to provide a visual location to display final data sources after all transformations have been applied. These data sources can subsequently be published or consumed by an application, such as an analytics application. A result of a job, or series of jobs, can be dragged from the workspace and dropped in a target visualization area, for example.
The authoring component 140 is configured to enable visual authoring of jobs comprising one or more transformation operations and pipelines comprising a one or more input datasets, a job, and an output dataset. In particular, the authoring component 140 can interact with at least the source component 120 and the workspace component 110 to facilitate job construction in conjunction with a diagram in a workspace from available data sources.
Turning attention to
The initiation component 210 is configured to facilitate initiation of job authoring. In particular, the initiation component 210 can provide visual and interactive mechanisms to aid a user in generating a new job. For example, a new job operation can be visually presented in a toolbar. Upon selection or otherwise activating the new job operation, a list of a plurality of different job types can be presented. A job types can provide particular types of operations (e.g., map-reduce, machine learning, query, extract-transform-load . . . ) over specific types of data sources (e.g., tables, streams, unstructured . . . ). Examples of job types are Hive, Pig, SQL Server Integration Services (SSIS), machine learning, query, and custom. Further, a job type can correspond to a specific programming language (e.g., M-Script). A user can subsequently select one of the different job types to create. In accordance with one interaction, a user can select a job type from amongst the plurality and add it to the workspace, for example by dragging and dropping the job type onto the workspace. Regardless of gesture, selection of a job type can result in generation of a new job and visual representation thereof (e.g., node) on the workspace. The new job can be a shell job devoid of transformation operations. However, the new job need not be an empty container, but rather can include standard or boilerplate code, for example associated with all jobs or a particular job type.
The code component 220 is configured to provide a mechanism to code transformation operations manually. In particular, the code component 220 can present a code editor that allows a user to specify transformation operations in a particular programming language, such as a scripting language. When finished, a user can commit the operations resulting in transformation of a new job or shell job to a particular job with one or more transformation operations. Further, the code component 220 can present a code editor in context or, in other words, in situ, with at least a visual workspace, such that user need not move to a different context or window to specify code. For example, the code editor can be presented alongside the workspace.
The code generation component 230 is configured to generate code capturing transformation operations automatically. In accordance with one aspect, data transformation operations of a programming language can be exposed to graphically. In this manner, users can author a job by selecting one or more visual representations of data transformation operations. Upon selection, code that implements the operations can be generated automatically. In accordance with one embodiment, the visual representations of operations can be presented in conjunction with a data preview that displays at least a subset of data associated with a source. Further, upon selecting a data operation the subset of data can be updated to reflect application of the operation. This enables quick sandboxing and experimenting by way of a test environment. Once a user is satisfied with the specified transformation operations, the code generation component 230 can automatically generate the corresponding code or program. Of course, code can be generated after or upon selection of an operation rather than waiting until all operations are specified.
The saved component 240 enables saved jobs to be integrated within the visual authoring environment. A saved job refers to a job or portion of a job that was previously created and saved for subsequent use and reuse. For example, a saved job can correspond to a favorite job or template job. In one instance, a library of saved jobs can be built to enable reuse. For example, an individual user can create and save jobs or job templates for later reuse in a library or toolbox. As another example, jobs or job templates can be saved across an organization with organization users authoring and contributing jobs to an accessible library. Moreover, the saved component 240 is configured to interact with the workspace component 110 to present a saved job thereon. This enables users to quickly and visually see the flow of data through a pipeline of transformations being created. The visualization can also enable users to quickly understand, the input, process, and output associated with a job and allow changes to be made inline, as needed. Further, the saved component 240 is configured to interact with the source component 120 to enable data sources associated with saved jobs to be presented and made available through a source visualization.
The acquisition component 310 is configured to enable acquisition of a saved job. A saved job comprises a set of one or more data transformation operations and optionally one or more input data sources and an output data source. In one instance, a saved job can correspond to, and be termed, a saved pipeline, if the saved job also includes one or more input data sources and one or more output data sources. A saved job is one that was partially or fully authored by a user at a prior time and saved for later use and reuse. The acquisition component 310 provides a mechanism to obtain a job from a stored location. In one instance, the acquisition component 310 can be embodied as a dialog box that is presented upon selecting a representation of a new job or shell job, in a workspace. The dialog box can enable a user to search for and locate a saved job stored locally or remotely. Other mechanisms are also contemplated including search and select functionality with respect to a library of saved jobs, for example, among others.
The diagram component 320 is configured to generate a visual diagram of a saved job including optionally one or more input data sources and an output data source. In one instance, saved job was authored outside the disclosed visual authoring environment. For example, a user could have manually coded a SQL Server Integration Services (SSIS) package, which is a particular type of job that performs an extract, transform, and load process, outside the virtual authoring environment. The diagram component 320 can generate a diagram of the package that can be presented in the workspace. Similar operations can be performed to diagram other types of jobs such as a machine learning job, Hive job, Pig job, and M-Script job, among others. In this manner, jobs of different types can be visualized and employed in conjunction with visually authoring a workflow pipeline over arbitrary data sources.
The capture component 330 is configured to facilitate capturing and saving of data jobs. For instance, the capture component 330 can provide a mechanism for selecting one or more jobs in a workspace and saving the jobs for later use. By way of example, the capture component 330 can provide a mechanism that allows selection of a job and optionally one or more input data sources and an output data source and subsequently initiate a save thereof to a saved job library or the like. In accordance with one aspect, a user can select a job to be saved. Additionally, a job can be selected and saved automatically. Among other things, this can facilitate reuse of recent jobs or common jobs.
Returning to
The update component 260 is configured to update a workspace and a source portion of a visual authoring environment based on changes. For example, after a saved job is acquired and a diagram of the saved job is generated, the workspace can be automatically updated to include the diagram. Further, a new job shell can be replaced or updated with the representation of a specific job. Furthermore, a saved job can optionally include specification of one or more input data sources. In this case, the update component 260 can be configured to add the input data source or otherwise visualize the source with respect to a source panel of an interface, for example. In this manner, the input sources become available for further job authoring.
The aforementioned systems, architectures, environments, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. Communication between systems, components and/or sub-components can be accomplished in accordance with either a push and/or pull model. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, various portions of the disclosed systems above and methods below can include or employ of artificial intelligence, machine learning, or knowledge or rule-based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example, and not limitation, the suggestion component 250 can employ such mechanisms to determine or infer data sources to suggest relevant to one or more selected operations, or other data sources already linked on an operation, among other things.
In view of the exemplary systems described above, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
The subject disclosure supports various products and processes that perform, or are configured to perform, various actions regarding semi-automatic failover. What follows are one or more exemplary methods and systems.
In a computer configured to provide a graphical user interface on a display, a method comprising: presenting on the display a visual representation of an operation configured to add a new job of a select type to a diagrammatic workspace; and presenting on the display a visual representation of the new job, devoid of transformation operations, in the diagrammatic workspace in response to activation of the operation. The method further comprises presenting on the display a dialog box that enables selection of a previously saved job. The method further comprises presenting on the display the dialog box upon selection of the visual representation of the new job. The method further comprises presenting on the display a visual representation of a selected saved job on the workspace. The method further comprises presenting on the display a visual representation of a job comprising one or more data transformation operations, one or more input data sources, and an output data source. The method further comprises presenting on the display a visual representation of one or more input data sources associated with the selected saved job in an area dedicated to available data sources. The method of comprising presenting on the display the visual representation of a selected saved job on the workspace comprises replacing the visual representation of the new job. The method further comprises presenting on the display a menu of job types associated with the new job.
A method comprising: employing at least one processor configured to execute computer-executable instructions stored in a memory to perform the following acts: requesting identification of a saved job; and presenting a visual representation of an identified saved job in a diagrammatic workspace, the visual representation includes a job comprising one or more data transformation operations, zero or more input data sources, and optionally an output data source. The method further comprises presenting a plurality of job types. The method further comprises receiving identification of one of the plurality of job types. The method further comprises presenting a visual representation of a new job of an identified type and devoid of transformation operations in the diagrammatic workspace. The method further comprises presenting a dialog box that enables identification of the saved job upon selection of the visual representation of the new job. The method further comprises replacing the visualization of the new job with the visual representation of an identified saved job on the workspace. The method further comprises presenting a visual representation of one or more input data sources for the identified job in a portion dedicated to data sources.
A system comprising: a processor coupled to a memory, the processor configured to execute the following computer-executable components stored in the memory: a first component configured to initiate acquisition of a saved job in response to addition of a representation of a new job devoid of transformation operations to a diagrammatic workspace; and a second component configured to present a visual representation of the saved job specified in code in the workspace. The system further comprises a third component configured to present a list of job types for selection associated with the new job. The system further comprises a third component configured present a dialog box to enable selection of the saved job. The system further comprising a third component configured to present a visual representation a data source associated with the saved job in a dedicated source area. The system further comprising a third component configured to present suggested data sources based on the saved job.
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner It is to be appreciated a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the terms “component” and “system,” as well as various forms thereof (e.g., components, systems, sub-systems . . . ) are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an instance, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
The conjunction “or” as used in this description and appended claims is intended to mean an inclusive “or” rather than an exclusive “or,” unless otherwise specified or clear from context. In other words, “‘X’ or ‘Y’” is intended to mean any inclusive permutations of “X” and “Y.” For example, if “‘A’ employs ‘X,’” “‘A employs ‘Y,’” or “‘A’ employs both ‘X’ and ‘Y,’” then “‘A’ employs ‘X’ or ‘Y’” is satisfied under any of the foregoing instances.
Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
In order to provide a context for the claimed subject matter,
While the above disclosed system and methods can be described in the general context of computer-executable instructions of a program that runs on one or more computers, those skilled in the art will recognize that aspects can also be implemented in combination with other program modules or the like. Generally, program modules include routines, programs, components, data structures, among other things that perform particular tasks and/or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the above systems and methods can be practiced with various computer system configurations, including single-processor, multi-processor or multi-core processor computer systems, mini-computing devices, mainframe computers, as well as personal computers, hand-held computing devices (e.g., personal digital assistant (PDA), phone, watch . . . ), microprocessor-based or programmable consumer or industrial electronics, and the like. Aspects can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all aspects of the claimed subject matter can be practiced on stand-alone computers. In a distributed computing environment, program modules may be located in one or both of local and remote memory devices.
With reference to
The processor(s) 1620 can be implemented with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. The processor(s) 1620 may also be implemented as a combination of computing devices, for example a combination of a DSP and a microprocessor, a plurality of microprocessors, multi-core processors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In one embodiment, the processor(s) can be a graphics processor.
The computer 1602 can include or otherwise interact with a variety of computer-readable media to facilitate control of the computer 1602 to implement one or more aspects of the claimed subject matter. The computer-readable media can be any available media that can be accessed by the computer 1602 and includes volatile and nonvolatile media, and removable and non-removable media. Computer-readable media can comprise two distinct types, namely computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes storage devices such as memory devices (e.g., random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM) . . . ), magnetic storage devices (e.g., hard disk, floppy disk, cassettes, tape . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), and solid state devices (e.g., solid state drive (SSD), flash memory drive (e.g., card, stick, key drive . . . ) . . . ), or any other like mediums that store, as opposed to transmit or communicate, the desired information accessible by the computer 1602. Accordingly, computer storage media excludes modulated data signals.
Communication media embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
Memory 1630 and mass storage device(s) 1650 are examples of computer-readable storage media. Depending on the exact configuration and type of computing device, memory 1630 may be volatile (e.g., RAM), non-volatile (e.g., ROM, flash memory . . . ) or some combination of the two. By way of example, the basic input/output system (BIOS), including basic routines to transfer information between elements within the computer 1602, such as during start-up, can be stored in nonvolatile memory, while volatile memory can act as external cache memory to facilitate processing by the processor(s) 1620, among other things.
Mass storage device(s) 1650 includes removable/non-removable, volatile/non-volatile computer storage media for storage of large amounts of data relative to the memory 1630. For example, mass storage device(s) 1650 includes, but is not limited to, one or more devices such as a magnetic or optical disk drive, floppy disk drive, flash memory, solid-state drive, or memory stick.
Memory 1630 and mass storage device(s) 1650 can include, or have stored therein, operating system 1660, one or more applications 1662, one or more program modules 1664, and data 1666. The operating system 1660 acts to control and allocate resources of the computer 1602. Applications 1662 include one or both of system and application software and can exploit management of resources by the operating system 1660 through program modules 1664 and data 1666 stored in memory 1630 and/or mass storage device (s) 1650 to perform one or more actions. Accordingly, applications 1662 can turn a general-purpose computer 1602 into a specialized machine in accordance with the logic provided thereby.
All or portions of the claimed subject matter can be implemented using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to realize the disclosed functionality. By way of example and not limitation, visual authoring system 100 or portions thereof, can be, or form part, of an application 1662, and include one or more modules 1664 and data 1666 stored in memory and/or mass storage device(s) 1650 whose functionality can be realized when executed by one or more processor(s) 1620.
In accordance with one particular embodiment, the processor(s) 1620 can correspond to a system on a chip (SOC) or like architecture including, or in other words integrating, both hardware and software on a single integrated circuit substrate. Here, the processor(s) 1620 can include one or more processors as well as memory at least similar to processor(s) 1620 and memory 1630, among other things. Conventional processors include a minimal amount of hardware and software and rely extensively on external hardware and software. By contrast, an SOC implementation of processor is more powerful, as it embeds hardware and software therein that enable particular functionality with minimal or no reliance on external hardware and software. For example, the visual authoring system 100 and/or associated functionality can be embedded within hardware in a SOC architecture.
The computer 1602 also includes one or more interface components 1670 that are communicatively coupled to the system bus 1640 and facilitate interaction with the computer 1602. By way of example, the interface component 1670 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video . . . ) or the like. In one example implementation, the interface component 1670 can be embodied as a user input/output interface to enable a user to enter commands and information into the computer 1602, for instance by way of one or more gestures or voice input, through one or more input devices (e.g., pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer . . . ). In another example implementation, the interface component 1670 can be embodied as an output peripheral interface to supply output to displays (e.g., LCD, LED, plasma . . . ), speakers, printers, and/or other computers, among other things. Still further yet, the interface component 1670 can be embodied as a network interface to enable communication with other computing devices (not shown), such as over a wired or wireless communications link.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.