This application is related to the following pending patent applications which are assigned to the assignee of the present application:
This disclosure relates generally to the field of data processing systems and more particularly to interaction with information on remote computers.
Robotic process automation (RPA) is the application of technology that allows employees in a company to configure computer software or a “robot” to capture and interpret existing applications for processing a transaction, manipulating data, triggering responses and communicating with other digital systems. Conventional RPA systems employ software robots to interpret the user interface of third party applications and to execute steps identically to a human user. While this approach can be quite useful in enabling RPA, it typically requires human usage of an application which can introduce errors due to mistakes and idiosyncrasies of a particular user.
Methods and systems that detect and define virtual objects in remote screens which do not expose objects are disclosed herein. This permits simple and reliable automation of existing applications. In certain aspects a method for detecting objects from an application program that are displayed on a computer screen is disclosed. An image displayed on the computer screen is captured. The image is analyzed to identify blobs in the image. The identified blobs are filtered to identify a set of actionable objects within the image. Optical character recognition is performed on the image to detect text fields in the image. Each actionable object is linked to a text field positioned closest to a left or top side of the actionable object. The system automatically detects the virtual objects and links each actionable object such as textboxes, buttons, checkboxes, etc. to the nearest label object. Advanced image processing and OCR technologies may advantageously be employed. In other aspects, background noise is removed from the image before analyzing the image to identify blobs in the image. Additionally, filtering the identified blobs may comprise retrieving one or more predefined filtering criteria that cause blobs larger or smaller than predefined sizes to be filtered out. Subsequently, when the application is accessed in a production environment by a bot, interaction with the application may be simplified as the bot retrieves the stored actionable objects and in some instances be able to employ the previously recognized actionable objects.
The disclosed methods and systems permit accurate identification, and hence automation, of applications for which only a screen image may be available to an automation user, as is common where the automation user is located remotely from the system on which the application to be automated is deployed. This permits a level of automation previously only available where underlying objects in the application to be automated are available to the system employed by the automation user. Moreover, the disclosed methods and systems permit automation even where changes in the application or to hardware, such as resolution of computer monitors, can cause changes in the visual image displayed by the application to be automated.
Additional aspects related to the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the description or may be learned by practice of the invention. Aspects of the invention may be realized and attained by means of the elements and combinations of various elements and aspects particularly pointed out in the following detailed description and the appended claims.
It is to be understood that both the foregoing and the following descriptions are exemplary and explanatory only and are not intended to limit the claimed invention or application thereof in any manner whatsoever.
The accompanying drawings, which are incorporated in and constitute a part of this specification exemplify the embodiments of the present invention and, together with the description, serve to explain and illustrate principles of the inventive techniques disclosed herein. Specifically:
In the following detailed description, reference will be made to the accompanying drawings, in which identical functional elements are designated with like numerals. The aforementioned accompanying drawings show by way of illustration, and not by way of limitation, specific embodiments and implementations consistent with principles of the present invention. These implementations are described in sufficient detail to enable those skilled in the art to practice the invention and it is to be understood that other implementations may be utilized and that structural changes and/or substitutions of various elements may be made without departing from the scope and spirit of present invention. The following detailed description is, therefore, not to be construed in a limited sense.
The innovations described herein may find numerous uses and may be particularly advantageous in the context of an RPA system 10, as shown in
In certain environments, the information provided by application 103 may contain sensitive information, the distribution or viewing of which may be subject to various regulatory or other restrictions. Automation controller 106, resident on computer system 110, operates in conjunction with RPA system 10, to interact with computer system 110. The RPA system 10 sends automation commands and queries to the automation controller 106, while respecting the security compliance protocols of computer system 110. In certain embodiments, a compliance boundary 112 may be implemented in connection with remote access module 114. Compliance boundary 112 represents a logical boundary, across which, any transfer of data or other information is controlled by agreements between parties. In certain embodiments, remote access module 114 may operate to prevent RPA user 102 from performing certain tasks on system 110, by way of example and not limitation, copying files, loading cookies, or transmitting data from computer system 110, through or beyond compliance boundary 112 via the internet or via any other output device that would violate the security protocols established by the computer system 110. The remote access module 114 may take the form of remote desktop products available from Citrix or Microsoft, which permit connection to a remote computer, such as computer system 110, to establish a communication link between system 10 and system 110 to permit apps, files, and network resources to be made available from computer system 110 to computer system 10.
The RPA system 10 implements a bot creator that may be used by RPA user 102, to create one or more bots that are used to automate various business processes executed by one or more computer applications such as application 103. The term “bot” as used herein refers to a set of instructions that cause a computing resource to interact with one or more user level computer applications to perform tasks provided by the one or more user level computer applications. Once created, the bot may be employed to perform the tasks as encoded by the instructions to interact with one or more user level computer applications. RPA user 102 may access application 103 remotely and may see the same screen 104 as seen by a user that may be located proximate to the computer system 110. Remote access module 112 may be configured to provide to user 102 only screen images generated by application 103. In such an instance, the bot creator employed by the user 102 will not be able to access any user interface level objects that may be employed by the application 103, such as HTML document models or an accessibility API provided by an operating system.
Conventionally, user 102 may manually identify fields provided by application 103. In the embodiments disclosed herein, RPA system 10 operates to automatically identify fields provided by application 103. A screen capture engine 120 operates conventionally to generate image file 121-1 by capturing an image displayed on computer screen 105 by operation of application 103. As will be appreciated by those skilled in the art, screen capture engine 120 operates by accessing frame buffer 107 employed by computer system 10. The frame buffer 107 contains a bitmap, a frame of data, of the image currently displayed on computer screen 105. The computer system 110 may be separate from or part of smart screen system 10.
The fingerprint generator 119 analyzes the image file 121-1 for various objects, such as automation controls (markers) and their locations. The combination of various objects, object metadata, properties and types, and location on the screen is used to generate a unique set of keys that can together represent a “fingerprint” or signature 121-2 of that screen that assists the automation application to recognize that specific screen, among a set of any other possible screens, as disclosed in pending patent application entitled “System And Method For Resilient Automation Upgrade” filed in the U.S. Patent Office on Aug. 25, 2015 and assigned application Ser. No. 14/834,773, and pending patent application entitled “System and Method for Compliance Based Automation” filed in the U.S. Patent Office on Jan. 6, 2016 and assigned application Ser. No. 14/988,877, which applications are hereby incorporated by reference in their entirety. The signature 121-2 is stored to a signature database corresponding to the application 103.
The image file 121-1, representing image 104, as captured by screen capture engine 120 is shown in further detail in
The image in the image file 121-1 is enhanced by image enhancer 122 to remove background noise that may be contained in the image file 121-1, and the results are stored in file 123. For example, certain screens from which image 104 is captured may contain background colors or watermarks upon which the image 104 is overlayed. Furthermore, a document image may exist with a wide variety of fonts, fonts sizes, background patterns, and lines (e.g., to delineate table columns). Image enhancer 122 in certain embodiments operates conventionally to transform the image file 121-1 from the color space to the background space and to remove background noise from the image file 121-1 to generate image file 123. The image file 123 contains an enhanced version of the image in image file 121-1. This increases the likelihood of correct recognition of automation control elements in the image file 121-1. For example, if the image file 121-1 contains an automation control in the form of a radio button, the conversion from color to black and white and the background noise removal increases the likelihood of recognition of the radio button.
Image file 123 is processed by field detection module 124 to generate, as shown in
The image file 123 is also processed by text detection module 126 to generate a binarized image 125, shown in
As seen in
The blobs recognized by module 124 and text detected by module 126 are linked by smart linking engine 118 by linking each blob to its nearest text field on the left or top side. This may be performed by retrieving common patterns and applying them to link labels to values. Certain embodiments may utilize machine learning to detect patterns, and to do so on an ongoing basis to increase accuracy. In certain embodiments pretraining may be employed to increase accuracy from the outset. By this auto linking capability, the system 10 detects and identifies the same objects even if the location of the text fields changes. Because of the normalization, the same pattern can be recognized at any scale so the system 10 can also identify the correct fields even if the scaling of the screen changes.
The smart screen 129 includes the image+recognized object (labels and values), which are sometimes referred to collectively as automation or actionable objects. Each automation object has associated with it custom automation controls, which can take the form of a drop down, radio button etc. The knowledge of the type of automation control permits the RPA system 10 to identify the type of input required for each automation control.
Some examples of automation objects are text fields, radio buttons, drop downs, and tabular data. In some embodiments, the system 10 may use a combination of image recognition and object recognition in a sequence to improve successful recognition. The representation of the data structures such as image file 121-1, signature 122-2, and the outputs of the various modules such as 125 and 127 may take a variety of forms such as individual files stored within a file system or may be stored within a single file or may be stored and managed by a database or may take other forms.
Computing system 400 may have additional features such as for example, non-transitory storage 410, one or more input devices 414, one or more output devices 412, and one or more communication connections 416. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system 400. Typically, operating system software (not shown) provides an operating system for other software executing in the computing system 400, and coordinates activities of the components of the computing system 400.
The non-transitory storage 410 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information in a non-transitory way and which can be accessed within the computing system 400. The storage 410 stores instructions for the software implementing one or more innovations described herein.
The input device(s) 414 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing system 400. For video encoding, the input device(s) 414 may be a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video samples into the computing system 400. The output device(s) 412 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system 400.
The communication connection(s) 416 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can use an electrical, optical, RF, or other carrier.
The innovations can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing system on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.
The terms “system” and “computing device” are used interchangeably herein. Unless the context clearly indicates otherwise, neither term implies any limitation on a type of computing system or computing device. In general, a computing system or computing device can be local or distributed and can include any combination of special-purpose hardware and/or general-purpose hardware with software implementing the functionality described herein.
While the invention has been described in connection with a preferred embodiment, it is not intended to limit the scope of the invention to the particular form set forth, but on the contrary, it is intended to cover such alternatives, modifications, and equivalents as may be within the spirit and scope of the invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5949999 | Song et al. | Sep 1999 | A |
5983001 | Boughner et al. | Nov 1999 | A |
6009192 | Klassen et al. | Dec 1999 | A |
6009198 | Syeda-Mahmood | Dec 1999 | A |
6133917 | Feigner et al. | Oct 2000 | A |
6389592 | Ayres et al. | May 2002 | B1 |
6427234 | Chambers et al. | Jul 2002 | B1 |
6473794 | Guheen et al. | Oct 2002 | B1 |
6496979 | Chen et al. | Dec 2002 | B1 |
6640244 | Bowman-Amuah | Oct 2003 | B1 |
6704873 | Underwood | Mar 2004 | B1 |
6898764 | Kemp | May 2005 | B2 |
7091898 | Arling et al. | Aug 2006 | B2 |
7246128 | Jordahl | Jul 2007 | B2 |
7441007 | Kirkpatrick et al. | Oct 2008 | B1 |
7533096 | Rice et al. | May 2009 | B2 |
7568109 | Powell, Jr. et al. | Jul 2009 | B2 |
7765525 | Davidson et al. | Jul 2010 | B1 |
7805317 | Khan et al. | Sep 2010 | B2 |
7805710 | North | Sep 2010 | B2 |
7810070 | Nasuti et al. | Oct 2010 | B2 |
7846023 | Evans et al. | Dec 2010 | B2 |
8028269 | Bhatia et al. | Sep 2011 | B2 |
8056092 | Allen et al. | Nov 2011 | B2 |
8095910 | Nathan et al. | Jan 2012 | B2 |
8132156 | Malcolm | Mar 2012 | B2 |
8234622 | Meijer et al. | Jul 2012 | B2 |
8438558 | Adams | May 2013 | B1 |
8443291 | Ku et al. | May 2013 | B2 |
8464240 | Fritsch et al. | Jun 2013 | B2 |
8498473 | Chong et al. | Jul 2013 | B2 |
8504803 | Shukla | Aug 2013 | B2 |
8510668 | Raskin | Aug 2013 | B1 |
8682083 | Kumar et al. | Mar 2014 | B2 |
8724907 | Sampson et al. | May 2014 | B1 |
8769482 | Batey et al. | Jul 2014 | B2 |
8965905 | Ashmore et al. | Feb 2015 | B2 |
9213625 | Schrage | Dec 2015 | B1 |
9278284 | Ruppert et al. | Mar 2016 | B2 |
9444844 | Edery et al. | Sep 2016 | B2 |
9462042 | Shukla et al. | Oct 2016 | B2 |
9483389 | Shen | Nov 2016 | B2 |
9864933 | Cosic | Jan 2018 | B1 |
10025773 | Bordawekar et al. | Jul 2018 | B2 |
10078743 | Baldi et al. | Sep 2018 | B1 |
10204092 | Venkataraman | Feb 2019 | B2 |
10318411 | Falkenberg | Jun 2019 | B2 |
10452974 | Cosic | Oct 2019 | B1 |
20020002932 | Winterowd | Jan 2002 | A1 |
20030033590 | Leherbauer | Feb 2003 | A1 |
20030101245 | Srinivasan et al. | May 2003 | A1 |
20030159089 | Dijoseph | Aug 2003 | A1 |
20040083472 | Rao et al. | Apr 2004 | A1 |
20040158455 | Spivack et al. | Aug 2004 | A1 |
20040172526 | Tann et al. | Sep 2004 | A1 |
20040210885 | Wang et al. | Oct 2004 | A1 |
20040243994 | Nasu | Dec 2004 | A1 |
20050002566 | Federico et al. | Jan 2005 | A1 |
20050144462 | LaGarde | Jun 2005 | A1 |
20050188357 | Derks et al. | Aug 2005 | A1 |
20050204343 | Kisamore et al. | Sep 2005 | A1 |
20050257214 | Moshir et al. | Nov 2005 | A1 |
20060045337 | Shilman et al. | Mar 2006 | A1 |
20060070026 | Balinsky et al. | Mar 2006 | A1 |
20060095276 | Axelrod et al. | May 2006 | A1 |
20060150188 | Roman et al. | Jul 2006 | A1 |
20070101291 | Forstall et al. | May 2007 | A1 |
20070112574 | Greene | May 2007 | A1 |
20080005086 | Moore | Jan 2008 | A1 |
20080028392 | Chen et al. | Jan 2008 | A1 |
20080104032 | Sarkar | May 2008 | A1 |
20080209392 | Able et al. | Aug 2008 | A1 |
20080222454 | Kelso | Sep 2008 | A1 |
20080263024 | Landschaft et al. | Oct 2008 | A1 |
20090037509 | Parekh et al. | Feb 2009 | A1 |
20090103769 | Milov | Apr 2009 | A1 |
20090113449 | Balfe | Apr 2009 | A1 |
20090172814 | Khosravi et al. | Jul 2009 | A1 |
20090199160 | Vaitheeswaran et al. | Aug 2009 | A1 |
20090217309 | Grechanik et al. | Aug 2009 | A1 |
20090249297 | Doshi et al. | Oct 2009 | A1 |
20090313229 | Fellenstein et al. | Dec 2009 | A1 |
20090320002 | Peri-Glass et al. | Dec 2009 | A1 |
20100013848 | Hekstra et al. | Jan 2010 | A1 |
20100023602 | Martone | Jan 2010 | A1 |
20100023933 | Bryant et al. | Jan 2010 | A1 |
20100100605 | Allen et al. | Apr 2010 | A1 |
20100235433 | Ansari et al. | Sep 2010 | A1 |
20110022578 | Fotev | Jan 2011 | A1 |
20110145807 | Molinie et al. | Jun 2011 | A1 |
20110197121 | Kletter | Aug 2011 | A1 |
20110276568 | Fotev | Nov 2011 | A1 |
20110276946 | Pletter | Nov 2011 | A1 |
20110302570 | Kurimilla et al. | Dec 2011 | A1 |
20120042281 | Green | Feb 2012 | A1 |
20120076415 | Kahn | Mar 2012 | A1 |
20120124062 | Macbeth et al. | May 2012 | A1 |
20120154633 | Rodriguez | Jun 2012 | A1 |
20120330940 | Caire et al. | Dec 2012 | A1 |
20130173648 | Tan et al. | Jul 2013 | A1 |
20130290318 | Shapira et al. | Oct 2013 | A1 |
20140181705 | Hey et al. | Jun 2014 | A1 |
20140282013 | Amijee | Sep 2014 | A1 |
20150082280 | Betak et al. | Mar 2015 | A1 |
20150306763 | Meier | Oct 2015 | A1 |
20150339213 | Lee | Nov 2015 | A1 |
20150347284 | Hey et al. | Dec 2015 | A1 |
20150370688 | Zhang | Dec 2015 | A1 |
20150378876 | Ji | Dec 2015 | A1 |
20160019049 | Kakhandiki et al. | Jan 2016 | A1 |
20160078368 | Kakhandiki et al. | Mar 2016 | A1 |
20160101358 | Ibrahim | Apr 2016 | A1 |
20160119285 | Kakhandiki et al. | Apr 2016 | A1 |
20170060108 | Kakhandiki et al. | Mar 2017 | A1 |
20170154230 | Dow et al. | Jun 2017 | A1 |
20170213130 | Khatri et al. | Jul 2017 | A1 |
20170330076 | Valpola | Nov 2017 | A1 |
20180004823 | Kakhandiki et al. | Jan 2018 | A1 |
20180157386 | Su | Jun 2018 | A1 |
20180173698 | Dubey et al. | Jun 2018 | A1 |
20180196738 | Murata | Jul 2018 | A1 |
20180232888 | Thevenet | Aug 2018 | A1 |
20180285634 | Varadarajan | Oct 2018 | A1 |
20190236411 | Zhu | Aug 2019 | A1 |
Entry |
---|
B. P. Kasper “Remote: A Means of Remotely Controlling and Storing Data from a HAL Quadrupole Gas Analyzer Using an IBM-PC Compatible Computer” , Nov. 15, 1995, Space and Environment Technology Center. |
Zhifang et al., Test automation on mobile device, May 2010, 7 pages. |
Hu et al., Automating GUI testing for Android applications, May 2011, 7 pages. |
Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller, Sikuli: Using GUI Screenshots for Search and Automation, Oct. 4-7, 2009, 10 pages. |
Bergen et al., RPC automation: making legacy code relevant, May 2013, 6 pages. |
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems (2012). |
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv:1704.04861v1 (2017). |
Chollet, Francois et al., Keras: The Python Deep Learning Library, GitHub, https://github/keras-team/keras, 2015. |
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9 (2015). |
J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679-698, Nov. 1986. |
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun, R-fcn: Object detection via region-based fully convolutional networks, Advances in neural information processing systems, 2016. |
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016. |
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition (2016). |
Koen EA Van De Sande, Jasper RR Uijlings, Theo Gevers, and Arnold WM Smeulders, Segmentation as Selective search for Object Recognition, International journal of computer vision, 104(2) (2013). |
M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, Support vector machines, IEEE Intelligent Systems and their Applications, 13(4):18-28, Jul. 1998. |
Meng Qu, Xiang Ren, and Jiawei Han, “Automatic Synonym Discovery with Knowledge Bases”. In Proceedings of KDD'I 7, Aug. 13-17, 2017, Halifax, NS, Canada, DOI: 10.1145/3097983.3098185(Year: 2017). |
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu, “TextBoxes: A Fast Text Detector with a Single Deep Neural Network”, arXiv:1611.06779v1[cs.CV] Nov. 21, 2016 (Year: 2016). |
Navneet Dalal and Bill Triggs, Histograms of oriented gradients for human detection, International Conference on computer vision & Pattern Recognition (CVPR'05), vol. 1, pp. 886-893. IEEE Computer Society, 2005. |
Ross Girshick, Fast r-cnn, Proceedings of the IEEE international conference on computer vision (2015). |
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition (2014). |
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, 2015. |
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick, Microsoft coco: Common objects in context, European conference on computer vision (2014). |
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár, Focal loss for dense object detection, Proceedings of the IEEE international conference on computer vision (2017). |
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg, Ssd: Single shot multibox detector, European conference on computer vision, pp. 21-37. Springer, 2016. |
X. Yang, E. Yumer, P. Asente, M. Kraley, D. Kifer and C. L. Giles, “Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks,” 2017 IEEE CVPR, Honolulu, HI, 2017, pp. 4342-4351. doi: 10.1109/CVPR.2017.462 (Year: 2017). |