Web applications have transformed the digital landscape, offering dynamic and personalized user experiences through the Internet. Unlike traditional websites, web applications enable two-way interactions, allowing data exchange with the server and empowering users to engage with server-stored information. From simple message boards to complex e-commerce platforms, web applications have demonstrated their versatility and crucial role in the modern digital era.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
A technique to generate an application program (e.g., a web application) is disclosed herein. In some embodiments, the application program is configured to run in a web browser. In some embodiments, the application is configured to run on an electronic device, such as a computer, smartphone, tablet, etc. The technique includes receiving an input from a client device. The input specifies a schematic of user interface (UI) components of the application program. The input may be an image file that includes a wireframe or sketch design (e.g., .png, .jpg, .jpeg, .tiff, etc.), a prompt, a data file (e.g., JSON file, CSV file), or a document (e.g., text file, word document, etc.). The client device may be a server, a computer, a laptop, a desktop, a tablet, a smartphone, or any other electronic communication device.
A first group of a plurality of models is used to extract information from the input to automatically identify the user interface components for an application program and properties associated with elements in the input. The plurality of models extracts information (properties), such as counts, dimensions, width, height, x-y placement, center coordinates, labels, etc., from the input. In some embodiments, some or all of the plurality of models in the first group sequentially extract information from that input, that is, a second model extracts information from the input after a first model extracts information from the input. The second model may utilize an output from the first model when extracting information from the input. In some embodiments, some or all of the plurality of models in the first group, in parallel, extract information from that input.
A model of the plurality of models may be a machine learning model, a heuristic model, a statistical model, or any other mathematical model. The machine learning model may be a computer vision model, a natural language processing model, a pre-trained box detection model, a pre-trained text extraction model, an image manipulation model for identifying x-y coordinates of the extracted boxes, and/or a combination thereof. In some embodiments, the first group includes a single model that performs the functions of a plurality of models.
Each of the one or more models output one or more key-value pairs. For example, an optical character recognition model is configured to output a coordinate at which text was written. The key-value pair may also indicate whether the text was included in a box, a header, or a footer.
An object-detection model is configured to detect the number of boxes or other shapes in the input and the corresponding coordinates associated with the detected boxes or other shapes. The object-detection model may be configured to determine the width and height associated with a detected box or other shape, the boundary coordinates associated with the detected box or other shape, and a center coordinate associated with the detected box or other shape. The detected box or other shapes corresponds to a user interface component associated with the application program to be generated.
The plurality of models in the first group of models analyze the input from a different perspective and the key-value pairs outputted from the plurality of models are provided to a second group of one or more models. A model of the plurality of models in the second group may be a machine learning model, a heuristic model, a statistical model, or any other mathematical model.
The key-value pairs provided from the plurality of models in the first group are combined by the second group of one or more models to generate an intermediate vector, which is also referred to as a domain specific language (DSL) token. A DSL token refers to the individual units or elements that make up a DSL. The tokens in a DSL are the building blocks of the language and represent the smallest meaningful units. These tokens can include keywords, identifiers, literals, operators, punctuation marks, and other language-specific symbols. Each token has its own syntactic and semantic meaning within the DSL. In some embodiments, a single model is configured to analyze the input from different perspectives and generate the intermediate vector and DSL token that includes the key-value pairs that would have been generated by the plurality of models. The intermediate vector and DSL token encapsulates the crucial information needed for creating the application program in a particular DSL.
The DSL token is compiled into a respective language code for the application program. The DSL may be utilized for a user-selected framework, such as Django, Streamlit, Flask, Gradio, etc. The code may be stored in a string format for easy manipulation and storage. An application program is generated using the generated code. The application program includes some or all of the user interface components specified in the input.
The application program may be generated for a plurality of different industries, such as predictive maintenance, agriculture, construction, consumer goods, education, energy, financial services, food and beverage, healthcare, information technology, insurance, manufacturing, media, pharmaceuticals, retail, telecommunications, etc. The appearance of the application program may differ based on a selected industry, that is, the application program for a first industry may have a different appearance than the application program for a second industry. In some embodiments, the application program is generated using sample data. In some embodiments, the sample data is for a particular industry. For example, an application program for the construction industry is generated using sample construction data. In some embodiments, the application program is generated using actual data, that is, a user may have provided input data that will be used by the application program.
By interpreting an input, such as a hand-drawn sketch and converting it into a functional application program, the disclosed technique provides an innovative and intuitive path toward software design that significantly improves the efficiency of the development process. This framework offers notable advantages to both novice and expert developers, notably in terms of reducing the technical overhead involved in prototyping, thereby enabling a more streamlined creation process. Furthermore, by offering an easy way to transform abstract design ideas into tangible digital outputs, the disclosed technique paves the way for more diverse and creative applications to emerge. The utilization of application program frameworks within this system is another key highlight, allowing Python developers, such as Python developers, to create real-time, interactive application programs with relative ease. The reactive programming model and its comprehensive library of UI components make the creation of diverse and engaging applications possible.
Input 202 may be an image file, a prompt, a data file, a document, etc. For an image file, the input may be in different file formats, such as .png, .jpg, .jpeg, or .tiff. The image file includes a sketch or wireframe design that depicts the UI of an application program.
The wireframes include an arrangement of rectangles or “cards,” (and/or other shapes) each embodying a part of the overall layout associated with the application program. These cards may be labeled to signify the type of card required, and their dimensions (width and height) reflect the size of the corresponding cards within the actual application. The placement of these cards, determined by their corresponding x-y coordinates, indicates their final positioning in the generated application program.
An app generation system may provide a UI in which a user may provide a prompt to generate the UI of the application program. The app generation system may include a large language model configured to understand the user prompt to create the UI components of the application program. In some embodiments, the UI of the application program allows the user to modify, via a prompt, the UI components of the application program after the application program is generated. For example, the prompt may include an instruction to widen or shorten one or more UI components of the application program.
An app generation system may provide a user interface in which a user may provide a data file (e.g., CSV file) that includes the data to be depicted in some or all of the user interface components of the application program. In some embodiments, an image file or other type of input is utilized to generate the user interface components of the application program and a data file is provided to populate the UI components with production data (e.g., real-world data) instead of sample data.
An app generation system may provide a user interface in which a user may provide a document that describes the application to be generated. The app generation system may include a natural language processor to understand the text included in the document and generate the application program based on an output of the natural language processor. In some embodiments, the document includes baseline code from which the application program is generated. The user interface may provide a user interface prompt in which the user describes how the baseline code is to be modified.
Input 202 is provided to models 204a, 204b, . . . , 204n. In some embodiments, input 202 is provided, in parallel, to models 204a, 204b, . . . , 204n. In some embodiments, input 202 is provided sequentially to some or all of models 204a, 204b, . . . , 204n. For example, input 202 is initially provided to model 204a, the output of model 204a and input 202 is provided to model 204b, . . . , the output of model 204n-1 and input 202 is provided to model 204n. Although
Models 204a, 204b, . . . , 204n may include an image processing model configured to remove noise from an image.
Models 204a, 204b, . . . , 204n may include an object detection model configured to identify various rectangular components or other shapes in input 202, such as plot or text cards. The object detection model may load the image using a computer vision library, such as “OpenCV”, convert the image into a grayscale version of the image to simplify the data, and apply an edge detection algorithm, such as “Canny Edge.” The object detection model identifies the edges of shapes and their contours may be found using a method, such as the ‘findCountours’ method, which also helps to establish the relationships between different contours. For example, the object detection model may detect all rectangles in an image by loading it, identifying shape edges, and discovering their contours.
Models 204a, 204b, . . . , 204n may include a nested object detection model configured to identify and handle nested objects (e.g., boxes or other shapes), indicative of layered UI structures like dropdown menus or dialog boxes. The nested object detection model performs its function after one or more boxes or other shapes are detected by the object detection model. The nested object detection model is configured to check the contour hierarchy from previous steps, determining if any of the detected boxes or other shapes are nested within others. If a nested box or other shape is detected, the nested box or other shape is extracted, processed, and its bounding rectangle is drawn on the image. The nested object detection model employs recursion to manage multiple nesting layers. In some embodiments, the nested object detection model determines that a nested box or other shape is also a parent and repeats the above process to find any further nested boxes or other shapes. After processing a contour and its nested boxes or other shapes, the nested object detection model proceeds to a next contour on the same hierarchical level. The nested object detection model adds the coordinates and dimensions to each detected box to a ‘boxes’ list, creating a real-time comprehensive collection of all identified box-like structures, including the nested ones.
Models 204a, 204b, . . . , 204n may include a filter model configured to filter out one or more shapes (e.g., a box) from an image. Not every detected shape (box) is relevant to the application structure as some may represent noise or insignificant sketch details. For example, a box corresponding to a bar graph may include one or more rectangles. The one or more rectangles included in the bar graph may be filtered out. These boxes are filtered out, while sketch lines indicating separators or sections are converted into box-like forms for uniformity in processing. The filter model is configured to remove a box based on its width and height, for example, discarding a box with dimensions below a certain threshold (e.g., a threshold width or a threshold height). In some embodiments, the filter model is configured to filter out line-like boxes, those with one dimension significantly smaller than the other, retaining only boxes with both dimensions exceeding a specific threshold (e.g., threshold width, threshold length, threshold height). In some embodiments, the filter model is configured to eliminate repetitive boxes to remove redundancy, ensuring each box in the output is unique. The filter model may identify nearly identical boxes as boxes having a center coordinate difference that is below a set threshold. These nearly identical boxes likely represent the same sketch element, but appear as separate boxes due to detection variations. These identified boxes are added to a diction of almost identical shapes.
In some embodiments, the input image includes similar or duplicate boxes or other shapes due to minor variations or noise. An algorithm identifies these boxes or other shapes using their characteristics and location, and these duplicates are then omitted from further processing to increase efficiency. The filter model removes duplicate boxes from the list, relying on the duplicates dictionary previously generated. By checking each box or other shape against the keys in the dictionary, it determines whether a box is a duplicate and excludes it from a final list. Unique boxes and other shapes are included in the output. This ensures the final list is free from duplicates, enhancing subsequent process efficiency and accuracy.
Models 204a, 204b, . . . , 204n may include a conversion model configured to redefine a box or other shape. For example, the box or other shape may be represented by its center coordinates and dimensions instead of top-left corner coordinates, width, and height, which aids further processing.
Models 204a, 204b, . . . , 204n may include a text extraction model configured to extract textual information within each detected object. The textual information may include labels, button names, or other information. The textual information is used to identify the function associated with an object and the associated code that needs to be generated. The app generation system may include a library of functions. The text extraction model is configured to determine which function to select from the library of functions based on the identified function. In some embodiments, a large language model is utilized to generate code for a function that is not included in the library of functions. In some embodiments, the text extraction model is configured to extract the textual information using optical character recognition (OCR). In some embodiments, OCR is used to determine that a detected object is a header. In some embodiments, OCR is used to determine that a detected object is a footer. In some embodiments, OCR is used to determine that a detected object corresponds to a UI component. In some embodiments, the text extraction model is configured to extract the textual information using intelligent character recognition (ICR). The text extraction model is configured to open the image using a library of computer vision functions (e.g., OpenCV) and determine the corresponding area for each detected object. In some embodiments, OCR is used to extract the text from the cropped image portions, initially employing an OCR engine, such as Tesseract OCR engine, and a secondary OCR tool if necessary.
The extracted text is stored in a dictionary (with box or shape parameters as keys and associated text as values) and a list of tuples (containing box or shape parameters and related text). These tuples may be sorted based on vertical and horizontal positions, ensuring logical reading order. In summary, the text extraction model processes each object, extracts related text using OCR or ICR, and organizes this information for case of access and manipulation in subsequent step(s).
The output of models 204a, 204b, . . . , 204n is provided to model 214. Model 214 is configured to establish horizontal and vertical zones with the wireframe or sketch design based on the positions of objects and their corresponding text, aiding understanding of the application's layout and structure.
Model 214 is configured to calculate each non-filtered object vertical span or range of y-coordinates. Model 214 is configured to merge overlapping ranges, sorting them based on their lower limit to create non-overlapping horizontal ‘zones.” Model 214 is configured to group non-filtered objects into their respective horizontal zones based on these ranges. Model 214 is configured to compute each non-filtered object's horizontal span, or range of x-coordinates, within the horizontal zones. Model 214 is configured to merge these horizontal ranges, similar to the vertical ranges, to create non-overlapping vertical ‘zones” within each horizontal zone. Model 214 is configured to group the non-filtered objects into their respective vertical zones based on these ranges.
The final output of model 214 is a two-dimensional grid-like structure that logically organizes the non-filtered objects, following a typical reading order. This facilitates correct information interpretation extracted from the image. The established zones are converted into a more accessible structure, such as a dictionary. Each vertical zone may become a key mapping to another dictionary or list based on its content, providing an efficient data access method. Model 214 is configured to sort this dictionary, ensuring zones and ranges maintain the correct reading order. The dictionary keys, generated to preserve the original order, are sorted to keep zones and ranges in their original sequence. Finally, a string representation of these zones is created, serving as a template for the application's structure in the final Python code.
Token generator 216 is configured to generate one or more tokens that encapsulates a range of information generated by models 204a, 204b, . . . , 204n and model 214. For example, a token may include information extracted by one or more models, such as counts (number of boxes and/or other shapes), dimensions, sizes, and locations. The token may include label(s) based on text extracted from the input to signify the purpose or content of each box or shape. Lastly, the zone identification performed by model 214 discerns the overarching layout of the wireframe or sketch design. The token is a vector that is comprised of key-value pairs. For example, the token may have the form of {x, y, w, h, zone, label}. A wireframe or sketch design corresponds to a unique set of DSL entries, which serve as an intermediary language between the sketch and the final application program code. In essence, these DSL entries form the bridge that translates wireframe sketches and sketch designs into fully functional application programs.
Compiler 222 is used to translate the tokens associated with the detected shapes into code, such as python code. This code will form the bases of application program 232. The code is stored in a string format for easy manipulation and storage.
When executed by a processor, the code enables users to create an interactive application program 232. Application program 232 includes some or all of the user interface components specified in input 202. In some embodiments, large language model 242 is configured to interact with application program 232. In some embodiments, large language model 242 is configured to generate insights (e.g., statistical views, continuous variables, histograms, mean, median, modes, etc.) associated with the data used by application program 232. In some embodiments, LLM 242 is configured to modify the code for application program 232 based on the generated insights. For example, the initial code for application program 232 did not include a user interface component to display the mean associated with a particular variable. After the insights are generated, LLM 242 may update the code for application program 232 such that it includes a user interface component to display the mean associated with the particular variable.
At 302, an input is received. The input may be an image file, a prompt, a data file, a document, etc.
At 304, a first group of one or more machine learning models are used to automatically identify user interface components and associated properties specified in the input. The first group may include an image processing model configured to remove noise from an image that is provided at the input. The one or more machine learning models may include an object detection model configured to identify various rectangular components or other shapes in the input. The object-detection model is configured to detect the number of boxes or other shapes in the input and the corresponding coordinates associated with the detected boxes or other shapes. The object-detection model may be configured to determine the width and height associated with a detected box or other shape, the boundary coordinates associated with the detected box or other shape, and a center coordinate associated with the detected box or other shape.
The first group of one or more machine learning models may include a nested object detection model configured to identify and handle nested objects (e.g., boxes and/or other shapes), indicative of layered UI structures like dropdown menus or dialog boxes. The nested object detection model is configured to check the contour hierarchy from previous steps, determining if any of the detected objects are nested within others. If a nested object is detected, the nested object is extracted, processed, and its bounding rectangle is drawn on the image.
The first group of one or more machine learning models may include a filter model configured to filter out one or more shapes (e.g., a box) from an image. For example, a box corresponding to a bar graph may include one or more rectangles. The one or more rectangles included in the bar graph may be filtered out. In some embodiments, the filter model is configured to eliminate repetitive boxes to remove redundancy, ensuring each box in the output is unique. The filter model may identify nearly identical boxes as boxes having a center coordinate difference that is below a set threshold. In some embodiments, the input image includes similar or duplicate boxes due to minor variations or noise. An algorithm identifies these boxes or other shapes using their characteristics and location, and these duplicates are then omitted from further processing to increase efficiency. The filter model removes duplicate boxes from the list, relying on the duplicates dictionary previously generated. By checking each box or other shape against the keys in the dictionary, it determines whether a box is a duplicate and excludes it from a final list. Unique boxes and other shapes are included in the output. This ensures the final list is free from duplicates, enhancing subsequent process efficiency and accuracy.
The first group of one or more machine learning models may include a conversion model configured to redefine a box or other shape. For example, the box or other shape may be represented by its center coordinates and dimensions instead of top-left corner coordinates, width, and height, which aids further processing.
The first group of one or more machine learning models may include a text extraction model configured to extract textual information within each detected box or other shape. The textual information may include labels, button names, or other information. The textual information is used to identify the function associated with a box or other shape and the associated code that needs to be generated. The app generation system may include a library of functions. The text extraction model is configured to determine which function to select from the library of functions based on the identified function.
In some embodiments, the image processing model, the object detection model, the nested object detection model, the filter model, the conversion models, and the text extraction model are separate models. In some embodiments, the image processing model, the object detection model, the nested object detection model, the filter model, the conversion models, and the text extraction model are combined into a single model.
At 306, a second group of one or more machine learning models are used to automatically generate program code implementing an application program code. The second group of one or more machine learning models may include a model to calculate each box or other shape's vertical span or range of y-coordinates. The model is configured to merge overlapping ranges, sorting them based on their lower limit to create non-overlapping horizontal “zones.” The model is configured to group boxes or other shapes into their respective horizontal zones based on these ranges. The model is configured to compute each box or other shape's horizontal span, or range of x-coordinates, within the horizontal zones. The model is configured to merge these horizontal ranges, similar to the vertical ranges, to create non-overlapping vertical ‘zones” within each horizontal zone. The model is configured to group the boxes or other shapes into their respective vertical zones based on these ranges. The final output of the model is a two-dimensional grid-like structure that logically organizes the boxes or other shapes, following a typical reading order. In some embodiments, the above functions are performed by a single machine learning model. In some embodiments, the above functions are performed by a plurality of machine learning models.
At 308, an application program is generated. The application program includes some or all of the user interface components specified in the input.
A token generator is configured to generate one or more tokens that encapsulate a range of information generated by the one or more machine learning models in the first group and the one or more machine learning models in the second group. For example, the token may include information extracted by one or more models, such as counts (number of shapes), dimensions, sizes, and locations. The token may include label(s) based on text extracted from the input to signify the purpose or content of each box or shape. Lastly, the zone identification performed by model 214 discerns the overarching layout of the wireframe or sketch design. The token is a vector that is comprised of key-value pairs. The tokens are mapped in an intermediate vector representation of a DSL token. For example, the token may have the form of {x, y, w, h, zone, label}. The DSL token refers to the individual units or elements that make up a DSL. The tokens in a DSL are the building blocks of the language and represent the smallest meaningful units. These tokens can include keywords, identifiers, literals, operators, punctuation marks, and other language-specific symbols. That is, the form of a token is specific to the DSL and each token has its own syntactic and semantic meaning within the DSL.
A wireframe or sketch design corresponds to a unique set of DSL tokens, which serve as an intermediary language between the sketch and the final application program code. In essence, these DSL entries form the bridge that translates wireframe sketches and sketch designs into fully functional application programs.
A compiler is used to translate the DSL tokens into code, such as Python. When executed by a processor, the code enables users to create an interactive application program.
The boxes may include textual information that indicates a type of UI component that should be placed at that location of the UI. Input 400 indicates that box 402 should be a header UI component, boxes 404a, 404b, 404c, 404d, 404e should be a gauge UI component, boxes 406a, 406b, 406c should be a vertical bar UI component, boxes 408a, 408c should be a radar UI component, box 408b should be a table UI component, and box 410 should be a footer UI component.
Application program 450 includes UI components 452, 454a, 454b, 454c, 454d, 454e, 456a, 456b, 456c, 458a, 458b, 458c, 460. UI components 452, 454a, 454b, 454c, 454d, 454c, 456a, 456b, 456c, 458a, 458b, 458c, 460 correspond to boxes 402, 404b, 404c, 404d, 404c, 406a, 406b, 406c, 408a, 408b, 408c, 410 of input 400, respectfully. The application program system is generated application program 450 based on input 400 utilizing the techniques disclosed herein.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
This application claims priority to U.S. Provisional Patent Application No. 63/534,545 entitled CONVERTING WIREFRAMES AND SKETCH DESIGNS TO A FULLY FUNCTIONAL WEB APPLICATION filed Aug. 24, 2023 which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63534545 | Aug 2023 | US |