The present disclosure is generally directed towards a search engine that is capable of identifying vehicles based on a photograph.
Machine learning (ML) can be applied to various computer vision applications, including object detection and image classification (or “image recognition”). General object detection can be used to locate an object (e.g., a car or a bird) within an image, whereas image classification may involve a relatively fine-grained classification of the image (e.g., a 1969 Beetle, or an American Goldfinch). Convolutional Neural Networks (CNNs) are commonly used for both image classification and object detection. A CNN is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. Generalized object detection may require models that are relatively large and computationally expensive, presenting a challenge for resource-constrained devices such as some smartphones and tablet computers. In contrast, image recognition may use relatively small models and require relatively little processing.
Also, conventional search engines that identify vehicles (e.g., used car websites, car dealership websites, car financing websites, rental car services, parking services) attempt to identify vehicles based on a user input that includes the make (i.e., manufacturer) and model of the car. Often a user may not be privy to the make or model of the car they are looking for, making conventional search engines frustrating and/or impossible to use.
Conventional search engines that identify vehicles using photographs (e.g., police/federal databases, transit polls) often take an image of a license plate and apply optical character recognition to the image in order to obtain the license plate number. The systems then look up the license plate and associated vehicle identification number (VIN) using a database. These systems are limited in that they pose privacy issues, and are only able to pull an exact vehicle. Pulling an exact vehicle may not be useful when a user is trying to locate vehicles similar to the one they photograph (rather than the exact vehicle).
Conventional products that provide comparisons between vehicles may require a user to visit a variety of websites. Conventional products that provide comparisons between vehicles may also require a user to provide answers to a plurality of data fields such as mileage, pricing, customer ratings, body style, etc. before identifying cars and providing comparison information. Often a user may not be privy to the data fields for the car they are looking for, making conventional vehicle comparison products frustrating and/or impossible to use.
According to one aspect of the present disclosure, a system for image-based vehicle identification includes a database, an image processor, and a vehicle search engine. The database includes a plurality of vehicle information. The image processor may apply one or more machine learning models on one or more images received by a user device. In some configurations, the user device includes a camera that obtains one or more images. In some configurations, the user device provides a display having one or more images of a vehicle and information associated with the vehicle through a user interface of the user device. The display may include a first portion provided at a first location of the user interface, and a second portion provided at a second location different from the first location. The user interface provides each of the first portion and the second portion at a single instance (i.e. same time). The vehicle search engine may identify one or more vehicles in the images received from the user device.
In some embodiments, each of the one or more machine learning models identify a plurality of objects in the received images, at least one of the plurality of objects is a vehicle. In some embodiments, the vehicle search engine may identify a plurality of vehicle image coordinates corresponding to the one or more vehicles in the images received from the user device using a Single Shot Detector Inception machine learning model. In some embodiments, the image data processor may generate a detailed vehicle information based on the vehicle information retrieved from the database for each of the identified vehicles. For example, the detailed vehicle information may include at least one of: a mileage information, a pricing information, a vehicle stock information, a location of a vehicle dealer, a color information, one or more customer rating information, and a body style information. In some embodiments, the image data processor may generate an augmented image for each of the identified vehicles by overlaying the detailed vehicle information upon an image of at least one of the identified vehicles. In some embodiments, the user device may display the augmented image for each of the identified vehicles through the user interface of the user device. In some embodiments, the vehicle search engine may identify at least one of: a number of vehicles, a plurality of vehicle image coordinates for each vehicle, and a plurality of dimensions for each vehicle.
In some embodiments, the image data processor identifies a plurality of vehicle image co-ordinates for each identified vehicle; performs a cropping of each of the one or more received images in accordance with the identified vehicle image co-ordinates; generates one or more cropped images from the one or more received images; and stores the generated cropped images of the identified vehicle in the database. In some embodiments, the image data processor performs the cropping of each of the one or more received images based on a scaling of the identified vehicle image co-ordinates in accordance with a plurality of parameters associated with the one or more received images.
Another aspect of the present disclosure is a method for image-based vehicle identification. The method includes receiving one or more images from a user device, extracting one or more parameters corresponding to at least one of the received images, providing the determined one or more parameters as input to one or more machine learning models, obtaining, as an output from the one or more machine learning models, a prediction of one or more vehicle information, each vehicle information corresponding to a vehicle in the obtained one or more images, identifying, from the one or more predicted vehicle information obtained from the one or more machine learning models, one or more vehicles matching the vehicle in the obtained one or more images, and presenting a display with the one or more identified vehicles to the user device. In some configurations, at least one of the one or more machine learning models is a Single Shot Detector Inception machine learning model.
In some embodiments, the method includes for each of the vehicles identified from the one or more predicted vehicle information, generating a detailed vehicle information based on a vehicle information retrieved from a database. In some embodiments, the detailed vehicle information includes at least one of: a mileage information, a pricing information, a vehicle stock information, a location of a vehicle dealer, a color information, one or more customer rating information, and a body style information. In some embodiments, the method further includes, generating an augmented image for each of the identified vehicles by overlaying the detailed vehicle information upon an image of at least one of the one or more identified vehicles. In some embodiments, the method further includes, displaying the augmented image for each of the identified vehicles through an user interface of the user device. In some embodiments, the one or more predicted vehicle information includes at least one of: a number of vehicles, a plurality of vehicle image coordinates for each vehicle, and a plurality of dimensions for each vehicle.
In some embodiments, the method further includes, identifying a plurality of vehicle image co-ordinates for each identified vehicle matching the vehicle in the obtained one or more images, performing a cropping of each of the one or more received images in accordance with the identified vehicle image co-ordinates, generating one or more cropped images from the one or more received images, and storing the generated cropped images of the identified vehicle in a database. In some embodiments, performing the cropping of each of the one or more received images is based on a scaling of the identified vehicle image co-ordinates in accordance with a plurality of parameters associated with the one or more received images. In some embodiments, the Single Shot Detector Inception machine learning model is configured to identify a plurality of vehicle image co-ordinates corresponding to the one or more vehicles in the one or more images received from the user device.
Another aspect of the present disclosure is a non-transitory computer-readable storage medium including instructions executable by a processor. The instructions may comprise: receiving one or more images from a user device; extracting one or more parameters corresponding to at least one of the received images; identifying, based on inputting the extracted one or more parameters to one or more machine learning models, one or more vehicles matching a vehicle in the images received from the user device, at least one of the one or more machine learning models being a Single Shot Detector Inception machine learning model that identifies vehicle image co-ordinates corresponding to the one or more vehicles in the one or more images received from the user device; generating an augmented image for each of the identified vehicles based on overlaying a vehicle information upon an image of at least one of the one or more identified vehicles; and transmitting the augmented image to the user device for display.
The drawings are not necessarily to scale, or inclusive of all elements of a system, emphasis instead generally being placed upon illustrating the concepts, structures, and techniques sought to be protected herein.
Described herein are systems and methods for object detection using image classification models. In some embodiments, an image is processed through a single-pass convolutional neural network (CNN) trained for fine-grained image classification. Multi-channel data may be extracted from the last convolution layer of the CNN. The extracted data may be summed over all channels to produce a 2-dimensional matrix referred herein as a “general activation map.” the general activation maps may indicate all the discriminative image regions used by the CNN to identify classes. This map may be upscaled and used to see the “attention” of the model and used to perform general object detection within the image. “Attention” of the model pertains to which segments of the image the model is paying most “attention” to is based on values calculated up through the last convolutional layer that segments the image into a grid (e.g., a 7×7 matrix). The model may give more “attention” to segments of the grid that have higher values, and this corresponds to the model predicting that an object is located within those segments. In some embodiments, object detection is performed in a single-pass of the CNN, along with fine-grained image classification. In some embodiments, a mobile app may use the image classification and object detection information to provide augmented reality (AR) capability.
Some embodiments are described herein by way of example using images of specific objects, such as automobiles. The concepts and structures sought to be protected herein are not limited to any particular type of images.
Referring to
The image ingestion module 102 receives an image 112 as input. The image 112 may be provided in any suitable format, such as Joint Photographic Experts Group (JPEG), Portable Network Graphics (PNG), or Graphics Interchange Format (GIF). In some embodiments, the image ingestion module 102 includes an Application Programming Interface (API) via which users can upload images.
The image ingestion module 102 may receive images having an arbitrary width, height, and number of channels. For example, an image taken with a digital camera may have a width of 640 pixels, a height of 960 pixels, and three (3) channels (red, green, and blue) or one (1) channel (greyscale). The range of pixel values may vary depending on the image format or parameters of a specific image. For example, in some cases, each pixel may have a value between 0 to 255.
The image ingestion module 102 may convert the incoming image 112 into a normalized image data representation. In some embodiments, an image may be represented as C 2-dimensional matrices stacked over each other (one for each channel C), where each of the matrices is a WxH matrix of pixel values. The image ingestion module 102 may resize the image 112 to have dimensions WxH as needed. The values W and H may be determined by the CNN architecture. In one example, W=224 and H=224. The normalized image data may be stored in memory until it has been processed by the CNN 104.
The image data may be sent to an input layer of the CNN 104. In response, the CNN 104 generates one or more classifications for the image at an output layer. The CNN 104 may use a transfer-learned image classification model to perform “fine-grained” classifications.
For example, the CNN may be trained to recognize a particular automobile make, model, and/or year within the image. As another example, the model may be trained to recognize a particular species of bird within the image. In some embodiments, the trained parameters of the CNN 104 may be stored within a non-volatile memory, such as within model database 106. In certain embodiments, the CNN 104 uses an architecture similar to one described in A. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” which is incorporated herein by reference in its entirety.
As will be discussed further below in the context of
The object detection module 108 may generate, as output, information describing the location of an object within the image 112. In some embodiments, the object detection module 108 outputs a bounding box that locates the object within the image 112.
The image augmentation module 110 may augment the original image to generate an augmented image 112′ based on information received from the CNN 104 and the objection detection module 108. In some embodiments, the augmented image 112′ includes the original image 112 overlaid with some content (“content overlay”) 116 that is based on CNN's fine-grained image classification. For example, returning to the car example, the content overlay 116 may include the text “1969 Beetle” if the CNN 104 classifies an image of a car as having model “Beetle” and year “1969.” The object location information received from the object detection module 108 may be used to position the content overlay 116 within the 112′. For example, the content overlay 116 may be positioned along a top edge of a bounding box 118 determined by the object detection module 108. The bounding box 118 is shown in
In some embodiments, the system 100 may be implemented as a mobile app configured to run on a smartphone, tablet, or other mobile device such as user device 600 of
The convolutional layers 202 may be arranged in series as shown, with a first convolutional layer 202a coupled to the input layer, and a last convolutional layer 202d coupled to the GAP layer 208. The layers of the CNN 200 may be implemented using any suitable hardware- or software-based data structures and coupled using any suitable hardware- or software-based signal paths. The CNN 200 may be trained for fine-grained image classification. In particular, each of the convolutional layers 202 along with the GPA 208 and fully connected layer 210 may have associated weights that are adjusted during training such that the output layer 212 accurately classifies images 112 received at the input layer.
Each convolutional layer 202 may include a fixed-size feature map that can be represented as a 3-dimensional matrix having dimensions W′×H'×D', where D′ corresponds to the number of layers (or “depth”) within that feature map. The dimensions of the convolutional layers 202 may be irrespective of the images being classified. For example, the last convolution layer 202 may have width W′=7, height H′=7, and depth D′=1024, regardless of the size of the image 112.
After putting an image 112 through a single pass of a CNN 200, multi-channel data may be extracted from the last convolutional layer 202d. A general activation map 206 may be generated by summing 204 over all the channels of the extracted multi-channel data. For example, if the last convolution layer 202d is structured as a 7×7 matrix with 1024 channels, then the extracted multi-channel data would be a 7×7×1024 matrix and the resulting general activation map 206 would be a 7×7 matrix of values, where each value corresponds to a sum over 1024 channels. In some embodiments, the general activation map 206 is normalized such that each of its values is in the range [0, 1]. The general activation map 206 can be used to determine the location of an object within the image. In some embodiments, the general activation map 206 can be used to determine a bounding box for the object within the image 112.
Referring to
Referring to
The techniques described herein provide approximate object detection to be performed using a CNN that is designed and trained for image classification. In this sense, object detection can be achieved “for free” (i.e., with minimal resources) making it well suited for mobile apps that may be resource constrained.
At block 504, the image data may be provided to an input layer of a convolutional neural network (CNN). The CNN may include the input layer, a plurality of convolutional layers, a fully connected layer, and an output layer, where a first convolutional layer is coupled to the input layer and a last convolutional layer is coupled to the fully connected layer.
At block 506, multi-channel data may be extracted from the last convolutional layer. At block 508, the extracted multi-channel data may be summed over all channels to generate a 2-dimensional general activation map.
At block 510, the general activation map may be used to perform object detection within the image. In some embodiments, each value within the general activation map is compared to a predetermined threshold value. A bounding box may be established around the values that are above the threshold value. The bounding box may approximate the location of an object within the image. In some embodiments, the general activation map may be interpolated to determine a more accurate bounding box. In some embodiments, the general activation map and/or the bounding box may be upscaled based on the dimensions of the image.
Sensors, devices, and subsystems may be coupled to the peripherals interface 606 to facilitate multiple functionalities. For example, a motion sensor 610, a light sensor 612, and a proximity sensor 614 may be coupled to the peripherals interface 606 to facilitate orientation, lighting, and proximity functions. Other sensors 616 may also be connected to the peripherals interface 606, such as a global navigation satellite system (GNSS) (e.g., GPS receiver), a temperature sensor, a biometric sensor, magnetometer, or other sensing device, to facilitate related functionalities.
A camera subsystem 620 and an optical sensor 622, e.g., a charged coupled device (CCD) or a complementary metal-oxide semiconductor (CMOS) optical sensor, may be utilized to facilitate camera functions, such as recording photographs and video clips. The camera subsystem 620 and the optical sensor 622 may be used to collect images of a user to be used during authentication of a user, e.g., by performing facial recognition analysis.
Communication functions may be facilitated through one or more wired and/or wireless communication subsystems 624, which can include radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. For example, the Bluetooth (e.g., Bluetooth low energy (BTLE)) and/or WiFi communications described herein may be handled by wireless communication subsystems 624. The specific design and implementation of the communication subsystems 624 may depend on the communication network(s) over which the user device 600 is intended to operate. For example, the user device 600 may include communication subsystems 624 designed to operate over a GSM network, a GPRS network, an EDGE network, a WiFi or WiMax network, and a BluetoothTM network. For example, the wireless communication subsystems 624 may include hosting protocols such that the device 6 can be configured as a base station for other wireless devices and/or to provide a WiFi service.
An audio subsystem 626 may be coupled to a speaker 628 and a microphone 630 to facilitate voice-enabled functions, such as speaker recognition, voice replication, digital recording, and telephony functions. The audio subsystem 626 may be configured to facilitate processing voice commands, voice printing, and voice authentication, for example.
The I/O subsystem 640 may include a touch-surface controller 642 and/or other input controller(s) 644. The touch-surface controller 642 may be coupled to a touch surface 646. The touch surface 646 and touch-surface controller 642 may, for example, detect contact and movement or break thereof using any of a plurality of touch sensitivity technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with the touch surface 646.
The other input controller(s) 644 may be coupled to other input/control devices 648, such as one or more buttons, rocker switches, thumb-wheel, infrared port, USB port, and/or a pointer device such as a stylus. The one or more buttons (not shown) may include an up/down button for volume control of the speaker 628 and/or the microphone 630.
In some implementations, a pressing of the button for a first duration may disengage a lock of the touch surface 646; and a pressing of the button for a second duration that is longer than the first duration may turn power to the user device 600 on or off. Pressing the button for a third duration may activate a voice control, or voice command, module that enables the user to speak commands into the microphone 630 to cause the device to execute the spoken command. The user may customize a functionality of one or more of the buttons. The touch surface 646 can, for example, also be used to implement virtual or soft buttons and/or a keyboard.
In some implementations, the user device 600 may present recorded audio and/or video files, such as MP3, AAC, and MPEG files. In some implementations, the user device 600 may include the functionality of an MP3 player, such as an iPodTM. The user device 600 may, therefore, include a 36-pin connector and/or 8-pin connector that is compatible with the iPod. Other input/output and control devices may also be used.
The memory interface 602 may be coupled to memory 650. The memory 650 may include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (e.g., NAND, NOR). The memory 650 may store an operating system 652, such as Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks.
The operating system 652 may include instructions for handling basic system services and for performing hardware dependent tasks. In some implementations, the operating system 652 may be a kernel (e.g., UNIX kernel). In some implementations, the operating system 652 may include instructions for performing voice authentication.
The memory 650 may also store communication instructions 654 to facilitate communicating with one or more additional devices, one or more computers and/or one or more servers. The memory 650 may include graphical user interface instructions 656 to facilitate graphic user interface processing; sensor processing instructions 658 to facilitate sensor-related processing and functions; phone instructions 660 to facilitate phone-related processes and functions; electronic messaging instructions 662 to facilitate electronic-messaging related processes and functions; web browsing instructions 664 to facilitate web browsing-related processes and functions; media processing instructions 666 to facilitate media processing-related processes and functions; GNSS/Navigation instructions 668 to facilitate GNSS and navigation-related processes and instructions; and/or camera instructions 670 to facilitate camera-related processes and functions.
The memory 650 may store instructions and data 672 for an augmented reality (AR) app, such as discussed above in conjunction with
Each of the above identified instructions and applications may correspond to a set of instructions for performing one or more functions described herein. These instructions need not be implemented as separate software programs, procedures, or modules. The memory 650 may include additional instructions or fewer instructions. Furthermore, various functions of the user device may be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits.
In some embodiments, processor 604 may perform processing including executing instructions stored in memory 650, and secure processor 605 may perform some processing in a secure environment that may be inaccessible to other components of user device 600. For example, secure processor 605 may include cryptographic algorithms on board, hardware encryption, and physical tamper proofing. Secure processor 605 may be manufactured in secure facilities. Secure processor 605 may encrypt data/challenges from external devices. Secure processor 605 may encrypt entire data packages that may be sent from user device 600 to the network. Secure processor 605 may separate a valid user/external device from a spoofed one, since a hacked or spoofed device may not have the private keys necessary to encrypt/decrypt, hash, or digitally sign data, as described herein.
Embodiments of the present disclosure are directed toward a search engine that is capable of identifying vehicles based on a photograph or image. As described below with reference to
In some embodiments, a user may take an image of one or more vehicles using a user device, and upload the image through a user interface of a server system. The server system may use one or more machine learning modules to identify the number of vehicles in the received image and generate a separate image for each of the vehicles (i.e., extracted vehicle image). The server system may then apply a machine learning module to the extracted vehicle image to identify the vehicle in the extracted vehicle image. This may generate identified vehicle information (e.g., make, model, trim, and year). The server system may then determine detailed vehicle information for each of the identified vehicles. The server system may generate an augmented image for each of the vehicles in the user provided image that includes the extracted vehicle image and identified vehicle information and/or detailed vehicle information. The augmented image(s) may be provided to the user via the user interface for the user device.
The Single Shot Detector (SSD) Inception as used herein is a method for detecting objects in images using a single deep neural network. The SSD inception discretizes output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the single deep neural network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the single deep neural network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. The SSD Inception model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300×300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500×500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model.
The server system 703 may include an image data processor 713 configured to receive and process images received from the user device 705. The server system 703 may also include an image parameter based vehicle search engine 715 that may query a database 707 to retrieve vehicle information 717 for vehicles identified as matching parameters determined by the image data processor 713.
The user device 705 may include a camera 711 capable of obtaining an image of a car. The user device 705 may also include a user interface 709 such as a website, mobile application, or the like. The mobile device 705 may communicate over the network 703 using programs or applications. In one example embodiment, methods of the present disclosure may be carried out by an application running on one or more mobile devices and/or a web browser running on a stationary computing device. In some embodiments the user interface 709 may include a graphical user interface. In some embodiments, the user may have to provide login credentials to access the user interface 709. The database 707 may include one or more data tables, data storage structures and the like.
The network 701 may include, or operate in conjunction with, an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), the Internet, a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks.
Although one computing device (i.e., server system 703, and user device 705) may be shown and/or described, multiple computing devices may be used. Conversely, where multiple computing devices are shown and/or described, a single computing device may be used.
In some embodiments, at step 801, the server system may receive an image from a user via the user device 705 that may include multiple vehicles within the same image. In such an embodiment, the image data processor of step 803 may use a library and or object detection application interface (e.g., TensorFlow®) and a machine learning model (e.g., Single Shot Detector) to identify parameters such as the number of vehicles present in the uploaded picture, the coordinates for each identified vehicle in the image, the dimensions for each identified vehicle in the image, and the like. The image data processor may also crop or resize the obtained image to create separate images for each identified vehicle within the image. The image parameter based vehicle search engine 715 at step 805 may use the identified parameters (e.g., dimensions), a library or object detection application interface (e.g., TensorFlow®) and a machine learning model, to predict the make, model, trim and/or year of a vehicle that matches the identified parameters. The identified vehicle's image, make, model, trim and/or year information may be displayed to the user at step 807. In some embodiments the processes described above may utilize one or more Representational State Transfer (REST) application programming interfaces.
In some embodiments, a user may provide the server system with an image having a plurality of vehicles. In some embodiments, the image may be a photograph taken by the user using a mobile device, cell phone, tablet camera, or the like. In some embodiments, the image may be a stock photograph, an image obtained from the internet, an image from a movie, television show, or the like. The user provided image may be received at the server system. The server system may then apply one or more machine learning algorithms to the image to remove non-vehicle objects from the image. For example, in some embodiments, a Single Shot Detector Inception machine learning algorithm may be used to remove non-vehicle objects from the image. Non-vehicle objects may include, but are not limited to, people, cats, dogs, pets, trees, buildings, signs, and the like.
The one or more machine learning algorithms and related libraries (e.g., Single Shot Detector Inception) may also identify the number of vehicles in the image along with the location of the vehicles within the image. In one embodiment, the machine learning algorithm may be used to generate two coordinates that define two diagonal points of a rectangle that surrounds a vehicle in the image. In some embodiments, one or more coordinates may be provided corresponding to any suitable shape. In some embodiments, the generated coordinates may be represented in a float coordinate system. In some embodiments, the generated coordinates represented in a float coordinate system may be converted to coordinates in a pixel coordinate system corresponding to the user provided image.
In some embodiments, the converted pixel coordinates may be used to extract one or more vehicle images from the user provided image. In some embodiments, the extracted vehicle images may be stored in a database to provide a training data set for machine learning algorithms. In such an embodiment, the extracted vehicle images may be anonymized before storage in the database. In some embodiments, the extracted vehicle images may be stored without anonymization. In some embodiments, vehicle data corresponding to the extracted vehicle images may be stored alongside the extracted vehicle images. Vehicle data may be retrieved using the processes described below.
In some embodiments, each of the extracted vehicle images may be provided to a machine learning algorithm that is configured to identify the vehicle in the extracted vehicle image. For example, the machine learning algorithm may include a TensorFlow® model. The machine learning algorithm may be trained on images and may be configured to generate identified vehicle information including a vehicle's make, model, year, and/or trim when provided with an extracted vehicle image that shows vehicle shape (e.g. headlights, windshield shape, body style, bumper, etc.).
In some embodiments, the identified vehicle information (i.e., make, model, year and/or trim) may be transmitted to another component of the server system that is configured to retrieve detailed vehicle information. The detailed vehicle information may include mileage, pricing, vehicle stock, location of the car dealer, color, customer ratings (of the car and/or dealer), body style, and the like for each of the identified vehicles.
In some embodiments, the identified vehicle information and/or detailed vehicle information may be overlaid upon the corresponding extracted vehicle image to form an augmented image. In some embodiments, the augmented image may be saved on a user's computer device and/or a database communicatively coupled to the server system. In some embodiments the augmented image may be saved in a user profile of a mobile application or website. In some embodiments the augmented image may be generated in real time. For example, the augmented image may be generated with updated detailed vehicle information for a stored extracted vehicle image.
In some embodiments, augmented images for each of the extracted vehicles may be displayed to a user using a user interface. Augmented images may be displayed concurrently, or in series. For example, a user may flip, or scroll thru a collection of augmented images. In some embodiments the augmented images may be provided to the user as an image gallery. In this manner, the described system is able to provide a user with a detailed comparison of the vehicles the user photographed. The described system may be compatible with a website, a mobile application and the like.
The example process may continue as illustrated in
The cropping width and cropping height may be applied to the original image to generate a cropped image 2421. The cropped image may then be sent to a Tensor Flow model to detect the make, model, and/or year range for the vehicle 2423. The detected make, model, and/or year range may be provided to a separate process 1925 (as shown in element C in
The process may continue by taking the original image as input 2517. The cropped coordinates may be added to a list or used to create a new list 2519. If there are additional vehicle coordinates available 2521 the process may continue at element B of
If there are no additional vehicle coordinates available 2521 the process may continue by applying the list of cropping coordinates to the original image to generate a list of new images 2523. The list of new images may be sent to a Tensor Flow Machine Learning model to get the make, model and year list 2525. The make, model and year list may be sent to an application interface to retrieve pricing, location and additional information 2527. The new images with the pricing, location, and additional information may be returned to client (displayed to a user) using an application interface. The user may save the new images to a preferences list and/or wishlist 2529. The process may also save the newly cropped images for further machine learning training 2531.
The steps illustrated by the processes depicted in
It is to be understood that the disclosed subject matter is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The disclosed subject matter is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods, and systems for carrying out the several purposes of the disclosed subject matter. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the disclosed subject matter.
Although the disclosed subject matter has been described and illustrated in the foregoing exemplary embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the disclosed subject matter may be made without departing from the spirit and scope of the disclosed subject matter.
In some examples, each of the user device and the server system may be implemented by a computer system (or a combination of two or more computer systems). Computer systems may include a set of instructions for causing the machine to perform any one or more of the methodologies, processes or functions discussed herein may be executed. In some examples, the machine may be connected (e.g., networked) to other machines as described above. The machine may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be any special-purpose machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine for performing the functions describe herein. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. A computer system may include processing components, memory, data storage components, and communication components which may communicate with each other via a data and control bus. In some embodiments a computer system may also include a display device and/or user interface.
Processing components may include, without being limited to, a microprocessor, a central processing unit, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP) and/or a network processor. Processing components may be configured to execute processing logic for performing the operations described herein. In general, processing components may include any suitable special-purpose processing device specially programmed with processing logic to perform the operations described herein.
Memory may include, for example, without being limited to, at least one of a read-only memory (ROM), a random access memory (RAM), a flash memory, a dynamic RAM (DRAM) and a static RAM (SRAM), storing computer-readable instructions executable by processing components. In general, memory may include any suitable non-transitory computer readable storage medium storing computer-readable instructions executable by processing components for performing the operations described herein. In some embodiments computer systems may include two or more memory devices (e.g., dynamic memory and static memory).
Computer systems may include communication interface devices, for direct communication with other computers (including wired and/or wireless communication), and/or for communication with network 701 (see
In some examples, computer systems may include data storage devices storing instructions (e.g., software) for performing any one or more of the functions described herein. Data storage devices may include any suitable non-transitory computer-readable storage medium, including, without being limited to, solid-state memories, optical media and magnetic media.
In some examples, some or all of the logic for the above-described techniques may be implemented as a computer program or application or as a plug in module or sub component of another application. The described techniques may be varied and are not limited to the examples or descriptions provided. In some examples, applications may be developed for download to mobile communications and computing devices, e.g., laptops, mobile computers, tablet computers, smart phones, etc., being made available for download by the user either directly from the device or through a website.
Moreover, while illustrative embodiments have been described herein, the scope thereof includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations and/or alterations as would be appreciated by those in the art based on the present disclosure. For example, the number and orientation of components shown in the exemplary systems may be modified. Further, with respect to the exemplary methods illustrated in the attached drawings, the order and sequence of steps may be modified, and steps may be added or deleted.
Thus, the foregoing description has been presented for purposes of illustration. It is not exhaustive and is not limiting to the precise forms or embodiments disclosed. Modifications and adaptations will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments.
The claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification, which examples are to be construed as non-exclusive. Further, the steps of the disclosed methods may be modified in any manner, including by reordering steps and/or inserting or deleting steps.
Furthermore, although aspects of the disclosed embodiments are described as being associated with data stored in memory and other tangible computer- readable storage mediums, one skilled in the art will appreciate that these aspects can also be stored on and executed from many types of tangible computer-readable media, such as secondary storage devices, like hard disks, floppy disks, or CD-ROM, or other forms of RAM or ROM. Accordingly, the disclosed embodiments are not limited to the above described examples.
The present disclosure is a continuation-in-part application of and claims the benefit of application Ser. No. 15/915,329 entitled “Object Detection Using Image Classification Models,” filed Mar. 8, 2018. The present disclosure claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application No. 62/640,437 entitled “Photograph Driven Vehicle Identification Engine,” filed Mar. 8, 2018, and U.S. Provisional Application No. 62/641,214 entitled “Photograph Driven Vehicle Identification Engine,” filed Mar. 9, 2018 and hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62640437 | Mar 2018 | US | |
62641214 | Mar 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15915329 | Mar 2018 | US |
Child | 16151280 | US |