The present disclosure relates generally to processing and analysis of a captured image.
After photos are captured, photographers often perform post-processing of the photos to adjust for various lighting conditions present at the time the image was captured or otherwise acquired. Based on the image content and desired effect, photographers, through photo editing software, often adjust parameters that affect how the image is rendered when displayed on a screen and/or in print.
Currently, there are lighting condition presets than can be defined or purchased that apply a set of image adjustments that are appropriate for various lighting conditions. The user must identify the lighting condition to achieve the stylistic effect, and then apply the corresponding preset adjustments to the photo.
Existing pre-trained classifiers are centered on image content (e.g. identification of people, animals, emotions, etc.). Other work focuses on scene recognition (e.g. Grand Canyon, Eiffel Tower, . . . ) There is some work around style transfer where the goal is to transform an image of one style or type into an image of another style. While most of these operations as object based, a need exists to focus on conditions at the time of image capture. A system and method described herein remedies the above noted drawback.
According to one aspect of the disclosure, applications and/or datasets that focus on the identification of lighting conditions with the focus on photo post-capture processing are described. In one embodiment, an apparatus and method for automatically adjusting a collection of images based on their lighting conditions is provided. The apparatus and method obtains one or more images, determines a first lighting condition scores for each lighting condition and for each of the one or more images using a trained prediction model, and labels the each of the one or more images based on the determined first lighting condition scores.
According to another embodiment and apparatus and method are provided that obtains meta-data associated with each of the one or more images, identify sequences of images in the one or more images, generate lighting condition predictions based on a sequence analysis of the first lighting condition scores and the sequences of images and respective associated image meta-data; and labels the each or one or more image based on the lighting condition scores.
The apparatus obtains one or more images, and makes a first lighting condition prediction for each of the one or more images using a trained prediction model and labeling the each of the one or more images based on the predicted first lighting condition.
In another embodiment, the apparatus obtains an associated meta-data of each of the one or more images, identifies sequences of images in the one or more images, and modifies the lighting condition predictions based on a time-series analysis of the first predicted labels and the sequences of images and respective associated image meta-data.
These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be noted that the following exemplary embodiment is merely one example for implementing the present disclosure and can be appropriately modified or changed depending on individual constructions and various conditions of apparatuses to which the present disclosure is applied. Thus, the present disclosure is in no way limited to the following exemplary embodiment and, according to the Figures and embodiments described below, embodiments described can be applied/performed in situations other than the situations described below as examples. Further, where more than one embodiment is described, each embodiment can be combined with one another unless explicitly stated otherwise. This includes the ability to substitute various steps and functionality between embodiments as one skilled in the art would see fit.
The process of identifying lighting conditions for hundreds or thousands of photos can take a significant amount of time and human efforts. According to this disclosure, a system that automates the process of identifying appropriate lighting conditions and automatically applying the appropriate image processing presets is provided.
In some embodiments the predictions from the classifier predictor are not the final predictions given to an image. In some cases, the predictions may be adjusted further based on an analysis of a sequence of photos captured. In other words, a photo's lighting condition prediction may be improved if the predictions from previous and subsequent photos are taken into consideration. Furthermore, the strength of the consideration for previous and subsequent photo predictions can also consider the amount of time that has elapsed between the photos.
In order to consider the processing of the sequences of photos some embodiments estimate the transition probabilities from one lighting condition to another for a given period of time.
In
When photographs are processed, they may be from one or more sessions with one or more photographers.
In
Flow then continues back to block B520 where it is determined whether there are more images to consider to be assigned to their capture device configuration list. If it is determined that all images have been considered then flow moves to block B560. In block B560 the process starts to iterate through all of the identified capture device configurations. If there are capture device configurations left to process, flow continues to block B570 where all images in the capture device configuration image list are sorted by their capture timestamp. Capture timestamps may come from timestamps on the file, or through EXIF data or other meta-data associated with the file, for example. If a timestamp is not available for a particular image, a sentinel value is assigned to the timestamp to identify that the image is not associated with a time. Sometimes a timestamp representing a specific date in the past is used to indicate an image without a timestamp. Finally the images for the capture device configuration are sorted by timestamp from oldest to newest. Flow then continues to block B560 where it is determined whether there is another capture device configuration to consider. Finally in block B560 when all configurations have been processed and all of the configuration images have been sorted into chronological order, the flow continues to block B595 where the processing ends. In some embodiments this last processing loop sorts the images in the “unknown” configuration, but other embodiments don't sort the “unknown” configuration images.
Returning to
One example embodiment of the estimation of transition probabilities is outlined in
In the case of the stochastic matrix, the matrix can be initialized with random numbers drawn from a uniform distribution between 0 and 1. Then the matrix rows (or alternatively columns) can be renormalized so that the rows (or columns) sum to one. The resulting matrix is a valid stochastic matrix. Some embodiments recognize that the final transition matrix will typically have a strong diagonal component. In other words, the diagonal entries in the stochastic matrix tend to be close to 1 and the off-diagonal elements tend to be close to zero. Thus in these embodiments, the matrix is first created by creating a matrix with uniform random entries between 0 and 1 and then adding a constant to the diagonal. Then normalization of the rows (or columns) is performed. For example if the value of 99 was added to the diagonal, the resulting matrix after normalization would be close to 0.99 on the diagonal entries and close to 0 on the off-diagonal entries.
In the case of a double stochastic matrix a similar approach may be followed. However this embodiment carries the modified normalization step of normalizing both rows and columns. One embodiment for normalizing a matrix to take on a double stochastic matrix form is to normalize the rows, then columns (or columns and then rows) and then repeated performing the row column normalizations until the matrix converges to a double stochastic matrix with rows and columns that both sum to 1.
These embodiments for matrix normalization can be used again in subsequent steps in this workflow.
Next the flow continues to block B620 where an iteration loop begins for a burn-in period. The exit criteria of the burn-in period may be defined as a fixed number of steps or based on a score relating to the convergence of the transition matrix estimate. If the loop has not reached its exit criteria then flow continues to block B625 where a proposal for a next transition matrix is made based on a random walk step taken from the current estimate. The proposal is based on a step generated by modifying the current transition matrix estimate through a random perturbation. The random perturbation may be generated by adding a zero mean normal random number with standard deviation of σ to each entry of the transition matrix. Resulting values that are below some small threshold ϵ or above 1−ϵ are rejected and re-sampled so that the effective resulting distribution added to each value is a truncated Gaussian. Once the matrix element perturbations are all found to be acceptable, the matrix is renormalized as was described previously in the discussion of block B615. The result is a valid stochastic or double stochastic matrix that has deviated slightly from the previous matrix.
The value of the standard deviation of the random normal perturbation, σ, may be defined as a iterative process (e.g. controlled by block B620 or B650), or may be defined dynamically based on certain acceptance criteria to ensure good progress in the random walk process (e.g. to ensure an approximate acceptance rate of blocks B625 and B670). In one embodiment the value of σ is given by σ=log(j+2), where j is the step number in the iteration controlled by B620 or B650. In other embodiments the value of a is decreased by a factor of F whenever a moving average of the acceptance rate of the proposal falls below a specified threshold. Of course many other strategies are possible.
Next the flow continues to block B630 where the transition matrix proposal is evaluated. Some embodiments score the transition matrix against sequences of observed data. For example,
When calculating the log-likelihood some embodiments use the time interval between images to compute the probability of transitioning from a first label to a second label. If the transition matrix is represented by the matrix T and represents the transition probabilities in a single unit of time, then Tn is the matrix representing the transition probability in n units of time, where Tn is T to the n-th power (e.g. T3=T·T·T). Since many cameras provide timestamps in their EXIF data in units of 1 second resolution, some embodiments calculate T as the 1-second transition probabilities. If two consecutives images have a time difference of 3 seconds then the transition probability from label j to label k is the j-th row and k-th column of the matrix T3. In some cases, images from the same camera may have the same (valid) timestamp when taken in rapid succession such as a burst mode. Some embodiments treat the time between subsequent images to be the minimum of the actual time difference in seconds and 1 second (assuming the timestamp is valid, e.g. not missing and represented as a sentinel value). Some embodiments use different units of base time.
Some embodiments consider additional features to determine the “closeness” of two images in a sequence. In the above embodiment, the time difference determined the power of the transition matrix to arrive at a transition probability matrix for a given time period. Other embodiments consider image similarity as a measure of image “closeness”. For example, in some settings, such as a photo studio, lighting conditions can remain fairly static over time until the scene or lighting setup is rearranged. In cases like these, the transition probability is more related to the relative change in the image contents or image features rather than the actual time elapsed between photos.
One such embodiment uses a neural classifier trained to estimate lighting conditions and an image feature vector may be obtained from an intermediate layer of the classifier. When the network is used to estimate the lighting conditions it can also output the image feature vector. In a sequence of images the feature vectors can be compared via a distance or dissimilarity measure to estimate the image similarity from one image to the next in a sequence of images. A properly scaled measure can then be used in place of, or in conjuction with (e.g. in combination with) time to determine the “closeness” of images in a sequence. This measure can act as the exponent to the transition matrix in a similar fashion as we did with the time difference to provide transition probability estimates.
In some embodiments, the transition matrix can be simplified such that it is a single parameter double stochastic matrix: the transitions probabilities are the same to stay within any state and the probability of changing states is equally probable. In this case the K by K transition matrix takes on the form:
where IK is the identity matrix and K is a ones matrix and t is the probability of staying in a state when the closeness of the images is 1.0.
More generally we can define a balanced double stochastic matrix as:
T=aI
K
+b
K
We note that since each row and column must sum to one, a and b are related by:
Then squaring the transition matrix T, we get
T
2=(aIK+bK)2
T
2
=a
2
I
K+2abK+b2K2
T
2
=a
2
I
K+(2ab+Kb2)K
Advantageously, the power of any matrix that can be described by aIK+bK can also be decomposed with an a and b coefficient. This can be seen by the fact that the power involves the summation of only constant diagonal matrices or constant matrices:
Moreover, and power of a double stochastic matrix is also double stochastic. Thus the b coefficient can be calculated based on the a coefficient and the transition matrix to any power can be describe as
For the case of our transition matrix defined with parameter t,
This leads to a simplified transition matrix based on time and image similarity. We define a function based on two image Im1 and Im2 and the time between the images Δt:
δ=ƒ(Imi,Imi+1,ti,ii+1)
And the corresponding transition matrix is given by Tδ:
In one embodiment the function ƒ is calculated by:
ƒ(Imi,Imi+1,ti,ii+1)=wf(1−N(Imi),N(Imi+1))2+wt(ti+1−ti)
where N(Imi) is the network feature of Imi, ⋅,⋅ is an inner product, and wf and wt are the feature and time difference weightings. Of course many other functions are possible considering these factors.
The log-likelihood is the log of the product of each transition probability across all sequences:
Where L(T) is the log-likelihood of transition matrix T, M is the number of labeled sequences ordered by time, ti
Block B630 evaluates the proposal based on the likelihood of the transition matrix. In one embodiment, a Metropolis-Hastings method is used to evaluate the ratio of the previous likelihood to the proposed likelihood. Alternatively, but equivalently, the difference of the log-likelihoods can be examined. The evaluation of the proposal in some embodiments results in a ratio of likelihoods or difference of log-likelihoods and the flow continues to block B635 where the system determines to accept the proposed transition matrix as the new current transition matrix.
In block B635 the decision to accept the proposal can be further illustrated by
In some embodiments, when the ratio of proposal transition to current transition likelihoods is above 1.0 block B635 always accepts the proposal as the new current transition estimate. When the ratio is below 1.0, block B635 will randomly choose the proposal with a probability equal to the ratio. Once the determination whether to accept or reject a proposal the flow continues. In the case that a proposal is accepted, flow then proceeds to block B640 where the current transition matrix is made the proposal matrix. It is from this current transition that the next proposal will be generated from (e.g. as a perturbation of the current proposal).
If the proposal is not accepted in block B635 flow returns to block B620. If the proposal is accepted block B640 is done before the flow returns to block B620. In block B620 a check is performed to determine whether the burn in period has not ended. If the burn-in period has ended flow then passes to a second loop that is similar to the loop in blocks B620 through B640. In this second loop starting in block B645, the loop is run for a certain number of iterations and the accepted proposal are recorded in block B675. Once N iterations are performed as checked by block B650, the flow continues to block B690 where a final transition estimate is calculated. In some embodiments the final transition matrix is the transition matrix along the random walk carried out in blocks B620 to B675 which had the maximum likelihood score. In other embodiments the transition matrix is a normalized version of the average of all accepted proposal matrices encountered after the burn-in period. These are the matrices stored by block B675. Once a final transition estimate is generated, it is stored for further use in online prediction and flow continues to block B695 where the process terminates.
In some embodiments of
U=T
t
Then if a is the index for label A 810, b is the index for label B 820, c is the index for label C 830 (and d the index of label D 840), then the transition likelihood for matrix T given a transition from labels A and B to label C in time t, is given by
In the example of 802 we similarly weigh the starting labels by the reciprocal of the number of starting labels and perform an “or” operation on the subsequent labels (probabilities add) to obtain:
And in the example shown in 803, there is a single starting label and multiple subsequent labels resulting in a likelihood of
More generally if the input set of labels is given by a set Q={q1, q2, . . . } and the cardinality of the set is denoted by |Q| and the subsequent label set is given by the set R={r1, r2, . . . } we can denote the likelihood of the transition over time t as:
In some embodiments the starting label and subsequent labels are randomly sampled from the sets Q and R when estimating the likelihood of the transition matrix. In these cases a method described previously may be used to calculate the likelihood.
Turning back to
In some embodiments, the prior probabilities are considered to be all equal. Thus if there are K distinct labels then each label has a prior probability of 1/K. In this case block B310 is not needed. This embodiment may be useful when a label predictor such as a neural multi-label classifier is trained with an unbalanced dataset that is unbalanced similarly as the true prior distribution. Sometimes in these cases the classifier produces predictions that are biased towards the more frequent classes and these predictions essentially encapsulate the prior probabilities. Thus in some cases it is justifiable to use equal weighted prior probabilities for later use.
In
For block B380 several embodiments are possible. One embodiment determines the label from the classifier with the maximum score as the classifier output. In this case the probability of a true label given a predicted label is exactly the probability one would find in a confusion matrix generated from testing the classifier. Another embodiment may take each of the classifier's label scores and normalize them such that they sum to 1. In this case the probability of the true label can be estimated by the relative weight of the corresponding classifier's label score with respect to the other label scores. Some embodiments perform a soft-max operation on the prediction scores as the emission function. Other embodiments further try to characterize the distribution of scores generated by the classifier given a truth label. For example a mean and covariance matrix may be estimated for the classifier's label scores for all images labeled with each true label. Thus if there are K true labels, then there are K mean vectors and K covariance matrices. Thus the emission function returns a likelihood of a particular score by using a Gaussian formula. Some embodiments extend this function and represent it as a Gaussian Mixture Model. Of course other embodiments are possible, some of which are similar or extensions of the ones described here.
Flow continues to block B390 where the emission functions are saved for further use. Finally flow moves to block B395 where the process terminates.
Turning to
Typically the goal when using a Hidden Markov Model is to attempt to estimate the hidden states (true labels) given observations. In the case described herein, the system estimates the true label given a set of prediction values. The values 970, 980, and 990 could be the label of the maximum prediction scores from a multi-label classifier or could be the set of scores for all labels from a classifier. Additionally it could be a feature vector derived from the image.
In
It can be seen that the example in
The lighting-condition-detection device 1100 includes one or more processors 1101, one or more I/O components 1102, and storage 1103. Also, the hardware components of the lighting-condition-detection device 1100 communicate via one or more buses or other electrical connections. Examples of buses include a universal serial bus (USB), an IEEE 1394 bus, a PCI bus, an Accelerated Graphics Port (AGP) bus, a Serial AT Attachment (SATA) bus, and a Small Computer System Interface (SCSI) bus.
The one or more processors 1101 include one or more central processing units (CPUs), which may include one or more microprocessors (e.g., a single core microprocessor, a multi-core microprocessor); one or more graphics processing units (GPUs); one or more tensor processing units (TPUs); one or more application-specific integrated circuits (ASICs); one or more field-programmable-gate arrays (FPGAs); one or more digital signal processors (DSPs); or other electronic circuitry (e.g., other integrated circuits). The I/O components 1102 include communication components (e.g., a graphics card, a network-interface controller) that communicate with the display device 1120, the network 1199, the photo-editing device 1110, and other input or output devices (not illustrated), which may include a keyboard, a mouse, a printing device, a touch screen, a light pen, an optical-storage device, a scanner, a microphone, a drive, and a game controller (e.g., a joystick, a gamepad).
The storage 1103 includes one or more computer-readable storage media. As used herein, a computer-readable storage medium includes an article of manufacture, for example a magnetic disk (e.g., a floppy disk, a hard disk), an optical disc (e.g., a CD, a DVD, a Blu-ray), a magneto-optical disk, magnetic tape, and semiconductor memory (e.g., a non-volatile memory card, flash memory, a solid-state drive, SRAM, DRAM, EPROM, EEPROM). The storage 1003, which may include both ROM and RAM, can store computer-readable data or computer-executable instructions.
The lighting-condition-detection device 1100 also includes a communication module 1103A, a label-scoring module 1103B, a scoring-training module 1103C, a transition-training module 1103D, an emission-modeling module 1103E, a prior-training module 1103F, a transition-scaling module 1103G, a sequence-detection module 1103H, a label-adjustment module 1103I, and a photo-editing module 1103J. A module includes logic, computer-readable data, or computer-executable instructions. In the embodiment shown in
The label-scoring module 1103B includes operations programed to carry out label prediction such as those created through
At least some of the above-described devices, systems, and methods can be implemented, at least in part, by providing one or more computer-readable media that contain computer-executable instructions for realizing the above-described operations to one or more computing devices that are configured to read and execute the computer-executable instructions. The systems or devices perform the operations of the above-described embodiments when executing the computer-executable instructions. Also, an operating system on the one or more systems or devices may implement at least some of the operations of the above-described embodiments.
Furthermore, some embodiments use one or more functional units to implement the above-described devices, systems, and methods. The functional units may be implemented in only hardware (e.g., customized circuitry) or in a combination of software and hardware (e.g., a microprocessor that executes software).
Additionally, some embodiments of the devices, systems, and methods combine features from two or more of the embodiments that are described herein. Also, as used herein, the conjunction “or” generally refers to an inclusive “or,” though “or” may refer to an exclusive “or” if expressly indicated or if the context indicates that the “or” must be an exclusive “or.”
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.
This application claims priority from U.S. Provisional Application Ser. No. 63/056,417 filed on Jul. 24, 2020 and U.S. Provisional Application Ser. No. 63/143,450 filed on Jan. 29, 2021, both of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63056417 | Jul 2020 | US | |
63143450 | Jan 2021 | US |