The disclosure generally relates to statistical models, and more specifically to systems and methods for imputing missing values in environmental, social and governance (ESG) data.
Financial institutions and/or investors may often rely on various company data to make their investment decision. In addition to finance data, environmental, social and governance (ESG) data has been widely used in analyzing company performance. However, ESG data often comes in incomplete or unbalanced datasets. For example, salary data of smaller private companies can often be less transparent, resulting in missing data entries in the ESG data spreadsheet. For another example, energy management data of companies can often be missing intermittently throughout a reporting period. Thus, the missing ESG data entries pose a challenge in systematic investments when an investment system ought to compare all investment companies on a financially material attribute.
Therefore, there is a need to address the issue of missing data entries in ESG data.
Embodiments of the disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the disclosure and not for purposes of limiting the same.
The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith.
As used herein, the term “module” may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks.
As used herein, the term “substantially” refers to a characteristic that achieve a certain property for the most part. For example, a set of variables that maximizes a numerical approximation of an objective function may be referred to as substantially maximizes the original objective function.
Control systems can often be analyzed by prediction models. To build a prediction model, training data samples are obtained, e.g., from historical measurements and/or observations of the control system, each of which includes an input sample and a corresponding output sample. The input sample may include a plurality of input characteristics. In another example, an evaluation model may employ data samples to valuate the performance metrics of a control system.
For example, a prediction model that predicts company performance may employ ESG data that provides information on the environmental, social, and governance factors of a company and may be used to measure how ethically viable and sustainable company operations are. Example environmental data of a company may include climate change data, greenhouse gas emissions data, waste and pollution data, deforestation data, resource depletion data, and/or the like. Example social data of the company may include working conditions data, health and safety data, local communities data, employee relations and diversity data, conflicts and humanitarian crisis data, and/or the like. Example governance data of the company may include fair tax strategy data, executive pay data, bribery and corruption data, employee pay and reward data, board diversity data, and/or the like.
Due to the nature of the ESG data, available ESG data may often come in the form of incomplete or unbalanced datasets with missing data entries. Some existing systems may choose to disregard rows with missing data entries in an ESG data spreadsheet, which largely under utilizes the available ESG data in data training.
Embodiments described herein provide a data-imputation and bias-correction approach that detects and corrects missing ESG data entries for the control system. Specifically, a matrix factorization approach is adopted for key performance indicators (KPIs) in a given descriptor (a column in the ESG data spreadsheet) to impute missing ESG values. The approach may then check if the KPIs in the descriptor exhibit systematic data gaps based on a missing completely at random (MCAR) test. Conditional on the results of the MCAR test, the system may apply Heckman's bias correction to correct for biases in the imputed data.
In one embodiment, the KPIs in a given descriptor may share a few unobservable attributes, chosen by experts in the field to represent a unique ESG issue represented by the descriptor. Companies are linked by these latent attributes, and so the latent attributes can explain firm-to-firm variance in ESG data (much like style factors do for conventional risk models). The number of these hidden attributes that explain similarities between firms is smaller than the number of KPIs in the descriptor and it is much smaller than the number of companies. In this way, these hidden attributes may be extracted from the available ESG data, and together with the shared relationship between companies, to impute the missing ESG data entries.
In one embodiment, upon data imputation of missing entries, the system may then determine whether the missing data is randomly missing, or may share the same distributional properties (mean and standard deviation) as those observed, e.g., systemically missing. Companies may be grouped into two sets, one for which data is missing in a given KPI and the other for which data is available. The two groups of companies may be used to test for difference in central tendencies for all other KPIs in the descriptor. Based on the tested difference, it may be inferred whether data in that descriptor is missing at random or systemically.
In one embodiment, if data is missing systematically, the latent attributes to explain the systemic “missingness” may be determined. For example, based on priors these attributes are possibly sector, region or size differences between firms, given that regulatory requirement and relevance for disclosure can vary along these dimensions. By regressing along these attributes, a number may be assigned to the likelihood of missingness. Heckman correction may then be applied to adjust the imputed values for disclosure bias.
In one embodiment, the server 130 may receive ESG data 102a-n relating to one or more companies from data sources 103a-n via a communication network. For example, the data source 103a-n may include data vendors such as Bloomberg®, S&P® DJI, ISS Oekom, and/or the like. The data 102a-n may include ESG data including information on the environmental, social, and governance factors of a number of target companies and may be used to measure how ethically viable and sustainable company operations are. Example environmental data of a company may include climate change data, greenhouse gas emissions data, waste and pollution data, deforestation data, resource depletion data, and/or the like. Example social data of the company may include working conditions data, health and safety data, local communities data, employee relations and diversity data, conflicts and humanitarian crisis data, and/or the like. Example governance data of the company may include fair tax strategy data, executive pay data, bribery and corruption data, employee pay and reward data, board diversity data, and/or the like.
In one embodiment, the server 130 may receive the ESG data 102a-n in the form of a database file, such as a spreadsheet, and/or the like. The server 130 may host a data imputation 104 to impute missing data entries in the ESG data spreadsheet. For example, the ESG data 102a-n may be received in a batch in the form of a spreadsheet having rows representing a plurality of companies, and columns representing different ESG data attributes (e.g., salary, diversity, carbon footprint, and/or the like). For each row (company), one or more data entries corresponding to one or more ESG data attributes may be missing. Or for each column (ESG data descriptor), some company data may not be available. In that case, the data imputation module 104 may impute the values of the missing data entries based on available entries, e.g., using a prediction model that learns the shared relationships between companies and known ESG data attributes.
In one embodiment, the data imputation module 104 may generate imputed data 116, based on which the missing data assessment module 105 may determine whether the missing data entries are randomly missing, or may share the same distributional properties (mean and standard deviation) as those observed, e.g., systemically missing. For example, the target companies (all rows) may be grouped into two sets: the first set having companies that miss data entries relating to a key performance indicator, and the second set having companies that have data entries available for the key performance indicator. The two groups of companies may be used to test for difference in central tendencies for other KPIs. Based on the tested difference, it may be inferred whether data in that descriptor is missing at random or systemically.
In one embodiment, a bias correction module 106 may be employed to generate latent attributes according to the missing data entries, if data is missing systematically. For example, these latent attributes are possibly sector, region or size differences between firms, given that regulatory requirement and relevance for disclosure can vary along these dimensions. By regressing along these attributes, a number may be assigned to the likelihood of missingness. Heckman correction may then be applied to adjust the imputed values for disclosure bias. The corrected ESG data 126 post bias correction may then be used for prediction model training 115a at the server 130. Or the corrected ESG data 126 may be optionally output to a user device 110, on which prediction model training 115b may be deployed.
The user device 110, data vendor servers 145, 170 and 180, and the server 130 may communicate with each other over a network 160. User device 110 may be utilized by a user 240 (e.g., a driver, a system admin, etc.) to access the various features available for user device 110, which may include processes and/or applications associated with the server 130 to receive an output data anomaly report.
User device 110, data vendor server 145, and the server 130 may each include one or more processors, memories, and other appropriate components for executing instructions such as program code and/or data stored on one or more computer readable mediums to implement the various applications, data, and steps described herein. For example, such instructions may be stored in one or more computer readable media such as memories or data storage devices internal and/or external to various components of system 200, and/or accessible over network 160.
User device 110 may be implemented as a communication device that may utilize appropriate hardware and software configured for wired and/or wireless communication with data vendor server 145 and/or the server 130. For example, in one embodiment, user device 110 may be implemented as an autonomous driving vehicle, a personal computer (PC), a smart phone, laptop/tablet computer, wristwatch with appropriate computer hardware resources, eyeglasses with appropriate computer hardware (e.g., GOOGLE GLASS®), other type of wearable computing device, implantable communication devices, and/or other types of computing devices capable of transmitting and/or receiving data, such as an IPAD® from APPLE®. Although only one communication device is shown, a plurality of communication devices may function similarly.
User device 110 of
In various embodiments, user device 110 includes other applications 116 as may be desired in particular embodiments to provide features to user device 110. For example, other applications 116 may include security applications for implementing client-side security features, programmatic client applications for interfacing with appropriate application programming interfaces (APIs) over network 160, or other types of applications. Other applications 116 may also include communication applications, such as email, texting, voice, social networking, and IM applications that allow a user to send and receive emails, calls, texts, and other notifications through network 160. For example, the other application 116 may be an email or instant messaging application that receives a message containing corrected ESG data from the server 130. Other applications 116 may include device interfaces and other display modules that may receive input and/or output information. For example, other applications 116 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view a report of corrected ESG data 126.
User device 110 may further include database 118 stored in a transitory and/or non-transitory memory of user device 110, which may store various applications and data and be utilized during execution of various modules of user device 110. Database 118 may store user profile relating to the user 240, predictions previously viewed or saved by the user 240, historical data received from the server 130, and/or the like. In some embodiments, database 118 may be local to user device 110. However, in other embodiments, database 118 may be external to user device 110 and accessible by user device 110, including cloud storage systems and/or databases that are accessible over network 160.
User device 110 includes at least one network interface component 119 adapted to communicate with data vendor server 145 and/or the server 130. In various embodiments, network interface component 119 may include a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency, infrared, Bluetooth, and near field communication devices.
Data vendor server 145 may correspond to a server that hosts one or more of the databases 103a-n (or collectively referred to as 103) to provide data 102a-n to the server 130. The database 103 may be implemented by one or more relational database, distributed databases, cloud databases, and/or the like.
The data vendor server 145 includes at least one network interface component 126 adapted to communicate with user device 110 and/or the server 130. In various embodiments, network interface component 126 may include a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency, infrared, Bluetooth, and near field communication devices. For example, in one implementation, the data vendor server 145 may send ESG data from the database 103, via the network interface 126, to the server 130.
The server 130 may be housed with the data imputation module 104, bias correction module 106 and the missing data assessment module 105. In some implementations, modules 104-106 may receive data from database 103 at the data vendor server 145 via the network 160 and build or implement a prediction model such as a regression model and/or a machine learning model to generate a imputed value for missing ESG data entries. The generated value may further be corrected by the bias correction module 106, and the data results 126 is sent to the user device 110 for review by the user 240 via the network 160.
The database 132 may be stored in a transitory and/or non-transitory memory of the server 130. In one implementation, the database 132 may store data obtained from the data vendor server 145. In one implementation, the database 132 may store parameters of the base prediction model 115. In one implementation, the database 132 may store previously predicted values generated from the prediction generation module 106, and the corresponding input feature vectors.
In some embodiments, database 132 may be local to the server 130. However, in other embodiments, database 132 may be external to the server 130 and accessible by the server 130, including cloud storage systems and/or databases that are accessible over network 160.
The server 130 includes at least one network interface component 133 adapted to communicate with user device 110 and/or data vendor servers 145, 170 or 180 over network 160. In various embodiments, network interface component 133 may comprise a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency (RF), and infrared (IR) communication devices.
Network 160 may be implemented as a single network or a combination of multiple networks. For example, in various embodiments, network 160 may include the Internet or one or more intranets, landline networks, wireless networks, and/or other appropriate types of networks. Thus, network 160 may correspond to small scale communication networks, such as a private or local area network, or a larger scale network, such as a wide area network or the Internet, accessible by the various components of system 200.
In one embodiment, sparsity of the data matrix 302 can range from 25-90%. For example, missingness of sustainable information may be related to Region (APAC companies have more missing values than EMEA), company size (small caps have more missing values than large caps), sector (Utility and Energy sectors have more coverage in descriptors related to energy production), and data vendor.
In one embodiment, a missing value in the data matrix 302 may be estimated using known ratings made by the same user (company) on similar items (columns). A variant (the transposed problem) is based on similarity of items. Key to these approaches is a set of similar values Ωk(i,j). This is the set of k-neighbors for user i and item j in the data matrix 302, the imputed value for the missing ai,j is given by:
where Wk,j is a weight representing the relative importance of ak,j. For example, the weights may be determined by the priority weights of the KPI and the Ωk(i,j) set includes all other KPIs in the same descriptor. However, this neighborhood model may face a common problem that they do not provide the flexibility to distinguish between the weights and user preference.
As shown in diagram 300, the latent factor model (LMF) are adopted to perform matrix factorization 304 on the data matrix 302. The LMF improves upon the deficiency of the neighborhood models by introducing latent factors to differentiate between user preference (user factors) and item importance (item factors). Specifically, as the incomplete data, represented by the sparse data matrix 302, can often be mapped into a small dimensional space (a low rank) and the original matrix can be recomputed to recover the missing values.
In one embodiment, the objective of the LMF approach is to find the latent matrices U of size (m×k) and V of size (k×n) such that k<<m, n such that the original data matrix 302 Y=UVT. The latent matrix V represents the mapping to a small group of representative companies (or representative users) and the matrix U represents a small group of representative KPIs (or representative items). It is noted that there is noise in the data matrix 302, the overfitting (to noise) may be controlled using a regularization penalty.
In one embodiment, a neural model may be engaged to generate the latent matrices U and V based on an input of the matrix the original data matrix Y. The neural model may be trained by a loss that minimizes the element-wise error between Y and UVT, and a regularization term. The objective function to train the LMP model is therefore:
where the regularization term uses a norm 2. ∥ui∥22=(Σjxij2)1/2 and a scalar factor implying the strength of regulation. Thus, in matrix form, the objective is computed as:
minU,v∥A−UV∥2F+λ(∥U∥F2+∥V∥F2)
where A denotes the sparse data matrix 302. In this way, when the factorized matrices U and V are located according to the objective, the imputed data matrix Y 306 may be obtained by the multiplication of U and V. The missing entries in the original data matrix 302 are thus imputed.
For example, ESG data is primarily based on company disclosures. Corporate disclosures are subject to regulation and so unless the policy of the land necessitates a disclosure companies would not necessarily take on the regulatory risk upon them. The lack of disclosure may often be amplified by the fact that collecting, aggregating and reporting information on ESG issues require administrative expenses that small firms often cannot afford. Therefore, it is likely that sometimes there is more data from certain countries, sectors and bigger companies than otherwise. Thus, under such conditions where the observation is incomplete, inferences drawn, and parameters extracted from the observations may not represent the true value. Therefore, to determine whether the inferences are possibly biased, the missing data assessment module 105 assess whether the gaps in the data entries are random or systemic.
In one embodiment, the missing values 402 and non-missing values 403 may be passed to a Missing Completely At Random (MCAR) test 410. The MCAR test assumes that null hypothesis states missing values are totally random. For example, for every KPI in a descriptor (column), the MCAR test 410 splits companies (rows) into two groups: the first with existing data values for the KPI and the second without (values missing). The MCAR test 410 then estimates the means and covariances of the two groups using the other KPIs.
Specifically, given the two multivariate distributions (missing vs. non-missing), the MCAR test 410 tests whether the means of two distributions are statistically different. When the two means are determined to be statistically indifferent, the missing data is determined to be missing completely random. On the other hand, when a p-value (based on observed missing data values, assuming the null hypothesis is true) is less than a threshold (e.g., 0.04, 0.05, etc.), strong statistical significance of difference is implied—the missing data is not missing at random.
For example, assuming the missing data point is a random variable (‘y’). The imputed value using the method discussed in relation to
where ϕ( ) is the probability density function of the observed data; wi is a vector of the attributes of the company which explain the probability of missingness; γ is a vector of parameters for each variable in the vector wi; and σ is the standard deviation of errors in the missing variable estimation (e.g.,
is also called the inverse mills ratio.
For example, if the left tail of the distribution is censored, then the sample mean from the rest of the data (dark area) is going to be higher than the true value. The difference between the guessed value (sample mean) and the true value (population mean) is the sample selection or censorship bias. This censorship bias in the imputed values may be corrected using the correction term computed from the Heckman's method.
Specifically, assuming that there are a set of explanatory variables in the data matrix 302, the exposure to which decides if the data from a company will be disclosed or censored. Referring back to the example shown in
In one embodiment, censorship is converted into a binary variable:
Using a probit regression with ‘z’ as the dependent variable and the sector, region and size as independent variables, a threshold below which data is likely to be censored (e.g., see the dotted line in
In one embodiment, Heckman correction is applied to the conditional mean (in this case the conditional mean is the imputed value). As shown in
Ycorected=YMF+Γ(Sector, Region, Size)
where Γ(Sector, Region, Size) is the bias term to correct the gap between the imputed values of missing data and the true values. It is worth noting that the explanatory variables (sector, region, size) need to be complete. In other words, the explanatory variables are both for observed and unobserved data.
At step 802, a data spreadsheet (e.g., data matrix 304 in
At step 804, one or more missing data entries corresponding to one or more ESG data descriptors in the data spreadsheet may be identified. For example, as shown in
At step 806, predicted values may be determined for the missing data entries based on a matrix factorization model using existing data entries in the data spreadsheet. For example,
At step 808, the method determines whether the one or more missing data entries are randomly or systemically missing based on a mean and standard deviation of the one or more missing data entries.
At step 812, if it is determined that the missing data entries are not randomly missing, method 800 proceeds to step 814, wherein one or more ESG data descriptors that are related to a reason of systemic missingness of the one or more missing data entries are identified.
Back to step 812, if it is determined that the missing data entries are randomly missing, method 800 proceeds to step 820, where the imputed data spreadsheet is output as the corrected ESG data (e.g., 126 in
At step 816, a bias correction term is predicted based on the one or more ESG data descriptors.
At step 818, the bias correction term is added to the predicted values for the one or more missing data entries.
Method 800 then concludes at step 820, where the corrected and imputed data spreadsheet is outputted to a user device and/or a data requestor (e.g., 126 in
Example data experiments may be performed on simulated data to illustrate where the gaps are created using a systematic process. The impact of systematic censorship bias and the effectiveness of the methods described in
In one embodiment, a synthetic matrix A (e.g., n=5000 companies, m=20 KPIs) where each column Aj:
Aj=βXi+∈ and X=U*VT where β=1;∈˜(0,σ2
k)
The matrix A is created by multiplying two latent matrices U the company exposures and V the KPI factors such that X=UVT. The value of a 2 represents the amount of random noise in the data. The latent matrices U˜(0,
k). and V˜
(0,
k) were chosen such that A has a rank k=4.
Next, the sample selection mechanism may be described by the indicator variable Zj=0.5+0.7×Wj+μ where corr (Xj, Wj)=0.8 and corr (ε, μ)=0.8. In this example, Wj is a latent variable (n×1) that determines if a data point in Aj will be observed or not. In particular, Zj>0 then observations will be observed, therefore Aj[Zj<0]=NaN. Each column of matrix A (the data set that provided), will be constructed using a low rank structure on X and a systematic way of defining missingness capture by Z and driven by W (latent variables linked to X). The resulting Matrix A is thus a synthetic data set for KPIs with missing values.
An MCAR test is used to show the missing values are not completely random. Then the regularized matrix factorization tis adopted to recover the missing values in the matrix A and used Heckman correction to adjust for the sample bias.
Note also if the data was simulated without censorship bias (rightmost bar), the RMF method is able to do a much better job in guessing the values. This is not just the case with RMF specifically, all methods reviewed for the data experiment can reproduce the missing data better in the absence of disclosure bias, because the gaps they intend to fill are random. In other words, 50% of the error in imputation comes from censorship bias. However, censorship is something that has been neglected by data vendors in their imputation method. Thus, the methods described in
The computer system 1000 includes a bus 1012 or other communication mechanism for communicating information data, signals, and information between various components of the computer system 1000. The components include an input/output (I/O) component 1004 that processes a user (i.e., sender, recipient, service provider) action, such as selecting keys from a keypad/keyboard, selecting one or more buttons or links, etc., and sends a corresponding signal to the bus 1012. The I/O component 1004 may also include an output component, such as a display 1002 and a cursor control 1008 (such as a keyboard, keypad, mouse, etc.). The display 1002 may be configured to present a login page for logging into a user account or a trading information page for displaying market data or portfolio data to a user. An optional audio input/output component 1006 may also be included to allow a user to use voice for inputting information by converting audio signals. The audio I/O component 1006 may allow the user to hear audio. A transceiver or network interface 1020 transmits and receives signals between the computer system 1000 and other devices, such as another user device, a merchant server, or a service provider server via network 1022. In one embodiment, the transmission is wireless, although other transmission mediums and methods may also be suitable. A processor 1014, which can be a micro-controller, digital signal processor (DSP), or other processing component, processes these various signals, such as for display on the computer system 1000 or transmission to other devices via a communication link 1024. The processor 1014 may also control transmission of information, such as cookies or IP addresses, to other devices.
The components of the computer system 1000 also include a system memory component 1010 (e.g., RAM), a static storage component 1016 (e.g., ROM), and/or a disk drive 1018 (e.g., a solid-state drive, a hard drive). The computer system 1000 performs specific operations by the processor 1014 and other components by executing one or more sequences of instructions contained in the system memory component 1010.
Logic may be encoded in a computer readable medium, which may refer to any medium that participates in providing instructions to the processor 1014 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. In various implementations, non-volatile media includes optical or magnetic disks, volatile media includes dynamic memory, such as the system memory component 1010, and transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise the bus 1012. In one embodiment, the logic is encoded in non-transitory computer readable medium. In one example, transmission media may take the form of acoustic or light waves, such as those generated during radio wave, optical, and infrared data communications.
Some common forms of computer readable media include, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer is adapted to read.
In various embodiments of the present disclosure, execution of instruction sequences to practice the present disclosure may be performed by the computer system 1000. In various other embodiments of the present disclosure, a plurality of computer systems 1000 coupled by the communication link 1024 to the network (e.g., such as a LAN, WLAN, PTSN, and/or various other wired or wireless networks, including telecommunications, mobile, and cellular phone networks) may perform instruction sequences to practice the present disclosure in coordination with one another.
Where applicable, various embodiments provided by the present disclosure may be implemented using hardware, software, or combinations of hardware and software. Also, where applicable, the various hardware components and/or software components set forth herein may be combined into composite components comprising software, hardware, and/or both without departing from the spirit of the present disclosure. Where applicable, the various hardware components and/or software components set forth herein may be separated into sub-components comprising software, hardware, or both without departing from the scope of the present disclosure. In addition, where applicable, it is contemplated that software components may be implemented as hardware components and vice-versa.
Software in accordance with the present disclosure, such as program code and/or data, may be stored on one or more computer readable mediums. It is also contemplated that software identified herein may be implemented using one or more general purpose or specific purpose computers and/or computer systems, networked and/or otherwise. Where applicable, the ordering of various steps described herein may be changed, combined into composite steps, and/or separated into sub-steps to provide features described herein.
The various features and steps described herein may be implemented as systems comprising one or more memories storing various information described herein and one or more processors coupled to the one or more memories and a network, wherein the one or more processors are operable to perform steps as described herein, as non-transitory machine-readable medium comprising a plurality of machine-readable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform a method comprising steps described herein, and methods performed by one or more devices, such as a hardware processor, user device, server, and other devices described herein.
Number | Name | Date | Kind |
---|---|---|---|
10409789 | Zoll | Sep 2019 | B2 |
11709910 | Prieditis | Jul 2023 | B1 |
20050055257 | Senturk | Mar 2005 | A1 |
20050234688 | Pinto | Oct 2005 | A1 |
20090177598 | Bhaskar | Jul 2009 | A1 |
20120023045 | Steck | Jan 2012 | A1 |
20120136896 | Tseng | May 2012 | A1 |
20130036082 | Natarajan | Feb 2013 | A1 |
20150058337 | Gordon | Feb 2015 | A1 |
20150073932 | Nice | Mar 2015 | A1 |
20170132509 | Li | May 2017 | A1 |
20170316008 | Srinivasan | Nov 2017 | A1 |
20180167153 | Cui | Jun 2018 | A1 |
20180173501 | Srinivasan | Jun 2018 | A1 |
20200082283 | Moon | Mar 2020 | A1 |
20200160215 | Kotnis | May 2020 | A1 |
20200314119 | Karin | Oct 2020 | A1 |
20200333170 | Uwano | Oct 2020 | A1 |
20210075875 | Liu | Mar 2021 | A1 |
20210304047 | Alattas | Sep 2021 | A1 |
20220004528 | Medisetti | Jan 2022 | A1 |
20220058663 | Cui | Feb 2022 | A1 |
20220067610 | Aggarwal | Mar 2022 | A1 |
20220083532 | Solari | Mar 2022 | A1 |
20220107872 | Kondrashkin | Apr 2022 | A1 |
Number | Date | Country |
---|---|---|
112464289 | Mar 2021 | CN |
113127469 | Jul 2021 | CN |
113806349 | Dec 2021 | CN |
114090561 | Feb 2022 | CN |
114153829 | Mar 2022 | CN |
102014108191 | Dec 2014 | DE |
WO-2005106656 | Nov 2005 | WO |
Entry |
---|
Koren et al., “Matrix Factorization Techniques for Recommender Systems”, Computer, vol. 42, Issue 8, Aug. 2009, pp. 30-37. (Year: 2009). |
Le Morvan et al., “What's a Good Imputation to Predict with Missing Values?”, 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Nov. 30, 2021, pp. 1-21. (Year: 2021). |
Wang et al., “Correction of Bias from Non-Random Missing Longitudinal Data Using Auxiliary Information”, Statistics in Medicine, 2010, 29(6): pp. 671-679. (Year: 2010). |