The present teaching generally relates to computers. More specifically, the present teaching relates to machine learning.
With the development of the different techniques on artificial intelligence (AI), learning models based on training data to perform certain decision-making tasks has become more and more prevalent. Data may be collected from past operations of a relevant application and used to generated training data to be provided to a machine learning mechanism to train a model to learn knowledge in making decisions like what is represented by the training data. Training data may comprise various tuples, each of which includes situation data and decision data. The situation data may correspond to different types of information associated with a situation and the decision data may correspond to a decision made for the underlying application based on the situation data. For example, in content recommendation (application), situation data may include features such as an identity of a user, a personal profile of the user, a webpage the user is interacting with, the searches performed by the user, trending topics, content topics available to the operator of the webpage, etc. Decision data in this exemplary application may be a recommended topic of content to be recommended to the user. When such situation and decision data are collected from past user content consumption activities, they may be used to train a model to recommend a topic of content to be recommended to a user given the situation data associated with the user in each scenario.
The performance of a learned model may depend on different factors. For example, the representativeness of the training data with respect to the underlying application may have a crucial effect on the model. However, it may be difficult to obtain comprehensive training data to cover all scenarios, especially when the dynamics associated with an application may not be predictable. For instance, value distributions of features related to seasons and trends may change over time and some may not change in a predictable manner. That is, in general, training data may include unstable features. Given that, the models learned using training data may not reflect the situation at the time the model is relied on to make a decision. Efforts have been made to overcome this issue. For example, training data may be continually collected, and the model may be completely re-trained regularly using the most relevant training data (e.g., most recent). This approach may be expensive because of the regular re-training. Another effort is to perform time series analysis with respect to each feature in the training data, which requires data collected from a long period in order to be effective.
Thus, there is a need for a solution that address the problem in training models due to unstable features.
The teachings disclosed herein relate to methods, systems, and programming for information management. More particularly, the present teaching relates to methods, systems, and programming related to content processing and categorization.
In one example, a method, implemented on a machine having at least one processor, storage, and a communication platform capable of connecting to a network for learning a model. Supervised training data with samples having feature values and a label is received. Unlabeled data be classified is received having samples with values of the same features. Un-stationary features in the supervised training data are detected based on respective feature values from the supervised training data and the unlabeled data. If un-stationary feature exists, adjusted training data set is created based on the supervised training data and the un-stationary features and used to train a stationary classification model. Otherwise, the supervised training data is used to train the stationary classification model.
In a different example, a system is disclosed for learning a model and includes an un-stationary feature detector and a supervised stationary model training engine. The un-stationary feature detector is provided for receiving supervised training data and unlabeled data to be classified, where the supervised training data include data samples, each with values of features and a label and the unlabeled data has values of the same features, and are used to determine whether any of the corresponding features is un-stationary in the supervised training data. If any un-stationary features is detected, adjusted supervised training data is created based on the supervised training data and the detected un-stationary features and used for training a stationary classification model. Otherwise, the stationary classification model is trained based on the supervised training data set.
Other concepts relate to software for implementing the present teaching. A software product, in accordance with this concept, includes at least one machine-readable non-transitory medium and information carried by the medium. The information carried by the medium may be executable program code data, parameters in association with the executable program code, and/or information related to a user, a request, content, or other additional information.
Another example is a machine-readable, non-transitory and tangible medium having information recorded thereon for learning a model. Supervised training data with samples having feature values and a label is received. Unlabeled data be classified is received having samples with values of the same features. Un-stationary features in the supervised training data are detected based on respective feature values from the supervised training data and the unlabeled data. If un-stationary feature exists, adjusted training data set is created based on the supervised training data and the un-stationary features and used to train a stationary classification model. Otherwise, the supervised training data is used to train the stationary classification model.
Additional advantages and novel features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The advantages of the present teachings may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.
The methods, systems and/or programming described herein are further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
In the following detailed description, numerous specific details are set forth by way of examples in order to facilitate a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or system have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
The present teaching discloses a framework for adaptively training a stable or stationary model by dynamically detecting un-stationary features and eliminating or minimizing the impact of such un-stationary features in training the model. A training mechanism utilizes a training data set with ground truth labels for learning to obtain a model that can be used to predict labels of unlabeled data. Such a trained model works well if the feature values in the training data set have the same or similar distributions as that in the unlabeled data. An un-stationary feature is defined as one whose value distribution in the training data is not the same or similar to that in the unlabeled data. When un-stationary features are present, the predictive power of the model degrades. However, the un-stationary features may change over time. That is, an un-stationary feature at some time may no longer be un-stationary. Similarly, a stationary feature at some time may become un-stationary later.
The present teaching discloses methods and systems to dynamically detecting un-stationary features with respect to training data and unlabeled data set and then accordingly eliminate or minimize the impact of the detected un-stationary features in training a model. In some embodiments, an un-stationary feature may be identified by detecting a distribution change based on, e.g., statistical tests. When remaining stationary features provide adequate predictive power, a model may be trained in a manner to remove the impact of the un-stationary features. In some embodiments, the impact of the un-stationary feature may be eliminated by excluding a part of the training data associated with the un-stationary features in training the model. In some embodiments, the impact of the un-stationary features may be minimized through adjusting the weights added to the values of the un-stationary features. For example, a weight may be applied to each of the features in the training data and may be dynamically adjusted based on, e.g., a level of consistency between its value distribution and that of the unlabeled data set. The higher the level of consistency, the higher the predictive power of the feature. On the other hand, the higher the level of inconsistency, the lower the weight assigned to the feature to reduce its impact to the model.
As feature selection according to the present teaching is based on the predictive power of each feature with respect to a specific unlabeled data set, different features may be detected as un-stationary with respect to different unlabeled data sets. It makes it possible to adapt the model to a problem in hand or to the fluctuation in the unlabeled data sets collected over time in the same application. As such, a model to be trained for an application may be dynamically adjusted to yield a stable model in terms of predictive power with respect to the data to be classified. The approach according to the present teaching provides an improvement to the conventional techniques because it enables training of stable models yet does not require continuous collection of labeled data or time series analysis based on data collected over a long period of time.
Based on the training data set 120 and the unlabeled data set 130, the un-stationary feature detector 110 is provided to identify un-stationary features [F′]. The supervised stationary model training engine 140 is provided to train the stationary classification model 150 using a modified training data set, e.g., [D. F-F′, L], derived based on the [D, F, L] from 120 and the detected un-stationary features [F′]. In some embodiments, F-F′ may be implemented by removing features in [F′] from F. In some embodiments, F-F′ may be implemented by adjusting the weights to features in F according to the detection result so that the un-stationary features in F′ may be given minimum weights to reduce their impact during training. Training based on the modified training data set [D, F-F′, L] yields the stationary classification model 150. The stationary model-based classification engine 160 may then classify each sample in the unlabeled data set 130 based on the trained stationary classification model 150 and assigned labels to the samples in unlabeled data set 130.
Based on the detected un-stationary features [F′], the supervised stationary model training engine 140 creates, at 125, a training data set that is to be used for the actual training. As discussed herein, in some embodiments, a modified training data set [D, F-F′, L] may be created with F-F′ being derived by either removing the un-stationary features F′ or assigning minimized weights to features in F′ to eliminate their impact to the training. Such created actual training data set may then be used to train, at 135, the stationary classification model. With the trained stationary classification model 150, the stationary model-based classification engine 160 classifies, at 145, the samples in the unlabeled data set 130 and assign labels thereto.
The operation as disclosed herein to obtain a stable classification model via supervised training using stable training data according to the present teaching may be regularly carried out in an application. In some embodiments, the unlabeled data set 130 may correspond to batch data. In this case, each time there is a new batch, the adaptation of the stationary classification model 150 may be carried out by detecting again the un-stationary features with respect to the new batch. In some embodiments, the adaptation may be carried out according to some specified time interval, e.g., daily or weekly. In some embodiments, the adaptation may be triggered by, e.g., performance of the stationary classification model 150.
The F-distribution change detector 260 is provided for detecting a distribution change based on two distributions represented by two sets of feature values, one from the labeled data feature samples 220 and the other from the unlabeled data feature samples 240 corresponding to the same feature. As discussed herein, in some embodiments, the detection of a distribution change is based on some statistical test, configured, e.g., in a distribution change test specification 250. If a feature is determined to be un-stationary, it is identified as such. The un-stationary feature determiner 270 is provided for generating an identified set of un-stationary features [F′] based on the comparison result and outputs [F′].
In the operation mode of weighing features according to un-stationary feature detection result, the weight-based feature regularization unit 320 may be invoked to determine the weight to each of the features. In some embodiments, the weight to an un-stationary feature may be set to zero or a small value, which may be determined proportionally to the level of change of the underlying feature value distribution to indicate the impact of the un-stationary feature allowed during training. The stationary features may be weighed differently indicative of allowing them to contribute to the model training. In some embodiments, the weights applied to the stationary features may be equal, indicating that stationary features may equally contribute to the learning. Such weighted features may be incorporated into the supervised training data set 120 to generate adjusted training samples 340. Such adjusted training samples 340 may later be used for training the stationary classification model 150. As the impact of the detected un-stationary features is minimized via weighing, the adjusted training samples enables derivation, via machine learning, of the stationary classification model 150.
In the operation mode of generating a modified training data set according to the un-stationary feature detection result, the training data modifier 330 may be invoked and remove the un-stationary features and their corresponding values from the supervised training data 120 to generate adjusted training samples 340, which may later be used for training the stationary classification model 150. In this operational mode, as the un-stationary features have been removed in the adjusted training samples 340, the stationary classification model 150 trained using such training data yields a mode that is stable. With the adjusted training samples 340, obtained in either of the operation mode as discussed herein, the machine learning engine 350 is invoked to train, via machine learning, the stationary classification model 150 using the adjusted training samples 340.
The learning scheme for learning a stable model according to the present teaching may be used to capture sudden and/or unpredictable changes between current data to be classified and the training data. As discussed herein, although time series analysis may be used to capture such a change, it can achieve the same only after the fact and requires in general evidence over an extended period. The conventional re-training of a model also requires collecting additional labeled data. The approach as disclosed herein may be used to capture such a change on-the-fly while it is happening without needing previously collected evidential data associated with the change.
To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform(s) for one or more of the elements described herein. The hardware elements, operating systems and programming languages of such computers are conventional in nature, and it is presumed that those skilled in the art are adequately familiar therewith to adapt those technologies to appropriate settings as described herein. A computer with user interface elements may be used to implement a personal computer (PC) or other type of workstation or terminal device, although a computer may also act as a server if appropriately programmed. It is believed that those skilled in the art are familiar with the structure, programming, and general operation of such computer equipment and as a result the drawings should be self-explanatory.
Computer 500, for example, includes COM ports 550 connected to and from a network connected thereto to facilitate data communications. Computer 500 also includes a central processing unit (CPU) 520, in the form of one or more processors, for executing program instructions. The exemplary computer platform includes an internal communication bus 510, program storage and data storage of different forms (e.g., disk 570, read only memory (ROM) 530, or random-access memory (RAM) 540), for various data files to be processed and/or communicated by computer 500, as well as possibly program instructions to be executed by CPU 520. Computer 500 also includes an I/O component 560, supporting input/output flows between the computer and other components therein such as user interface elements 580. Computer 500 may also receive programming and data via network communications.
Hence, aspects of the methods of information analytics and management and/or other processes, as outlined above, may be embodied in programming. Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine-readable medium. Tangible non-transitory “storage” type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
All or portions of the software may at times be communicated through a network such as the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, in connection with information analytics and management. Thus, another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine-readable medium may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, which may be used to implement the system or any of its components as shown in the drawings. Volatile storage media include dynamic memory, such as a main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.
Those skilled in the art will recognize that the present teachings are amenable to a variety of modifications and/or enhancements. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server. In addition, the techniques as disclosed herein may be implemented as a firmware, firmware/software combination, firmware/hardware combination, or a hardware/firmware/software combination.
While the foregoing has described what are considered to constitute the present teachings and/or other examples, it is understood that various modifications may be made thereto and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.