Features are used to describe the characteristics of entities. For example, "age", "smoking or not", and "years of smoking" are features of a patient, which can be used to describe the patient's physical condition, and furthermore, to predict if she or he is likely to get lung cancer. A combination of features could be more helpful to the prediction, e.g., "age" minus "years of smoking" can be a new feature to indicate how early the patient starts smoking. This kind of feature combination is called feature generation. In the big data era, there exist enormous numbers of features, and it is not realistic to generate features manually by human experts. This project will build new technologies to automatically generate new features based on existing features, to better describe the objects and entities, and to gain better prediction performance. Additionally, this project aims to substantially improve the traceability, affordability, and explainability during the generation process. The developed algorithms and tools are expected to be generalized and applicable to a broad range of scientific and engineering problems, not just in feature generation, but also in other domains such as data pre-processing, social analysis, intelligent transportation systems, healthcare, and the internet of things. <br/><br/>This project identifies three research tasks: (i) A Reinforcement Learning (RL) based approach to realize traceability. Two RL agents are used to select appropriate features, and one RL agent is used to select the appropriate operation. The policy network will be decomposed into two sub-networks, i.e., representation network and value network. Different agents will share the value network to improve training convergence. (ii) A heuristic approach to realize affordability. Information theory-based utility scores will be designed to evaluate features and feature sets, and the heuristic selection strategy will be designed in the generation process. (iii) A Large Language Model (LLM) based approach to realize explainability. The tabular data will be serialized into natural language strings, and comprehensive prompts will be designed incorporating feature generation expertise and domain expertise. The LLM can generate features with explanations by fine-tuning it with prompts. Two strategies will be proposed to compress the prompt. The proposed research will provide novel perspectives and methodologies as to how to generate new features by advancing the understanding and designing new generation strategies. They go beyond conventional generation methodologies that are highly dependent on domain knowledge<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.