Tuning a machine learning model is an iterative process. Data scientists typically run numerous experiments to train and evaluate models, trying out different features, different loss functions, different AI/ML models, and adjusting model parameters and hyperparameters. Examples of steps involved in tuning and training a machine learning model include feature engineering, loss function formulation, model testing and selection, regularization, and selection of hyperparameters.
Feature engineering – a critical step to enhance AI/ML models – broadly refers to mathematical transformations of raw data in order to feed appropriate signals into AI/ML models.
In most real-world AI/ML use cases, data are derived from a variety of source systems and typically are not reconciled or aligned in time and space. Data scientists often put significant effort into defining data transformation pipelines and building out their feature vectors. Furthermore, in most cases, mathematical transformations applied to raw data can provide powerful signals to AI/ML algorithms. In addition to feature engineering, data scientists should implement requirements for feature normalization or scaling to ensure that no one feature overpowers the algorithm.
For example, in a fraud detection use case, the customer’s actual account balance at a point in time may be less meaningful than the average change in their account balance over two 30-day rolling windows. Or, in a predictive maintenance use case, the vibration signal related to a bearing may be less important than a vibration signal that is normalized with respect to rotational velocity.
Thoughtful feature engineering that is mindful of the underlying physics or functional domain of the problem being solved, coupled with a mathematical explosion of the feature search space, can be a powerful tool in a data scientist’s arsenal.