In most real-world machine learning use cases, raw data are derived from a variety of source systems and typically are not reconciled or aligned in time and space. Data must therefore be transformed in order to be useful for machine learning models. Feature engineering refers to the mathematical transformations of raw data in order to feed appropriate signals into machine learning models.
Data scientists often put significant effort into defining data transformation pipelines and building out their feature vectors. In most cases, mathematical transformations applied to raw data can provide powerful signals to machine learning algorithms. In addition to feature engineering, data scientists should implement requirements for feature normalization or scaling to ensure that no one feature overpowers the algorithm.
For example, in a fraud detection use case, the customer’s actual account balance at a point in time may be less meaningful than the average change in their account balance over two 30-day rolling windows. Or, in a predictive maintenance use case, the vibration signal related to a bearing may be less important than a vibration signal that is normalized with respect to rotational velocity.
C3 AI software provides powerful feature engineering tools and capabilities, enabling data scientists, developers, and analysts to explore data sets visually and mathematically; identify and manipulate features; and define data transformations quickly and easily.