Ground truth refers to the actual nature of the problem that is the target of a machine learning model, reflected by the relevant data sets associated with the use case in question. Supervised machine learning models are trained on labeled data that are considered “ground truth” for the model to identify patterns that predict those labels on new data.
Supervised techniques often require non-trivial dataset sizes to learn reliably from ground truth observations. For most enterprise business problems, data complexity is significant. It is reasonable to assume that at least five or six disparate IT and operational software systems will be required to solve most real-world enterprise AI use cases that unlock substantial business value. At most organizations, the individual IT source systems weren’t designed to interoperate and typically have widely varying definitions of business entities and ground truth.
Models may require many thousands of input and output examples to learn from in order to perform effectively. Larger datasets, including greater numbers of historic examples from which to learn, enable the algorithms to incorporate a variety of edge cases and produce models that handle these edge cases elegantly. Depending on the business problem at hand, multiple years of data are necessary to account for seasonality.
The C3 AI Platform and C3 AI Applications provide extensive capabilities that enable data scientists, developers, and analysts to manage, manipulate, explore, transform, and normalize large, diverse data sets for machine learning models. C3 AI software incorporates sophisticated capabilities to automate numerous data science operations that facilitate discovery of ground truth for machine learning applications.