Greg Council is Vice President of Product Management at Parascript, responsible for market vision and product strategy. Greg has over 20 years of experience in solution development and marketing within the information management market. This includes search, content management and data capture for both on premise solutions and SaaS. To contact Greg and Parascript, please email: email@example.com.
When the future you expect never arrives and business predictions fall short of their mark, the culprit is—more often than not—bad or missing data. Procurement staff must ensure their data is accurate from start to finish so that their forecasts have the desired outcomes.
Idioms abound about how to tackle future challenges such as “past results do not guarantee future performance” or conversely, “those who do not learn history are doomed to repeat it.” We have all seen or heard these intuitive phrases. On the surface, they would seem to be at odds. In reality, they address two different concepts associated with using the past (data) to understand, predict and influence the future.
Similarly, when it comes to projects that involve the need for data, whether it is to predict sales to manage inventories or to train a system to automate a process, success hinges on having the right set of data to use as input to the decision making process. Today, where machines are often making decisions, the notion of “right set of data” becomes a lot harder to understand. This is because machines learn in a different way and the rationale for the output they produce is difficult to reconstruct.
Machines do not have the intuition or the critical reasoning that can help to elevate or discount one data point over another. Input data must be accurate, representative, and free from bias so here are some key guidelines about your data to help ensure successful projects:
1. Accurate Data. Having accurate data is essential because a machine can learn on both accurate and inaccurate data, but only accurate data provides the desired results: a machine that provides output, which is reliable.
2. Representative Data. For machine-based learning, having a representative sample set is crucial. A basic definition of representative samples is a subgroup of a larger population that reflects the key characteristics of that larger population as a whole.
3. Bias-free Data. Bias is an error in output based upon incorrect assumptions about the input data. Bias can result from the algorithms themselves or the data inputs that—while accurate and representative—do not provide the right type of data to use for machine learning.
Getting It Right from the Get-go
As machine learning capabilities continue to be incorporated into business process automation, an increased risk of adverse outcomes exists due to a lack of experience and focus on the input data sets used to train these new systems.
Automation systems used to reduce costs can result in the increase in costly manual efforts. Predictive systems used to help plan for the future can result in strategies meant to deal with outcomes that never occur. The answer to all of these problems lies with the input data. It must represent the real situation in which the problem domain lives, must be accurate, and it must be free of hidden bias.
To learn more about this topic, visit futureofsourcing.com to read Greg’s article on ensuring your expected data outcomes are accurate.
Greg Council, Vice President of Product Management, Parascript