It’s 10pm.
Do YOU know where
your training data is?

Reading Time: 3 minutes

Much like the television and radio public service announcements from the 60’s,70’s and 80’s, training data, like your kids, often needs adult supervision. These PSA’s were directed at parents in order to promote more responsibility and accountability for their kids out after dark when risks for trouble increased. Similarly, a company always needs to keep tabs on where their training data has been, what it’s involved in now, and where it is going. We want that data safe and want to prevent it from being corrupted. And we certainly don’t want it involved in a crime.

A lot of eff ort goes into network protection and preventing cyber-attacks. However, regulators are becoming just as concerned with the question of how companies are using the data that consumers share with them. Consumers, to varying degrees, trust companies to not misuse their data in activities the consumer hasn’t been made aware of and agreed to. But consumers also trust the law, and the laws around the fair use of data, with particular emphasis on data that contains personally identifi able information (PII). The regulators police and enforced these laws to ensure that data be used for only for the reasons legally agreed to. But why is this becoming an issue now, instead of say ten years ago?

Artifi cial Intelligence systems normally need a very high volume of data to operate eff ectively. What has changed in the last ten years is the proliferation of sensors.
Your smart phone, to name an obvious one, of course. But there are now sensors that evaluate all sorts of activities and conditions because the data networks can support them—those sensors can now talk to a computer in real-time and communicate everything from cameras monitoring the fl ow of crowds between innings at a baseball team, to microphones deciphering the best person to talk to you online or through a call-center. At the same time these sensors have become inexpensive. So it now is possible to have a security system with hundreds or thousands of sensors and still be cost eff ective. Of course the cost of computer memory being a fraction of what it was only 5 years ago, along with the speed of computer processors continuing to increase means more data can be crunched in a shorter amount of time and create cost eff ective solutions that wouldn’t have been possible even 7 or 8 years ago.

But how trouble can a company really get into with physically tracking the data. The harm does not just come from a regulatory agency. It’s also a public relations disaster. With social media at the forefront of influencing opinions about the company’s people trust, losing that trust and the bad press that comes with it can cause exponentially more damage when data is put in the wrong hands (through cyber-crime) or misused (through a violation of the Terms of Service). Either way, be it simple mischief impacting only a few dozen people, or major harm affecting thousands, future revenue will invariably take a hit when the word gets out and the company’s reputation is in tatters.