Select Page

There’s a lot of excitement in the security world today around artificial intelligence (AI) and, more specifically, machine learning (ML). CSO Online lists their top 5 use cases for machine learning in security which include detecting malicious activity in the network, automating repetitive tasks, and analyzing large volumes of data for threat intelligence. But another immediate application of machine learning will be in data protection and the prevention of data leaks.

Data Protection Accuracy and Usability

Data protection is always balancing security and usability. Whether you are using traditional encryption and data loss prevention (DLP) software, or more modern CASB and Information Rights Management, those technologies are known for their usability problems. Some data protection software, especially if the settings are too aggressive, will prompt users too often to identify themselves and identify their data, prevent users from accessing data they actually should be able to access, and report false positives. Too many usability problems lead to frustration and slow down the pace of business. In the worst case, unusable data protection software will even incite users to find ways of circumventing the security you’ve put in place altogether. Faced with user backlash, many organizations end up turning down the security settings on their DLP, which improves the user experience but makes the data protection much less effective.

Machine learning algorithms promise to strike a better balance. By learning what sensitive data looks like in your organization, ML has a better chance of identifying sensitive data before it leaks, and catching potential leaks with fewer false positives.

Traditional regular expression matching doesn’t always work well to identify personally identifiable information. For example, take the number “4538262071598058”. Is this a credit card number (PFI), an international phone number (PII), or just a random sequence of numbers? Or imagine a document that talks about “Project DaVinci”, a secret project. Should this be treated as confidential? Without context, it’s hard to know, but machine learning in context could identify more accurately what it is. And as time goes on, ML continues to learn and gets more and more accurate.

Machine learning can even produce a confidence rating about what it thinks, for example for the number sequence given above perhaps the ML algorithm at that moment is 80% sure that this is a credit card number. Depending on the level of confidence, whether 80%, 50%, 99%, etc. it could chose to prompt the user for clarification, or simply take action without bothering the user at all.

Pinpoint the Source of Data Leaks

In the movies, secret agents will often ‘bug’ a criminal’s car and then watch them drive back to their headquarters to learn more about who they are meeting with and what they are doing. Similarly, some data protection software like Information Rights Management, lets you encrypt a file and then track what happens to it when it leaks. As the content owner, you can see who is trying to access the file, even outside the organization, and get a full audit trail of how the file leaked at every touch point along the way.

Information Rights Management is already very powerful in this regard, but today’s solutions are still a manual process of following files on a dashboard and getting alerts when they fall into the wrong hands. Add machine learning algorithms to Information Rights Management and now you are proactively tracking patterns of where all your enterprise data is leaking, by who, to whom, and why. Machine learning could be automatically highlighting potential insider threats, exfiltration attempts, and even who on the outside may be receiving the leaked data.

Machine learning is an exciting field and new breakthroughs in ML are sure to help the security world. In the short term, expect to see machine learning powering data protection solutions such as Information Rights Management technology, improving their accuracy, usability, and monitoring, even in the event of a leak.