Big Data vs GDPR!

So, I’m pretty sure we are all familiar with Europe’s GDPR (General Data Protection Regulation).  It’s all I’ve seen on my newsfeed lately! I’d like to get the opinion of those in the Big Data space from my network on how you feel this will affect ML technologies?

Big Data is hardly the biggest fan of Regulations and its’ certainly not on the front line wearing a cheerleader costume for GDPR.  We need to recognise that ML is used in more places than you think! In your Alexa to understand speech patterns, your bank to identify fraudulent transactions and even during your lunch time shop on ASOS or Facebook!

So, what does this mean for Europe?

The new regulations pose several challenges for companies in the Big Data space who use or are building Machine Learning Systems. Machine Learning (ML) is all about the collection and optimisation of Data and it does so consistently in amounts that surpass its creator’s expectations. The data is needed to identify patterns and essentially serve its purpose. So, in theory, regulating what data can be collected and how it will be used will restrict its capabilities right?

Well let’s just look at some of these new regulations …

  • Where personal data is collected, you must state what it will be used for and not use it for anything else.
  • Personal data can be edited or deleted at the request of the individual.
  • If data is being used for automated decisions, businesses must explain the logic behind the decision-making process.
  • The amount of data collected and stored is to be as minimised as much as possible, allowing business to keep only what they need to serve its purpose.
  • Limits must be set on how long data is stored.
  • At any time, someone can request a copy of all data stored that relates to them.

So, what’s the issue?

Not all companies have a grasp on their data or even what it was originally collected for. If we were all as organised as this…. most companies would not be in business!

As ML systems is not a rule-based piece of software, you can’t always get transparency on how a decision is reached or why the data is needed. It doesn’t have an end goal as it’s continuous, so you don’t know what purpose you are consenting for your data to be used.

This becomes a bigger issue when brands rely on certain ‘personal data’ details such as religious views, sexuality and political beliefs as the restrictions get tighter! Even if you anonymize certain aspects of your data, it still isn’t enough. If a business has enough Data and a person can be recognised or singled out… it falls outside the anonymized data clause.

What happens next?

I’d love to get the opinion of my network on how you will think this will affect ML and what experiences you’ve had with these discussions?

You can share this story by using your social accounts:

leave a Comment