On 11/06/2020, I discussed "Data Privacy and AI" during "Analytics and Data Privacy" conclave organized by Symboisis University.
Had some excellent questions by audience and my answers to address the "Challenges of AI and Data Privacy".
1. Can AI and data privacy co-exist?
·
For some of the AI/ML Analysis like recommendation
engine where some privacy data is required to provide the recommendation, we
ensure that only the data required is used seamlessly without any privacy data.
·
We ensure Acceptance of Privacy Policy by the client
where applicable.
·
Additionally, CISO, the Privacy Officers exercise
required controls on all the activities carried out by the organization
including AI projects.
·
In some cases, we make sure privacy data such as SSN,
Protected Health Information (PHI), and other personal data is masked
before providing data to AI/Analytics teams. For example, we mask the first 5
digits of SSN or the first 12 numbers of credit card, etc.
2. What steps
are you taking in your organization to protect customers' privacy while
providing data access for AI and Machine Learning projects?
·
Exactly knowing the location of “privacy data” on our
systems, how it is stored, processed and where this data is duplicated e.g. for
backup purposes etc.
·
Using proper encryption of privacy data when stored on
the systems and also when transmitted
·
Ensuring that the privacy data is not duplicated
unless it is absolutely required for business enabling purposes
·
Access is provided to privacy data only on absolute
business need basis
·
Anonymise the privacy data while carrying out any
analysis (as far as possible)
·
Utmost care to process the data only for customer
consented purposes
·
Ensuring that the data is deleted/archived on timely
basis based on the IT security policies and procedures.
·
Exactly knowing what data is considered as “privacy
data” as per various statutes and regulations e.g. HIPAA, GDPR (General Data
Protection Regulation)
·
“Anonymizing” data is the common practice to protect
“privacy data”
3. Why is Data Privacy such a concern when it comes to
AI?
·
Certain information is private, sensitive, secret, and
such data are restricted.
·
Such information is classified as “confidential” and
needs to be protected.
·
Compromising sensitive personal data may damage the
reputation of the organizations, violation of privacy rights and loss of
business.
·
If any confidential data breach happens, senior executives
are answerable to investors, government regulators and to consumers.
·
For instance, sharing medical data with pharmaceutical
companies and hospitals for medical research purposes, or for finding patterns
of a particular drug usage, can be violating a patient’s privacy even though
sensitive information is masked.
·
Hence, data privacy is a big challenge not just for AI
projects but in general.
4. How much is the end user aware of what personal
data is being collected and shared?
·
The answer is "it depends". However,
over a period of time, the awareness among the user community with respect to
privacy of personal data has increased.
·
Sometimes, the users simply accept the privacy
policies of the organizations/software download without reading thus allowing
organizations to use the personal privacy data beyond what they requested.
·
How many of us actually read the End User Licensing
(EUL) Agreements?
·
Because of some of the recent incidences such as
Cambridge Analytica, consumers are becoming aware as to how his/her data is
being used.
·
Also, various laws are being brought in to support the
privacy of personal data across the globe. Some of the regulations
include General Data Protection Act (GDPR), California Consumer Privacy Act of
2018 (CCPA)
·
The most recent regulations are General Data
Protection Act (GDPR) and California Consumer Privacy Act(CCPA 2018). Both are
aimed to protect customer “privacy”.
·
Some personal data are forcibly collected by
organizations and some are not.
·
For example, Google Map, Google search, Facebook, are collecting
all your data
·
Personal data is any information that relates to an
individual who can be directly or indirectly identified. Names and email
addresses are obviously personal data. Location information, ethnicity, gender,
biometric data, religious beliefs, web cookies, and political opinions can also
be personal data.
·
Key regulatory points of GDPR includes – providing
transparency how personal data is used, how data is stored, who has access,
what is the purpose of collecting such data
·
GDPR is for EU regions
·
Similarly, California Consumer Privacy Act (CCPA 2018)
for California state residents.
·
In July 2019, New York passed the Stop Hacks and Improve Electronic Data Security (SHIELD) Act. This law
amends New York's existing data breach notification law and creates more data
security requirements for companies that collect information on New York
residents.
·
The consumer has right to access, right to rectify,
right to delete, right to object
·
These regulations give more control over the personal
information that the organizations are collecting.
6. Do you think the evolution of AI technology in the next
5-10 years seems to be blocked by Data
Privacy?
·
Yes, very much. It is true that sometimes “privacy” of
data tries to block the data usage for AI purposes.
·
Particularly, organizations
dealing with sensitive and private data, like healthcare or finance, have lagged
due to regulatory constraints to protect users’ data
·
AI team may comprise of temp workers, contractors, not
just employees. Organizations are concerned about the data leak by people who
are working on AI Projects. Hence, organizations exercise caution providing
access to data to the right people.
·
Leak of privacy data can lead to huge penalties,
company’s bad reputation and the consequences can be severe.
·
Hence, AI projects normally require data access
approval from the top C-Suite persons to ensure that the data is available for
AI purposes which are intended to solve critical business problems or
challenges.
·
I strongly suggest that the AI Team reports to one of
the C-Suite persons to ensure that there is authority, and accountability at
the same time.
·
One thing you have to understand is , AI is leading to
significant benefits to the organizations in terms of increasing productivity,
optimization of the processes, reducing downtime, increasing product
reliability, increasing proactive preventive maintenance, prescriptive
analysis, and additional business generation.
·
But, “privacy” is equally important and to avoid any
possible breach of data privacy, the organizations have to invest money to put
proper controls in place so that their IT and systems are secured and protected
from threats.
·
Though there is an initial cost, the penalty on
account of breach of privacy data or on account of negligence is much larger
·
Hence, it is an advantage for the organizations to
spend additional money and protect their data, by implementing proper controls than
paying huge penalty later on.
·
Once the data privacy meets standards, and all the regulations
and proper controls are in-place, approval for accessing data that is required
for solving AI/Machine Leaning/Analytics business challenges can be provided
·
Yes, the consumers should be aware of what is “personal identifiable information” (PII).
·
The consumer also should be aware of how their data is
collected, and what is the purpose of it being collected
·
They should know whether the data is used for research
purposes or monetory benefits?
·
Based on all this information, the consumer should be
able to make a choice/decision whether he should give consent or not.
·
The consumer has to read the data agreement carefully
to check whether their data can be sold to others for monetory benefits
·
The consumer should have rights to modify the data,
erase the data or to decline - collecting such data
·
The consumer should be aware that most of the AI and
Analytics analysis can be done by anonymising the privacy data
appropriately
Other Important factors to consider:
·
Cryptographic methods like Homomorphic Encryption, Garbled
Circuits, Secret
Sharing and Secure
Processors and ultimately the generation of “Synthetic
data”.
·
“Synthetic data” can be used without risking privacy
of users
·
Anonymization removes some of the informational value
of the data, it can distort or completely destroy important correlations.
·
Also, Intel’s SGX system, enables secured Trusted
Execution Environment (TEE) which guarantees privacy through hardware.
No comments:
Post a Comment