Comparing enterprise data anonymization techniques

There comes a time when data needs to be shared — whether to evaluate a matter for research purposes, to test the functionality of a new application, or for an infinite number of other business purposes. To protect sensitivity or confidentiality of shared data, it often needs to be sanitized before it can be distributed and analyzed.

A popular and effective method for sanitizing data is called data anonymization. Also known as data masking, data cleansing, data obfuscation or data scrambling, data anonymization is the process of replacing the contents of identifiable fields (such as IP addresses, usernames, Social Security numbers and zip codes) in a database so records cannot be associated with a specific individual, project or company. Unlike the concept of confidentiality, which often means the subjects’ identities are known but will be protected by the person evaluating the data, in anonymization, the evaluator does not know the subjects’ identities.

Thus, the anonymization process allows for the dissemination of detailed data, which permits usage by various entities while providing some level of privacy for sensitive information.

Data anonymization techniques
There are a number of data anonymization techniques that can be used, including data encryption, substitution, shuffling, number and date variance, and nulling out specific fields or data sets.

Data encryption is an anonymization technique that replaces sensitive data with encrypted data. The process provides effective data confidentiality, but also transforms data into an unreadable format. For example, once data encryption is applied to the fields containing usernames, “JohnDoe” may become “@Gek1ds%#$”. Data encryption is suitable from an anonymization perspective, but it’s often not as suitable for practical use. Other business requirements such as data input validation or application testing may require a specific data type — such as numbers, cost, dates or salary — and when the encrypted data is put to use, it may appear to be the wrong data type to the system trying to use it.

Read more at TechTarget’s SearchSecurity

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.