Pseudonymized Data
Pseudonymization takes the most identifying fields within a database and replaces them with one or more artificial identifiers, or pseudonyms. For example a name is replaced with a unique number. The purpose is to render the data record less identifying and therefore reduce concerns with data retention and data sharing. The process can also be used as part of a Data Fading policy. Pseudonymized Data is typically used for analytics and data processing, often with the aim of improving processing efficiency.
The choice of which data fields are to be pseudonymized is sometimes subjective. Pseudonymized Data should include all fields that are highly selective, for example a social security or national insurance number. Less selective fields, such as birth date, zip code or postcode are often also included because they may retain sufficient detail to allow an Inference Attack, where such data is cross-referenced with other data sets, to reveal the replaced data. However pseudonymizing these less identifying fields can affect analysis and new data fields are often inserted, such as region instead of address, or year of birth instead of birth date.
Pseudonymized Data is not the same as Anonymized Data. When data has been pseudonymized it still retains a level of detail in the replaced data that should allow tracking back of the data to its original state. With anonymized data the level of detail is reduced rendering a reverse compilation impossible.
Care must be taken with understanding the handling of sensitive data because patterns in data may infer meanings that allow reconstruction of the source data. It is prudent to protect Pseudonymized Data with encryption algorithms such as Elliptic curve Diffie-Hellman (ECDH) and ideally with the use of Perfect Forward Secrecy (PFS) to safeguard the data. Pseudonymization can also be used to obfuscate code to help defend against scripted hacking attacks. For example the ShapeShifter appliance runs real-time polymorphism to continually rewrite html and scripts with obfuscated code that changes every time it is requested.