Using Tokenization to Protect Data
- Maor Volokh
- 2 minute read
In recent years tokenization has been widely used in the financial industry mainly as a way to comply with PCI DSS regulations. Consequently, many companies have been asking where are the benefits and weaknesses of tokenization for protecting any sensitive data. In this article, we will examine the advantages and drawbacks of using tokenization.
What is Tokenization?
Tokenization of data is the process of converting cleartext, such as an account number, into a random string of characters, called a token, in order to protect the underlying information. The token itself serves as a reference to the original data but cannot be used to guess plaintext. Unlike encryption, tokenization does not use a mathematical function to protect sensitive data. The tokens themselves are randomly generated. Tokenization does not use a key or an algorithm, but stores the relationship between the original data and the token in a database, called a Token Vault. The original data in the vault is then secured using access control and other methods having the Tokenization solution vendor responsible for the security of the Token Vault.
The use of tokenization offers the following advantages:
· Not mathematically reversible - there is no mathematical relationship between the data that is used and the real data that it represents. Tokenization uses mapping, which is only stored in the Token Vault. So, if the instances holding the tokenized data are compromised and the tokenized data is stolen, the data cannot be utilized by an attacker.
· The structure and format are preserved – the data is maintained in the same format. For example, a social security number will have the same number of characters and delimiters, while the numbers themselves will remain random. In the case of credit card numbers, the last four digits of a payment card number can be preserved, so there is a reference to the actual credit card number. The printed characters might be all asterisks plus those last four digits. In this case, the merchant only has a token, not a real card number, for security purposes.
· Minimizes compliance efforts – typically, sensitive data is replicated across multiple sources for different applications and users. To comply with regulation, such as PCI-DSS, an organization would need to ensure that all the data across all instances is encrypted and that the instances themselves are protected (i.e. infrastructure security). With tokenization instead of worrying about protecting all the instances and data, you would only need to protect the Token Vault, as the data across all instances is meaningless.
With that said, there are factors to take into account when considering to implement tokenization. A critical factor when considering tokenization is the reduced application functionality. Legacy or 3rd party application logic might break once the underlying data is tokenized. For example, consider search and filtering - in order to maintain application functionality development resources are required and performance might be impacted.
Tokenization of Data - Common Use Cases
With that limitation in mind, organizations tend to implement tokenization for the following, more common, use cases:
· Payment card data - protecting payment card data reduces merchants' obligations under PCI DSS, as they don't need to ensure the entire technology infrastructure used to store and transmit this data meets the requirements of PCI DSS requirements.
· Personally identifiable information – tokenization is also used for protecting personally identifiable information (PII), including social security numbers, telephone numbers, email addresses, account numbers, etc. Since this unique identifier is woven into many systems, it’s very difficult to remove them from all these systems or protect the data as it is being used. Tokenization is used to protect this data to maintain the functionality of these systems without exposing the PII to attackers.