Data Masking Algorithms

The open source data masking framework has potential to be exploited in the industry as well as in the scientific domain. Within data masking framework three main concepts can be identified as exploitable.

First is Test data management which can be used to produce test data. Data masking users can make sure employees don’t do something wrong with corporate data, like making private data sets public, or moving production data to insecure test environments. In reality, masking data for testing and sharing is almost a trivial subset of the full customer requirement.

The real goal is administration of the entire data security lifecycle – including locating, moving, managing, and masking data. The mature version of today’s simpler use case is a set of enterprise data management capabilities which control the flow of data to and from hundreds of different databases or flat files. This capability answers many of the most basic security questions we hear customers ask, such as “Where is my sensitive data?” “Who is using it?” and “How can we effectively reduce the risks to that information?”.

Compliance is the second concept and major reason stated by users as reason they need masking products. Unlike most of today’s emerging security technologies, Early customers came specifically from finance, but adoption is well distributed across different segments, including particularly retail, telecomm, health care, energy, education, and government. The diversity of customer requirements makes it difficult to pinpoint any one regulatory concern that stands out from the rest. During discussions we hear about all the usual suspects – including PCI, NERC, GLBA, FERPA, HIPAA, and in some cases multiple requirements at the same time. These days we hear about masking being deployed as a more generic control – in form of protection of Personally Identifiable Information (PII), health records, and general customer records, among other concerns; but we no longer see every company focused on only one specific regulation or requirement. Now masking is perceived as addressing a general need to avoid unwanted data access, or to reduce exposure as part of an overall compliance posture.

Third concept is Production Database Protection:  While replacement of sensitive data – specifically through ETL style deployments – is by far the dominant model, it is not the only way to protect data in a database. At some firms protection of the production database is the primary goal for masking, with test data secondary. Masking can do both, which makes it attractive in these scenarios. Production data generally cannot be fully removed, so this model redirects requests to masked data where possible.

Within masking solution many masking methods can be used for data obfuscation such as: data shuffling, data substitution, randomizing, nullifying and encryption.

Data masking Framework
Data masking Framework