Pythian Blog: Technical Track

Improving Security and Privacy in Snowflake with Column Data Masking

As data breaches become increasingly common, protecting sensitive data has become a top priority for businesses. One way to do this is by using data masking techniques that allow us to alter data so that it is still usable but cannot be used to identify individuals.

Snowflake offers a feature called dynamic data masking, which allows us to modify column values based on different masking policies. This feature is helpful in cases where you want to provide access to data to specific users or groups, but don’t want them to see sensitive information such as credit card numbers, social security numbers, and full names.

How Does Column Data Masking Work in Snowflake?

In Snowflake, column data masking works by defining a masking policy on a column or a set of columns. The masking policy specifies how the data in the column should be masked.

There are two types of masking policies you could implement in a column in Snowflake:

  • Static Data Masking: This involves replacing the original value with a static string or value. For example, you could replace a credit card number with a string such as “**** **** **** 1234”. This can be done as part of your ETL pipeline using Snowpipe or any other tool.
  • Dynamic Data Masking: This involves replacing the original value with a dynamically generated value based on a specific algorithm or function. For example, you could replace a social security number with a randomly generated number that follows the same format as a social security number. And this happens dynamically, showing the masked or non-masked value to different users depending on the masking policy and the user’s role. This is the feature we want to focus on in this blog post!

Something very important to keep in mind is that this feature is on the Enterprise edition or above, so you will not be able to use it on Standard.

However, this functionality has multiple business use cases if you have it available. Let’s discuss some examples!

Use Cases for Dynamic Data Masking

1. Call Centers

Call centers often handle sensitive customer information, such as credit card numbers, social security numbers and account information. Column data masking can protect this information from unauthorized access by call center agents. By masking sensitive columns, such as credit card numbers or social security numbers, call center agents can still access the information they need to perform their jobs while protecting customer privacy and preventing unauthorized access to sensitive information.

2. Protecting IP

Protecting IP is the second example. Research and development departments often handle confidential information, such as intellectual property, trade secrets and product designs. Column data masking can protect this information from unauthorized access by employees or contractors who do not need to see the full details. By masking sensitive columns, such as patent numbers or product codes, research and development teams can collaborate effectively while ensuring that only authorized personnel can access confidential data.

3. HR Departments

Finally, HR departments are a great use case for dynamic data masking. Human resources departments often handle personally identifiable information, such as social security numbers, employee identification numbers and medical records. Column data masking can protect this information from unauthorized access by employees who do not need to see the full details. By masking sensitive columns, such as social security numbers or medical records, human resources teams can manage employee data effectively while ensuring that only authorized personnel can access sensitive information.

Of course, these are just a few examples. In almost every industry, there is the handling of PII or sensitive data where column dynamic data masking could be implemented.

Implementing Column Data Masking in Snowflake

To implement column data masking in Snowflake, you will need to define a masking policy and apply it to the desired columns.

Here’s a small demo example of our call center scenario:

-- Create a role for call center agents
CREATE ROLE call_center_agent;

-- Grant the role access to the table
GRANT SELECT ON customers TO ROLE call_center_agent;

-- Create a column data masking policy to mask credit card numbers for call center agents
CREATE MASKING POLICY mask_credit_cards
    AS (value STRING)
    RETURNS STRING
    INLINE = (
        CASE
            -- Only show the last 4 digits of the credit card number to call center agents
            WHEN has_role('call_center_agent') THEN '****-****-****-' || substring(value, -4)
            ELSE value
        END
    );

-- Apply the column data masking policy to the credit_card column
ALTER TABLE customers MODIFY COLUMN credit_card SET MASKING POLICY mask_credit_cards;

In this example, we create a role for call center agents and grant them access to the customer’s table. We then create a column data masking policy that masks credit card numbers for call center agents by only showing the last 4 digits of the credit card number. We apply this masking policy to the credit_card column of the customer’s table.

With this policy in place, call center agents can only see the last 4 digits of credit card numbers when they query the customer’s table, while other users can still see the full credit card numbers. This helps to protect sensitive customer data while still allowing call center agents to access the information they need to do their jobs.

Limitations and Warnings to be Aware of When Using Column Data Masking

While column data masking is a powerful feature that can help protect sensitive data, there are some limitations and warnings that you should be aware of when using this feature. Here are a few examples:

1. Limited Masking Algorithms

Snowflake supports masking algorithms that do partial or full masking with different characters or replacement strings. If you require a specific masking algorithm not supported by Snowflake, you may need to implement your custom masking solution.

2. Impact on Query Performance

When column data masking is applied to a column, it can impact query performance, especially for frequently queried columns. This is because the data must be masked on-the-fly during query execution. You can use query profiling to understand how column data masking affects your queries’ performance and adjust as needed.

3. Risk of Unintended Disclosure

Column data masking is designed to prevent unauthorized users from viewing sensitive data. However, it is still possible for an authorized user to inadvertently disclose sensitive data, for example, by using a SELECT * statement that includes masked columns when exporting the data or doing a screen-sharing session. To prevent unintended disclosure, educating users on the correct way to query masked data and using appropriate security policies, such as row-level security, is important to further restrict access to sensitive data.

4. Impact on Data Types

Column data masking can impact the data type of a column, depending on the masking algorithm used. For example, if a full masking algorithm is used to replace a numeric credit card number with asterisks, the resulting data type will be a string. This can impact downstream applications that rely on the original data type and may require additional data type conversions.

Conclusion

Column data masking is pretty simple to set up and configure while, at the same time, being a powerful tool for enhancing the security and privacy of sensitive data in your organization. By masking sensitive columns, you can protect data from unauthorized access, prevent data breaches and comply with privacy regulations. With the right policies in place, you can ensure that only authorized users can access sensitive information while allowing employees to access the data they need to perform their jobs effectively.

No Comments Yet

Let us know what you think

Subscribe by email