TransWikia.com

Best practices for providing PHI/PII data to users in organization

Database Administrators Asked by MartyB on December 3, 2020

I am looking for best practices for how to share data from a SQL Server database that contains PHI/PII to individuals that cannot view PHI/PII. In short, we maintain a SQL server database that contains 30+ columns of PHI/PII. We need to provide datasets for certain individuals that cannot access the PHI/PII columns, but can access the other fields to conduct different types of analyses.

Current structure: The database is 100GBs and is updated 4 times per day. All data resides on Azure SQL Server. The tables should maintain metadata so blob storage is not ideal (unless someone has an idea to maintain metadata in blob storage). The data will be accessed through PowerBI or Azure Databricks.

Several options come to mind:

  1. Create a DB Role and deny access to PHI/PII columns
  2. Create a new SQL server database and an ETL that copies non-PHI/non-PII data from one database to the new database
  3. Create a new schema and then create views of the tables which do not contain PII/PHI columns and then restrict access to users for this schema only
  4. Build an script/ leverage Azure Data Factory to copy data from the database to Azure table.

Security is number one priority. Any advice is much appreciated.

2 Answers

In Azure, I would prefer

Create a new SQL server database and an ETL that copies non-PHI/non-PII data from one database to the new database

PHI is serious stuff, and segregating the users in a separate database is both more obviously secure, and has other benefits. EG the database can be scaled and billed separately if you like, and have separate CPU, Memory, and IO.

Answered by David Browne - Microsoft on December 3, 2020

My first choice would be to use the traditional security mechanisms, which has been in place for decades. I.e., only grant permissions on what they should see.

If they are about to write their own queries, they might be annoyed if you grant some columns but not all (1), since SELECT * will choke on that. And, yes, some will do SELECT *, whatever we say. So, (3) gets my vote.

You might run into the Dynamic Data Masking track, and this might be useful if they aren't writing their own queries. Since you say that security is #1 priority, then you probably aren't OK with them being able to deduce info. Basically, the data "is there" through the WHERE clause but not SELECT list. I recently read an article where CROSS APPLY was used as a type of brute force data attack to get the SSN even though it was masked.

Answered by Tibor Karaszi on December 3, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP