SENSITIVE DATA! It’s an fascinating matter! On this put up I’m attempting to clarify tips on how to hash information to extend safety throughout ETL. Assume that we’ve delicate information saved in a number of secured supply techniques. The supply techniques are positioned in numerous international locations and completely different areas. Because the supply techniques themselves are secured, how we are able to cowl information safety wants throughout ETL course of to learn information from supply techniques and cargo into staging space? Aside from utilizing secured community infrastructure, VPN, community tunnelling and so forth. we have to cowl information layer safety to extract delicate information. Among the finest methods is hashing information when it’s extracting from supply databases. Hashbytes is a T-SQL operate that’s out there in SQL Server 2005 and later. As you would possibly know there are a lot of hashing algorithms, however, completely different SQL Server variations are supporting completely different vary of hashing algorithms. As an illustration SHA1 is supported by SQL Server 2005 and later, however, if you’re wanting safer hashing techniques like SHA2, 256 (32 bytes) or 512 (64 bytes), you must use SQL Server 2012. Truly the hashbytes operate will return null in earlier variations of SQL Server. If you’re in search of a better stage of safety like SHA3 that’s initially often known as “Keccak” you must await it for a very long time as based mostly on my investigations it’s not supported even in SQL Server 2014 OR you possibly can write your personal SHA3 code OR simply depend on some third social gathering codes out there on the Web! So let’s get our arms soiled with utilizing hashbytes in numerous variations of SQL Server.
SQL Server 2005:
SELECT @@model [SQL Server Version]
, hashbytes(‘SHA1’, ‘123456’) [SHA1]
, hashbytes(‘SHA2_256’, ‘123456’) [SHA2_256]
, hashbytes(‘SHA2_512’, ‘123456’) [SHA2_512]
Let’s run the identical question in SQL Server 2008 and see the outcomes:
Once more the end result for SHA2 is null.
And know we’re testing SQL Server 2012:
We are going to see the identical outcomes retrieved from SQL server 2014.
So, the thought is DO NOT LOADING SENSITIVE DATA AT ALL. Consequently, it appears the one manner that the information would possibly leak is that any individual sniffs the SQL codes which can be retrieving information in reminiscence (notice that our assumption is we’ve a safe community infrastructure). Now we are able to put our T-SQL code into an “OLE DB Supply” element in SQL Server Integration Providers (SSIS) and we may have the hashed information (VarBinary) within the staging space.