Monthly Archives: April 2016

PolyBase: I now Pronounce you SQL and Hadoop


Are you interested in Hadoop? With SQL Server 2016 just around the corner, a powerful new feature is being included called PolyBase.  To summarize what it is, just think about it as a marriage between SQL Server and Hadoop.  It is a way to store and query data on HDFS using SQL Server.

HDFS is a distributed file system that works differently than what we’re used to in the Windows OS side of things; the general principle is to use cheap commodity hardware that replicates data in order to account for availability and to prevent loss of data. With that in mind, it makes a great use case to store a lot of data cheaply for archiving purposes or can be used to store large quantities of data that been to be processed in large quantities as well.

For more information please visit:

Now if you want to try it out for yourself, make sure you install the PolyBase Engine (from the SQL Server setup) and feel free to try the modified code sample below.

Read the rest of this entry »


Posted by on April 27, 2016 in Other, SQL Server 2016


Tags: , ,

SQL Server 2016: What’s the best new feature?

With SQL 2016 around the corner, and my blog without a post this year, I thought I would write about some of the new features.  However, I’m not sure which feature I like the most to talk about! So I’m going to summarize a few of the new features I have been exposed to and see which ones my readers like the best.

Stretch Database:

The idea is simple, what if you could “stretch” your database between your local on-premise server and Azure? That would be fantastic of course!  It’s a wonderful new feature in which you configure your database to push certain “cold” data into the Cloud (specifically Azure) while your “hot” data sits on your local on premise server.  This is an excellent way for archiving data.  But how do the users access the “cold” data? Simple, they just keep querying the same way the have been! From a logical perspective, this is very similar to Table Partitioning where the DB engine understands where the data is stored and the application does not have to change the way it behaves with the database.  Just keep in mind, querying older data will result in longer wait times for the data to be retrieved from Azure.

Always Encrypted:

In today’s world, security is a very big concern. And what is the gold that the technology thieves are after? Your DATA! Nothing else is more important than your data.  Now how does this feature differ from Transparent Data Encryption? Well in a lot of ways. First, TDE encrypts the entire database at rest. So if someone pulls the Database off the server, or it’s backup, they cannot access it without the proper certificate. If they are able to gain access to your server while the database is online, they can read all of the data right off of there as plain text.  Access is usually not a problem for expert hackers.

Now Always Encrypted works in a different way. Data is encrypted, column by column, using a certificate that is not stored inside the SQL Server Engine. Ok so you may think, well TDE encrypts the entire DB not just column by column. Yes, but it does not store the data encrypted as Always Encrypted does. Also, in most cases you do not need to encrypt the entire database, you just need to encrypt the sensitive data. Additionally, with SQL 2014 and above, you can can encrypt the backups without having the need for TDE. So when your backups are moved between locations, they are encrypted.

Let’s not get away from the main point here. Now with Always Encrypted, the certificate to Encrypt and Decrypt the data is used on the client side. So Encryption and Decryption is done outside the DB engine which allows it to scale a lot better.  Encryption can also be done deterministically instead of randomly.  Let me explain that a bit more.  If you have a unique ID like a national ID (let’s use 123456789 for example) you can query against that because the hash used to encrypt it is deterministic. So if you were to store the value twice in the same table (or a different table using the same certificate) you would get the same hash.  This improves query performance tremendously.  Deterministic encryption would not be wise for a column that has few values however, such as gender, because someone can quickly figure out the pattern in the data.

So to sum up this feature, those without access to the certificate cannot decrypt the data (including your DBAs). Data is encrypted even when the database is opened, unlike TDE, so if someone was to gain access to your system they would still need the certificate to decrypt the data. You can use different certificates for different columns, even within the same table, which can further increase security.

Dynamic Data Masking:

Please don’t confuse this with a security feature, because it really is not.  This feature is designed to display a “mask” over specific data. For example, for email it can display for the email. Now the data is not stored like that, and people with elevated privileges can see the data as clear text.  However, this is very easy to implement and does not require a lot of planning.  It can be used as a “quick fix” to potentially block data that you don’t want others to see on reports and webpages that they may have access to for which you are not aware.  It’s handy to keep not-so-sensitive data blocked, but is not a replacement for something like Always Encrypted.

Row Level Security:

This feature has been around in other database engines, but is new for SQL 2016.  It’s very simple, you programmatically allow for certain users to see certain data in a table.  It’s performance is better than using views or other types of logic to do row-by-row security since the engine is optimized to process the Row Level Security features.You can even block your admins from getting access to the data if they are not supposed to. A small disclaimer, they can always disable the functions that force the security and get by it so make sure you are auditing those actions.  This is a great way to allow for multi-tenancy within a database since you can exclude results to those that do not have permissions.


I like to say it is a marriage between SQL and Hadoop.  PolyBase allows for you to query using your SQL Server Engine data that is stored in Hadoop.  It also allows for you to store data into Hadoop by creating External Tables and loading data into them.  It’s a great way for SQL Server to stretch out and touch HDFS while allowing for SQL professionals to access data on HDFS using regular T-SQL statements. It’s definitely a great bridge between the two worlds and will allow many data professionals to get closer to understanding and using other technologies outside the SQL Server landscape.

New SSRS 2016 Portal and Mobile Reports:

SSRS has finally gotten a face lift!  The portal looks much cleaner and now allows for two new types of objects to be placed. The first is the KPI (key performance indicator) which is basically just a tile on the Reports homepage that displays a value. It’s useful if you want to have specific information highlighted and readily available.  The other new object you can put up there is mobile reports.  With the acquisition of Datazen, Microsoft has basically added the functionality right into SSRS 2016. They look fantastic on a regular screen, tablet, and even phone.  It’s a great step forward in terms of making reports look great and easy to navigate.  The road map also has PowerBI reports being available to on-premise SSRS 2016 implementations. You can read all about the road map here:

The Verdict?

My favorite feature is a battle between PolyBase and Always Encrypted. I love all the new features (even the ones I did not list here) but I feel like these two specifically push the boundaries of what a DBMS can do and I love it!


Posted by on April 19, 2016 in Other, SQL Server 2016



%d bloggers like this: