RSS

Finding and removing hidden characters in your data using dynamic T-SQL

05 Oct

It has been a long morning for me. It was supposed to be a standard run for our quarterly data aggregations but one of the data import SSIS packages kept failing. The problem was a conversion problem from varchar type to float and I kept getting the following error:

Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.

However, my data was numeric I checked it many times… or was it really 100% numeric?

Apparently there are some crazy hidden characters that you cannot see when you just glimpse at data. One of the best ways to find out if your data has some non-numeric characters in it is by running a simple query.


SELECT ColumnName
FROM TableName
WHERE ISNUMERIC(ColumnName) = 0

Pretty straight forward. But which character in my data is non-numeric? Well you could use the CHARINDEX function to figure out where a certain character set exists in your data but you would have to know which code to use. The link below has the ASCII Character set:

http://www.asciitable.com/

I didn’t have time to test all 255 characters to see what was in my numeric field. So I had to write some dynamic t-sql to help me out. In the code below, just insert your own table name (with schema if you want) and column name.


DECLARE @tablename VARCHAR(1000) ='Schema.Table'
DECLARE @columnname VARCHAR(100)='ColumnName'
DECLARE @counter INT = 0
DECLARE @sql VARCHAR(MAX)

WHILE @counter <=255
BEGIN

SET @sql=

'SELECT TOP 10 '+@columnname+','+CAST(@counter AS VARCHAR(3))+' as CharacterSet, CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') as LocationOfChar
FROM '+@tablename+'
WHERE CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') <> 0'

PRINT (@sql)
EXEC (@sql)
SET @counter = @counter + 1
END

After running the code, you will see in the results table which character yielded results and you can go ahead and update your data accordingly. It will display the ASCII character code number and the first position of that character in your column. In my case, I had a problem with character #13, which is a carriage return that was added to each row of data in my flat file and somehow made it to my staging table during import. So I ran the following update to fix the problem:


UPDATE Schema.Table
SET ColumnName= REPLACE(Columnname,CHAR(13),'')

Previously I was cleansing data with Char 32, which is a space and cannot really be seen by just glancing at the data. So now I have a quick way to go through and figure out what characters exist in any column on my table. The code can be used for finding those annoying hidden non-alpha numeric characters or if you are just interested to see what type of characters exist in your data.

Advertisements
 

Tags: , ,

5 responses to “Finding and removing hidden characters in your data using dynamic T-SQL

  1. ToOsIK

    October 8, 2012 at 5:12 AM

    Thats a nice clever bit of SQL, thanks for sharing Ayman!

     
    • Ayman El-Ghazali

      October 8, 2012 at 7:38 AM

      Your welcome. It was a pain in the neck trying to find that darn character and it was holding up a production run.

       
  2. tony yardigan

    October 13, 2012 at 9:20 AM

    why do you store numeric data on a char column? Wouldn’t be better to use a numerical column instead?

     
    • Ayman El-Ghazali

      October 13, 2012 at 1:42 PM

      This data was coming from a flat file as part of our ETL process. In general, when we bring in flat files to our staging tables and we don’t have too many constraints on the fields. That we can do our transformations to clean up the data and convert it where necessary. In this case, the last column imported from the flat file was numeric and had the hidden character in it so it failed on conversion. Our destination table has the proper data types in it, so we are not storing numeric data in a char column, just staging it that way.

       

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: