Wednesday, April 15, 2009

SSIS: from Flat File Source to OLE DB destionation, everyday problems

While performing one of the most regular and simple tasks in SSIS – transferring data from a flat file to a table – I struggled with two, maybe the most regular errors coming along with this task. I struggled with them for hours – simply because these things are so simple and basic that most manuals don’t bother to give detailed instructions on them. Anyway, no one needs to spend HOURS inventing this wheel… These two errors are: Output column [column name] failed because truncation occurred and Invalid character value for cast specification.

Let’s, first, fight the truncation problem.

So, I have a simple flat file TEST1.txt:

[Row 1] AAABBB
[Row 2] aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa8
[Row 3] bbbbbbbbb
[Row 4] 12345

The file has 4 rows; the 1st row contains the field names; the “” symbol is a field separator; carriage return is a row separator; there are no values in row 3, column ‘BBB’, and no value in row 4, column ‘AAA’.
This data was planned to go to a table TEST17 with two fields: AAA varchar(30) and BBB varchar(30). However, there was an immediate warning posted on the destination:

Truncation may occur due to inserting data from data flow column "AAA" with a length of 50 to database column "AAA" with a length of 30…

(and same for a column “BBB”).

Speaking of “a length of 50”: beware that 50 is a default value for all columns of a flat file source; and, by default, they all are considered to be strings.
Speaking of warnings... from my small experience, they rarely allow you to pass them by harmlessly. And, sure enough, an attempt to run this package ended in a massive error output:

Data conversion failed. The data conversion for column "AAA" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page."
Error: 0xC020902A at Data Flow Task, Flat File Source [1]: The "output column "AAA" (198)" failed because truncation occurred, and the truncation row disposition on "output column "AAA" (198)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Error: 0xC0202092 at Data Flow Task, Flat File Source [1]: An error occurred while processing file "C:\Test1.txt" on data row 2.


(...and so on. Thank God I was warned!) What to do now?

Option 1. So, the truncation row disposition on "output column "AAA" (198)" specifies failure on truncation? Let’s then change this.
Flat File Source – right click opens Flat File Source editor; choosing the Error output tab opens the window:

In a column Truncation open a drop-down list and choose Ignore Failure. Now the package runs with success, and in the result table we have a properly truncated string. (The warning messages, however, keep showing up).

Option 2. To put the column lengths into an accord - instead of truncating them - looks like a more peaceful solution. To do this we need or shorten the column length of the source file either make longer that of the destination.

Let's try shortening first. Right click on Flat File Source and choose Show Advanced Editor. The following window opens:


- where I’ve already switched to the Input and Output Properties and expanded the Flat File Source Output list, and, both, External Columns and Output Columns inside it. I’ll try to change the length of the external column AAA. Highlighting it shows the column’s properties in the right panel, where I change the column length to 30. Now I need to do the same to the outptu AAA columns. (Noteworthy that here, among other things, we also could change the truncation behavior of the column!).


The OK-exiting from this window, however, only changes the warning message to:

Validation warning. Data Flow Task: Flat File Source [1]: The external metadata column collection is out of synchronization with the data source columns. The column "AAA" needs to be updated in the external metadata column collection.

NB! If you go back and try to edit properties of the Flat File source now, you’ll risk to be forced to cancel changes just done, in a form a message:


Just answer “No”! (It will not bother you again).

The right place to match the metadata now is the Flat File Connection Manager. Go (down) to the Connection Managers panel, right click on the Flat File Connection Manager, choose Edit and you’ll see the Flat File Connection Manager Editor window, where – in Advanced tag - we choose the ‘AAA’ column from the column list and could, finally, set up the OutputColumnWidth property to 30:


OK exit… The warning message about the column ‘AAA’ should have gone now.
Of course, I still have the old warning for another column:

Validation warning. Data Flow Task: OLE DB Destination [158]: Truncation may occur due to inserting data from data flow column "BBB" with a length of 50 to database column "BBB" with a length of 20.

Let's see what happens if we try fight the issue from the destination side
The important thing here is that we cannot change the field length of the destination table from the place where it’s kind of suggested – from the Advanced Editor of the OLE DB Destination. I call it a “suggested” place because the value-in-question is there and looks open for editing:

However this is not the case: after changing this value to 50 we’ll only end up with yet another warning message:
Validation warning. Data Flow Task: OLE DB Destination [158]: The external metadata column collection is out of synchronization with the data source columns. The column "BBB" needs to be updated in the external metadata column collection.

Furthermore, when you open the same Advanced Editor window again, you’ll see that the length has been SILENTLY changed back to 20! So, no, we cannot change the length of the destination table’s column here, and maybe for a good reason: the destination table may be an inseparable part of the database. My conclusion is: we don’t change properties of a destination existing in the database - instead, the source’s properties are to be adjusted (or truncation ignored). However, if the destination is a new table that is being created in a course of SSIS execution, we have all power over it, and can request a column length we need.



And after this is done, we are, finally, warning messages clean!

…Now it’s time to pay attention to that fact that a source file, actually, suggests an integer value for a column ‘BBB’. The result, however, totally depends, again, on the destination table’s properties. Let’s change them to have an integer value in column ‘BBB’:



OK-exit, no warning messages, excellent… However, the test run generates the whole train of error messages, of them the most important:

Invalid character value for cast specification.

As I’ve mentioned, I spent long time looking for comments on this error message, while the solution was so on a surface! You either need to change the RetainNulls property to TRUE – in the source’s Properties window, or check the box Retain null values from the source as null values in the data flow – in the Flat File Source Editor. Here is the resulting table:

AAA BBB
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 8
bbbbbbbbb NULL
NULL 12345

However, what’ll happen if we need to change the DML statement in the destination as follows?
CREATE TABLE [TEST17](
[AAA] VARCHAR(30),
[BBB] int NOT NULL )

An error, of course:

Error: 0xC020901C at Data Flow Task, OLE DB Destination [158]: There was an error with input column "BBB" (246) on input "OLE DB Destination Input" (171). The column status returned was: "The value violated the integrity constraints for the column."

This error could not be resolved by editing source-destination properties, but here we, finally, have a real SSIS thing – we’ll need to place a data transformation task in between the source and the destination. I used a Derived Column Task with the following expression for the derived column:

ISNULL(BBB) ? "0" : BBB
(To the word, an excellent source on SSIS expressions is at: http://technet.microsoft.com/en-us/library/ms141827.aspx )

Friday, August 1, 2008

"The operation has timed out" while creating a snapshot (SSRS)

I've deployed a rather big report project with several subreports and a huge main one. That main report had two possible values for a parameter, and, depending on them, could run from 1 to 6 minutes. Well, for a customer 6 minutes was unacceptable wait time, but then the data was not supposed to be updated often, so a snapshot option looked good in this case.
In the report Properties window I opened the Execution tab and chose "Render this report from an execution snapshot" option. Then I've configured a schedule to have snapshots created daily, made sure that the "Create a snapshot of the report when this page is saved" option is checked (which is a default state), and clicked OK to save the properties...

NB! As I've mentioned, there were only two possible values for the only parameter of the main report. To run a report with parameters from a snapshot you need to have a separate version of your report with default values for each set of parameters. So, I actually had TWO reports instead of one, corresponding to two (hard-coded as a default for each version) possible values of a parameter. If you cannot allow yourself such a luxury, you need to turn to a cashing option instead...

So, I had to perform above mentioned properties setup twice. However, when I did that for the second value of a parameter (with which the report was supposed to run for 6 minutes), after just about 2 minutes of execution I got an error: "The operation has timed out"! Obviously, that was pertaining to the operation of the properties saving, because the report execution timeout was set up to unlimited time. After checking this issue on Internet I found out that other people have been fixing it by changing timeout value in system parameters... oh well, it was too much not only for me, but for our sysadmin as well. He tried several fixings, but the result was the same. After one of those fruitless attempts I looked at the project online and found that all the components are up and running except this big report, of which I got the following information:
"The selected report is not ready for viewing. The report is still being rendered or a report snapshot is not available. (rsReportNotReady)". That looked as a dead-end situation.

But then, in a due time of 6 minutes the report became available online, and has been available since then with no delay! Obviously, in spite of an error message, behind the scenes the snapshot was continuing to create!

CONCLUSION: don't panic upon seeing "The operation has timed out" error message while creating an initial snapshot of a long-running report. It could well be that in a due time the snapshot is successfully created. I killed a couple of hours (so did our sysadmin) to resolve the timeout problem, while in fact it had no impact on the resulting performance.

Wednesday, July 16, 2008

Query to look for text in procedures

This is a query that I use every day! Say, you've changed a datatype of a field, or renamed it, and now you need to change it everywhere where it's been called from procedures. Or you have hardcoded something (or someone before you did it), and now you need to change it or, better, get rid of it. Anyway, you will have to look for some set of symbols in all your procedures. There are, actually, quite a few samples of how to do this on the Internet, this is just one of them:


USE [YourDataBaseName]
GO
-- create temporary Result table to keep names and text of the procedures in the database :
CREATE TABLE #Result
(TextField varchar(max), ProcName varchar(100))


-- create temporary ProcName table with the names of all the procedures in the database:
CREATE TABLE #ProcList
(ID int IDENTITY, ProcName varchar(100))


-- populate the ProcName table with the procedure names:
INSERT #ProcList SELECT [name] from sys.procedures


-- get the number of procedures (to be used in the loop below):
DECLARE @NumberOfProcs int
SELECT @NumberOfProcs = COUNT(*) FROM sys.procedures


-- loop to populate the Result table:
DECLARE @i INT
SET @i = 1
DECLARE @ProcName varchar(100)
DECLARE @SQL varchar(2000)
WHILE @i <= @NumberOfProcs
BEGIN
SELECT @ProcName = ProcName FROM #ProcList WHERE ID = @i

-- and ProcName like '%proc name%'
SET @SQL = 'INSERT INTO #Result (TextField) EXEC sp_helptext ' + @ProcName
EXEC (@SQL)
UPDATE #Result SET ProcName = @ProcName WHERE ProcName IS NULL
SET @i = @i + 1
END


-- look for a string you need [your string] in the Result table
SELECT * FROM #Result WHERE TextField LIKE '%[your string]%'


-- clean up
DROP TABLE #Result
DROP TABLE #ProcList


--
Additional observations.
1. The general idea is to use the sp_text procedure.
2. Now, Pinal Dave, who runs a very useful blog on SQL server, has something much shorter for this (and guess what, it works just the same, even faster :) ):

USE [YouDatabaseName]
GO
SELECT DISTINCT o.name ,o.xtype
FROM syscomments c
INNER JOIN sysobjects o ON c.id=o.id
WHERE c.TEXT LIKE '%[your string]%'

Tuesday, July 15, 2008

Error message "String or binary data would be truncated"

It is Msg 8152, and it's usually followed by "The statement has been terminated" message. It's actually very easy, still, I learned to get scared of it after, like, 5-th (10-th?) time I had it.
--
What happened. Nothing really bad: you just tried to populate a field with a string longer than the field's designed length. It's good to know: maybe you'd reconsider such an update.
--
Workaround. It you, however, want to stick a long string into a short field, just issues a statement: SET ANSI_WARNINGS OFF. You'll be able to commit updates without further interruption: extra symbols will be silently truncated.
--
Additional observation. One of the web articles explaining this error had a comment: "Profiler is handy in troubleshooting this error. " Never had a change to find out what this means... (feels like it's worth trying)

Saturday, July 12, 2008

Error message "The row value(s)... alter multiple rows"

The full text of the error message is following:

"No row was not updated.
The data in row [number] was not committed.
The row value(s) updated or deleted either do not make the row unique or they alter multiple rows ([number] rows)."
--
How it happened. I received this error message (and, indeed, was not able to commit any updates) when I was trying to change a value through the 'OPEN TABLE' option of SSMS. I saw that I am able to change values in some rows, but other give me this message. How could I alter multiple rows if, while working directly in the open table, you can only update 1 value in 1 row at a time? And, of course, T-SQL statement was allowing to update any values with no error messages.
--
Explanation. It could be, indeed, that you have several rows with all the same values. One of the cases - you've recently dropped an identity field. Another case - you've just made changes to a row that turned it into a duplicate of another row. It is allowed, however, this row immediately turns into a read-only (it has a lighter grey color than other rows), and now, if you try to update its duplicate - you get this error message.
Workaround. There are cases when you want to make some hand-made changes directly in the table. If the situation of a non-uniqueness could occur, create a new identity field - then do whatever updates you need - then drop the new identity field.
--
Additional observations. The same can happened if there are constrains that tie a table with other tables, although I did not investigate this case too far.