Sunday, February 19, 2012

Create Dynamic table from CSV

Hi,

I'm trying to import a csv file directly to my database every month. The contents of the file stays the same, however the format of the columns may vary. For example, 1 month I can have the following:

Time, Probe1, Probe2, Probe3, Probe4

Whereas the next month I can have something like this

Time, Probe2, Probe3, Probe1, Probe4

The "Time" column will always be on the left side, but the probes may vary in their placement.

I'm importing this csv to a temp table, where I then run a query to select any new data and enter that in my main tables. The problem is that when i import the csv, if the placement of the columns has changed, the data gets entered in the wrong locations.

Is there any way to create this temp table dynamically, based on what the header columns of the csv file are?

Any help appreciated!

I think the best approach would be to configure your flat file connection manager to read each row as a single column. This column would then be input to a script transformation component which would parse the line into columns and, based on the column order in the header row, put the values into the appropriate output columns.
|||Here is a code sample for a transformation script to illustrate the idea. The Row.Line is the input column that contains an entire line of a text file. This line is taken and split into an array by the comma delimiter. The first time we see the line, it is a header with column names. A boolean flag is used to detect the header condition and we save the column names in a variable. For every subsequent row, we iterate through the column names and match them to an output column that has been defined for the script, assigning the value parsed from the file into the correct column. Using this method, the order of the columns in the text file is irrelevant.

Code Snippet


Dim Line As String = Row.Line
Dim LineTokens As String() = Line.Split(","c)

If IsHeader Then
HeaderColumns = LineTokens
IsHeader = False
Else
' match input columns to output columns
With Output0Buffer
.AddRow()

For i As Integer = 0 To HeaderColumns.Length - 1
Select Case HeaderColumns(i)
Case "Time"
.Time = LineTokens(i)
Case "Probe1"
.Probe1 = LineTokens(i)
Case "Probe2"
.Probe2 = LineTokens(i)
Case "Probe3"
.Probe3 = LineTokens(i)
Case "Probe4"
.Probe4 = LineTokens(i)
End Select
Next

End With
End If


|||

Thanks a lot for the help...

Looks like it'll solve my problem Smile

Thanks!

|||Just make sure you revist the thread and marks it as answered |||

Ok, Since i'm not very proficient with SSIS...

here's what I have going right now.

Under Data Flow, I have a Flat File Source which uses my Flat File connection manager...there's only 1 column, just like you said.

Then I have it going to the transformation script component.

Under Input Columns, I only have Column 0, and its output alias is the same.

Under Inputs & Outputs, I didn't change anything...I didn't add any output columns since I want this to be done dynamically...

Then for the script,

I put in the script you gave me, and this is where i'm having problems...

I renamed Row.Line to Row.Column0,

but there's still a few undefined...such as IsHeader, HeaderColumns, and OutputBuffer.

Where do I define these?!

Thanks!

|||IsHeader and HeaderColumns are global variables that I didn't show in the example. They are shown below would go right under "Inherits UserComponents" in the script.

Code Snippet


Dim IsHeader As Boolean = True
Dim HeaderColumns() As String


Output0Buffer (don't forget the "0") is built for you to the definition that you supply on "Inputs and Outputs". This object will have the same name as your output (without spaces), but with "Buffer" appended to it. You should have one now, but likely just forgot to add the "0" (the default output is "Output 0" ).

You can't leave your output columns blank as they can't be created dynamically. You said your columns were constant, but could appear in different orders in the source. The script will worry about the order of the input columns, but you still need to define the constant output columns.

|||

Beauty.

It's working now.

Thanks a lot for the help.

No comments:

Post a Comment