Ok, so one of the problems that we come across in the development field is the fact that data is not always consumable in formats that we would wish to get, it may come in tab delimited, pipe delimited, in CSV files, whatever – the main thing here is we need to automate a process that imports the data into your database that has as few moving parts as possible (so when not if you have to dig back in the code you are not left wondering how the hell you built the thing in the first place.)
Turns out it is very doable but a long drawn out process in SQL Server to build a .dtsx package that accomplishes the entire task! I have detailed the steps below as I just know that this will come back and be of use later.
Creating a Job in SQL Server to Import Flat Files
What You Need:
Microsoft SQL Server Management Studio
MSFT SQL server
Overview:
When data comes in from outside sources in flat files that are delimited you need a way to open the files, import the data into the database, and archive the files just in case you have a need to rerun the job, or to take a look at the raw data files in the future. The script will archive the files that have already been processed in folders bearing the month and the year that corresponds with the date the job ran that kicks off the .dtsx package.
Tables:
- processed_files – This table will hold the files that have been processed so that 1) you have a record of what has been done, and 2) if the job fails you at least know where it failed.
- data_table – This is the table that you are importing data into and will change depending on the files that you are importing.
Steps:
- Set up Variables
- Set up data connections
- Displaying and using the toolbox
- Add Foreach Loop Container
- Add Execute SQL Task (Check Processed)
- Add data Flow Task
- Add Flat File Source
- Add Multicast
- Add OLE DB Destination
- Add Row Count
- Insert Constraint between Check processed and the Import file Data Flow Task
- Add Execute SQL Task (Register Processing)
- Add Script Task
- Add File System Task (Move Files)
- Add Execute SQL Task (Update processed table)
Create processed files table:
CREATE TABLE [dbo].[file_control] ( [ImportedFilename] [varchar](100) NOT NULL, [RecordsImported] [int] NOT NULL, [ProcessDate] [datetime] NOT NULL )
Set up Variables:
To display the variables window in Microsoft SQL Server Management Studio hover over the ‘View’ menu item then scroll down to the ‘Other Windows’ option, the resulting fly-out menu should display the variables display option.
Figure 1 – View Menu Bar
The table below lists all of the variables needed to set up the file import solution on the server. These variables will be manipulated later by scripts as well as used as output and input parameters in several of the steps we are about to perform.
|
Name |
Scope |
Data Type |
Value |
Example |
| Destination Folder | DTS Package Level | String | Archive folder. However, you will need to insert the fully qualified name to the source folder or you will get the error that is visible in Figure 2 – FilePath error is variable is left empty. | Fully qualified path. End with a forward slash (\). If you are developing this on a network do not use mapped drive letters use the \\ format. |
| Destination Root | DTS Package Level | String | Path to the root of the archive folder. | Fully qualified path. End with a forward slash (\). If you are developing this on a network do not use mapped drive letters use the \\ format. |
| FilePath | DTS Package Level | String | The path and name of the current file in the Foreach loop container. You will need to insert the fully qualified name to the source folder or you will get the error that is visible in Figure 2 – FilePath error is variable is left empty. | Fully qualified path. End with a forward slash (\). If you are developing this on a network do not use mapped drive letters use the \\ format. |
| FilePattern | DTS Package Level | String | The extension or name that you want to use as a filter when determining which files the Foreach loop container will process. | *.txt, or SALES*.txt |
| MonthNameFormat | DTS Package Level | String | The Visual basic or C# format specifier for the month part of the date. More information can be found in the script section of this document. | MMMM or MMM |
| Processed | DTS Package Level | Int32 | Will hold a flag that will be set to true if the file has already been processed. | 0 |
| Records | DTS Package Level | Int32 | Will hold the record count, or the number of rows that are present on the file. | 0 |
| SourceFolder | DTS Package Level | String | The source folder where the files that you will be processing reside. will be | Fully qualified path. End with a forward slash (\). If you are developing this on a network do not use mapped drive letters use the \\ format. |
Figure 2 – FilePath error is variable is left empty
A word about scope:
Scope is very important in the processing of the files, you may be tempted to go with the thought that a variable.
Set up data connections
Figure 3 – Connection Managers tab with context menu.
This .dtsx package will need three connections to add a data connection you will need to right click on the blank area in the ‘Connection Managers’ tab. You will need three connections:
- New Flat File Connection
- New File Connection
- New OLE DB Connection
New Flat File Connection
The connection to the file is where you will have to set up the column mappings in order to get the data from the file to the table in the database. In this example give the connection the name ‘TextFile_Connection’, the rows are separated by a carriage return {CR}{LF}, the format is delimited, and there is one header row that we have to skip before we get to the actual data. See Figure 4 – Flat File Connection General Tab for a view of these settings.
The ‘Columns’ tab is a bit counterintuitive as it is not where we define the actual columns, it is where we set the column delimiter and the row delimiter for the rows that follow the header. In this instance we are dealing with a tab delimited file and each record is separated from the next by a carriage return. See Figure 5 – Flat File Connection Columns Tab for a view of these settings. You may be tempted to skip the next step, especially if the preview pane shows actual data from your input file; if you do you will miss one of the most important steps. Conversely, if you do not see any data in the preview grid after you navigate to the file you want to import then don’t worry, this is actually pretty common, you can add the columns manually in the next tab.
The ‘Advanced’ tab, allows you to change the name of the columns as well as change the data type of the columns. If you are dealing with a data file that does not include column headers, or for some other reason you are unable to get the flat file wizard to recognize the file you have selected you can press the ‘new’ button on the bottom of the screen to add columns to the connection manager. Make sure you rename these columns and match the data type and length to the destination database table. The Advanced tab is displayed in Figure 7 – Flat File Connection Advanced Tab.
Figure 4 – Flat File Connection General
Figure 5 – Flat File Connection Columns Tab
Figure 6 – A test tab delimited text file with one header row
Figure 7 – Flat File Connection Advanced Tab
New File Connection (A connection that points to the Processed Folder)
Figure 8 – New File Connection dialog box
To set up a connection to the folder that will contain all of the archived files (in their dated folders) you will need to add a file connection. This is pretty straight forward, just change the usage type to ‘Existing Folder’ and browse to the folder on the server. Note that if you are using mapped drives you will need to type the location in the Folder textbox using the server name on this format \\servername\somefolder\destfolder. The folder you are pointing to must exist before you complete this step. The File Connection Manager Editor can be seen in Figure 8 – New File Connection dialog box.
New OLE DB Connection
Figure 9 – New OLEDB Connection manager window
Displaying and using the toolbox
Figure 10 – Show / Hide the Toolbox
The toolbox provides drag and drop elements that can be added to the control flow and data flow tabs in SQL Servers Management Studio. It is important to note that the options available within the toolbox change depending on the tab that you are working in. This distinction is usually ignored in many of the online tutorials that you will find, which in leaves the reader wondering why they do not see those options.
To display the Toolbox pane click on the view menu item then press the toolbox icon, or just use the shortcut keys Ctrl+Alt+X.
Figure 11 – Toolbox when Control Flow is displayed

Figure 12 – Toolbox when Data Flow is shown.
The figures above illustrate the different toolboxes. The next section will focus on the items necessary to complete the data import task.
Add Foreach Loop Container
Figure 13 – The Foreach loop Container
The Foreach loop container allows you to loop through a collection of objects. To configure the container double click on it to display the Foreach loop editor which is shown in Figure 14 – Foreach Loop Editor General Tab.
In the General tab change the Name to ‘Loop thru files’.
The next tab is the Collection tab; this is where you will be able to let the container know which directory it is supposed to loop through, as well as which files it should process. In this tab you will have to enter a value in the Folder textbox in the Enumerator configuration section of the application. The Files textbox works just like a file search in windows explorer, so if you wanted to loop through all the text files in a folder you would have to add ‘*.txt’.
The third tab is the ‘Variable Mappings’ tab, and is where we will connect the variable that is specified in the ‘Collection’ tab under the Enumerator configuration section. The variable in this instance will be set to the fully qualified file name.
Finally, in the expressions tab we will set up the variables that will hold the Directory and FileSpec values, this way we can go in later and change them by simply updating the variable’s value. To map the variables just click on the ellipsis in the right hand column of the expressions line of the Foreach Loop editor. Once the Property Expression Editor is displayed you can select a value in the right hand box that you would like to assign the variables value to. Figure 17 – The Expressions tab of the Foreach Loop Editor shows both the Expressions tab and the Property Expression Editor.
Figure 14 – Foreach Loop Editor General Tab
Figure 15 – The Collection tab of the Foreach Loop Editor
Figure 16 – The Variable Mappings tab of the Foreach Loop Editor
Figure 17 – The Expressions tab of the Foreach Loop Editor
Add Execute SQL Task
When looping through the files the application will expect that the incoming data conform to the specifications that are set later in the data flow task. If for some reason the data does not conform to the specifications it will cause the entire job to stop executing. In order to determine where the job stopped running and to provide a means of restarting the job without deleting the data that was successfully input before the job failed it is important to log the file that has been processed and check that this file is not imported twice. This task will check the file log and determine if the file has been run already.
First drop the Execute SQL Task Control Flow Item into the Foreach loop container. Next double click on the container to view the Execute SQL Task Editor, shown in the figure below.
Figure 18 The General tab of the Execute SQL Task Editor
In the tab above change the name to read ‘Check Processed’. Change the result set from None to Single Row, then drop down the connection and if you have not added a connection to the database yet click on <New Connection> and add a connection to the database. Given the table structure for the processed files table found on page one place the following SQL in the SQL Statement area.
SELECT COUNT(*) AS Processed FROM file_control WHERE ImportedFilename = ?
Notice the question mark in the sql statement, this will receive the value in the parameter mapping tab that we will set up now. In the ‘Variable Name’ column drop down the box and select the FilePath variable that was set up earlier. Note that all variables that are user created will be prefixed with the word User and two colons (::). The directions should be Input, the Data Type is VARCHAR, and of course, this is the 0th parameter so place 0 in the Parameter Name column. Adjust Parameter Size according to your file path size; here 200 will cover anything that is thrown at it.
Figure 19 – The Parameter Mapping tab of the Execute SQL Task Editon
Next click on the ‘Result Set’ tab and in the Result Name column type in ‘Processed’, and in the variable column drop down the box and select the user defined variable named User::Processed.
Figure 20 – The Result Set tab of the Execute SQL Task Editor
Add data Flow Task
Figure 21 – Data Flow Task with Context Menu visible as well as the data Flow tab visible
Next we will add the data flow task by dragging it inside the Loop thru files container. First, drag the green arrow that comes from the Check processed task that you just created to the new data Flow Task and connect the two.
Double clicking on the data Flow Task will take you to the data Flow tab pictured in the image above, which will allow you to drag and drop additional elements inside of the data flow task. The elements that are available will differ when inside the data flow tab. To rename the Data Flow Task you will have to right click on it and select Rename, at this point give the task the name ‘Import file’.
Next , double click the Import file task to change your view to the data Flow tab. Once inside the data flow tab we will need to drag four elements from the toolbox to the tab.
- Flat File Source element
- Multicast element
- OLE DB Destination element
- Row Count element
After adding these elements connect them by dragging the green lines as shown in the figure below.
Figure 22 – Arraignment of elements in the Import file task
Add Flat File Source
Figure 23 – The Connection Manager tab of Flat File Source Editor
Double click on the Flat File source element to display the editor window. In the Connection Manager tab you should have the option of selecting the connection manager that points to the flat file you would like to import. You can refer to the file in Figure 6 – A test tab delimited text file with one header row for the file that we are using in this case. Once you have selected the connection move on to the ‘Columns’ section of the editor (pictured in the figure below). The Columns tab will allow you to rename the output from the file into names that actually mean something to you. Here I would suggest using the actual names in the destination table as it will make it easier to map the columns when you get to the OLE DB Destination part of this task. To change the names just click in the Output Column cell and type the name that you would like to be sent to the Multicast element. Once you complete these steps the small red X in the top right hand corner of the element should disappear. You can think of this X as an error in the compiler if you were writing code.
Figure 24 – The Columns tab of Flat File Source Editor
Add Multicast
The Multicast element is one of the few that does not need any changes once placed. This element will simply take the input from the Flat File Source and route it to two locations. In this instance those locations are the DB destination, to actually insert the data into the database, and the row count element to determine how many rows were inserted for logging purposes.
Add OLE DB Destination
The OLE DB Destination will point to the table that will receive the data from the file. The SQL to create the test table that corresponds with the test data file in Figure 6 – A test tab delimited text file with one header row is below:
CREATE TABLE [dbo].[file_import_test]( [TestColumn1] [int] NULL, [TestColumn2] [varchar](200) NULL, [TestColumn3] [float] NULL, [TestColumn4] [float] NULL )
First, double click on the DB connection to display the OLE DB Destination Editor (pictured in the image below). Next, drop down the connection manager and select the connection that points to your database, which should have been set up earlier. You will want to select ‘table or View’ for the data Access Mode, and finally select the actual table where you want the file data to be routed to in the name of the table or the view dropdown box. Finally, we will map the columns from the file to the table columns. Click on the Mappings tab in the left window pane.
Figure 25 – The Connection Manager tab of the OLE DB Destination Editor
If you configured your flat file column names to match the table column names then this part will be pretty easy, just verify that they link up. If not you are left with the task of connecting the file column outputs with the table columns and then connecting them by dragging and dropping the lines between them to the correct columns. Once you complete this task the small red X in the OLE DB Destination element should disappear.
Figure 26 – The Mappings tab of the OLE DB Destination Editor
Add Row Count Element
The row count element will count the rows that have been read from the file and store them in the variable that has been designated in the custom properties section of the Component properties tab pictured below. All that needs to be done here is the variable User::Records should be selected from the drop down box in the column next to variable name.
Figure 27 – The Advanced Editor for Row Count
Insert Constraint between Check processed and the Import file Data Flow Task
The final step in the addition of the Import file data flow task is to set up a constraint between the check processed SQL task and the data flow itself. The constraint will make sure that duplicate records do not get inserted into the table. To accomplish this double click the green arrow that connects the two elements to display the Precedence Constraint Editor.
Figure 28 – Precedence Constraint Editor
In the window pictured above change the drop down box from the default setting of Constraint to Expression and Constraint. Once this has been completed the Expression text box will open where you enter @Processed == 0, which corresponds to the variable being passed by the Check processed SQL task. Leave the default setting in Multiple Constrains where it is and press OK.
Add Execute SQL Task (Register Processing)
Next , drag another Execute SQL task element to the Foreach Loop Container and drop it inside the container directly under the data flow task. Right click on the task and rename it ‘Register Processing’.
Figure 29 – The General tab of the Execute SQL Task Editor
In the General tab of the Execute SQL task Editor select the database connection from the connection dropdown box. In the SQL Statement text area insert the following SQL:
INSERT INTO file_control (ImportedFilename, RecordsImported) VALUES (?, ?)
Figure 30 – The Parameter Mapping tab of the Execute SQL Task Editor
Next click on the parameter mapping tab and add the following parameters to the window on the right hand side, 1) User::FilePath with a direction of Input and a datatype of VARCHAR and a size of 200, and 2) User::Records with adirection of Input and a type of LONG and a size of -1. The parameter name of the first variable is 0 and the parameter name of the second variable is 1.
Add Script Task
Next drag the Script Task element from the toolbox and drop it directly under the Register Processing SQL task. Right click on the element and give it the name Set Destination. Double click on the element to display the Script Task Editor window. Make sure the Script language is set to Microsoft Visual Basic 2008 then press the Edit Script … button at the bottom of the window to bring up the code editor. The script that will create archive directories is below and will need to replace the code that is between the following lines:
Public Sub Main()
…..
End Sub
<b>Microsoft Visual Basic 2008 Code</b>
Dim varCollection As Variables = Nothing
Dts.VariableDispenser.LockForRead("User::FilePath")
Dts.VariableDispenser.LockForRead("User::DestinationRoot")
Dts.VariableDispenser.LockForRead("User::MonthStartPosition")
Dts.VariableDispenser.LockForRead("User::MonthNameFormat")
Dts.VariableDispenser.LockForWrite("User::DestinationFolder")
Dts.VariableDispenser.GetVariables(varCollection)
Dim SourceFilePath As String = varCollection("User::FilePath").Value.ToString()
Dim FileName As String = SourceFilePath.Substring(SourceFilePath.LastIndexOf("\") + 1)
Dim DestinationRoot As String = varCollection("User::DestinationRoot").Value.ToString()
Dim MonthStartPosition As Integer = Convert.ToInt32(varCollection("User::MonthStartPosition").Value)
Dim MonthValue As Integer = 0
Dim Yearvalue As Integer
Dim MonthNameFormat As String = varCollection("User::MonthNameFormat").Value.ToString()
Dim FolderName As String = String.Empty
Dim MonthwiseDirectory As String = String.Empty
MonthValue = Month(Now())
Yearvalue = Year(Now())
If FileName.Length > 0 AndAlso MonthValue > 0 Then
FolderName = New DateTime(1, MonthValue, 1).ToString(MonthNameFormat) & " " & Yearvalue
End If
MonthwiseDirectory = System.IO.Path.Combine(DestinationRoot, FolderName)
If Not System.IO.Directory.Exists(MonthwiseDirectory) Then
System.IO.Directory.CreateDirectory(MonthwiseDirectory)
End If
varCollection("User::DestinationFolder").Value = MonthwiseDirectory
Dts.TaskResult = ScriptResults.Success
The code above will create a folder structure for archiving the files that have been processed using a month / year naming convention as shown in Figure 31 – Example of the Archived File Structure.
Figure 31 – Example of the Archived File Structure
The parent folder for these monthly folders is determined by the value in the variable DestinationFolder.
Add File System Task (Move Files)
Figure 32 – File System Task Editor General Settings Tab
Next drag a File System Task element from the toolbox and drop it inside the Foreach Loop Container under the Script Task then connect the greed arrow from the Script Task to the File System Task. Right click on the element and rename it Move Files. This will be the element that physically moves the files to the archive location.
To move the processed files change the following settings in this order, or it will not work:
Destination Connection Section
- IsDestinationPathVariable = true, this in turn changes the text in the box below it from Destination Connection to DestinationVariable
- DestinationVariable = User::DestinationFolder
- OverwriteDestination =false
Source Connection Section
- IsSourcePathVariable = true, this in turn changes the text in the box below it from SourceConnection to SourceVariable
- Sourcevariable = User::FilePath
Add Execute SQL Task (Update processed table)
The last task that you have to complete before setting up the job that will kick off this entire process is to drag an Execute SQL Task element to the Foreach Loop Container and drop the element directly below the Move Files element that you just completed. Right click on the object and rename it ‘Update processed table’, as the name suggests this task will actually go out and rename the files appending the word processed to the imported file name so that the application will not skip the file when it is dropped in the new file location tomorrow (this assumes that you are working on files that will be coming in using the same name every day which is very common!). You should be getting pretty good at the SQL Task object by now so here are the steps required to update the file names.
Figure 33 – Your completed .dtsx package should look like this
Visual Basic and C# date Time Format Specifiers:
http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx?cs-save-lang=1&cs-lang=vb#code-snippet-2





























