More often than not i have been approached with the requirement of combining different dsx files into a single file for a variety of reasons. Sometimes its just to create a single backup from an existing code base. Sometimes its even to help people reduce the pain of importing multiple small dsx files. Whatever the reason maybe i have found myself wanting to do the same in more than one occasion.
This is precisely why i thought that it would be best that i write a script to help me carry out this process. The concepts is basically the same as splitting the dsx file. There is always going to be a header for the dsx file. Tee idea is to to copy this header information to a single file and then append all the job information from the dsx files (excluding the header) and append it the file that contains the header information we extracted. The script is pretty simple and small and is a batch program. I have provided the code below.. Hope this helps you if you ever come across this strange requirement.
FOR %%X in (“*.dsx”) DO ( @set files=%%X
if !counterman! EQU 1 ( cat !files! > sample.txt
set /a counterman+=1
echo Combined !files! ) else ( for /f “delims=:” %%a in (‘findstr /n /r /c:”END HEADER” !files!’) do @set headerend=%%a
set /a headerend+=1
tail +!headerend! %%X >> sample.txt
set /a counterman+=1
echo Combined !files!
Most people use the export tool that’s available on the DataStage designer.. However the problem with the tool is that its slow and gives us very little options in terms of automation. Cases where you will need to export individual jobs as separate export files will take a lot of time when you use the designer to export your jobs.
The best way to take out the manual effort involved in this particular activity is to use the export commands available from Datastage on your client side. Now there are two separate commands that are popularly used to enable exports. I will explain how to use both these commands and what best to use to prepare an automated export solution.
1) The first command is the dsexport command. This can be found in the client side on the location specified below
The syntax of the command is available on the Datastage help portal and i won’t go into the details of this here. However there is one glaring limitation in using this command as this requires manual intervention like logging into the server via the client when prompted. This defeats the purpose of preparing an automated solution. Personally i would not recommend using this command if you plan on automating your export process.
2) DSCMDEXPORT is the other command available in datastage to carry out exports. This command requires no manual intervention on the part of the person carrying out the export. All it requires is you to do is provide your credentials at the time of executing the command.. This exe file is available in the same path as the dsexport command. The command will looking something like the below
X:\……….\InformationServer\Clients\Classic \dscmdexport /D=%Domain% /U=%UserId% /P=%Password% /H=%hostname% %projectname% export.dsx /V
This is by far the best option we have available for carrying out an automated export. The only problem is that it exports every single object available in the repository. You do not have the option to choose the objects you want to export. You will then need to write a code snippet to extract the jobs and executables that you want out of this bulk export dsx. This is quite simple and i have already written a simple batch program to carry out this activity. This is explained in detail as another topic on this blog. Would advise you to check it out yourselves as you will need both these things to build your automated export solution.The location of the post is given below.
In many cases we would not find it easy to extract individual objects from the repository to prepare individual dsx files. It would be a tedious task especially if the number of jobs we need to export number in the hundreds. The simplest solution for this would be to get a single extract dump dsx file and write a piece of code to split it into individual files.
This is a fairly simple task if you understand the structure of the dsx file and the structure of how the job definitions are stored in the dsx file. For example, each dsx file starts with header information that gives us the version number of the tool, date in which the export was taken, the server from which the export was taken,etc..It will look something like the below
ExportingTool “IBM InfoSphere DataStage Export”
This particular header information will be present only once in the dsx file. After this the job definitions and executables will follow. Every individual dsx file needs to have this header information. Without this the dsx file will not be readable by the import application.
Once you have taken care of the header information, the remaining parts are simple. You will be able to identify individual job definitions by the key word ‘BEGIN DSJOB’. The name of the job will be indicated on the next line by the keyword ‘Identifier’. Example is shown below
The end of the job information will be indicated by the keywords ‘END DSJOB’. Therefore extracting everything between ‘BEGIN DSJOB’ and ‘END DSJOB’ gives you the job definition.
The same goes for shared containers as well as job executables. For getting the executables you will need to extract everything between ‘BEGIN DSEXECJOB’ and ‘END DSEXECJOB’. Similarly shared containers will have its own set of keywords. Its up to you guys to explore and find out.
I have written at least two scripts to carry out this logic in different ways. Ill be sharing one of them here. This particular script uses the MKS toolkit installed on the windows machine to carry out the actions of cutting and moving the lines of data from the file.
It is now possible to expose the Datastage jobs you have created, as a web service. This is more in line with the SOA architecture that they are aiming for. However you simply cannot expose any job as a web service. Your jobs will have to be designed to interpret or send web service messages. The jobs can either be something that is always running or a job that runs to completion a single time. If your job is something that is meant to be triggered by web service requests then you should set that job to the multiple instance mode and select the enabled for service in the job properties window your job( as shown below). Continue Reading…
Buffering is a technique used in the Datastage jobs to ensure a constant and uninterrupted flow of data to and from stages in such a way that there is no potential dead lock or any fork join problems. It is been implemented in Datastage keeping in mind the fact that the data has to keep moving in the process with an optimized use of the memory in the server. As mentioned by IBM the ideal scenario is when the data flows through the stages without being written on the disk. As in the case of buffering in any system, the upstream operators should to wait for the downstream operators to consume their input before starting to create their records. This is the intention in Datastage too.