Step By Step approach to solve ETL problem|ETL mini project

This blog will give insight of basic approach to solve any ETL problem:

Question: An insurance company wants to capture details regarding an agent – Agent ID, Agent name, Office address, and office contact. These details are to be captured in SCD type 2 Agent_Dim. Details to be captured once every 7 days. The agent details are obtained in a file as below:

Data File

Describe the loading process and the table structure of Agent_Dim Target table has the following structure – AGT_ID, START_DATE, END_DATE, NAME (VARCHAR), ADDRESS (VARCHAR), CITY (VARCHAR), ZIP (VARCHAR), PHONE, IND. The entire agent ID’s coming from the file needs to be checked against parent AGENT_MASTER. Also update indicator IND to Y if agent belongs to CITY 2

When we get a file and need to perform certain transformation rules on that file, first step is to profile the data in file i.e. check the quality of data as per the mapping provided. This can be done either manually using spreadsheets(if file is smaller) or with the help of ETL tool. In this example below are the points to be checked while doing profiling:

  • Check all the source data types are compatible with target data types i.e. in terms of data type compatibility e.g number cant be converted strings etc plus bigger size cant fit in smaller size
  • Check if any field which is not null in target table should not be null in source also or appropriate mapping rule should be given for defaults
  • If some special characters are coming in source data then how it should be handled

Perform any other checks if needed as per requirement.

Steps In ETL job:

  • Unload master agent data in file to perform lookup operation Which will eliminate the need to hit the database again for every record in phase 0 or as first task, also unload yesterday’s load to compare with current load (say filenames are masteragent.lkp and yesterdayagentload.lkp)
  • Read the file, perform basic checks i.e. mobile number is valid 10 digit number or not (along with any other issues found in profiling)
  • Perform the lookup operation for each incoming record and filter the data based on agents which do not have records in master data
  • Perform below cdc operation
    • create hash record of complete record say
      temp.hashrecord = trim(agent_id)+trim(covert(date,DDMMYYYY))+trim(agent_name)+trim(office_addrr)+trim(office_cntct)
      lookup.hashrecord = trim(agent_id)+trim(start_date)+trim(name)+trim(address+city+zip)+trim(phone)
      Compare temp record with all the lookup records, if match found mark ‘No change’ else if match not found check if input records agent id is present in lookup file, if yes mark update else insert
  • Discard no change records. For all other records check city is ‘CITY 2’ by spliiting input records office_addre field if yes create a new output field as IND mar it ‘Y’ if matches else ‘N’ if not
  • For insert only record perform below mappings:
    out.agent_id :: in.agent_id
    out.start_date :: convert(, ‘DDMMYYYY’)
    out.end_date :: You can leave it empty of give some future date say ’01-01-2999′ :: in.agent_name ( if $~ is correct, else put value after stripping these 2 characters using appropriate functions)
    out.adress :: do substring of office_address using delimiter as ‘,’ and take out first part :: do substring of office_address using delimiter as ‘,’ and take out 2nd part :: do substring of office_address using delimiter as ‘,’ and take out 3rd part :: office_cntc
    out.ind: temporary indicatore created
  • For update record, perform upsert for exiting record
    fill end_date as today’s date and create new record as step .

This is just to give brief idea on how to proceed with ETL problems. Kindly give me your suggestions to improve this blog if any .


For design consideration, check out previous page … Previous Page

Note: This question is taken from


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s