Pythian Blog: Technical Track

Building data tests in PowerShell

While working with a client recently, we came across a problem while testing data for completeness or errors after running an ETL process to import & manipulate the data. The main issue we ran across was that the overall client process was being managed via PowerShell, and we wanted to save the success/fail messages for later processing.

Data tests in PowerShell

The solution we settled on has the following steps: Define a PS DataTable to store results We first create a DataTable in PowerShell for storing the results of the data tests.
$results = New-Object System.Data.DataTable
 $sheetName = New-Object System.Data.DataColumn sheetName,([string]) # Name of worksheet these results will be stored in. 
 $status = New-Object System.Data.DataColumn status,([string])
 $output = New-Object System.Data.DataColumn output,([string])
 
 # Define table columns
 $results.Columns.Add($sheetName)
 $results.Columns.Add($status)
 $results.Columns.Add($output)
 
Define the test, pass/fail and status messages Here, we define the data tests query, pass/fail threshold and what to do if the query passes or fails. What happens within Get-ImportStatus is that the testQuery value is run and if the number of results is greater than or equal to the threshold, the data test fails. The errorQuery is then run to get the failed messages. If the test succeeds, the successQuery is run.
$testResults = Get-ImportStatus `
  -sqlConnection $sqlConnection `
  -testThreshold 0 `
  -testQuery "SELECT count(1) FROM [config].[IngestionProcessLog] WHERE [Error_Process] IS NOT NULL;" `
  -successMsg "No Errors encountered during processing:" `
  -successQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NULL;" `
  -errorMsg "The below Errors Have been recorded:" `
  -errorQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NOT NULL;"
 
Call a stored procedure to run the test and return results in CSV Taking advantage of a stored procedure I found on Stack Overflow, we get the results in CSV format.
function Get-ImportStatus {
 param (
  $sqlConnection,
  $testThreshold, # Test query number that the results are failed at. 
  $testQuery, # Should return a single number
  $successMsg,
  $errorMsg,
  $successQuery,
  $errorQuery
 )
 #
 # This procedure runs the testQuery sent in and passes/fails the results based on the TestThreshold. 
 # Returns a hashtable of results
 #
 
 [hashtable]$row = @{}
 
 $testStatus = Test-DataStatus -SQLConnection $sqlConnection -Query $testQuery # This just runs the SQL Query
 
 if ($testStatus -gt $testThreshold) {
  $row.status = $errorMsg
  $row.output = Get-ImportData -SQLConnection $sqlConnection -Query $errorQuery
 }
 else {
  $row.status = $successMsg
  $row.output = Get-ImportData -SQLConnection $sqlConnection -Query $successQuery
 }
 
 return $row
 
 }
 
Store results in the DataTable The returned results are then stored in the DataTable for reporting or manipulating in any way we want.
$row = $results.NewRow()
 $row.sheetName = "ProcessStatus"
 $row.status = $testResults.status 
 $row.output = $testResults.output
 $results.Rows.Add($row)
 
I hope that through this demonstration we can help others who may also encounter this problem.

No Comments Yet

Let us know what you think

Subscribe by email