Pythian Blog: Technical Track

Examining Teradata To Google BigQuery Migration

Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ's to BigQuery SQLs. SQL, on a higher level, stays the same with few differences. In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.  
Task Teradata SQL BigQuery SQL
Select Rows with a particular rank SELECT col1, col2 FROM <DB>.<TBL> QUALIFY ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY <DATE> DESC) = 1 SELECT * EXCEPT(rank) FROM ( SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY <DATE> DESC) as rank FROM <DB>.<TBL> ) where rank=1
Concatenate multiple data types select ' String 1 ' || ' String 2 '; --Multiple data types can be concatenated e.g. integer, varchar, decimal etc. select 'String 1 '|| 2 select concat(' String 1 ',' String 2 '); -- with limitation, can only concat STRING or BYTES OR cast data types first and then concatenate select concat('String 1 ',cast(2 as STRING));
Truncate Table Delete from <DB>.<TBL> OR Delete from <DB>.<TBL> ALL Delete from <DB>.<TBL> WHERE TRUE
Combine two result sets with no duplicate row SELECT * from <DB>.<TBL1> union SELECT * from <DB>.<TBL2> SELECT * from <DB>.<TBL1> union distinct SELECT * from <DB>.<TBL2>
Timestamp Difference to get Interval select (timestamp '2019-08-26 10:50:00.000000' - timestamp '2019-08-25 07:40:00.000000') DAY(4) to SECONDOutput:1 03:10:00.000000 where: 1 = day(s) 03 = Hour(s) 10 = Minute(s) 00 = Second(s) 000000 = Milliseconds WITH RESULT_IN_SECONDS AS ( SELECT TIMESTAMP_DIFF(TIMESTAMP '2019-08-26 10:50:00.000000',TIMESTAMP '2019-08-25 07:40:00.000000',SECOND) AS SECONDS ) SELECT CONCAT(CAST(DAYS AS STRING),' ',HOURS,':',MINUTES,':',SECONDS,'.000000') FROM ( SELECT CAST(DAYS AS INT64) AS DAYS , FORMAT("%02d",CAST(HOURS AS INT64)) AS HOURS, FORMAT("%02d",CAST(MINUTES AS INT64)) AS MINUTES, FORMAT("%02d",CAST((REM_SEC_2 - (MINUTES * 60)) AS INT64)) AS SECONDS FROM ( SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2 FROM ( SELECT DAYS,HOURS, (REM_SEC - (HOURS * 3600 ) ) AS REM_SEC_2 FROM ( SELECT DAYS,FLOOR((SECONDS - (DAYS * 86400))/3600) AS HOURS,SECONDS - (DAYS * 86400) AS REM_SEC FROM ( SELECT SECONDS, FLOOR(SECONDS/86400) AS DAYS FROM RESULT_IN_SECONDS )) ) ) )As there is no interval data type in Google BigQuery so above manipulation can be performed to get same 'Interval' output as we get in Teradata
Search strings with text '_AB' treating '_' as data not a wildcard character select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\_AB%' ESCAPE '\' select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\\_AB%'
Date Difference/Subtraction select date '2019-08-26' - date '2019-08-25' Default behavior is to return difference in days select DATE_DIFF(DATE '2019-08-26', DATE '2019-08-25',DAY) In BigQuery we have to specify DAY as second parameter to get difference in Days
 

No Comments Yet

Let us know what you think

Subscribe by email