To follow-up on my previous article on Teradata Parallel Transporter (TPT), I invited Jack Harris from WhereScape to share his latest performance tuning tips with TPT and DataDirect Connect for ODBC drivers certified for use by Teradata. I have worked with Jack on several projects for WhereScape Red shops running Teradata TPT with DataDirect connectivity to build/renovate data warehouses and marts.
Jack's contribution is specific to the DataDirect Connect for ODBC Oracle Wire Protocol driver, however ODBC performance tuning properties are available across all drivers used in TPT projects including SQL Server, Sybase, DB2, MySQL, Postgres, Salesforce, Informix, Hadoop Hive, and Cloudera Impala.
I've seen the Wherescape Wizard/Magician perform at different conferences and turn $1 bills into $100s right before my eyes; but I've got to say I'm even more impressed with Jack's performance results.
One of the most exciting enhancements to Teradata’s TPT ODBC Operator was released in Version 14.0, efix 5. While DataBlockSize may not cause you to wake up your wife in the middle of the night to make this announcement, it happened at my house. The ODBC Operator is a vital member in a Teradata data loading strategy because data is never landed – among other issues, you can provision data directly into the database without concern for those pesky Carriage Return or Line Feed Characters one may encounter loading data from a file.
At the core of TPT’s architecture are Operators, and the behavior of an operator is modified by one or more Attributes. The ability to add Attributes enables TPT to provide new features more easily than any other data provisioning technology available today.
Prior to the availability of the DataBlockSize attribute in the ODBC Operator, buffers were consumed in 64KB blocks, regardless of the size provided by the ODBC driver. This had an adverse impact on performance – which in turn lead to a variety of perceptions, some of which were not founded in fact.
With the DataBlockSize attribute, you may now change the size of the blocks being read by the ODBC Operator to match the size of the block provided by the driver. With the Oracle ODBC Wire Protocol driver, ArraySize is provided in bytes. The DataBlockSize parameter reads kilobytes, so be sure to make the conversion to match these up correctly.
The TPT release note from Teradata provides some additional detail:
This optional attribute that allows the user to dynamically adjust the data block size of the buffer that will hold multiple rows with a single fetch call. Additionally, the size of the row will affect the number of rows that can be fetched into that data block. End-users and customers will now have the ability to fine tune ODBC Operator performance by directly assigning a number value to the attribute.
In testing I’ve done with our customers, elapsed time has been reduced by over 80%. This kind of improvement is exactly why you wake up your household, regardless of the time of day. But, we know whenever a bottle neck is removed, new constraints to performance will become visible.
While the Oracle ArraySize parameter can be set to 4,294,967,296 bytes, we’ve found that probably won’t lead to optimal throughput. Here are some rules of thumb to keep in mind:
Jack Howard is a Principal Architect for Teradata Solutions, WhereScape USA.
Bio: Data Architect focused on data warehouse solutions for the last twenty years, on Teradata. In addition to Teradata, I’ve provided design, development and strategy services on Oracle, DB2, SQL Server architectures for the last thirty years at a variety of Fortune 500 companies. My last position prior to joining WhereScape was a Center of Excellence Architect for the Teradata Corporation.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites