How do I design a data ingestion process in Snowflake that includes update/inserts and maintain optimum performance

Jane George

3 years ago

I will be ingesting about 20-years of data that includes files with millions of rows about 500 columns. Reading through Snowflake (SF) documentation I saw that I should load the files in an order that would allow SF to create the micro-partitions (MP) with metadata optimized for pruning. However, I am concerned because I will be updating previously loaded records that could ruin the integrity of the MP. Is there a best practice for handling updates? Might I at some point need to reorg the table data to regain its performance structure. Are cluster keys adequate for handling or should I consider a combination of the two. I am planning on splitting the load files into logical combinations that would also support the proper metadata definitions but am also wondering if there is preferred limit to number of columns. If there is a know best practice document please let me know. Thanks. hs