When to use alter table vs create and copy for migration changes to tables?

I know that singlestore supports online alters of tables which is a nice feature. But it is my understanding that with columnstore tables, repetitive physical alters to a table structure can cause performance problems in the long term. If I remember it was said by support that this is caused by multiple columns being stored in pairs and it creates a different type of fragmentation then occurs strictly by row inserts and deletes. Is there any guidance on when a create insert into should be performed over an online alter? And if performing a create insert is there any way to perform a write lock on the source table while doing the insert into a new table?

re:
Is there any guidance on when a create insert into should be performed over an online alter?

==> Online alter is much easier to use because the table name doesn’t change and you don’t have to change your code or take the table offline. I never hear complaints about fragmentation due to alter on columnstore tables (though maybe others have). Worst-case, you could do optimize table full or do a dummy update to parts of the table to get the background merger to reorganize segments.

Some operations like backup can be blocked by ALTER or block ALTER.
The most common problem I hear about from customers is that ALTER blocks BACKUP. I expect us to allow these to run concurrently at a future date.

See here for information about operations that take a DB or cluster-level lock.

I don’t know of a way to lock the source table while you are inserting into the target. You could take write permissions away from everybody for the source table.

Hanson,

I will give you an instance when a swap needs to occur. If I need to change the leading column on a columnstore key I can’t do an online alter. So how do I make the old table available while copying data as REQUIRED with a columnstore key change?

  1. Create new table with new name and new columnstore key
  2. Select all data from old table into new table (old table is read/write accessible during)
  3. Alter original table to new name
  4. Alter new table to original name

Between steps 2 beginning and step 4 completing any updates or inserts need to be updated in the new table. Are you implying that an entire db level lock is necessary to force a read only lock on the old table while that step occurs? Ideally an additional step would occur between step 2 and 3 that would utilize the timestamp metadata stored on each row as a basis for selecting updated/inserted records with an exclusive lock as part of a final re-synch insert. But without a table level lock that allows an insert into select for update you essentially have to perform the re-synch after the swap, and that could create an eventual consistency situation in a use-case expecting consistency.

1 Like

Thanks for describing this use case. We’ll take another look at online ALTER TABLEs and locking in scenarios like this when we plan for next year.

1 Like