Traditional Situation
Let's review how an add column operation is accomplished when there is no "add column now" feature. Let's also use this to familiarize ourselves with the illustration in this issue:
When an add column operation is performed, all rows of data must be augmented with a segment of data (Column 4 data in the illustration)
As mentioned in the previous illustration, when the length of a row of data is changed, the tablespace needs to be rebuilt (the grayed-blue portion of the illustration is the portion in which the change occurs)
Column definitions in the data dictionary are updated as well
All of these operations are done with the "Add Column Now" feature, which is a feature that allows you to add columns to a row.
The problem with the above is that each add column operation requires a tablespace rebuild, which requires a lot of IO and a lot of time
Add Columns Now
The process of adding columns now is shown below:
Please click to enter a description of the image
Please click to enter a description of the image
Add Columns Now changes only the contents of the data dictionary, which includes:
Please click to enter a description of the image
Add columns now. dictionary, including:
Adding the definition of the new column to the column definition
Adding the default value of the new column
"Add columns now"? When you want to read data from the table after "Add Column Now":
Since "Add Column Now" does not change the row data, only three columns are read
MySQL appends the default value of the fourth new column to the read data
The above procedure describes how to read data from a table that is not in the "Add Column Now" column definition
The default value of the new column is added to the table definition.
So how do you read? How do you read the data that was written after the "add column now"? written ? The process is shown below:
When reading row 4:
Please click to enter an image description
Please click to enter an image description
By determining ? instant? flag bit in the header information of a data row, you can tell that the format of the row is "new format": the row's header information is followed by a new field ?" Number of columns"
By reading the ? data row's ?" Number of Columns"? field of the data row, you can know how many columns of the data row have "real" data, so you can read the data by number of columns
You can see from the above figure: reading the ? In the "immediately add columns"? data written before/after is a different process
Through the above discussion, we can summarize ?" Add Column Now"?
The reason why it is efficient is that the process of writing data before/after the execution of ?" Add Columns Now"? without changing the structure of the data rows
When reading the "old" data, "faking" the ? new columns are added so that the results are correct
When writing "new" data, the new data format is used (with the addition of the instant flag bit and the ?" Number of columns"? field) to distinguish between old and new data
When reading "new" data, the data can be read as is
So? Can we keep "faking"? Can we keep "faking" ? When will it be dismantled ?
Consider the following scenario:
Add column A with "add column now"
Write data row 1
Add column ?B with "add column now"
Write data row ?2
Remove column ?B
Let's speculate on the minimal cost of "removing column B": it would require either modification of the data row's instant flag bit in the data row or the ?" Number of Columns"? field in the data row, which would at least affect the ?" add columns now"? This affects at least the rows of data written after the ? "add column immediately"? field, at a cost similar to rebuilding the data
From the above speculation, it follows that when there is an error associated with the ?" Add Column Now"? operation, the table will need to be rebuilt, as shown in the following figure:
Please click to enter a description of the image
Please click to enter a description of the image
Expanded thought question: Can other data formats be devised to replace the instant flag bit and the ?" Number of columns"? field so that add/delete operations can be done "right away" ? (Hint: Consider adding columns and deleting columns. - Delete columns? - and then add columns)
Limitations
After we understand how this works, let's look at the ?" Add Columns Now"? The first two of these are easy to understand:
"Add columns now"? can only be added at the end of the table, not between other columns
In the metadata, it is only recorded how many columns a row of data should have, not where those columns should be. So it's not possible to specify where the columns should be
"Add columns now"? Can't add primary key columns
Adding columns can't involve changes to clustered indexes, otherwise it becomes a "rebuild" operation, not an "immediate" one
"Add Columns Now" doesn't support compressed tabular formats
According to WL: "Compressed is no need to supported"
"Compressed is no need to supported"
"Compressed is no need to supported"
"Add columns now" is not supported. format)
Summary review
Let's summarize the above discussion:
The reason why "Add Columns Now" is efficient is that:
When performing "Add Columns Now", the structure of the data rows is not altered
When reading the "old" data, the "faked" ? added columns to make the result correct
When writing "new" data, a new data format is used? (with the addition of the ?instant flag bit? and the "number of columns" field) to distinguish between old and new data
When reading "new" data, the data can be read as it is
The "add columns immediately"? s "forgery" maneuver cannot be maintained all the time. When this happens? incompatible with the "add columns now" operation? the table data is rebuilt
Returning to the two remaining questions:
How does "add column now" work ?
We've already answered that question
Is the so-called "add columns immediately" completely business-neutral, and is it truly done "immediately" ?
It can be seen that even if you "immediately add columns", you still need to change the data dictionary, and then the locks can't be escaped. That is to say, the "immediately" here refers to "not changing the structure of the data rows", and does not mean "zero cost to complete the task"