Everyone who administrates databases, large or small, will eventually encounter the need to copy data from one table to another. This often occurs in application development when tweaking data model designs, copying data stored in an existing table into a newly-created table. In such cases it is often also necessary to update the existing table with the row ID of the associated data in the new table. Consider the following example:
Joe maintains a database for customer information. His database includes a table for his customers’ personal information (named Personal) as well as a table for their business information (named Business). Both tables include many similar columns (street_address, city, state, zip), and Joe realizes that it is better to contain this info in a new table called
Addresses. He decides to create a new column in both the Personal and Business tables which will store the ID of the corresponding row in the new Addresses table.
Now, Joe has to figure out how to copy the data from the existing tables into the new table. Being an application developer used to control flow structures, he conceives a loop that parses each row, copies the data from A to B, then updates the old row with the last insert ID. He sets out to find the syntax to perform loops in MySQL, but he comes away confused and dejected. That’s because although MySQL provides loop functionality through cursors, Joe has learned that cursors are inefficient, difficult to implement, and inflexible. Joe decides that there must be a better way to move his data.
Joe calls his database admin friend, Pete, and explains what he wants to do. After spending five minutes nitpicking the semantics of Joe’s explanation, Pete offers a clever solution:
“Just copy the existing row ID into a throwaway column in the new table, then perform a multi-table update to update the new ID field in the old table. Pfft…developers.”
A good developer, fluent in the best practices of procedural programming, would consider this approach to be a hack. But in the world of relational databases, it’s how to get things done. Here’s the code:
# create a new temporary column to store the existing row ID ALTER TABLE Addresses ADD old_id INT(11); # copy the data INSERT INTO Addresses (street_address, city, state, zip) SELECT street_address, city, state, zip FROM Personal; # match the rows using the ID stored in the temporary column # then update the new_id field in the old table UPDATE Addresses, Personal SET Personal.new_id = Addresses.id WHERE Personal.id = Addresses.old_id; # delete the temporary column ALTER TABLE Addresses DROP old_id;
To copy rows from both existing tables at once, add a UNION clause:
# copy the data INSERT INTO Addresses (street_address, city, state, zip) SELECT street_address, city, state, zip FROM Personal; UNION SELECT street_address, city, state, zip FROM Business;
Obviously, these queries may be adapted to include transformations, joins, or any other necessary operations.
So, the big lesson that we learned along with Joe today is that when application developers get their hands dirty in the database, it often requires a retooling of the way we think, not just a syntax reference.