After several hours of uncertainty it seems that I have managed to migrate a MySQL Drupal to a PostgreSQL. So I want to share my experience with this group where I think could be useful.
There are several problemas to solve: table structure translation, autoincrement values / sequences, serialized data, etc.
I've tried several ways, but always I came up against some problem.
Database schema
MySQL and PostgreSQL use different data types for its columns, they define its tables in its own way, you can't use MySQl definitions.
One aproach it to install a clean Drupal with the codebase of the MySQL site and enable all used modules. But will not get the autogenerated tables form field API, for example. Sure, you may put all this fields in a Feature and use it to create those tables but is not very handy.
I've used a Ruby Script to do the job, mysql2postgres. This scripts cliams to migrate structure and data but at least when using PostgreSQL 9.x you'll have problemas with PHP serialized data.
So, use this script to replicate the same table structure of your MySQL site. This scripts seems to set up correctly sequences too.
Data migration
This is the more difficult issue. As I mentioned above there are problemas with PostgreSQL 9.x and serialized data, at least when you try to copy from MySQL. There's an issue for this: http://drupal.org/node/926636
So, what to do? I decided to trust in Drupal and use the drupal_write_record function to insert one by one every record from my MySQL database to PostgreSQL. Using this function I benefit form Drupal wisdom on inserting data into postgreSQL (if you rad the code you will see some tweaks when inserting into a PostgreSQL DB).
This is the sript that do the trick:
<?php
print "Migrating from MySQL to PostgreSQL!\n";
// Adding PostgreSQL database connection.
$pgsql_database = array(
'database'=> 'YOUR_PGSQL_DATABASE_NAME',
'username'=> 'YOUR_PGSQL_NAME', // assuming this is necessary
'password'=> '', // assuming this is necessary
'host'=> 'localhost', // assumes localhost
'driver'=> 'pgsql', // replace with your database driver
);
Database::addConnectionInfo('PGMigration', 'default', $pgsql_database);
// Setup ready, let's loop over existing tables.
$tables = db_query('SHOW TABLES');
foreach ($tables as $table) {
migrate_table($table->tables_in_icomem);
}
/<em>
* Select all records form the MySQL table and inserts them into the PostgreSQL.
*
* Uses a modified version of drupal_write_record to insert.
*/
function migrate_table($table) {
$exclude = array('search_index', 'search_dataset', 'search_total');
if (in_array($table, $exclude)) {
return;
}
$table_info = array();
// Get table dataset.
$mysql_result = db_query("SELECT * FROM $table");
// Activating PostgreSQL
db_set_active('PGMigration');
print "Processing $table\n";
// Truncate PostgreSQL table.
db_query("TRUNCATE TABLE $table");
// Walk the MySQL dataset.
try {
foreach ($mysql_result as $mysql_row) {
drupal_write_record_no_autoserialize($table, $mysql_row);
}
}
catch (Exception $e) {
print "There's no schema info for $table, skiping!\n";
}
db_set_active();
}
/</em>
* Modified version of drupal_write_record.
*
* It doesn't serialize serialized fields (because as we are copying form one DB
* to anohter we don't unserialize data) and changes active connection to read
* schema form MySQL but weirte into PostgreSQL.
*/
function drupal_write_record_no_autoserialize($table, &$record, $primary_keys = array()) {
db_set_active();
// Standardize $primary_keys to an array.
if (is_string($primary_keys)) {
$primary_keys = array($primary_keys);
}
$schema = drupal_get_schema($table);
if (empty($schema)) {
throw new Exception('No schema!');
}
$object = (object) $record;
$fields = array();
// Go through the schema to determine fields to write.
foreach ($schema['fields'] as $field => $info) {
if ($info['type'] == 'serial') {
// Skip serial types if we are updating.
if (!empty($primary_keys)) {
continue;
}
// Track serial field so we can helpfully populate them after the query.
// NOTE: Each table should come with one serial field only.
$serial = $field;
}
// Skip field if it is in $primary_keys as it is unnecessary to update a
// field to the value it is already set to.
if (in_array($field, $primary_keys)) {
continue;
}
if (!property_exists($object, $field)) {
// Skip fields that are not provided, default values are already known
// by the database.
continue;
}
// Build array of fields to update or insert.
$fields[$field] = $object->$field;
// Type cast to proper datatype, except when the value is NULL and the
// column allows this.
//
// MySQL PDO silently casts e.g. FALSE and '' to 0 when inserting the value
// into an integer column, but PostgreSQL PDO does not. Also type cast NULL
// when the column does not allow this.
if (isset($object->$field) || !empty($info['not null'])) {
if ($info['type'] == 'int'|| $info['type'] == 'serial') {
$fields[$field] = (int) $fields[$field];
}
elseif ($info['type'] == 'float') {
$fields[$field] = (float) $fields[$field];
}
else {
$fields[$field] = (string) $fields[$field];
}
}
}
if (empty($fields)) {
return;
}
// Build the SQL.
if (empty($primary_keys)) {
// We are doing an insert.
$options = array('return'=> Database::RETURN_INSERT_ID);
if (isset($serial) && isset($fields[$serial])) {
// If the serial column has been explicitly set with an ID, then we don't
// require the database to return the last insert id.
if ($fields[$serial]) {
$options['return'] = Database::RETURN_AFFECTED;
}
// If a serial column does exist with no value (i.e. 0) then remove it as
// the database will insert the correct value for us.
else {
unset($fields[$serial]);
}
}
db_set_active('PGMigration');
$query = db_insert($table, $options)->fields($fields);
db_set_active();
$return = SAVED_NEW;
}
else {
$query = db_update($table)->fields($fields);
foreach ($primary_keys as $key) {
$query->condition($key, $object->$key);
}
$return = SAVED_UPDATED;
}
// Execute the SQL.
if ($query_return = $query->execute()) {
if (isset($serial)) {
// If the database was not told to return the last insert id, it will be
// because we already know it.
if (isset($options) && $options['return'] != Database::RETURN_INSERT_ID) {
$object->$serial = $fields[$serial];
}
else {
$object->$serial = $query_return;
}
}
}
// If we have a single-field primary key but got no insert ID, the
// query failed. Note that we explicitly check for FALSE, because
// a valid update query which doesn't change any values will return
// zero (0) affected rows.
elseif ($query_return === FALSE && count($primary_keys) == 1) {
$return = FALSE;
}
// If we are inserting, populate empty fields with default values.
if (empty($primary_keys)) {
foreach ($schema['fields'] as $field => $info) {
if (isset($info['default']) && !property_exists($object, $field)) {
$object->$field = $info['default'];
}
}
}
// If we began with an array, convert back.
if (is_array($record)) {
$record = (array) $object;
}
return $return;
}
?>
I think is pretty strigforward. The script is executed on the MySQL site using drush php-script. The scripts first it sets up a connection to PostgreSQL, and then iterates over all available tables in the MySQL DB. For each table it selects all records and fields and insert hem into the PostgreSQL.
For inserting the data I use a modified drupal_write_record function. This is because drupal_write_record serializes serialized declared fields, but I don't unserialize those fields when I fetch them form MySQL (it's simpler to just pass the dat instead of decide which data must be unserialized). Also I need to change active connection in the middle of the drupal_write_record function to use MySQL one when retrieving the table schema and switch back to PostgreSQL when inserting data).
Tables without schema are ignored, usually correspond to uninstalled modules.
Some tables are ignores explicity, as search ones, becasue they are large and I preffer to regenerate them later in the PostgreSQL site.
Problems
The only problem I have is draggable views. I've several views with Dragable Views to allos admin to order it. When you add the draggable view weight field to the sort cirteria you can check the option 'New items appear bottom of the list'. If you do so, Views will produce a SQL sentence like that:
SELECT node.title AS node_title, draggableviews_structure.weight AS draggableviews_structure_weight
FROM
{node} node
INNER JOIN ...
[...]
ORDER BY COALESCE(draggableviews_structure_weight, 10000) ASC
PostgreSQL says: 'SQLSTATE[42703]: Undefined column: 7 ERROR: column "draggableviews_structure_weight" does not exist LINE 7: ORDER BY COALESCE(draggableviews_structure_weight, 10000) AS... '.
If you uncheck this option SQL sentences turnis into:
SELECT node.title AS node_title, draggableviews_structure.weight AS draggableviews_structure_weight
FROM
{node} node
INNER JOIN ...
[...]
ORDER BY draggableviews_structure_weight ASC
As I've read problems seems to be the COALESCE function, PostgreSQL wants it to be in the column selection section (just after de SELECT keyword).
I'm not sure if this is a Draggable Views bug or Views bug (I've to have a look the handler that handles that sort field) but for the moment I can live without this check ;)
That's all, I hope it can be useful for someone. Comments are welcome!