Converting your InnoDB MySQL server to TokuDB (how we did it)
What is TokuDB?
TokuDB is a storage engine for MySQL and MariaDB that dramatically improves the performance of large databases (typically 50 GBs or larger). By replacing 40-year-old B-tree technology with patented Fractal TreeĀ® technology, TokuDB speeds indexing by 10x or more, and reducing the size of the database by upt o 90%. Because of its exceptional indexing performance, TokuDB is ideal for applications that must simultaneously query and update large volumes of rapidly arriving data (e.g., clickstream analytics). And you do not have to change any application code.
Our case
We wanted to backup our main MySQL database server with half TB of data. Since TokuDB compresses data by a great deal, I decided to use their TokuDB engine for our slave server. In following article I’ll try to cover how migration went, and how we set it up to work.
In current setup we have Master – Master (or better, Slave – Slave) replication between our 2 main MySQL databases. (Both databases have same data) We decided to shut one down, copy it and create a third one on TokuDB, which would be slave to one of them, and backup it on regular basis. (If data is compressed, like TokuDB says, it should be pretty neat solution.)
Procedure of converting MySQL with InnoDB to MySQL with TokuDB
- Stopping slave
First of all we stopped production slave on one of our MySQL servers with stop slave
. We logged information about replication at this point with: show slave status\G
. To make sure nobody writes to our MySQL while copying, we also shut it down.
mysql:[email protected]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: server002 Master_User: replica Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.045177 Read_Master_Log_Pos: 675772374 Relay_Log_File: relay.035280 Relay_Log_Pos: 675693021 Relay_Master_Log_File: mysql-bin.045177 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: tba%.imported_%,tba%.tmp_% Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 675765524 Relay_Log_Space: 675700060 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec)
Important parts were bolded and they mark point, from which replication must proceed, once we setup new slave.
- Copying of MySQL data with rsync
We copied whole slave data folder with rsync. We setup rsyncd on the target server. On CentOS you need xinetd package installed to enable rsyncd daemon. Also you need to modify: /etc/xinetd.d/rsync and change line to: disable = no. Once you set that up, start up xinetd: service xinetd start
All we had to do is setup /etc/rsyncd.conf. Test config was:
[data] path = /srv/mysql/data/ hosts allow = * hosts deny = none list = true uid = root gid = root read only = false
All we had to do is copy data with command rsync --progress -rav /srv/mysql/data server003::data
This seemed as best option to copy data as fast as possible through network.
- Starting up slave
Becouse we already copied data and logged position from which we have to start replicating on new server003, we started back our production server and let it catch replication again.
- Setting up TokuDB
Once we had our data on target server and production MySQL’s were back, we could have simply started new server on the position logged one step before, but we also wanted to change engine from InnoDB to TokuDB. We made sure to put skip-slave-start in /etc/my.cnf and remove any binary log and relay log files (we want to clean this one up, before bringing it back to replication). We also removed master.info and relay-log.info files (might not be necessary if you wanted to simply start your MySQL and it would know where to connect and what position to start from)
We followed instruction for installing MySQL from binary and installed: Percona-5.6.13-tokudb-7.0.4. Let me mention also that ALTER table engine=TokuDB won’t work if you have foreign keys on your tables, becouse TokuDB does not have them implemented (yet). It’s important if you plan to drop any foreign keys to install 5.6.x version of MySQL, becouse dropping constraints/foreign keys is inline operation from 5.6 on. MySQL 5.5 does not have it and it takes ages to drop foreign keys.
Make sure that when you try to start up MySQL for the first time, you have to write: default-storage-engine = innodb into your my.cnf or it won’t start. Make sure you run mysql_upgrade
. When you have your MySQL up, you have to register TokuDB plugins with following commands:
mysql> install plugin tokudb soname 'ha_tokudb.so'; mysql> install plugin tokudb_user_data soname 'ha_tokudb.so'; mysql> install plugin tokudb_user_data_exact soname 'ha_tokudb.so'; mysql> install plugin tokudb_file_map soname 'ha_tokudb.so'; mysql> install plugin tokudb_fractal_tree_info soname 'ha_tokudb.so'; mysql> install plugin tokudb_fractal_tree_block_map soname 'ha_tokudb.so';
You can then optionally change your default engine back to TokuDB:
mysql> set global default_storage_engine=TokuDB;
Bash script I used to find all foreign keys on my databases (replace prefix_of_my_tables with your own):
#!/bin/bash echo "SELECT CONCAT('ALTER TABLE ',table_schema,'.',table_name,' DROP FOREIGN KEY ',constraint_name,';') FROM information_schema.table_constraints WHERE CONSTRAINT_TYPE = 'FOREIGN KEY' AND CONSTRAINT_SCHEMA LIKE 'prefix_of_my_tables_%'" | mysql -h localhost | grep -v CONCAT > drop_all_foreign.sql echo "We have written drop_all_foreign.sql"
We executed drop_all_foreign.sql and went on to the script to create .sql to alter all existing InnoDB tables (replace my_table_prefix with your own):
#!/bin/bash DATABASES=`echo "show databases" | mysql -h localhost | grep -v Database | grep "my_table_prefix_"` echo "Generating alter_tables.sql" echo "" > alter_tables.sql for DB in ${DATABASES[@]}; do TABLES=`echo "SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_SCHEMA = '$DB' AND engine = 'InnoDB'" | mysql -h localhost $DB | grep -v Tables_in | grep -v TABLE_NAME` for TABLE in ${TABLES[@]}; do echo "ALTER TABLE $DB.$TABLE ENGINE=TokuDB;" >> alter_tables.sql done done echo "Generated!"
So now we had alter_tables.sql generated and all we had to do at this point is execute it. We used screen
before using cat alter_tables.sql | mysql
, becouse we knew this would take quite long. Command has to compress all InnoDB tables to TokuDB and 500GB of data takes quite some time to compress.
- Getting slave to replication
After ALTER script has executed successfully, we just have to bring back our slave with command and settings for master log file and position from previous step. We also had to add new replication user for new slave on master. Command for bringing up MySQL slave to the position we have data from master we used:
CHANGE MASTER TO MASTER_HOST='master_server', MASTER_USER='slave_user', MASTER_PASSWORD='<some_password>', MASTER_LOG_FILE='mysql-bin.045177', MASTER_LOG_POS=675765524;
Final thoughts
TokuDB seems like a great solution for backuping your huge databases, but it’s not yet production ready. These builds I used were experimental, but for backup they seemed sufficient. You can always get more stable version from tokutek’s website: Link

So what were the results? Do you have the size of the TokuDB database to compare to InnoDB?
Majority of our data compresses extremely well, so we had big benefit using TokuDB. We could fit 700+ GB database on a single 250GB SSD and still have free space availiable. Main downside was, not being able to keep foreign keys. That complicated our recovery procedures and increased disaster recovery time in case we needed it. Other than that, TokuDB showed like a great engine and I already forwarded some of my fellow sysadmins to have a look at it.
Great writeup, is tokudb needed 50% RAM in order to run smoothly?
All databases perform best if they have as much data as possible in RAM, and TokuDB is no exception. It’s hard to talk in %… but I guess so, if that’s your conclusion.
With this tutorial the tables still using BTREE indexes, isn’t it?
How did you drop and recreate the indexes? This is one of the best advantages of TokuDB
I solved the problem of the BTREE indexes with this script: http://www.electricmonk.nl/log/2014/01/12/my_indexr-script-to-drop-and-recreate-mysql-indexes/
Modifying the script removing the ‘USING BTREE’ string.
Thanks for sharing an update! Yes I didn’t include that in my blog post, totally forgot about it.
I don’t understand the use case, does that mean you didn’t have any backup before?
And what about the restore process? That mean you have to restore the TokuDB files instead of the InnoDB ones on your master in case of a failure?
We didn’t have backup before. At least no backup in a way that we could recover from human error, where someone would corrupt data or something like that. TokuDB is being dumped to SQL files. We do not have to restore TokuDB… It’s used only to save space and greatly increase performance, so we can utilize much weaker server than our main database. You must know we have 2TB of data and few billion rows in databases… It’s expensive to have this many SSD’s in place, just to have ability to backup data without disrupting normal operations. You can see here an example: https://blackbird.si/wp-content/uploads/2015/02/schema.jpg , we don’t have to disrupt web interfaces, nor statistics… we simply stop replication to TokuDB and backup whole data into SQL’s. If we lost all data from My002, we would copy it from My001… but if parts of data would get corrupted (human error or whatever), we would recover from backups of TokuDB. (mysqldump on tokudb can take really long, but we don’t care about that, since it’s not impacting performance of My001 or My002…)
recently, i have change my mysql engine, innodb to tokudb. And, i would like to know, what is the impact if i converts all covered indexes to clustered indexes ?