Tuesday, June 12, 2012

HSQL in-memory database

Recently, we were experimenting with the HSQL in-memory database for a particular usecase. It was interesting to observe the default behaviour of persisting the database during a shutdown - The entire database was saved as a SQL script file ! When the server starts again, it loads the SQL script and fires all the CREATE TABLES and INSERT statements to recreate the database in memory.

Wikipedia gives a good overview of this HSQL feature and compares the default memory tables with cached tables. Snippet:

The default MEMORY type stores all data changes to the disk in the form of a SQL script. During engine start up, these commands are executed and data is reconstructed into the memory. While this behavior is not suitable for very large tables, it provides highly regarded performance benefits and is easy to debug.

Another table type is CACHED, which allows one to store gigabytes of data, at the cost of the slower performance. HSQLDB engine loads them only partially and synchronizes the data to the disk on transaction commits. 

I was a bit concerned about the viability of all-in-memory tables for large datasets, but it looks like HSQL is being actively used in projects where millions of rows are stored in memory. The only limitation is that of the Java Heap size that can be configured to be very large on a 64-bit machine.

It is possible to convert from memory tables to cached tables. You need to shutdown the database first. Then edit the .script file and modify the line "CREATE MEMORY TABLE" to "CREATE CACHED TABLE".

Snippet from the FAQ page:

If only memory tables (CREATE TABLE or CREATE MEMORY TABLE) are used then the database is limited by the memory. A minimum of about 100 bytes plus the actual data size are required for each row. If you use CREATE CACHED TABLE, then the size of the table is not limited by the memory beyond a certain minimum size. The data and indexes of cached tables are saved to disk. With text tables, indexes are memory resident but the data is cached to disk.

The current (2.0) size limit of an HSQLDB database is 16GB (by default) for all CACHED tables and 2GB for each TEXT table. If you use large MEMORY tables, memory is only limited by the allocated JVM memory, which can be several GB on modern machines and 64bit operating systems.

The statements that make up the database are saved in the *.script file (mostly CREATE statements and INSERT statements for memory tables). Only the data of cached tables (CREATE CACHED TABLE) is stored in the *.data file. Also all data manipulation operations are stored in the *.log file (mostly DELETE/INSERT) for crash recovery. When the SHUTDOWN or CHECKPOINT command is issued to a database, then the *.script file is re-created and becomes up-to-date. The .log file is deleted. When the database is restarted, all statements of the *.script file are executed first and new statements are appended to the .log file as the database is used. A popular use of HSQLDB is for OLAP, ETL, and data mining applications where huge Java memory allocations are used to hold millions of rows of data in memory.

One limitation of HSQLDB is that it currently does not support server side cursors. (This allows it to run without any writeable media). This means the result of a query must always fit in memory, otherwise an OutOfMemory error occurs. In the rare situation that a huge resultsets must be processed, then the following workaround can be used: Limit the ResultSet using Statement.setMaxRows(1024), and select multiple 'smaller' blocks. If the table is for example 'CREATE TABLE Test(Id INT IDENTITY PRIMARY KEY, Name VARCHAR)' then the first block can be selected using 'SELECT * FROM Test'. The biggest ID should be recorded and the next block should be selected using 'SELECT * FROM Test WHERE Id>(biggest_id)' until no more records are returned. Don't forget to switch off the limit using setMaxRows(0).