Tuesday, June 12, 2012

HSQL in-memory database

Recently, we were experimenting with the HSQL in-memory database for a particular usecase. It was interesting to observe the default behaviour of persisting the database during a shutdown - The entire database was saved as a SQL script file ! When the server starts again, it loads the SQL script and fires all the CREATE TABLES and INSERT statements to recreate the database in memory.

Wikipedia gives a good overview of this HSQL feature and compares the default memory tables with cached tables. Snippet:

The default MEMORY type stores all data changes to the disk in the form of a SQL script. During engine start up, these commands are executed and data is reconstructed into the memory. While this behavior is not suitable for very large tables, it provides highly regarded performance benefits and is easy to debug.

Another table type is CACHED, which allows one to store gigabytes of data, at the cost of the slower performance. HSQLDB engine loads them only partially and synchronizes the data to the disk on transaction commits. 

I was a bit concerned about the viability of all-in-memory tables for large datasets, but it looks like HSQL is being actively used in projects where millions of rows are stored in memory. The only limitation is that of the Java Heap size that can be configured to be very large on a 64-bit machine.

It is possible to convert from memory tables to cached tables. You need to shutdown the database first. Then edit the .script file and modify the line "CREATE MEMORY TABLE" to "CREATE CACHED TABLE".

Snippet from the FAQ page:

If only memory tables (CREATE TABLE or CREATE MEMORY TABLE) are used then the database is limited by the memory. A minimum of about 100 bytes plus the actual data size are required for each row. If you use CREATE CACHED TABLE, then the size of the table is not limited by the memory beyond a certain minimum size. The data and indexes of cached tables are saved to disk. With text tables, indexes are memory resident but the data is cached to disk.

The current (2.0) size limit of an HSQLDB database is 16GB (by default) for all CACHED tables and 2GB for each TEXT table. If you use large MEMORY tables, memory is only limited by the allocated JVM memory, which can be several GB on modern machines and 64bit operating systems.

The statements that make up the database are saved in the *.script file (mostly CREATE statements and INSERT statements for memory tables). Only the data of cached tables (CREATE CACHED TABLE) is stored in the *.data file. Also all data manipulation operations are stored in the *.log file (mostly DELETE/INSERT) for crash recovery. When the SHUTDOWN or CHECKPOINT command is issued to a database, then the *.script file is re-created and becomes up-to-date. The .log file is deleted. When the database is restarted, all statements of the *.script file are executed first and new statements are appended to the .log file as the database is used. A popular use of HSQLDB is for OLAP, ETL, and data mining applications where huge Java memory allocations are used to hold millions of rows of data in memory.

One limitation of HSQLDB is that it currently does not support server side cursors. (This allows it to run without any writeable media). This means the result of a query must always fit in memory, otherwise an OutOfMemory error occurs. In the rare situation that a huge resultsets must be processed, then the following workaround can be used: Limit the ResultSet using Statement.setMaxRows(1024), and select multiple 'smaller' blocks. If the table is for example 'CREATE TABLE Test(Id INT IDENTITY PRIMARY KEY, Name VARCHAR)' then the first block can be selected using 'SELECT * FROM Test'. The biggest ID should be recorded and the next block should be selected using 'SELECT * FROM Test WHERE Id>(biggest_id)' until no more records are returned. Don't forget to switch off the limit using setMaxRows(0).

Monday, June 11, 2012

Dynamic reports using Jasper

Today, I spend a good 1 hr brainstorming the design to build dynamic reports in Jasper. A lot of customers demand the need to create dynamic reports at run-time; i.e. choose the number of columns, sorting order, group-by, etc.

Now as we know in Jasper, the first thing that needs to be done is to create the *.jrxml file. Typically for static reports, this is done through the report designer. But for dynamic reports, we need to create or modify a part of this jrxml file at run-time based on the user's input. For doing this, we have the the following options.

1) Template Engine: Here we would have a base jrxml file and then manipulate the base template file using a template engine such as 'Velocity' of 'FreeMarker'. The trick would be to have all possible mark-up in the base template and then remove sections as required. The drawback of this approach is that this will work only for simple dynamic requirements; such as add/remove columns. But if we need to add dynamic groups/sub-totals, then it might become cumbersome.

2) JasperDesign Object: A JasperDesign object is a run-time in-memory representation of the jrxml object. We can have a base jrxml and then load it into a JasperDesign object and then manipulate the object. A good example explaining this is available here. This is a good trade-off as you have some basic 'layout logic' in the jrxml template and are just manipulating the dynamic part of the 'layout-logic' using the JasperDesign API code.

Though it is possible to fully create the 'layout logic' through code, it would result in a maintenance nighmare for any small change in the future !!!! The Jasper library also has an example that creates the full jrxml file from scratch.

3) DynamicJasper Library: One alternative approach is to use a far simpler library such as DynamicJasper to create the report template from 100% pure Java code. This API is a bit more high-level than using the low level JasperDesign API as it makes many default layout assumptions that should suffice for 99% of use-cases.
Also you can use a base template *.jrxml file in which common styles, company logo, water mark, etc can be pre-defined. It also supports "Style library" from jrxml files.

Tuesday, June 05, 2012

Validations - on client side or server side or both?

A few years back, many developers spend a lot of time in coding validation rules for web forms - both on the client side as well as the server side. This was very tedious and a few lazy developers would just write JavaScript validation and not write server-side validation code; thus exposing a serious security flaw in the application.
Good design warrants us to apply the principle of 'security in depth'.

But today, most of the web-based MVC frameworks have OOTB support for validations - both on the client side and server side; with minimal coding. The basic design concept is to annotate your domain objects with validation constraints and then let the framework create the JS code for client side validation and use the framework interceptors for server side validation.

Struts-2 is a popular java web MVC framework that supports this feature. In fact, there is a JSR specification on the usage of annotations for bean validations called JSR 303. Struts-2 also has a plug-in for OVal that implements JSR-303.

In the .NET world, ASP.NET MVC framework also supports this feature of annotation-based validations.

Object Model or Data Model - What comes first?

Throughout my career, I have asked this question multiple times to myself - what to model first? The database entities or the object model? Start with ER diagrams (conceptual, logical) or UML class diagrams.

Well, the answer is that - it depends. Many a times, it also depends on the culture of the organization. In many organizations, the 'data' teams are more strong and powerful and insist on ER modeling first. It is possible to run both these work-streams in parallel, as there is a lot of conceptual common-ness during the initial domain modeling. What bugs me is the differences in the semantics of UML and ER. Does not make sense to do both at the conceptual level.

Now over the past few years, there have been a plethora of ORM tools that bridge the object-relational impedance mismatch. These ORM tools have a host of features that allow us to just work with the object model and abstract away the creation of the data model.
For e.g. Entity Framework 4.0 has full support for 'code-first' approach that is detailed in this article.
I simply loved the 'convention-over-configuration' approach in EF 4.0. These features enable us to only work with the object model and not worry about the data schema at all. Such features would suffice for 70-80% of business cases, I believe. 

A snippet from the article will give the reader an idea of the abstraction that is provided -

In addition to supporting a designer-based development workflow, EF4 also enables a more code-centric option which we call “code first development”. Code-First Development enables a pretty sweet development workflow. It enables you to: 
  • Develop without ever having to open a designer or define an XML mapping file 
  • Define your model objects by simply writing “plain old classes” with no base classes required 
  • Use a “convention over configuration” approach that enables database persistence without explicitly configuring anything 
  • Optionally override the convention-based persistence and use a fluent code API to fully customize the persistence mapping

What exactly is Domain Driven Design (DDD)?

We have been using many of the principles and patterns of DDD over the past many years. During domain modeling we have often used the concepts of boundary contexts, entity objects, value objects, aggregates and repository pattern, etc.

But based on my humble experience, I think DDD is much more than the usage of these patterns. DDD is a "thought-process" - the way you think about the problem domain, the way you interact with the domain experts & business stakeholders and the way you articulate the technology realization of the business need. This in a nutshell is the greatest boon of following DDD. The business and IT speak the same 'ubiquitous' language and this in turn bridges the "Business-IT gap" :)

That's the reason, the famous DDD book by Eric Evas states that DDD tackles Complexity in the Heart of Software. Mapping your software model as close to the real-life domain as possible helps us in managing the complexity of our design solutions.

The Microsoft Spain team has some pretty good documentation on this philosophy of DDD, which is available for download here. Also a neat ASP.NET example of a n-tiered DDD application is available for download.