Tech Talk

Thursday, September 06, 2012

Ruminating on RACI matrix

The RACI matrix (Responsible, Accountable, Consulted, Informed) is an excellent tool for mapping between processes/functions and roles.
It brings in immediate clarity on Who’s Responsible for something, Who's Accountable for the final result, Who needs to be Consulted and Who all are kept Informed.

There are several templates available on the net that can be used for the RACI matrix. One good template is available here. Once data is collated on such a matrix, then we can analyse questions such as -

a) Are there too many A's? Usually its best to have only 1 person/role accountable for a process/function.
b) Are there too many I's? Too much information that is not needed? etc.

We can use RACI matrices anywhere. You could "RACI" any deliverable in your project. Also in EA governance, we can use RACI to clearly separate the responsibilities across the centralized EA team and project architects.Given below is an example of EA Governance RACI Matrix.

Friday, August 31, 2012

Centralized Clustered ESB vs Distributed ESB

Often folks ask me the difference between traditional EAI (Enterprise Application Integration) products and ESB (Enterprise Service Bus). Traditionally EAI products followed the hub-spoke model and when we look at a lot of topology options of ESB's then one would see that the hub-spoke model is still followed!

Given the fact that almost all EAI product vendors have metamorphized their old products to 'ESB' adds to the confusion. For e.g. The popular IBM Message Broker product line is being branded as 'Advanced ESB'. Microsoft has released a ESB Guidance Package that depends on the BizTalk platform, etc.

In theory, there is a difference between EAI (hub-n-spoke) architecture and ESB (distributed bus) architecture. In a hub/spoke model, the hub 'broker' becomes the single point of failure as all traffic is routed through it. This drawback can be addressed by clustering brokers for high availability.

In a distributed ESB, there is a network of brokers that colloborate to form a messaging fabric. So in essence, there is a small lightweight ESB engine that runs on every node where a SOA component is deployed. This lightweight ESB engine would do the message transformation and routing.

For e.g. in Apache ServiceMix, you can create a network of message brokers (ActiveMQ). Multiple instances of ServiceMix can be networked together using ActiveMQ brokers. The client application sees it as one logical normalized message router (NMR). But here again, the configuration info (e.g. routing information, service names, etc.) are centralized somewhere.

So then what is the fundamental difference between hub-n-spoke and ESB? IBM did a good job in clearing this confusion, by brining in 2 concepts/terms - "centralization of control config" and "distribution of infrastructure". A good blog explaining these concepts from IBM is here. Loved the diagram flow on this post. Point made :)

Snippet from IBM site:
"Both ESB and hub-and-spoke solutions centralize control of configuration, such as the routing of service interactions, the naming of services, etc.
Similarly, both solutions might deploy in a simple centralized infrastructure, or in a more sophisticated, distributed manner. In fact, if the Hub-and-spoke model has these features it is essentially an ESB."

As explained earlier, some opensource ESBs such as Apache Service Mix and Petals ESB are lightweight and have the core esb engine (aka service engine) deployed on each node. These folks call themselves "distributed ESB". Other vendors such as IBM, use the concept of "Federated ESBs" for distributed topologies across ESB domains.

Wednesday, August 29, 2012

Translate Chinese unicode code points to English

One of our legacy applications had the UI in chinese and it was required to convert it to English.
Instead of hiring a translator, we decided to use Google Translation Services.

But the application was picking up chinese lables/messages from a properties file. The properties file had the chinese characters expressed as unicode code points. The Google Translate webpage expected chinese characters to be typed or copy pasted onto the form. We searched for a similar translation service that would accept unicode code points, but in vain.

Finally, we decided to write a simple program that would write the chinese unicode codepoints to a file and then open the file using a program such as notepad++ or MS word. These programs support chinese characters and would allow you to copy paste them onto the Google Translation page.

Given below is the simple Java code snippet to write to a file. Please open this file using MS Word (or any other program that supports UTF-8 font rendering).
-------------------------------------------------

import java.io.File;
import com.google.common.io.Files;

public class Chinese_Chars {
    public static void main (String arg[])throws Exception{

        String str = "\u6587\u4EF6";
        byte[] array = str.getBytes("UTF-8");
        
        File file = new File("d:/temp.txt");
        Files.write(array, file);
    }
}

---------------------------------------------------

Show below are some screen shots of Google translate page and MS Word opening the file.

Tuesday, August 21, 2012

Understanding REST

In one of my old posts, I had elaborated on the differences between REST and simple POX/HTTP.

Recently came across an interesting post by Ryan Tomayko; where-in he trys to explain REST in a simple narrative style. A must read :) http://tomayko.com/writings/rest-to-my-wife

Another interesting discussion thread on REST is available at StackOverFlow - regarding verbs and error codes.

File Upload Security - Restrict file types

In web applications, we often have to restrict the file types that can be uploaded to the server. One way to restrict it is by checking the file extensions. But what if someone changes the file extension and tries to upload a file.

For common file types such as GIF, PDF, JPEG we can check the contents of the file for a "signature" or "magic number". More information given in this blog post - http://hrycan.com/2010/06/01/magic-numbers/

The Apache Tika project can be used to quickly extract meta-data information from a file stream.

List of names...

Very often, we quickly need to populate a sample database with a list of names. In the past, we often did this by using random numbers appended to some common names.

But found this cool site on the web that gives us thousands of sample names that can be used to populate our databases for demo purposes.

http://www.listofnames.info/

Monday, August 13, 2012

Ruminating on JSF

In the past, I always hated JSF, the same way I hated EJB 2.x. But of-late, I am seeing a renewed interest in JSF, especially since a lot of pain areas were resolved in the JSF 2.0 specification.

Over the last couple of days, I have been evaluating PrimeFaces - an opensource implementation of JSF 2.0 and I would say that I am pretty impressed. Going through the sample demo pages, I was mighty pleased with the neat and clean code on both the XHTML file and the POJO beans Java code.

Also PrimeFaces has a plethora of components that should suffice for 90-95% of all basic web application requirements.

In general, if you do not require heavy UI customization and can hence sacrifice absolute control over the generated HTML, CSS and JS, then I would say that using PrimeFaces would greatly increase the productivity of an average development team. IMHO, the productivity gain could be as high as 50% over doing the conventional plumbing using MVC frameworks and JQuery.

But if there is a special requirement that cannot be sufficed by the standard UI components provided by PrimeFaces, then you are in trouble. You would then need deeper expertise to write your own JSF component or customize existing ones.

Based on my study and the common challenges we faced, I am jotting down some FAQ's that should be useful for folks embarking on using PrimeFaces.

Q) How to manipulate JSF components on the client side using plain JS or JQuery? How to use JQuery JS API or any other JS library on the client side with JSF?

http://stackoverflow.com/questions/7927716/how-to-select-primefaces-ui-or-jsf-components-using-jquery

http://stackoverflow.com/questions/5457292/jquery-conflicts-with-primefaces

Q) How to include JS file or inline JS code in a JSF XHTML page?
A) There are 3 ways to handle this.

Escape all special chars like 'greater than' or 'lesser than'
Use <![CDATA[ ... ]]> to hold your JavaScript code
Put the JavaScript code in a separate .js file, and use in the JSF page

http://www.mkyong.com/jsf2/how-to-include-javascript-file-in-jsf/

Q) How do I output HTML text in JSF? Do I need to use the 'verbatim' tag?
A) <h:outputtext escape="false" value="#{daBean.markedUpString}"></h:outputtext>

Q) Can I mix HTML tags with JSF tags?
A) You can. It is not as much as a pain as in JSF 1.x, but you need to be aware of issues.
http://stackoverflow.com/questions/5474178/jsf-facelets-why-is-it-not-a-good-idea-to-mix-jsf-facelets-with-html-tags

Friday, August 03, 2012

SOA interoperability - .NET WCF from Java

Recently, one of our teams was struggling to integrate a Java application with a .NET WCF service. The exception that was thrown on the Java side was a SOAPFault as shown below:

SOAPFaultException: The message with Action 'https://tempService:48493/VirtualTechnician/AcceptVehicleTermsOfService' cannot be processed at the receiver, due to a ContractFilter mismatch at the EndpointDispatcher. This may be because of either a contract mismatch (mismatched Actions between sender and receiver) or a binding/security mismatch between the sender and the receiver. Check that sender and receiver have the same contract and the same binding (including security requirements, e.g. Message, Transport, None).

After a lot of debugging using SOAPUI and WireShark, we found out that the problem was not in the SOAP message, but in the HTTP header. The SOAP Action HTTP header needs to be set in the HTTP Post Request.

On JAX-WS, it can be done with the following code snippet:

BindingProvider bp = (BindingProvider) smDispatch;
            bp.getRequestContext().put(BindingProvider.SESSION_MAINTAIN_PROPERTY, Boolean.TRUE);
            bp.getRequestContext().put(BindingProvider.SOAPACTION_USE_PROPERTY, Boolean.TRUE);
            bp.getRequestContext().put(BindingProvider.SOAPACTION_URI_PROPERTY, "http://tempuri.org/IVtOwl/AcceptVehicleTermsOfService");// Correct SOAP Action

Custom PMD rules using XPath

Writing custom rules in PMD using XPath is an exciting concept, but unfortunately there are not many good tutorials or reference guides available on the internet for this.

Recently, we wanted to write custom PMD rules to extract Spring JDBC calls from the code base. We utilized the PMD desiger that is provided OOTB in Eclipse to easily write the rules.

Just open Eclipse -> Preferences -> PMD -> Rule Designer.
In Rule Designer, copy-paste your source code and check the AST (Abstract Syntax Tree) that is formed. You can also copy the AST XML from the menu bar and paste it on to a text editor. Writing the XPath expression then becomes very very simple !!!

For e.g. for finding our the Spring JDBC query calls the XPath was:
//PrimaryPrefix[Name[starts-with(@Image,'jdbcTemplate.query')]]

Wednesday, August 01, 2012

Ruminating on "Trickle Batch Load".

Traditionally most of the batch jobs were run at end-of-day; where the entire day transaction log was pulled out and processed in some way. During the good old days, when the volume of data was low, these batch processes could easily meet their business SLAs. Even if there was a failure, there was sufficient time to correct the data and rerun the process and still meet the SLA's.

But today, the volume of data has grown exponentially. Across many of our customers, we have seen challenges around meeting SLA's due to large data volumes. Also business stakeholders have become more demanding and want to see sales reports and spot trends early - many times during a day. To tackle these challenges, we have to resort to the 'trickle batch' load design pattern.

In trickle batch, small delta changes from source systems are sent for processing mutiple times a day. There are various advantages of such a design -

The business gets real-time access to critical information. For e.g. sales per hour.
Business SLA's can be easily met, as EOD processes are no longer a bottleneck.
Operational benefits include low network latency and reduced CPU/resource utilization.
Early detection of issues - network issues, file corruption, etc.

The typically design strategies used to implement "trickle batch" are using the CDC (Change Data Capture) capabilities of Databases and ETL tools.
Today almost all ETL tools such as IBM InfoSphere DataStage, Informatica, Oracle Data Integrator, etc. have in-built CDC capabilities.

Trickle batches typically feed a ODS (Operational Data Store) that can run transactional reports. Data from the ODS is then fed to a DW appliance for MIS reporting.

Tuesday, July 31, 2012

Ruminating on Session Replication

In any web application cluster, we need to configure Session Replication for failover and high availability. While configuring session replication, the following common queries often come across our minds. I have attempted to answer these questions in a product agnostic way.

Q) Is the session replicated sychronously or asynchronously?
A) Session replication can occur either synchronously or asynchronously. In sychronous replication, the request does not return until the session has been replicated across all members of the cluster. This obviously has performance implications. In asynchronous replication, the response is returned and the session data is queued to be replicated across the cluster nodes. Typically asynchronous replication is the default mode on most app servers and this is configured with "Session Affinity".

Q) Is the entire session replicated or only the delta of what has changed?
A) It could be quite difficult to keep track of all session data modifications accurately. For e.g. someone would just refer a object in session and change its properties, without calling setAttribute() again. Hence typically app servers would replicate the entire session object each time for every request.

Q) What are the different topology options for configuring the memory-to-memory session replication?
A) For small clusters, we can set up all-to-all peer replication - i.e. the session is replicated across all the nodes of the cluster. For large clusters, we can set up "replication groups" or "buddy groups" that are essentially a group of nodes that would replicate session data between themselves. In some environments such as WebSphere Extreme Scale, one can configure dedicated JVMs in a grid for storing sessions.

Monday, July 30, 2012

Exploring Google Guava

I am pretty impressed with the simplicity of Google Guava API. We have been using Apache Commons for many years, but Guava has good support for generics and hence is a better choice for new Java development.

For e.g. to read a text file as a list of strings, you need just 2 lines of code.

//read a file from the classpath, as we want to avoid absolute paths..
File file = new File(getClass().getResource("/com/company/project/test.txt").getFile());
//pass a proper charset to ensure proper encoding
List<String>lines = Files.readLines(file, Charsets.UTF_8);

Another alternate method to read the entire content of the file as a string -

public static String toString(File file,Charset charset)

To read a binary file into a byte array, use the following utility method -

public static byte[] toByteArray(File file)

Similar simple methods exist for writing files too !..

Thursday, July 26, 2012

Some interesting code..to tickle your brain cells

Found this code over the internet :) ...Took me some time to debug. Please copy-paste the below code in eclipse and start analysing - why this is happening?

public class TimePass {
    public static void main(String[] args) {
        if ( false == true ) { //these characters are ignored?: \u000a\u007d\u007b
            System.out.println("false is true!");
        }
    }
}

Hint: linefeed and curly braces :)

Using Netty - Lessons learned

We were building a ISO-8583 equivalent card transaction simulator using the powerful Netty framework.The simplicity of the netty design is a great wow factor. Also most of the complexities of NIO are abstracted by the framework by concepts such as Channel, ChannelBuffer, ChannelEvent, Decoders/Encoders, etc.

While we were using the LengthFieldBasedFrameDecoder class, we faced an intriguing problem. The socket client was sending the length of the record in the first 2 bytes. For e.g. "11abcdefghijk"
But strangely, the length derived by the BigEndianHeapChannelBuffer was something different.
To get to the bottom of this, we enabled debugging and saw the raw buffer byte array that reached the server.

The buffer array for "11abcdefghijk" was [49, 49, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107]
The first 2 bytes contained the unicode charset number of the character '1'. Hence the lenght was also encoded from a character into bytes using the default charset, which resulted in a very large number.. Obviously the LengthFieldBasedFrameDecoder failed to decode the message.

To get around this problem, we had to send the first 2 bytes without any characted encoding. We achieved this using unicode escape sequences for the lenght field - i.e. \u0000 and \u000b; essentially 0 and 11.
The buffer array for "\u0000\u000babcdefghijk" was [0, 11, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107]

Now the netty decoder would work. But this approach has a serious flaw...for 2 chars - \u000A and \u000D. These chars correspond to \n (linefeed) and \r (carriage return) and hence would not compile itself as the Unicode escapes are pre-processed before the compiler is run.

Hence the best approach is to write the lenght field to the socket stream without using any charset encoding (write raw bytes) and then write the record as a string.

An alternate workaround is to write your own decoder, which too is very simple.

Given below is a sample decoder that extracts the first 2 chars of a string for the length and returns the buffer upstream.

-----------------------------------------------------------------

import java.util.logging.Logger;

import org.jboss.netty.buffer.ChannelBuffer;
import org.jboss.netty.channel.Channel;
import org.jboss.netty.channel.ChannelHandlerContext;
import org.jboss.netty.handler.codec.frame.FrameDecoder;
import org.jboss.netty.util.CharsetUtil;

public class NarenLengthFieldDecoder extends FrameDecoder {

    private static final Logger logger = Logger.getLogger(NarenLengthFieldDecoder.class.getName());

    public int lengthFieldLength = 2; // some default value

    public NarenLengthFieldDecoder(int length) {
        this.lengthFieldLength = length;
    }

    @Override
    protected Object decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) throws Exception {
        // wait until the length prefix is available.
        if (buffer.readableBytes() < lengthFieldLength) {
            // return null to inform frame decoder that frame is not yet
            // complete and to continue reading data
            return null;
        }

        // read length field...create a byteArray for the length field and copy
        // the first bytes into it..
        byte[] array = new byte[lengthFieldLength];
        buffer.readBytes(array); // Imp: readBytes also forwards the readerIndex

        // Mark the current buffer position before reading the length field
        // because the whole frame might not be in the buffer yet.
        // We will reset the buffer position to the marked position if
        // there's not enough bytes in the buffer.
        buffer.markReaderIndex();

        int dataLength = getLength(array);// length of the record

        // wait until the whole data is available.
        if (buffer.readableBytes() < dataLength) {
            // The whole bytes were not received yet - return null.
            // This method will be invoked again when more packets are received
            // and appended to the buffer.

            // Reset to the marked position to read the length field again next
            // time.
            buffer.resetReaderIndex();
            return null;
        }

        // forward remaining buffer to higher up handlers
         // There's enough bytes in the buffer. Read it.
         ChannelBuffer frame = buffer.readBytes(dataLength);
         // Successfully decoded a frame.  Return the decoded frame.
         return frame;
    }

    /**
     * Returns the first 2 or 3 sequence of bytes...that specify the length of
     * the record
     * 
     * @param array
     *            The byte array containing the length
     * @return An integer representing the length of the record
     */
    public int getLength(byte[] array) {
        String temp = new String(array, CharsetUtil.ISO_8859_1);

        int length = 0;
        try {
            length = Integer.parseInt(temp);
        } catch (Exception ex) {
            logger.info("Could not parse the length field of the record >>>" + temp);
        }
        return length;
    }
}

Tuesday, July 10, 2012

UIaaS - UI as a Service

Some time back, Salesforce popularized the term - UIaaS (UI as a Service), when they launched VisualForce.

UIaaS essentially means the ability to create new user interfaces using pre-built components. UI components could be pages, controls, static resources, etc. So the concept is to not start from scratch, but use off-the-shelf UI widgets. Typically UI designers are web-based and allow for in-browser UI design.

The underlying technologies for creating UIaaS are –
Server Side: JSF, Portlets, .NET WebParts
Client Side: Dojo controls, JQuery controls, Ext-JS controls, etc.

IMHO, the term is just old wine in new bottle. We used to have concepts of UI Widget Factory that encompasses creating reusable widgets and storing them (with meta-data) in a repository. Application developers would then pick and choose their widgets and design new pages. The reusable widgets could be technical widgets such as a "tab-bar","menu-bar","calendar", etc. or business widgets such as "Healthcare Provider Search", "Google Maps Overlay", etc.

Tuesday, June 12, 2012

HSQL in-memory database

Recently, we were experimenting with the HSQL in-memory database for a particular usecase. It was interesting to observe the default behaviour of persisting the database during a shutdown - The entire database was saved as a SQL script file ! When the server starts again, it loads the SQL script and fires all the CREATE TABLES and INSERT statements to recreate the database in memory.

Wikipedia gives a good overview of this HSQL feature and compares the default memory tables with cached tables. Snippet:

The default MEMORY type stores all data changes to the disk in the form of a SQL script. During engine start up, these commands are executed and data is reconstructed into the memory. While this behavior is not suitable for very large tables, it provides highly regarded performance benefits and is easy to debug.

Another table type is CACHED, which allows one to store gigabytes of data, at the cost of the slower performance. HSQLDB engine loads them only partially and synchronizes the data to the disk on transaction commits.

I was a bit concerned about the viability of all-in-memory tables for large datasets, but it looks like HSQL is being actively used in projects where millions of rows are stored in memory. The only limitation is that of the Java Heap size that can be configured to be very large on a 64-bit machine.

It is possible to convert from memory tables to cached tables. You need to shutdown the database first. Then edit the .script file and modify the line "CREATE MEMORY TABLE" to "CREATE CACHED TABLE".

Snippet from the FAQ page:

If only memory tables (CREATE TABLE or CREATE MEMORY TABLE) are used then the database is limited by the memory. A minimum of about 100 bytes plus the actual data size are required for each row. If you use CREATE CACHED TABLE, then the size of the table is not limited by the memory beyond a certain minimum size. The data and indexes of cached tables are saved to disk. With text tables, indexes are memory resident but the data is cached to disk.

The current (2.0) size limit of an HSQLDB database is 16GB (by default) for all CACHED tables and 2GB for each TEXT table. If you use large MEMORY tables, memory is only limited by the allocated JVM memory, which can be several GB on modern machines and 64bit operating systems.

The statements that make up the database are saved in the *.script file (mostly CREATE statements and INSERT statements for memory tables). Only the data of cached tables (CREATE CACHED TABLE) is stored in the *.data file. Also all data manipulation operations are stored in the *.log file (mostly DELETE/INSERT) for crash recovery. When the SHUTDOWN or CHECKPOINT command is issued to a database, then the *.script file is re-created and becomes up-to-date. The .log file is deleted. When the database is restarted, all statements of the *.script file are executed first and new statements are appended to the .log file as the database is used. A popular use of HSQLDB is for OLAP, ETL, and data mining applications where huge Java memory allocations are used to hold millions of rows of data in memory.

One limitation of HSQLDB is that it currently does not support server side cursors. (This allows it to run without any writeable media). This means the result of a query must always fit in memory, otherwise an OutOfMemory error occurs. In the rare situation that a huge resultsets must be processed, then the following workaround can be used: Limit the ResultSet using Statement.setMaxRows(1024), and select multiple 'smaller' blocks. If the table is for example 'CREATE TABLE Test(Id INT IDENTITY PRIMARY KEY, Name VARCHAR)' then the first block can be selected using 'SELECT * FROM Test'. The biggest ID should be recorded and the next block should be selected using 'SELECT * FROM Test WHERE Id>(biggest_id)' until no more records are returned. Don't forget to switch off the limit using setMaxRows(0).

Monday, June 11, 2012

Dynamic reports using Jasper

Today, I spend a good 1 hr brainstorming the design to build dynamic reports in Jasper. A lot of customers demand the need to create dynamic reports at run-time; i.e. choose the number of columns, sorting order, group-by, etc.

Now as we know in Jasper, the first thing that needs to be done is to create the *.jrxml file. Typically for static reports, this is done through the report designer. But for dynamic reports, we need to create or modify a part of this jrxml file at run-time based on the user's input. For doing this, we have the the following options.

1) Template Engine: Here we would have a base jrxml file and then manipulate the base template file using a template engine such as 'Velocity' of 'FreeMarker'. The trick would be to have all possible mark-up in the base template and then remove sections as required. The drawback of this approach is that this will work only for simple dynamic requirements; such as add/remove columns. But if we need to add dynamic groups/sub-totals, then it might become cumbersome.

2) JasperDesign Object: A JasperDesign object is a run-time in-memory representation of the jrxml object. We can have a base jrxml and then load it into a JasperDesign object and then manipulate the object. A good example explaining this is available here. This is a good trade-off as you have some basic 'layout logic' in the jrxml template and are just manipulating the dynamic part of the 'layout-logic' using the JasperDesign API code.

Though it is possible to fully create the 'layout logic' through code, it would result in a maintenance nighmare for any small change in the future !!!! The Jasper library also has an example that creates the full jrxml file from scratch.

3) DynamicJasper Library: One alternative approach is to use a far simpler library such as DynamicJasper to create the report template from 100% pure Java code. This API is a bit more high-level than using the low level JasperDesign API as it makes many default layout assumptions that should suffice for 99% of use-cases.
Also you can use a base template *.jrxml file in which common styles, company logo, water mark, etc can be pre-defined. It also supports "Style library" from jrxml files.

Tuesday, June 05, 2012

Validations - on client side or server side or both?

A few years back, many developers spend a lot of time in coding validation rules for web forms - both on the client side as well as the server side. This was very tedious and a few lazy developers would just write JavaScript validation and not write server-side validation code; thus exposing a serious security flaw in the application.
Good design warrants us to apply the principle of 'security in depth'.

But today, most of the web-based MVC frameworks have OOTB support for validations - both on the client side and server side; with minimal coding. The basic design concept is to annotate your domain objects with validation constraints and then let the framework create the JS code for client side validation and use the framework interceptors for server side validation.

Struts-2 is a popular java web MVC framework that supports this feature. In fact, there is a JSR specification on the usage of annotations for bean validations called JSR 303. Struts-2 also has a plug-in for OVal that implements JSR-303.

In the .NET world, ASP.NET MVC framework also supports this feature of annotation-based validations.

Object Model or Data Model - What comes first?

Throughout my career, I have asked this question multiple times to myself - what to model first? The database entities or the object model? Start with ER diagrams (conceptual, logical) or UML class diagrams.

Well, the answer is that - it depends. Many a times, it also depends on the culture of the organization. In many organizations, the 'data' teams are more strong and powerful and insist on ER modeling first. It is possible to run both these work-streams in parallel, as there is a lot of conceptual common-ness during the initial domain modeling. What bugs me is the differences in the semantics of UML and ER. Does not make sense to do both at the conceptual level.

Now over the past few years, there have been a plethora of ORM tools that bridge the object-relational impedance mismatch. These ORM tools have a host of features that allow us to just work with the object model and abstract away the creation of the data model.
For e.g. Entity Framework 4.0 has full support for 'code-first' approach that is detailed in this article.
I simply loved the 'convention-over-configuration' approach in EF 4.0. These features enable us to only work with the object model and not worry about the data schema at all. Such features would suffice for 70-80% of business cases, I believe.

A snippet from the article will give the reader an idea of the abstraction that is provided -

In addition to supporting a designer-based development workflow, EF4 also enables a more code-centric option which we call “code first development”. Code-First Development enables a pretty sweet development workflow. It enables you to:

Develop without ever having to open a designer or define an XML mapping file
Define your model objects by simply writing “plain old classes” with no base classes required
Use a “convention over configuration” approach that enables database persistence without explicitly configuring anything
Optionally override the convention-based persistence and use a fluent code API to fully customize the persistence mapping

What exactly is Domain Driven Design (DDD)?

We have been using many of the principles and patterns of DDD over the past many years. During domain modeling we have often used the concepts of boundary contexts, entity objects, value objects, aggregates and repository pattern, etc.

But based on my humble experience, I think DDD is much more than the usage of these patterns. DDD is a "thought-process" - the way you think about the problem domain, the way you interact with the domain experts & business stakeholders and the way you articulate the technology realization of the business need. This in a nutshell is the greatest boon of following DDD. The business and IT speak the same 'ubiquitous' language and this in turn bridges the "Business-IT gap" :)

That's the reason, the famous DDD book by Eric Evas states that DDD tackles Complexity in the Heart of Software. Mapping your software model as close to the real-life domain as possible helps us in managing the complexity of our design solutions.

The Microsoft Spain team has some pretty good documentation on this philosophy of DDD, which is available for download here. Also a neat ASP.NET example of a n-tiered DDD application is available for download.

Thursday, May 31, 2012

Byte code instrumentation and the ORM magic

All ORM tools use some kind of byte-code instumentation to do the persistance magic behind the scenes. But as an architect, it is important to understand what Hibernate or any JPA tool does to the entity classes?

Hibernate 'enhances' entity classes at runtime using a byte-code library called Javaassist. For e.g. it adds a '_dirty' flag to each field. It also adds a '_loaded' flag for each field to support lazy loading. A good blog explaining these concepts is here. So Hibernate reads the XML configuration or obtains annotations at runtime using reflection to apply byte-code instumentation.

There are various ways of doing byte-code instumentation using libraries such as CGLib, ASM, Javaassist, etc.
This byte-code enhancement can be done at compile-time or run-time. For Hibernate, besides a few special cases which require compile time 'enhancement' to byte-code; all common scenarios can be satisfied with runtime instrumentation.

The following link gives a good overview of all the enhancement options available in JPA.
http://openjpa.apache.org/builds/1.2.1/apache-openjpa-1.2.1/docs/manual/ref_guide_pc_enhance.html

In the .NET world, NHibernate uses the Linfu or Castle Dynamic Proxies byte-code enhancement providers.
http://nhforge.org/blogs/nhibernate/archive/2008/11/09/nh2-1-0-bytecode-providers.aspx

Mapping between Entity Objects and DTOs

Very often, we need to map between our Entity objects and DTO's. This mapping code can be quite tedious to write.
There is a lot of hot debate on whether to use DTO's or just pass the entity objects directly to the view or webservices. There are pros and cons of each approach. Some good links on this debate are listed here:

Data Transfer Object - MSDN

http://stackoverflow.com/questions/5216633/jpa-entities-and-vs-dtos

Pros and Cons of Data Transfer Objects

If you are using popular ORM tools such as Hibernate, iBatis or any other JPA complaint tool, then it may not even be possible to use the Enrity objects directly in your service or presentation tier. This is because these ORM toolkits typically use some kind of byte-code instrumentation to do the persistance magic behind the scenes. A good link explaining this is available here.

To avoid the drudgery of writing the 'Adapter/Mapping' code for each Entity object and DTO object, we can use some cool AutoMapper tools. These AutoMapper tools work on Reflection techniques and automatically map the source and target object properties. Custom mapping is supported using XML configuration or through code.

In the .NET world, there is a popular AutoMapper tool that has become the de-facto standard for a lot of .NET projects. In the Java world, there are 2 popular alternatives - Dozer and ModelMapper.
I found Dozer to be more comprehensive with some pretty good features. The usage is super-simple if you use the Singleton Wrapper and place the custom mapping file in the classpath.

If you are using the Spring Framework, then the 'BeanUtils' class has some simple static methods to copy properties from one object to the other.

Wednesday, May 30, 2012

Performance benchmarks

Every development project needs a formal performance engineering process - one that emphasizes on early performance testing and benchmarking.

For performance benchmarks, it is recommended to do a shallow and wide implementation of a few critical use-cases and then run the load tests against the target hardware. These test results would help in some basic capacity planning.

But what if you have to do some initial rough capacity planning to allocate budgets and do not have the time to do a formal benchmarking exercise. It is here that standard performance benchmarks help. These standard performance benchmarks take a sample transactional use-case (e.g. Order Processing System) and run this workload on various platforms to gather statistics. There are 2 standards that are quite popular -

TPC (Transaction Processing Performance Council) - (TPC) is a non-profit organization founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. TPC-C is the benchmark for OLTP workloads.

SPECjEnterprise2010 - SPECjEnterprise2010 is an industry-standard benchmark designed to measure the performance of application servers conforming to the Java EE 5.0 or later specifications.

Interesting results of the performance benchmarks on various hardware can be found here:
http://www.tpc.org/tpcc/results/tpcc_perf_results.asp
http://www.spec.org/jEnterprise2010/results/jEnterprise2010.html

For the past few years, the Java Day Trader application and its .NET equivalent StockTrader application have been used by vendors to compare the performance of Java vs .NET on their respective platforms. Jotting down some links that point to some interesting debatable data :)

http://www.ibm.com/developerworks/opensource/library/os-perfbenchmk/index.html

http://blogs.msdn.com/b/wenlong/archive/2007/08/10/trade-benchmark-net-3-0-vs-ibm-websphere-6-1.aspx

http://msdn.microsoft.com/en-us/netframework/bb499684.aspx

https://cwiki.apache.org/GMOxDOC22/daytrader-a-more-complex-application.html

JavaDB (Derby) in JDK 1.6 and above

JDK 1.6 and above ship with a default pure Java database called as "JavaDB". Is is based on the open source Apache Derby project.

By default, on a Windows platform JavaDB gets installed at "C:\Program Files\Sun\JavaDB".
Set the 'DERBY_HOME' system property to this path. Also put 'DERBY_HOME/bin' in the PATH property.

There is a good tutorial here that should get you up and running with JavaDB in 10-15 mins :)

Derby does not have a default GUI admin tool, but one can use many third-party tools such as SQuirrel and others. I think JavaDB provides a good alternative to MySQL for some scenarios.

Monday, May 28, 2012

What is a framework?

When someone says they have defined a "framework", what does it mean? Is a framework just a library of resuable components? Or is it something more?

There is a good article on CodeProject on the same topic - http://www.codeproject.com/Articles/5381/What-Is-A-Framework

The book "Applying UML and Patterns" by Craig Larman also gives a very good understanding of the concept. Jotting down snippets from both these resources, in my own words.One may consider them the 10 guiding principles while designing a framework.

At the risk of oversimplification, a framework can be defined as a cohesive set of classes/interfaces that provide services for the core part of a logical subsystem.
A framework contains both concrete and abstract classes that define interfaces to conform to, and other object interactions.
Frameworks usually allow the end-users to define sub-classes of existing framework classes for customization and extension of the framework services.
A framework enforces adherence to a consistent design approach.
Relies on the "Hollywood Principle" - "Don't call us, we will call you". This pattern is also called as IoC (Inversion of Control).
A framework makes it easier to work with complex technologies.
A framework reduces/eliminates repetitive tasks.
A framework is often re-usable across multiple scenarios - regardless of high level design considerations. Frameworks offer a higher degree of reuse - much more than individual classes.
A framework forces the team to implement code in a way that promotes consistent coding, fewer bugs, and more flexible applications.
A framework can be used as a software building block in the system architecture definition.

Thursday, May 24, 2012

Eclipse Memory Analyser

Read the following good reviews on Eclipse Memory Analyser. Looks like it can read both SUN JVM HPROF memory dumps as well as IBM JDK dumps.

http://memoryanalyzer.blogspot.in/2010/01/heap-dump-analysis-with-memory-analyzer.html

http://memoryanalyzer.blogspot.in/2010/02/heap-dump-analysis-with-memory-analyzer.html#more

http://www.eclipse.org/mat/

Some other interesting blogs that would help us resolve OOM errors :)

http://www.rallydev.com/engblog/2011/09/20/outofmemoryerror-fun-with-heap-dump-analysis/

http://www.rallydev.com/engblog/2012/03/16/java-memory-problems-why-is-my-heap-exhausted/

There is also a good article that contains sample code to simulate a Java OOM error and uses the Memory Analyser tool to identify the root cause of the error - http://www.javacodegeeks.com/2012/05/gc-overhead-limit-exceeded-java-heap.html

C heap vs Java heap

Found this interesting discussion on StackOverFlow around C Heap and Java Heap.
A good read and its important to understand that the JVM is also ultimately a C program :)

http://stackoverflow.com/questions/78352/what-runs-in-a-c-heap-vs-a-java-heap-in-hp-ux-environment-jvms

Thursday, May 17, 2012

RAID basics

Found this good blog that explains in simple terms, the various levels of RAID (Redundant Array of Independent Disks).

RAID 10 has become the defacto standard for relational databases, due to the excellent redundancy and performance given by them. In RAID 10 (also known as 1+0), blocks are mirrored and also striped.

Wednesday, April 25, 2012

'volatile' keywork in Java

Found this excellent article on the web explaining the 'volatile' keyword in Java and how it can be used for concurrency. The tutorial also explains the changes to the volatile keyword functioning in Java 5.

Also found it interesting to understand what 'livelock' is? We often encounter dead-lock and thread starvation in parallel programming, but livelock is also possible :)

Difference between Concurrent Collections and Synchronized Collections in JDK

Traditionally, we have also used object locks (semaphores) and synchronized methods to make our collections thread-safe. But having an exclusive lock on an object brings in scalability issues.

Hence the latest versions of JDK have a new package called "java.util.concurrent". This package contains many new collections objects that are thread-safe, but not so because of synchronization :)

More details at this link: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/package-summary.html

Snippet from the above link:

The "Concurrent" prefix used with some classes in this package is a shorthand indicating several differences from similar "synchronized" classes. For example java.util.Hashtable and Collections.synchronizedMap(new HashMap()) are synchronized.

But ConcurrentHashMap is "concurrent". A concurrent collection is thread-safe, but not governed by a single exclusion lock. In the particular case of ConcurrentHashMap, it safely permits any number of concurrent reads as well as a tunable number of concurrent writes.

"Synchronized" classes can be useful when you need to prevent all access to a collection via a single lock, at the expense of poorer scalability. In other cases in which multiple threads are expected to access a common collection, "concurrent" versions are normally preferable. And unsynchronized collections are preferable when either collections are unshared, or are accessible only when holding other locks.

Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators provide weakly consistent rather than fast-fail traversal. A weakly consistent iterator is thread-safe, but does not necessarily freeze the collection while iterating, so it may (or may not) reflect any updates since the iterator was created.

Also a good post on Concurrency basics is available at: http://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html (All chapters a must read :)

Another good blog that explains how ConcurrentHashMap maintains several locks instead of one single mutex to deliver better performance.