Friday, August 31, 2012

Centralized Clustered ESB vs Distributed ESB

Often folks ask me the difference between traditional EAI (Enterprise Application Integration) products and ESB (Enterprise Service Bus). Traditionally EAI products followed the hub-spoke model and when we look at a lot of topology options of ESB's then one would see that the hub-spoke model is still followed!

Given the fact that almost all EAI product vendors have metamorphized their old products to 'ESB' adds to the confusion.  For e.g. The popular IBM Message Broker product line is being branded as 'Advanced ESB'. Microsoft has released a ESB Guidance Package that depends on the BizTalk platform, etc.

In theory, there is a difference between EAI (hub-n-spoke) architecture and ESB (distributed bus) architecture. In a hub/spoke model, the hub 'broker' becomes the single point of failure as all traffic is routed through it. This drawback can be addressed by clustering brokers for high availability.

In a distributed ESB, there is a network of brokers that colloborate to form a messaging fabric. So in essence, there is a small lightweight ESB engine that runs on every node where a SOA component is deployed. This lightweight ESB engine would do the message transformation and routing.

For e.g. in Apache ServiceMix, you can create a network of message brokers (ActiveMQ). Multiple instances of ServiceMix can be networked together using ActiveMQ brokers. The client application sees it as one logical normalized message router (NMR). But here again, the configuration info (e.g. routing information, service names, etc.) are centralized somewhere.

So then what is the fundamental difference between hub-n-spoke and ESB? IBM did a good job in clearing this confusion, by brining in 2 concepts/terms - "centralization of control config" and "distribution of infrastructure".  A good blog explaining these concepts from IBM is here. Loved the diagram flow on this post. Point made :)

Snippet from IBM site:
"Both ESB and hub-and-spoke solutions centralize control of configuration, such as the routing of service interactions, the naming of services, etc. 
Similarly, both solutions might deploy in a simple centralized infrastructure, or in a more sophisticated, distributed manner. In fact, if the Hub-and-spoke model has these features it is essentially an ESB."

As explained earlier, some opensource ESBs such as Apache Service Mix and Petals ESB are lightweight and have the core esb engine (aka service engine) deployed on each node. These folks call themselves "distributed ESB". Other vendors such as IBM, use the concept of "Federated ESBs" for distributed topologies across ESB domains.

Wednesday, August 29, 2012

Translate Chinese unicode code points to English

One of our legacy applications had the UI in chinese and it was required to convert it to English.
Instead of hiring a translator, we decided to use Google Translation Services.

But the application was picking up chinese lables/messages from a properties file. The properties file had the chinese characters expressed as unicode code points. The Google Translate webpage expected chinese characters to be typed or copy pasted onto the form. We searched for a similar translation service that would accept unicode code points, but in vain.

Finally, we decided to write a simple program that would write the chinese unicode codepoints to a file and then open the file using a program such as notepad++ or MS word. These programs support chinese characters and would allow you to copy paste them onto the Google Translation page.

Given below is the simple Java code snippet to write to a file. Please open this file using MS Word (or any other program that supports UTF-8 font rendering).

public class Chinese_Chars {
    public static void main (String arg[])throws Exception{

        String str = "\u6587\u4EF6";
        byte[] array = str.getBytes("UTF-8");
        File file = new File("d:/temp.txt");
        Files.write(array, file);

Show below are some screen shots of Google translate page and MS Word opening the file.

Tuesday, August 21, 2012

Understanding REST

In one of my old posts, I had elaborated on the differences between REST and simple POX/HTTP.

Recently came across an interesting post by Ryan Tomayko; where-in he trys to explain REST in a simple narrative style. A must read :)

Another interesting discussion thread on REST is available at StackOverFlow - regarding verbs and error codes.

File Upload Security - Restrict file types

In web applications, we often have to restrict the file types that can be uploaded to the server. One way to restrict it is by checking the file extensions. But what if someone changes the file extension and tries to upload a file.

For common file types such as GIF, PDF, JPEG we can check the contents of the file for a "signature" or "magic number". More information given in this blog post -

The Apache Tika project can be used to quickly extract meta-data information from a file stream. 

List of names...

Very often, we quickly need to populate a sample database with a list of names. In the past, we often did this by using random numbers appended to some common names.

But found this cool site on the web that gives us thousands of sample names that can be used to populate our databases for demo purposes.

Monday, August 13, 2012

Ruminating on JSF

In the past, I always hated JSF, the same way I hated EJB 2.x. But of-late, I am seeing a renewed interest in JSF, especially since a lot of pain areas were resolved in the JSF 2.0 specification.

Over the last couple of days, I have been evaluating PrimeFaces - an opensource implementation of JSF 2.0 and I would say that I am pretty impressed. Going through the sample demo pages, I was mighty pleased with the neat and clean code on both the XHTML file and the POJO beans Java code.

Also PrimeFaces has a plethora of components that should suffice for 90-95% of all basic web application requirements.

In general, if you do not require heavy UI customization and can hence sacrifice absolute control over the generated HTML, CSS and JS, then I would say that using PrimeFaces would greatly increase the productivity of an average development team. IMHO, the productivity gain could be as high as 50% over doing the conventional plumbing using MVC frameworks and JQuery.

But if there is a special requirement that cannot be sufficed by the standard UI components provided by PrimeFaces, then you are in trouble. You would then need deeper expertise to write your own JSF component or customize existing ones.

Based on my study and the common challenges we faced, I am jotting down some FAQ's that should be useful for folks embarking on using PrimeFaces.

Q) How to manipulate JSF components on the client side using plain JS or JQuery? How to use JQuery JS API or any other JS library on the client side with JSF?

Q) How to include JS file or inline JS code in a JSF XHTML page?
A) There are 3 ways to handle this.
  • Escape all special chars like 'greater than' or 'lesser than'
  • Use <![CDATA[ ... ]]> to hold your JavaScript code
  • Put the JavaScript code in a separate .js file, and use in the JSF page
Q) How do I output HTML text in JSF? Do I need to use the 'verbatim' tag?
A) <h:outputtext escape="false" value="#{daBean.markedUpString}"></h:outputtext>

Q) Can I mix HTML tags with JSF tags?
A) You can. It is not as much as a pain as in JSF 1.x, but you need to be aware of issues.

Friday, August 03, 2012

SOA interoperability - .NET WCF from Java

Recently, one of our teams was struggling to integrate a Java application with a .NET WCF service. The exception that was thrown on the Java side was a SOAPFault as shown below:

SOAPFaultException: The message with Action 'https://tempService:48493/VirtualTechnician/AcceptVehicleTermsOfService' cannot be processed at the receiver, due to a ContractFilter mismatch at the EndpointDispatcher. This may be because of either a contract mismatch (mismatched Actions between sender and receiver) or a binding/security mismatch between the sender and the receiver. Check that sender and receiver have the same contract and the same binding (including security requirements, e.g. Message, Transport, None). 

After a lot of debugging using SOAPUI and WireShark, we found out that the problem was not in the SOAP message, but in the HTTP header. The SOAP Action HTTP header needs to be set in the HTTP Post Request.

On JAX-WS, it can be done with the following code snippet:

BindingProvider bp = (BindingProvider) smDispatch;
            bp.getRequestContext().put(BindingProvider.SESSION_MAINTAIN_PROPERTY, Boolean.TRUE);
            bp.getRequestContext().put(BindingProvider.SOAPACTION_USE_PROPERTY, Boolean.TRUE);
            bp.getRequestContext().put(BindingProvider.SOAPACTION_URI_PROPERTY, "");// Correct SOAP Action

Custom PMD rules using XPath

Writing custom rules in PMD using XPath is an exciting concept, but unfortunately there are not many good tutorials or reference guides available on the internet for this.

Recently, we wanted to write custom PMD rules to extract Spring JDBC calls from the code base. We utilized the PMD desiger that is provided OOTB in Eclipse to easily write the rules.

Just open Eclipse -> Preferences -> PMD -> Rule Designer.
In Rule Designer, copy-paste your source code and check the AST (Abstract Syntax Tree) that is formed. You can also copy the AST XML from the menu bar and paste it on to a text editor. Writing the XPath expression then becomes very very simple !!!

For e.g. for finding our the Spring JDBC query calls the XPath was:

Wednesday, August 01, 2012

Ruminating on "Trickle Batch Load".

Traditionally most of the batch jobs were run at end-of-day; where the entire day transaction log was pulled out and processed in some way. During the good old days, when the volume of data was low, these batch processes could easily meet their business SLAs. Even if there was a failure, there was sufficient time to correct the data and rerun the process and still meet the SLA's.

But today, the volume of data has grown exponentially. Across many of our customers, we have seen challenges around meeting SLA's due to large data volumes. Also business stakeholders have become more demanding and want to see sales reports and spot trends early - many times during a day. To tackle these challenges, we have to resort to the 'trickle batch' load design pattern.

In trickle batch, small delta changes from source systems are sent for processing mutiple times a day. There are various advantages of such a design -
  • The business gets real-time access to critical information. For e.g. sales per hour.
  • Business SLA's can be easily met, as EOD processes are no longer a bottleneck. 
  • Operational benefits include low network latency and reduced CPU/resource utilization.
  • Early detection of issues - network issues, file corruption, etc. 
The typically design strategies used to implement "trickle batch" are using the CDC (Change Data Capture) capabilities of Databases and ETL tools.
Today almost all ETL tools such as IBM InfoSphere DataStage, Informatica, Oracle Data Integrator, etc. have in-built CDC capabilities. 
Trickle batches typically feed a ODS (Operational Data Store) that can run transactional reports. Data from the ODS is then fed to a DW appliance for MIS reporting.