Tech Talk: February 2008

Friday, February 29, 2008

Schema validation in webservices

Recently a friend of mine was working on a .NET SOA project and was facing a peculiar problem.The XML Schema he was using had a lot of restictions such as minOccurs, maxOccurs, int ranges, string regular expression patterns, etc.

But unfortunately in .NET 2.0 webservices, there was no easy way to validate input messages based on this schema. The reason for this behavior has to do with XmlSerializer, the underlying plumbing that takes care of object deserialization. XmlSerializer is very forgiving by design. It happily ignores XML nodes that it didn't expect and will use default CLR values for expected but missing XML nodes. It doesn't perform XML Schema validation and the consequence is that details like structure, ordering, occurrence constraints, or simple type restrictions are not enforced during deserialization.

Unfortunately there is no switch in .NET 2.0 that can turn on Schema validations. The only way to validate schemas is to use SOAP extensions and validate the message in the extension code block before deserialization occurs. The following links show us how to do it:
1. MSDN-1
2. MSDN-2

In the J2EE world, JAX-WS 2.0 has become the default standard for developing webservices. In most toolkits, we essentially have 2 options: Let the SOAP engine do the data-binding and validation or separate the data-binding and schema validation from the SOAP engine. For e.g. In Sun Metro project, validation can be enabled for the SOAP engine using the @SchemaValidation attribute on a webservice.(https://jax-ws.dev.java.net/guide/Schema_Validation.html)
In Axis 2, there is support to enable schema validation using the XMLBeans binding framework.

I found this whitepaper quite valuable in understanding the various options for schema validation in JEE webservices.

Wednesday, February 27, 2008

Problems with Struts SSLext

One of the projects I was consulting on, was using the SSL-ext plugin module of Struts to redirect all secure resources to SSL. More information on how to mix HTTP and HTTPS in a web-flow can be found in my blog post here.

The SSL plug-in module used a simple technique - a custom request processor/servlet filter would check the URL of each request and then check in the configuration file if the requested resource was secure and needed to be accessed using SSL. So if we receive a plain HTTP request for a secure resource, then a redirect HTTP 304 response would be sent back to the user. The redirected URL would have the HTTPS scheme.

Now, in the production environment, it was decided to move the SSL encryption/decryption to a Cisco Content Switch (at the hardware level). The Content switch would decrypt the content and then forward the request to the webserver as plain HTTP.
But this caused a problem at the AppServer level as the SSL-ext in Struts always sent redirect responses for all secure resources. This resulted in a recursive infinite loop of request-response traffic between the browser and the servers, due to which the browser showed a 'hanged' screen.
I was given the task to find an innovative solution to this problem. The challenge here was that we wanted to use the SSL-ext plug-in feature functionality so that it is impossible to access secure resources using plain HTTP. All we needed was some way for the content switch to pass information to the AppServer that the request that reached the content switch was HTTPS.

I decided to do this using HTTP headers. We configured the content switch to set the 'Via' HTTP header to "HTTPS" whenever a SSL request reaches the content switch. We changed the source code of SSL-ext to check for this header instead of the URL scheme and port number and send a redirect if necessary. This simple solution worked just fine and earned me kudos from the team :)

P.S: I got the idea of using HTTP header after reading the below URL. A must read :)
http://www.nextthing.org/archives/2005/08/07/fun-with-http-headers

Tuesday, February 26, 2008

Architecture Principles

A member of my team recently asked me a simple question - "What are Architecture principles? And how are they different from the guidelines, best practices and NFRs that we already follow?"

Well the line of difference between the two is very blurred and in fact there could be a lot of overlap between them.

Principles are general rules and guidelines(that seldom change) that inform and support the way in which an organization sets about fulfilling its mission.
Architecture principles are principles that relate to architecture work. They embody the spirit and thinking of the enterprise architecture.These principles govern the architecture process, affecting the design, development and maintenance of applications in an enterprise.

The typical format in which a principle is jotted down is -
1. Statement - explaining what the principle is
2. Rationale - explaining why?

The following 2 links contain good reading material.
http://www.opengroup.org/architecture/togaf8-doc/arch/chap29.html
http://www.its.state.ms.us/its/EA.nsf/webpages/principles_home?OpenDocument

Examples of some interesting principles from the above sites:

Principle: Total cost of ownership design
In an atmosphere where complex and ever-changing systems are supporting all aspects of our business every hour of the day, it is easy to lose track of costs and benefits. And yet, these critical measures are fundamental to good decision-making. The Enterprise Architecture can and must assist in accounting for use, for change, for costs, and for effectiveness.
Rationale:Total costs of present and proposed alternatives, including unintended consequences and opportunities missed, must be a part of our decisions as we build the architecture of the future.

Principle: Mainstream technology use
Production IT solutions must use industry-proven, mainstream technologies except in those areas where advanced higher-risk solutions provide a substantial benefit. Mainstream is defined to exclude unproven technologies not yet in general use and older technologies and systems that have outlived their effectiveness.
Rationale:The enterprise may not want to be on the leading edge for its core service systems. Risk will be minimized.

Principle: Interoperability and reusability
Systems will be constructed with methods that substantially improve interoperability and the reusability of components.
Rationale:Enables the development of new inter-agency applications and services

Principle: Open systems
Design choices prioritized toward open systems will provide the best ability to create adaptable, flexible and interoperable designs.
Rationale:An open, vendor-neutral policy provides the flexibility and consistency that allows agencies to respond more quickly to changing business requirements.This policy allows the enterprise to choose from a variety of sources and select the most economical solution without impacting existing applications. It also supports implementation flexibility because technology components can be purchased from many vendors, insulating the enterprise from unexpected changes in vendor strategies and capabilities.

Principle: Scalability
The underlying technology infrastructure and applications must be scalable in size, capacity, and functionality to meet changing business and technical requirements.
Rationale:Reduces total cost of ownership by reducing the amount of application and platform changes needed to respond to increasing or decreasing demand on the system.Encourages reuse.Leverages the continuing decline in hardware costs

Principle: Integrated reliability, availability, maintainability
All systems, subsystems, and components must be designed with the inclusion of reliability and maintainability as an integral part. Systems must contain high-availability features commensurate with business availability needs. An assessment of business recovery requirements is mandatory when acquiring, developing, enhancing, or outsourcing systems. Based on that assessment, appropriate disaster recovery, and business continuity planning, design and testing must take place.
Rationale:Business depends upon the availability of information and services. To assure this, reliability and availability must be designed in from the beginning; they cannot be added afterward. The ability to manage and maintain all service resources must also be included in the design to assure availability.

Principle: Technological diversity is controlled to minimize the non-trivial cost of maintaining expertise in and connectivity between multiple processing environments.
Rationale:There is a real, non-trivial cost of infrastructure required to support alternative technologies for processing environments. There are further infrastructure costs incurred to keep multiple processor constructs interconnected and maintained. Limiting the number of supported components will simplify maintainability and reduce costs.

Monday, February 25, 2008

Tower Servers -> Rack Servers -> Blade Servers

The early servers were tower servers, so called because they were tall in height and took a lot of space and resulted in 'server-sprawl' in data-centers. The 90's saw the introduction of rack servers that were compact and saved a lot of 'real-estate' in the data-centers. A standard rack is 19 inch wide and 1.75 inch high. This dimension is called 1U. So a server component may occupy 1U, 2U or 4 half-U. The most common computer rack form-factor being 42U high, this configuration allows for 42 servers to be mounted on a single rack. Each server has it own power supply and network and switch configuration.
In the past few years, blade servers are gaining a lot of popularity. The advantage of blade servers is that instead of having a number of separate servers with their own power supplies, many blades are plugged into one chassis, like books in a bookshelf, containing processors, memory, hard drives and other components. The blades share the hardware, power and cooling supplied by the rack-mounted chassis -- saving energy and ultimately, money.
Another advantage is that enterprises can buy what they need today, and plug in another blade when their processing needs increase, thus spreading the cost of capital equipment over time.

In a rack server environment, typically 44% of the electricty consumption is by components such as power supplies and fans. In a blade server, that 44 percent is reduced to 10 percent because of the sharing of these components.This gives the blade server a tremendous advantage when it comes to electricity consumption and heat dissipation.

The only current disadvantage of blade servers is the vendor lock-in that comes in when you buy a blade environment and the system management software. In a rack system, it is possible to mix and match servers inside of a rack and across racks at will.

Friday, February 22, 2008

Are filters invoked when I do a RequestDispatcher.forward() ?

Before Servlet 2.4 specification, it was not clear whether filters should be invoked for forwarded requests, included requests and requests to the error page defined in

But in Servlet 2.4 specs, there is an extra element inside the filter mapping tag.
<filter-mapping>
<filter-name>DispatcherFilter</FILTER-NAME>
<url-pattern>/products/*</URL-PATTERN>
<dispatcher>FORWARD</dispatcher>
<dispatcher>REQUEST</dispatcher>
</filter-mapping>

Possible values are REQUEST, FORWARD, INCLUDE, and ERROR (Request is the default)

First steps for diagnosis of memory leaks in Websphere Application Server v6.1

- Enable verbose GC.

- If using Sun JVM, use the -XX:+HeapDumpOnOutOfMemoryError option to tell the VM to generate a heap dump if OutOfMemoryError is thrown. If using IBM JVM, then enable automatic heap dump generation.

- Start the lightweight memory leak detection

- Generate heap dumps manually if required using the wsadmin tool.

- Use the MDD4J tool (Memory Dump Diagnostic for Java) for diagnosing root causes behind memory leaks in the Java heap. The heap dump collected in the above steps will be the input to this tool. More information about this tool can be found here , here and here.
The MDD4J tool supports the following heap dump formats:
1.IBM Portable Heap Dump (.phd) format (for WebSphere Application Server Versions 6.x on most platforms)
2.IBM Text heap dump format (for WebSphere Application Server Versions 5.0 and 4.0 on most platforms)
3.HPROF heap dump format (for WebSphere Application Server on the Solaris® and HP-UX platforms)
4.SVC Dumps (WebSphere on the IBM zSeries)

On Solaris platform, starting with Java 1.5, Sun has been shipping a cool tool called jmap which allows you to attach to any 1.5 JVM and obtain heap layout information, class histograms and complete heap snapshots. The cool thing is that you don’t have to configure the JVM with any special options, and that it therefore runs exactly as during normal operation.

jmap -dump:format=b,file=snapshot2.jmap PID_OF_PROCESS
jhat snapshot2.jmap

What causes memory leaks in Java?

We know that a memory leak can occur in Java applications when object references are unintentionally held onto after the references are no longer needed. Typical examples of these are large collection objects, a unbounded cache, large number of session objects, infinite loops etc. A memory leak results in a OutOfMemoryError.
Besides this core reason, there are other factors that may also result in a OutOfMemoryError exception

1. Java heap fragmentation: Heap fragmentation occurs when no contiguous chunk of free Java heap space is available from which to allocate Java objects. Various causes for this problem exist, including the repeated allocation of large objects (no single large fragment in first generation). In this case, even if we see good amount of free heap, memory allocation fails.

2. Memory leaks in native heap. This problem occurs when a native component, like database connections, is leaking.

3. Perm size has exhausted. The permanent generation of the heap contains class objects. If your code is using a lot of reflection/introspection, then a number of temporary class objects are created that would exhaust the perm space. More info can be found here.

Wednesday, February 13, 2008

Can U request Google, Yahoo to not index Ur site?

I knew the way web crawlers/bots work to index your website. In fact Google also has a feature of submitting a SiteMap to better index the pages in Ur site. But what if U don't want some pages to be crawled. Well, today I learned that there is a way in which we can request crawlers to ignore certain pages in the site. The trick is to place a 'robots.txt' file in the root directory of the site. This text file contains folders and URLs that need not be crawled.
The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacya

Friday, February 08, 2008

Loosely typed vs Strongly typed webservices

This article gives a good overview of the difference between a loosely coupled and strongly coupled webservice.

Jotting down some points from the article here:

A loosely typed webservice is one where the WSDL (interface definition) does not expose the XML schema of the message to be transferred. One example of a loosely typed description is Web service that encodes the actual content of a message as a single stringThe service interface describes one input parameter, of type xsd:string, and one output parameter.

Another way of indicating that no strict definition of a message format exists in a WSDL file is the use of the element. Its occurrence in a schema simply indicates that any kind of XML can appear in its place at runtime. In that respect, it behaves much like the case where single string parameters are defined. The difference is that here, real XML goes into the message, as opposed to an encoded string.

Pros and Cons:
The service consumer and service provider need to agree on a common format that they both understand, and they both need to develop code that builds and interprets that format properly. There is not much a tool can do to help here, since the exact definition of the message content is not included in the WSDL definition.

On the positive side, if the message structure changes, you do not have to update the WSDL definition. You just have to ensure that all the participants are aware of such a change and can handle the updated format.

Examples where Loosely coupled services make sense:
1. For example, assume you have a set of coarse-grained services that take potentially large XML documents as input. These documents might have different structures, depending on the context in which they are used. And this structure might change often throughout the lifetime of the service. A Web service can be implemented in a way that it can handle all these different types of messages, possibly parsing them and routing them to their eventual destination. Changes to message formats can be made so that they are backward compatible, that is, so that existing code does not have to be updated.

2. Another example is the use of intermediaries. They have a Web service interface and receive messages. But many times, they provide some generic processing for a message before routing it to its final destination. For example, an intermediary that provides logging of messages does not need a strongly typed interface, because it simply logs the content of any message that it receives.

3. Finally, a message format might exist that can not be described in an XML schema, or the resulting schema cannot be handled by the Web service engine of choice.

A strongly typed service contains a complete definition of its input and output messages in XML Schema, a schema that is either included in the WSDL definition or referred to by that WSDL definition.

Strongly typed Web services provide both consumers and providers of services with a complete definition of the data structure that is part of a service. Tools can easily generate code from such a formal contract that makes the use of SOAP and XML transparent to the client and server-side code.

Smart clients....Has the pendulum swung back?

Before the emergence of the 'web' most of the UI development was 'rich-thick-client' based on MS platforms such as VB, MFC, etc. VB development enjoyed a lot of popularity because of the component based model and the ready availability of rich UI widgets and development environments such as Visual Studio.
Such rich clients suffered from 2 major drawbacks: DLL hell and application upgrade/deployment.
Maintaining the correct version of the application across all clients distributed across locations was a big pain.

To address this problem, many thick clients were migrated to a web-based UI with a centralized server and repository. But this 'silver bullet' again had 2 main drawbacks - the user experience was nowhere close to that of a rich client. Even with the emergence of RIA (Rich Internet Applications) and AJAX, those applications requiring tons of data input and fast keyboard based navigation across screens, the web UI offered serious limitations.

And it is here that .NET smart clients come into the picture. A .NET smart client has the following features:
- Uses the 'click once' platform feature to automatically update DLLs on the client machine. Thus deployment is no longer a problem.
- Smart clients build on .NET resolve the DLL hell problem using metadata versioning of assemblies.
- Smart clients can talk with webservices to obtain data and fulfill other business requirements.
- Smart clients can work in offline mode - offering either limited or full functionality.
- Smart clients can easily integrate with other applications on the desktop providing an integrated approach to the user.
- Using WPF, users can be given a rich user interface that they demand.

Tuesday, February 05, 2008

Deep linking in AJAX applications

AJAX appplications face the challenge of deep-linking because many times the page URL does not change, only the content changes using AJAX.
And without deep linking, spiders and bots may not be able to index your page.

Currently there are 2 strategies for deep-linking AJAX sites:
- Using anchors. Onload, javascript checks location.href for an anchor and then calls the existing ajax methods to add and remove the appropriate content.
- place the stateful URL ("Link to this page") somewhere within the page, at a location and using a style that will become well-known conventions. Both Google Maps and MSN Earth follow this technique.

"Neither Google Maps nor MSN Virtual Earth records the state of a particular map view on the browser’s URL-line. Google Maps hides all the parameters. MSN Virtual Earth presents only a GUID whose purpose I don’t yet understand, but which doesn’t record the map’s state. In both cases, the stateful URL is found elsewhere than on the URL-line. Google Maps presents it directly, with a link icon and a Link to this page label. MSN Virtual Earth presents it indirectly — clicking Permalink leads to a popup window containing the stateful URL."

Sunday, February 03, 2008

Technical Architect responsibilities defined

This weekend, I was going thru some of my old books in my library and found one book that was my favourite during my developer days - J2EE Architects handbook. I perused through the first chapter that tried to define the role of a technical architect. The tasks given in the book were something that I was doing day in and out for projects. The simple and lucid language used in that chapter prompted me to blog some snippets from it.

Technical Architect

Identifies the technologies that would be used for the project.
Recommends the development methodologies and frameworks for the project.
Provides the overall design and structure to the application.
Ensures that the project is adequately defined.
Ensures that the design is adequately documented.
Establishes design/coding guidelines and best practices. Drives usage of design patterns.
Mentors developers for difficult tasks.
Enforces compliance with coding guidelines using code reviews etc.
Assists the project manager in estimating project costs and efforts.
Assists management in assessing technical competence of developers.
Provides technical advice and guidance to the project manager.
Responsible for ensuring that the data model is adequate.
Guiding the team is doing POCs and early risk assessments.

Saturday, February 02, 2008

Basic tools for Solaris Performance monitoring

One of the development teams I was consulting for was facing some performance problems on the Solaris platform. The Websphere JVM was hanging and the performance of the application dropped.

The three basic tools on the Solaris platform that are invaluable at this time are:
1. vmstat - gives CPU stats, memory utilization
2. iostat - Gives I/O stats
3. netstat - gives network stats.

This link gives good info on understanding the thump rules to apply to analyze the output of these commands.

Another command that is very powerful on solaris is the 'prstat' command. prstat is more versatile than 'top' that is present on most unix systems.

To find out the top 5 processes consuming CPU time:

prstat -s cpu -a -n 5

To find out the top 5 processes consuming most memory:
prstat -s size -n 5

To dump the CPU usage of a process every 15 secs to a file
prstat -p 2443 15 > server.out &

A very nice feature of prstat is that by using the -L switch, prstat will report statistics for each thread of a process.
prstat -L -p 3295

The following links also provide good info about some basic troubleshooting tips:
http://developers.sun.com/solaris/articles/prstat.html
http://www-1.ibm.com/support/docview.wss?rs=180&uid=swg21162381
http://www-1.ibm.com/support/docview.wss?uid=swg21052644

Friday, February 01, 2008

Capabilities of an ESB

Just saw Mark Richards presentation on ESB. The presentation is simple and easy to understand without all the fancy jargon.
The streaming video is available at: http://www.infoq.com/presentations/Enterprise-Service-Bus

Capabilites of an ESB:

Routing: This is the ability to route a request to a particular service provider based on some criteria. Note that there are many different types of routing and not all ESB providers offer them i.e. static, content-based, policy-based. This is a core capability that an ESB has to offer (at least static routing). Without it a product cannot be considered an ESB.

Message transformation: Ability to convert the incoming business request to a format understood by the service provider (can include re-structuring the message as well).

Message enhancement: We don't want to change the client and in order to do that we often need to enhance (removing or adding information, rules based enhancement) the message sent to the service provider.

Protocol transformation: The ability to accept one type of protocol as input and communicate to the service provider through a different protocol (XML/HTTP->RMI, etc).

Message Processing: Ability to perform request management by accepting an input request and ensuring delivery back through message synchronization i.e queue management.

Service orchestration: Low-level coordination of different service implementations. Hopefully it has nothing to do with BPEL. It's usually implemented through aggregate services or interprocess communication.

Transaction management: The ability to provide a single unit of work for a service request for the coordination of multiple services. Usually very limited since it's difficult to propagate the transaction between different services. But at least the ESB should provide a framework for compensating transactions.

Security: Provides the A's of security. There are no "silos" in SOA so the ESB has to be able to provide this.

Tech Talk