Tuesday, December 15, 2009

What is longitudnal data?

A dataset is considered to be longitudnal if it tracks the same kind of information at multiple points of time. For e.g. the marks of students over multiple years, patient health records over a period of time, etc.

The most important advantage of longitudnal data is that we can measure change and the effect of various factors over the data-point time values. For e.g. what is the effect a particular drug had on a cancer patient? The effect of different teachers on a student?

So essentially, longitudnal data helps in establishing cause-n-effect relationships. Longitudnal data stores are also being used for predictive modeling and other areas. Longitudnal data stores are very popular in the Life Sciences and Healthcare industry.
I am interesting in learning the best practices for creating and optimizing a data-model for longitudnal data stores.

Difference between biostatistics and bioinformatics

Working in the healthcare domain, I often come across the terms - biostatistics and bioinformatics and wondered as to what were the differences between the two branches of studies. A quick googling revealed the following:
The term Biostatistics is a combination of the words biology and statistics. So it essentially it is the application of statistics to biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine and agriculture; the collection, summarization, and analysis of data from those experiments; and the interpretation of, and inference from, the results.
Bioinformatics is the application of information technology and computer science to the field of molecular biology. Its primary use has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
Bioinformatics focuses on applying computationally intensive techniques (e.g., pattern recognition, data mining, machine learning algorithms, and visualization) to understand biological processes.
More information can be found on Wikipedia at:

Thursday, November 19, 2009

Mapping UI controls to entitlements/operations

In RBAC, we often need to enable/disable a UI control based on the users role and entitlements. Most programmers write code for mapping a UI control with the operation name, i.e. if 'submitRecord' is not allowed for the user, then disable or hide the button.

Recently I came across a neat way to handle this using attributes in C#.NET. This article describes the use of attributes to specify the 'operationName' and the 'property-value' to be set on the control when we check for entitlements.
Example code snippet:
[YourCompany.Authorization("EditSalary", "ReadOnly", true)]
private System.Windows.Forms.TextBox Salary;

Though the example is that of a .NET program, the same concept can easily be applied in Java 5, which supports annotations. Annotations in Java are roughly equivalent to Attributes in .NET.

Wednesday, November 18, 2009

XSLT vs XQuery for XML transformations

During the early years in my career, I had always used XSLT for transformation purposes. Recently I have seen that a lot of applications and products are using XQuery for transformations. For e.g. BEA Aqualogic ESB and Apache Service Mix both use XQuery for message transformation capabilities.

Although XSLT is more powerful than XQuery for transformations, it is much simpler to learn XQuery. I remember the tough learning curve I had to go through when learning XSL, but I could grasp XQuery basics in a couple of hours. Both XSLT and XQuery use XPath for quering XML.

Found this interesting discussion on O'Reilly site that compares the two technologies. David P. in the discussion shares some interesting views on the design philosophy behind the two technologies - XQuery is a MUST UNDERSTAND language where as XSLT is a MAY UNDERSTAND language, i.e. it is more adaptive and template driven. XSLT is untyped; conversions between nodes and strings and numbers are handled pretty much transparently. XQuery is a typed language which uses the types defined by XML Schema. XQuery will complain when it gets input that isn't of the appropriate type. XQuery is better for highly-structured data, XSLT for loosely-structured documents.

We were building a transformation engine to convert between the various ACORD formats. The Acord committee has ensured that any new version of ACORD only 'adds' new elements to the current schema. No elements are deleted or the schema changed for backwards compatibility.
Hence for these transformations, XQuery fitted the bill. We also did not have to create a Canonical Data Model as direct transfortions were possible due to the above mentioned restriction. Hence if the tool supported 15 ACORD versions, only 15 XQuery files were required.

Other links on the same subject:

Monday, October 05, 2009

Use of Constants in code

Recently there was a big debate during a code reveiw session on the use of constants. The developers had used constants for the following purposes:

1. Each and every message key used in the i18N application was declared as a constant. The application contained around 3000 message keys and hence the same number of constants.

2. Each and every database column name was declared as a constant. There were around 5000 column names and still counting..

Does it make sense to have such a huge number of constants in any application? IMHO, common sense should prevail. Message keys just don't need to be declared as constants. We already have one level of indirection - why add one more?
Reg. database column names, I have mixed opinions. If a column is being used in multiple classes, does it make sense to declare it as a global constant? Maybe declaring a class as 'Table Name' and its members as 'Column Name' would be a good idea? But if you have a large number of tables and columns, it would become very tedious to even create these constants file.

I found a few tools on the net that can automate the creation of these constants files.

- CodeProject

- MobzySystems

- ConstTool

Friday, September 25, 2009

Cool Cool JS toolbox

Just found this good website having tons of cool JavaScript tools - free for commercial use too.
Check out the dynamic menus - always needed in any web-site development.

Wednesday, September 16, 2009

Should we call Dispose() on a Dataset?

During one of my code reviews, I saw that the development team had called 'Dispose()' on all the datasets used in the application. I knew that the Dataset was a disconnected managed object and could not understand what the dispose method would actually be doing? Dispose() is typically called to release unmanaged resources such as file-pointers, streams, socket connections, etc. In most cases, such classes also expose a Close() method that is more appropriate.

Discussions with the developers revealed that FxCop also throws an error when Dispose() is not called on Datasets.
Further investigation revleaded that the Dataset exposes the Dispose() method as a side effect of inheritance. The Dataset class inherits from the MarshalByValueComponent which implements the IDisposable interface because it is a component. The method is not overridden in the System.Data.Dataset class and the default implementation is in the MarshalByValueComponent class. The default implementation just removes the component from the parent container it is in. But in case of Dataset, there are no parent containers and hence the Dispose() method does nothing useful.

Conclusion: It is not necessary to call Dispose() on Datasets. Developers can safely ignore the FxCop warnings too :)

Ruminating over XML Schema concepts

In a recent discussion with my team members, I sensed a lot of confusion over basic concepts on XML Schemas and namespaces - especially over targetNamespace, DefaultNamespace, elements vs types, etc. Well first, lets get the fundamentals right.

A XML schema defines and declares types (complex and simple) and elements. All elements have a type. If the type has not been specified, it defaults to xsd:anyType. It's easy to understand this by drawing an anology between a class and an object. New elements can be created by referencing existing elements.

In a WSDL message description, a WSDL part may reference an element or a type. If the SOAP binding specifies style="document", then the WSDL part must reference an element. If the SOAP binding specifies style="rpc", then the WSDL part must reference a type.
Now coming to namespaces, users typically get confused over the difference between a targetNamespace and defaultNamespace. The 'targetNamespace' attribute is typically used in schema files to identify and select the namespace into which new elements that are defined are created in. It is the namespace an instance is going to use to access the types it declares. For e.g. An XML document may use this schema as the default schema. Default schema is defined simply by using 'xmlns=' without a prefix.
It is important to remember that XML schema is itself an XML document. Hence a schema can contain a namespace attribute and also a targetNamespace attribute. Typically they are the same.

Jotting down links where more explanation is given:

Friday, June 26, 2009

Creating a self-signed certificate

Jotting down the quick commands in .NET and JDK frameworks that can be used to create a self signed certificate.

In the .NET framework, open the Visual Studio command prompt and type the following:
makecert -r -pe -n "CN=www.yourserver.com" -b 01/01/2000 -e 01/01/2036 -eku -ss my -sr localMachine -sky exchange -sp "Microsoft RSA SChannel Cryptographic Provider" -sy 12
Just replace the CN with the name or IP of your server. The certificate would be created in the default personal store on Windows. Go to MMC and add the 'certificates' snap-in.

In JDK first use the keytool utility to generate a certificate in the keystore.
keytool -genkey -alias myalias -keystore .keystore
You would be prompted to enter the CN and other details. Once the cert is stored in the keystore, it can be exported as a file.
keytool -storepass password -alias myalias -keystore .keystore -export -rfc -file outfilename.cer

Thursday, June 11, 2009

.NET Web Methods Tips and Tricks

It is possible to pass a parameter by reference to a Web Method? I thought this does not make sense, but behind the covers the .NET webservice proxy and runtime make this possible.
A detailed example with sample source code can be found here.
Also given in the above link is an example of supporting polymorphism in Web Methods.

Tuesday, June 09, 2009

Open Source Application Management Software

We needed a simple application mangement tool for monitoring our Java applications running on Tomcat servers. My team evaluated the various free/open-source options and gave a demo of the following 2 tools that proved to be interesting.

1. ManageEngine Applications Manager - This is free for use (upto 5 monitors). Very user friendly and professional look and feel.
2. JManage - (open source. Good support for Java console applications)

Passing large .NET datasets across layers

If your .NET webservice is returning datasets, then you might face performance problems with large payloads consuming the network bandwidth. There are a few strategies around this problem.

1. Convert the dataset into a byte[] using the Binary Formatter. Compress this byte[] using the Deflate classes in .NET. On the client side, the reverse process needs to be done.
Peter Bromberg has written a nice article explaining the details of this strategy.

2. Convert the dataset into a XML string and compress the XML string. This discussion forum thread contains sample source code for the same.

Monday, June 08, 2009

Migration Factory

Recently, I have been hearing a lot of buzz around the term - "Migration Factory". A lot of IT service companies have jumped on this buzzword and offer conversion services based on the Migration Factory model.
So what exactly is a Migration Factory? A Migration Factory is a methodology that endeavours to adopt a factory model for churning out migrated components. Essentially the migration landscape is segmented into various well-defined tasks and each segmented task is assigned to dedicated teams having specialized skills in that area. An analogy to a vehicle manufacturing factory would give a fair idea of what the methodology tries to achieve. So this essentially is an assembly line methodology with phases such as planning, assessment, conversion, implementation, etc.

The key drivers in the Migration Factory approach are the reuse of patterns, models, tools gained from past experiences. Toolkits for automation of activities such as code converstion, automated regression testing, etc. form a important part of the Factory model.

In my personal experience, I have observed that no migration project is the same. Each project brings its own technology and business challenges, but we can reuse the best practices and other tools developed in previous engagements.

Saturday, May 16, 2009

Data compression in .NET

In .NET, developers often have to choose between 2 options for compression/decompression - GZipStream and DeflateStream. What's interesting to note is that GZip uses the same 'deflate' algorithm as DeflateStream; but in addition also supports CRC checks and has additional headers to store metadata such as version nos, original file name, timestamps, etc.

So GZip is actually a data format and multiple files can be compressed into a single archive. You can open a file written to by the GZipStream using a GZip decompression utility such as WinRAR on Windows, gunzip on Linux, etc.

Sample example code can be found at MSDN.  There is also a sample solution that allows working with multiple files using GZip compression. Both these classes only support max 4GB as the stream length.

Friday, May 15, 2009

Interoperability when using datasets in .NET webservices

Datasets are powerful data containers that are very popular in the .NET world. Quite often, a lot of .NET webservices return datasets in their web methods. This is ok, as long as we are sure that all the clients are also on the .NET platform. 
But if we want our services to be interoperable with other platforms (e.g. Java), then passing datasets back to webservice clients would not work. This is because a dataset is a generic container and is populated at runtime with data. Hence at design time, it is not possible to define the dataset datatype with a schema. Also when a dataset is serialized to XML, then the default .NET SOAP serializer adds .NET specific attributes to the XML schema. e.g. isDataset=true, Diffgram, etc.

Typed datasets do have a schema backing them, but here again there is a hack required to make typed dataset accessible to a Java program. The hack is to return a xml string or a XML Node instead of the typed dataset. We would also have to change the autogenerated WSDL and add the typed dataset schema to it. A lot of trouble for using datasets in a heterogenous environment :( 
For hassle free interop, the best option is to go for pure schema driven complex types that can be mapped to any language.
More information is available at the following links:

XML serialization tips in .NET

Today, my team ran across a strange issue. Some of the properties on a .NET class were not getting serialized to XML when used in webservices. There was no error message or any other warning. 
Closer examination revealed that the properties not getting serialized were 'read-only' properties; i.e. they had getters but no setters. The default XML Serializer in .NET calls the property getters/setters and hence if a read-only property is present, then it is not serialized.
Once we added setters to the properties, XML serialization worked fine.

There is another option. We can implement the IXmlSerializable interface and write custom serialization code. But this could be very tedious. 

Wednesday, May 13, 2009

Java to .NET conversion tools

Today one of our teams was looking for a Java to .NET conversion tool. They had developed resuable components in Java and the client wanted similar components in their .NET environment.

Microsoft used to ship a module called JLCA (Java Language Conversion Assistant) that enables us to covert Java code to .NET code. This tool was shipped as part of VS 2005, but unfortunately is no longer shipped with VS 2008. Please read Microsoft's statement here regarding JLCA. 

I had a quick look at the tool by launching it from VS 2005 and selecting a few Java util libraries. The results looked good, but we need to understand that 100% conversion is not possible. The tool puts in a lot of comments in the generated .NET code, so all errors and warning are maked in the generated code. Also a cool HTML report is generated that mentions the errors and warnings. I could find tons of errros/warnings, but most of them fell into expected categories such as no direct mapping between Java and .Net classses, etc.

ArtinSoft offers an enhanced version of the JLCA tool called JLCA Companion. I have not evaluated this, but the documentations states interesting features for customization.

To summarize, I believe such migration tools are useful to jumpstart the process and have a foundation ready to work on.

Friday, May 08, 2009

Async calls in Java and .NET webservices

Over the past few weeks, the Architecture Support Group that I head at my organization, received quite a few queries on making asynchronous web service calls in a SOA environment. So decided to blog about the various options at hand.

To make asych webservice calls in .NET, the following programming models are available. Please visit the links for furthur information.

1. Using Asychronous Callback delegates.  

2. Using Event Based Asych methods

3. Fire and Forget mechanism: Here we can decorate the server side webmethod with the 'OneWay' attribute.

On the Java side, the popular Axis-2 framework supports asych web services calls out of the box, by generating call back handlers in the webservice proxy.

The WS-Addressing specification is also trying to bring in a standard for defining different MEP (Message Exchange Patterns).

Thursday, April 30, 2009

Smart Client Applications

Just read this blog entry about Smart Client technology and an architect's dilemma to choose an appropriate client UI technology. I had blogged about Smart Clients earlier here

In the blog posts, Philip discusses the scenarios where-in a thin web-based client makes sense. For SmartClient, he says that 'offline' applications are the best fit where it makes sense to use a thick client. I feel there is another strong reason why one would want to select Smart Client UI - a rich user experience. Even with cutting edge Ajax solutions, it is impossible to get the same look and feel of a desktop application over a browser.  We also have powerful WinForms/SmartClient third-party controls such as those from Infragistics that enable you to slice-n-dice thousands of records in a grid. Such raw power is not available over a web interface.

Other important point raised by Philip was the security aspect of Smart Client applications. How would we protect the client code and data from being compromised? Possible solutions:
- Encrypt all data
- Obfuscate the dlls
- Resolve data concurrency errors and database replication errors.

Sunday, March 15, 2009

Global exception handler in Win Forms and Smart Client

I always wondered if it is possible to add a global exception handler for a Win Form application. This global exception handler would handle all exceptions that have not been caught in the UI forms. In web forms, we have the onError event handler in Global.asax, but I was not sure if Windows forms had a similar functionality. 
I got the answer from Rich Newman's blog here:
Another option I had tried out was using the Policy Injection Block to capture all unhandled exceptions. But the caveat here is that all objects need to be proxied, which may be tedious to do.

Monday, March 02, 2009

Priciples of SOA

Was reading "Thomas Erl's book - Priciples of Service Design" over the weekend.
It's an interesting book with a lucid language and simple to understand examples. 
Jotting down some of the design principles for SOA:
  • Services should be platform neutral and programming language neutral
  • Services should have a standard interface contract
  • Services should be loosely coupled
  • Design for coarse grained services
  • Service Abstraction - Hide information that is not required by clients. Hide underlying technology details. Promotes loose coupling.
  • Service Reusability - Position services as enterprise resources with agnostic functional context; i.e. the service can be used in other functional scenarios too. Always design the service in such a way, that it can be used beyond its original context.
  • Service Autonomy - Services need to have a high degree of control over their underlying runtime execution environment. The higher up a service is in a typical composition hierarchy, the less autonomy it tends to have due to dependencies on other composed services.
  • Service Statelesness - defer or delegate state to databases, rather than in memory.
  • Service Discovery - services can be discoved using standards such as UDDI.

Tuesday, January 20, 2009

.NET FileHelpers Library

I came across this cool and simple .NET library that I bet would be useful to everyone - FileHelpers Library 2.0
Very often, we have tasks where we need to read comma delimited files or fixed lenth files and load them into the database. Though this talk is very straight forward, the FileHelpers Library actually makes this task a breeze. Also the code looks much more clean and maintanable.

API for reading/creating Excel files in Java and .NET

Recently I was trying to find a good library for reading and writing Excel files.
In .NET, there were quite a few alternatives. Given below are 2 blog-links where the author has given a detailed analysis of the libraries.
For .NET, you also have the option of using the COM Excel library thats available if you have Excel(MS Office) installed on your box. But using the Excel COM Interop assembly has performance issues.
For Java, my friend pointed out that an excellent free open-source library is available at BIRT exchange. This library is called "e.Spreadsheet Excel API" and a cursory glance at the sample examples impressed me. The library is simple to use and completely free to use in your commercial applications too.