Tuesday, September 25, 2012

Ruminating on Hibernate Entities and DTO

Today we had a lengthy debate on the hot topic of Hibernate entities vs DTO to be used across layers of a n-tiered application. Summarizing the key concepts that were clarified during our brainstorming :)

Q. Is is possible to use Hibernate entities in SOA style web services?

Ans. Yes. It is possible, but with a few important caveats and serious disadvantages.

If you are using Hibernate annotations to decorate your entities, then it would mandate that your service clients also have the necessary Hibernate/JPA jar files. This can be a big issue if you have non-Java clients (e.g. a .NET webservice client). If you use a mapping file (*.xml), then you are good to go as then there are no dependencies on Hibernate jars. But any change in your data model will affect your entities and this will result in changes to all your webservice consumers. So a lot of tight coupling :(

Also you have to understand that Hibernate has used Javaassist to create new dynamic proxies of your Hibernate entities. So the entities that you are referencing are actually that of the generated dynamic proxy - that contains a reference to the actual entity object. So U would need to deproxy the entity and then use it - either using dozer or a similar mapping library.

Person cleanPerson = mapper.map(person, Hibernate.getClass(person));

Note: In the above code, Hibernate.getClass() returns the actual entity class of the proxy object. 

If you are using lazy loading (which most applications do), then you might encounter the famous "LazyInitializationException". This occurs when you detact the Hibernate entity and the serialization process would trigger the lazy loading for a property. To understand this exception, you need to understand the concept of sessions and transactions as given here.
If you do not have lazy loading and do a eager load of all your entities, then it would work - but this is only a short term solution that may work only for small applications.


Q. Ok. I understand the pitfalls. So if I go with DTO, will my problems be solved?

Ans. Yes, DTO is the best option, but you may encounter a few issues again.

Many developers encounter the "LazyInitializationException" when they try to use libraries such as a Dozer to copy properties. This happens because Dozer uses reflection behind the scenes to copy properties and again attempts to access uninitialized lazy collections. There are 2 ways to resolve this problem. Use a custom field mapper as shown here or use the latest version of Dozer that has a HibernateProxyResolver class to get the real entity class behind the Javaassist proxy. This is explained in the proxy-handling section of Dozer site.

Annotations are used at compile-time or runtime?

There is a lot of confusion among folks on the scope of annotations - whether annotations are used only at compile-time or also at runtime?

The answer is that it depends on the RETENTION POLICY of the annotation. An annotation can have one of the three retention policies.

RetentionPolicy.SOURCE: Available only during the compile phase and not at runtime. So they aren't written to the bytecode. Example: @Override, @SuppressWarnings

RetentionPolicy.CLASS: Available in the *.class file, but discarded by the JVM during class loading. Useful when doing bytecode-level post-processing. But not available at runtime.

RetentionPolicy.RUNTIME: Available at runtime in the JVM loaded class. Can be assessed using reflection at runtime. Example: Hibernate / Spring annotations.

Serializing JPA or Hibernate entities over the wire

In my previous post, we discussed about the need to map properties between Hibernate entities and DTOs. This is required, because Hibernate instuments the byte-code of the Java entity classes using tools such as JavaAssist.

Many developers often encounter the LazyInitializationException of Hibernate when you try to use the Hibernate/JPA entities in your webservices stack. To enable you to serialize the same Hibernate entites across the wire as XML, we have 2 options -
  1. Use the entity pruner library - This library essentially removes all the hibernate dependencies from your entity classes by pruning them. Though this library is quite stable, developers should be careful about when to prune and unprune.
  2. Use AutoMapper libraries - Using libraries such as Dozer makes it very easy to copy properties from the domain hibernate entity to a DTO class. For e.g.
PersonDTO cleanPerson = mapper.map(person, PersonDTO.class);

Out of the above 2 approaches, I would recommed to use the second one as it is clean and easy to use. It also enforces clean separation of concerns. A good discussion on StackOverFlow is available here.

Friday, September 21, 2012

How RSA Protected Configuration Provider works behind the scenes?

We were using the "RSA Protected Configuration provider" to encrypt sensitive information in our config files. I was suprised to see that the generated config file also had a triple-DES encrypted key.

So that means the config section is actually encrypted/decrypted using this symmetric key. But where is the key that has encrypted this key. It is here that the RSA public/private key pair come into picture. The public key in the RSA container is used to encrypt the DES key and the private key is used to decrypt the DES key. A good forum tread discussing this is available here.

There is also a W3C standard for XML encryption available here.

How does DPAPI work behind the scenes?

In my previous post, we saw how DPAPI can be used on Windows platforms to encrypt sensitive information without having to worry about key management. The advantage of using DPAPI is that data protection API is a core part of the OS and no additional libraries are required.

DPAPI is essentially a password-based data protection service and hence requires a password to provide protection. By default, the logged-on users password (hash) would be used for the same.  A good explanation of the internal working of DPAPI is given here:  http://msdn.microsoft.com/en-us/library/ms995355.aspx

Its interesting to see how MS has used concepts such as MasterKey and PRNG to generate keys that would be actually used for encryption.
I was intrigued to understand how DPAPI works when the user/administrator changes the password. Snippet from the article:
"DPAPI hooks into the password-changing module and when a user's password is changed, all MasterKeys are re-encrypted under the new password."

Thursday, September 20, 2012

Can we use DPAPI from Java?

In my previous post, we discussed on how DPAPI makes it simple to encrypt sensitive information without worrying about key generation and management.

I wondered if there was a Java API though which we can use DPAPI on windows machines. Found out that there is an open source JNI wrapper available for the same.
http://jdpapi.sourceforge.net/index.html

Also worth reading is this excellent post on encryption key management. 

Ecrypting sensitive information in database

Recently for PCI complaince, we needed to encrypt credit card information (PAN) before storing it in the database. We decided to use AES 256 bit encryption for the same and developed a .NET component in the middle tier for the encryption/decryption.

After this, we faced the chicken-n-egg problem of storing the encryption key :)
In the past, we had used techniques such as storing the key in the windows registry and using RBAC to control access. Other option was to split the key into multiple files and use file-based OS permissions to control access.

But in the lastest version of .NET, you have another good option - i.e. DPAPI (Data Protection API). Using DPAPI, we can delegate key management to the operating system.
We do not provide a key with which to encrypt the data. Rather, the data is encrypted with a key derived from the logged-in user or system credentials. Thus we can pass any "sensitive" information to DPAPI and it would encrypt it using the "password" of the logged-in user or machine level authentication credentials.

The following MSDN links give detailed information on how to achieve this for connection strings - a common requirement scenario.

http://msdn.microsoft.com/en-us/library/ms998280.aspx


In our case, we placed our encryption key in a configuration section and encrypted that section using the methods described in the above link.
Because we were using a web farm scenario with two servers, we had 2 options- either use DPAPI on each server to encrypt the data using the machine specific key or use a separate key store/container and use RSA for encryption. In the first option, we would end up with different 'cipher-value' for the same input data. In the second option, the 'cipher-value' would be the same, but we would need to import the RSA keys on each server in the farm.

More information at this link:   http://msdn.microsoft.com/en-us/library/ms998283.aspx 

Why do we need Base64 encoding?

We often use Base64 encoding when we need to embed an image (or any binary data) within a HTML or XML file. But why do we need to encode it as Base64? What would happen if we don't encode it as Base64?  A good discussion on this topic is available here

Base64 was originally devised as a way to allow binary data to be attached to emails as MIME. We need to use Base64 encoding whenever we want to send binary data over a old ASCII-text based protocol. If we don't do this, then there is a risk that certain characters may be improperly interpreted. For e.g.
  • Newline chars such as 0x0A and 0x0D
  • Control characters like ^C, ^D, and ^Z that are interpreted as end-of-file on some platforms
  • NULL byte as the end of a text string
  • Bytes above 0x7F (non-ASCII)
We use Base64 encoding in HTML/XML docs to avoid characters like '<' and '>' being interpreted as tags.

Beware of using Hibernate 'increment' id generator

While reviewing one of the development projects using Hibernate as the ORM tool, I noticed that they were using the "increment" ID generation feature.

The way the implementation of this feature works is that it checks the highest value of the ID in the database and then increments it by one. But this approach fails miserable in a clusterned environment, because many threads might be writing to database concurrently and there is no locking.

It is best to stick to database specific features for auto-id generations, e.g. sequences in SQLServer 2012, Oracle 11g and identity columns in old versions of SQL Server.

Thursday, September 13, 2012

Delegated Authentication through a Login Provider

Nowadays it is common to see many internet sites delegating the authentication process to some third-party providers such as Google, Facebook, Twitter, etc.

So essentially you do not have to register and create a new login/password for the site. You can just use your existing login credentials of google, facebook, twitter, OpenID, LiveID, etc. Behind the scenes, the application and the login provider use standards such as OAuth and OpenID to do the magic.

The advantages of delegating authentication to a popular third-party provider are:
  1. Users don't have to register and create another set of username/password for your site. Thus the user has fewer passwords to remember.
  2. The application does not have to worry about creating a Login form and SSL enabling it.
Found 2 good libraries on google-code for enabling OAuth delegated authentication on your application:
http://code.google.com/p/socialauth/

http://code.google.com/p/socialauth-net/

Wednesday, September 12, 2012

How does Oracle RAC manage locks?

Typically databases store locks in-memory and hence it is challenging to load-balance or cluster 'write' operations to a shared database across multiple database instances.
For e.g. In my previous blog, I mentioned how SQL Server does not have any Oracle RAC equivalent.

So how does Oracle RAC managed to maintain locks & data consistency across multiple nodes of the RAC cluster?

The secret sauce is the high speed private network called the "interconnect". RAC has something called as "Cache Fusion Mechanism" that allows for inter-instance block transfers and cache consistency. The global cache services (GCS) is a set of processes that ensures that only one instance modifies the block at any given time. GCS also flags in-memory blocks as "invalid" whenever the blocks are changed in other nodes.

The following links have a good explanation on the RAC consistency mechanism:
http://www.rampant-books.com/art_rac_global_block_management.htm

http://www.rampant-books.com/art_burleson_rac_cache_coherency.htm

Tuesday, September 11, 2012

Ruminating on SQLServer HA and DR strategy

Recently for one of our customers, we were evaluating the options for configuring SQLServer 2008 for High Availability and Disaster Recovery.

Having successfully used Oracle RAC technology in the past many a times, I was suprised to realize that SQLServer does not have any equivalent to Oracle RAC. There is essentially no concept of load-balancing "read-write" requests between server instances working on a shared database storage system (e.g. SAN, RAID-10 array).

The only near equivalent to the RAC concept is using the SQLServer 2008 peer-to-peer transaction replication with updatable subscriptions. This is horribly complex to configure and maintain with data being replicated to peer servers. Also MS has finally decided not to support this feature anymore.

The clustering techniques of SQLServer 2008 use confusing words such as "active/active" and "active/passive". But in reality, there is no load-balancing of the database requests. The secondary server is typically in stand-by mode (as in active/passive). The 'active/active' concept essentially means that the SQL Servers are accessing two separate databases or a partitioned database. If one server fails, the processing is offloaded to the second machine. So now the second machine is running 2 server instances and this bogs down the resources of the server.

In SQLServer 2012, MS has tried to move closed to the RAC feature. We have a new feature of "AlwaysOn" that enables us to create HA groups and define replication strategies between the nodes in a HA group. The 'secondary' server would always be 'read-only' and can be used for reporting purposes or for 'read' operations. This is far better, as now we can atleast offload our 'read' operations to a different server, but we need to funnel all 'write' operations through the 'primary' server. A good discussion on this topic is available here.

After HA, we moved over to define our DR strategy. We first compared log-shipping vs mirroring. In Log Shipping, the secondary database is marked as 'read-only' and can be used for reporting, but the time taken for replication can be upto 30 mins. Hence it cannot be used if you desire instantaneous failover.
In Mirroring, there is almost instantaneous update to the secondary failover database, but since the database is always in recovery mode, it cannot be used for any purpose.

Again in Mirroring, we have 'synchronous' and 'asynchronous' replication. IMHO, synch replication is a disaster for any OLTP application and should actually never be used. Found a good case study on MS site that details out the strategy for "identity columns" when there is a potential for data-loss using async mirroring. Another option that can be considered for DR is SQL Replication.

Spring Security SecurityContextHolder

We have been happily using the static methods of "SecurityContextHolder" in our application. But as architects, it is important to understand what happends behind the scenes..

Where does Spring store the Principle object? My rough guess was that it would either be in Session or the ThreadLocal containers.
Found this great write-up that explains how the "Strategy" design pattern is used to store the UserDetails (Principle) object in the HTTPSession and then through a filter assigning it to the ThreadLocale container.

http://stackoverflow.com/questions/6408007/spring-securitys-securitycontextholder-session-or-request-bound

The advantage of this approach is that one can obtain the current 'Principle' anywhere in the code using simple static methods rather than being dependant on the request object or session object.

Monday, September 10, 2012

Connection Strings Reference.

Found this cool site containing a consolidated listing of "connection-strings" that can be applied to most of the databases out there. Quite valuable for quick reference :)

http://www.connectionstrings.com/

Friday, September 07, 2012

Estimating the size of SQL Server Database

MSDN has a very good link that explains in simple language on how we can estimate the size of the database. The size of the database would depend on the size of the tables and the indexes. The below link gives us easy formulas that can be used to calculate the database size.
http://msdn.microsoft.com/en-us/library/ms187445

A good samaritan also created excel templates that use these formulas for database sizing.

On Oracle, you can use PL/SQL SPROCs to calculate the table and index sizes. For e.g. you have the CREATE_TABLE_COST procedure of the DBMS_SPACE package that can be used to find out the bytes required for a table. You just have to input the potential number of rows expected.

Thursday, September 06, 2012

Ruminating on RACI matrix

The RACI matrix (Responsible, Accountable, Consulted, Informed) is an excellent tool for mapping between processes/functions and roles.
It brings in immediate clarity on Who’s Responsible for something, Who's Accountable for the final result, Who needs to be Consulted and Who all are kept Informed.

There are several templates available on the net that can be used for the RACI matrix. One good template is available here. Once data is collated on such a matrix, then we can analyse questions such as -

a) Are there too many A's? Usually its best to have only 1 person/role accountable for a process/function.
b) Are there too many I's? Too much information that is not needed? etc.

We can use RACI matrices anywhere. You could "RACI" any deliverable in your project. Also in EA governance, we can use RACI to clearly separate the responsibilities across the centralized EA team and project architects.Given below is an example of EA Governance RACI Matrix.