Tuesday, January 03, 2012

Techniques for handling very large strings in Java

In my previous blog, I had jotted down the perils of storing large strings in memory. So what are the alternatives? Listing down a few at the top of my head right now.
  1. Stream the string to a file and read chunk-wise from the file when required.
  2. Store an array of strings, instead of storing a large string. A large continuous block of memory may not be available, but there could be small holes in the fragmented heap.
  3. Compress the string using GZIP compression methods. Use the GZIPWriter class to keep appending strings to a byte-buffer.
  4. If the large XML string is to be sent back as a webservice response, utilize the streaming support in SOAP stacks such as Axis 2 and CXF. Evaluate the use of MTOM for large attachments.
  5. If you are operating on a large number of files, first deal with the 'large' files. To understand why, please peruse these links - Link 1 & Link 2
In one of the scenarios, the large XML string had to be fed to the JasperReports engine. Found a few interesting options to deal with this challenge here.

Heap Memory in .NET

Apropos my previous post, my team was trying to resolve another memory leak problem in one of the .NET applications. It is interesting to note that a .NET program does not have any explict way to specify the heap size. The .NET heap size will keep on growing till it consumes all of the available memory.
A hosted application such as IIS can control the amount of heap allocated to a Application Domain.
The following discussion threads throw more light on this: Link1  Link2

Also found this amazing article by Andrew Hunter (ANTS profiler contributor) explaning the Large Object Heap concept in .NET. Understanding these concepts will make us appreciate how we get an unexpected OutOfMemory error even if our total object size is relatively small.