Friday, April 20, 2012

Google Chart APIs

The last time (around 1 year ago), when I had evaluated Google Charts API, I was a bit disappointed. The Chart API only enabled you to embed an image (chart) that would be created on Google servers.

But looks like Google has completely revamped the concept and branding and have a brand new Chart API.

The new charting API looks cool and very easy to use. There are also samples and libraries that will help you write server side code to pass chart data to the client. I was particularly impressed with the Java DataSource library and Oracle PL/SQL library.

It is important to note that Google right now, does not allow these JS API to be downloaded offline and used in a web app that does not have internet connection (e.g. intranet applications). Please consider this constraint before you decide to use Google Chart Tools.

The older version of the image charts is still available here.There is also an online tool for quicking creating an image of a chart. Could be useful if you quickly want to create some stuff for your PPTs :)

JavaScript coding guidelines

With RIA applications becoming the norm, developers have to deal with a lot of JavaScript code. It is important to have proper coding conventions for JS too. Was glad to see a good document posted by Google on JavaScript coding conventions.

http://google-styleguide.googlecode.com/svn/trunk/javascriptguide.xml

Also, there are tools available to check the code quality of JavaScript code. Here are a few examples available:

http://docs.codehaus.org/display/SONAR/JavaScript+Plugin

http://jslint.com/


 

JAMon lives on !

I have been a big fan of JAMon tool for monitoring Java applications.It is lightweight and has a cool UI that gives you the stats you want. Also it is very clean and simple to use.

The last time I used JAMon was around 5 years ago. I was suprised to see that the project is still alive and kicking and has a few updates that make it even more interesting. I liked the concept of Listeners, which makes it easy to customoize JAMon.

The one thing that was missing in JAMon in the yesteryears was a consolidation app to read stats from multiple nodes in a cluster and present in on a unified dashboard. Luckily there seems to be a project called JARep that just does that. There is no much documentation available on JARep, but there is a good case study on DZone that explains how JARep works.

The case study is available at: http://architects.dzone.com/articles/case-study-performance-tuning--0

Though I have experiemented with a lot of other tools, I found JAMon to be the best and simplest. One can get it up and running in a project with a few minutes.

Thursday, April 19, 2012

Passing wilcard * in Runtime.exec() command

One of the projects I was consulting on approached me with a peculiar problem.
The application was executing unix commands from a Java program using RunTime.exec() APIs. But strangely for some reason, the "rm" command was not working. The user account running the Java process had the necessary rights for deletion, so the problem was somewhere else.

A quick googling around answered the problem :)
http://www.coderanch.com/t/423573/java/java/Passing-wilcard-Runtime-exec-command

Its important to remember that when we pass wildcards to the RunTime.exec() API, it will treat it as a string only. It is the Unix shell that understands the "*" syntax.
Hence something like this would work: Runtime.exec(new String[] { "sh", "-c", "rm /tmp/ABC*" });

Cool library for Java NIO

I often felt the need to write a wrapper class libaray around Java NIO packages to reduce the complexity for developers.
I was pleased to find one opensource project that does just that :)

The Netty project provides a very clean and easy API for writing network applications. Very useful if we need to implement a customized protocol for special purpose.

Friday, April 06, 2012

Closed Loop vs Open Loop Models in Card Processing

Found this good article on the internet that describes the differences between closed loop and open loop models.
Excerpts from the article:

"Open-loop payments networks, such as Visa and MasterCard, are multi-party and operate through a system that connects two financial institutions—one that issues the card to the cardholder, known as the issuing financial institution or issuer, and one that has the banking relationship with the merchant, known as the acquiring financial institution or acquirer—and manages information and the flow of value between them.
In a typical closed-loop payments network, the payment services are provided directly to merchants and cardholders by the owner of the network without involving third-party financial institution intermediaries. Closed-loop networks can range in size from networks such as American Express and Discover, which issue cards directly to consumers and serve merchants directly."


The site also has another interesting link on how companies such as American Express make money and the competitive advantage they gain because of the closed loop model.

Thursday, March 29, 2012

The Architecture of Open Source Applications

Found this cool book on the web, that explains the history behind many successful open source projects.

The chapters also contain a good amount of technical information - which was a pleasure to read :)

http://www.aosabook.org/en/index.html

Monday, March 26, 2012

Techniques for website design on all devices

Today, there is a growing demand to create websites that can render across multiple devices such as desktop browsers, tables and smart phones. New standards such as HTML5 help in this regard.

But it's important to apply proper design principles when we develop web pages that can reder across a wide array of devices. Found this cool article that describes a few techniques that can be used. The author of this blog 'Ethan Marcotte' has also written a book on this called "Responsive Web Design".

The core philosophy is how can we use CSS3/HTML5 to enable our application to detect the capabilities (size, resolution, JS support, etc.) of the browser and adapt the layout of the page accordingly. 

Friday, March 23, 2012

Lightweight UML sketching tool - UMLet

For years, I have been searching for a lightweight tool to quickly draw UML diagrams - in order to brainstorm an idea with my team. The traditional tools I have been using for UML were Rational Software Architect, IBM System Architect, Visual Studio 2010, Visio, ArgoUML, etc.

All the above tools are good, but are quite heavy to use and require installation. If you need a robust and quick UML sketching tool, then UMLet will blow your mind :)

I was particularly impressed with the simplicity of the tool. Drawing class diagrams, sequence/activity diagrams, package structures, deployment diagrams are a breeze....And it can run standalone as a JAR file or as an eclipse plugin. Loved the 'geeky' properties editor. U need to use it to appreciate it :)
It does not support code generation, as it was not the design intention of the tool.
Highly recommended for all agile architects!

Monday, March 05, 2012

How to ensure that IOCP is used for async operations in .NET?

In my last post, I had blogged about IO Completion Ports and how they work at the OS kernel level to provide for non-blocking IO.

But how can the 'average Joe' developer ensure that IOCP is being used when he uses async operations in .NET?

Well, the good news is that a developer need not worry about the complexities of IOCP as long as he is using the BeginXXX and EndXXX methods of all objects that support async operations. For e.g. SQLCommand has BeginExecuteReader/EndExecuteReader that you can use for asynchronously reading data from a database. FileStream, Socket class all have BeginXXX/EndXXX methods that use IOCP in the background. Under the bonnet,  these methods use IO completion ports which means that the thread handling the request can be returned to the threadpool while the IO operation completes. 

Some versions of Windows OS may not support IOCP on all devices, but the developer need not worry about this. Depending on the target platform, the .NET Framework will decide to use the IOCompletionPorts API or not, maximizing the performance and minimizing the resources.

An important caveat is to avoid using normal async operations for non-blocking IO - such as "ThreadPool.QueueUserWorkItem, Delegate.BeginInvoke", etc. because these do not use IOCP, but just pick up another thread from the managed thread pool. This defeats the very purpose of non-blocking IO, because then the async thread is drawn from the same process-wide CLR thread pool.

Non blocking IO in .NET (Completion Ports)

Non blocking IO is implemented in Windows by a concept called 'IO Completion Ports' (IOCP).
Using IOCP, we can build highly scalable server side applications that can perform asynchronous IO to deliver maximum throughput for large workloads.

Traditionally server side applications were written by assigning one thread to a socket connection. But this approach seriously limited the number of concurrent connections that a server can handle. By using IOCP, we can overcome the "one-thread-per-client" problem, because 'worker' threads are not blocked for IO. Rather there is a separate pool of IO threads called 'Completion Port Threads' that wait on a special kernel level object called 'Completion Port'.

A completion port is a kernel level object that you can bind with a file handle - either a file stream, database connection or a socket stream. Multiple file handles can be bound to a single completion port. The .NET CLR maintains its own completion port and can bind any file handle to it. Each completion port has a queue associate with it. Once a IO operation completes, a message (completion packet) is posted to the queue. IO threads block or 'wait' on this completion port queue, till a message is posted. The waiting IO threads (a.k.a completion port thread) pick up the messages in the queue in FIFO order. Hence any thread may handle any completion message packet. It is important to note that threads are 'woken' in a LIFO order, so chances are that caches are still warm.

The following links throw more light on this:
http://blog.stevensanderson.com/2008/04/05/improve-scalability-in-aspnet-mvc-using-asynchronous-requests/
http://www.codeproject.com/Articles/1052/Developing-a-Truly-Scalable-Winsock-Server-using-I

Why does the .NET Thread Pool have a separate worker thread pool and a Completion Port pool?
I believe that technically there is no fundamental difference in the nature of the threads associated with each pool. Worker threads are meant to do active work, where as Completion Port threads are meant to wait on completion ports. Since IO threads wait on CPs, they may block for longer periods of time. Hence the .NET framework has created separate categories for them. If there was a single pool, then there could be a situation where a high demand on worker threads exhausts all the threads available to dispatch native I/O callbacks,
potentially leading to deadlock.

Looks like in IIS 7, the threading model has undergone drastic changes. More info available here

Thursday, March 01, 2012

Why no delegates in Java? And will Closures come to Java?

Having worked across Java and .NET platforms, I often compare the features of one over the other. One of the interesting features of the .NET platform is the concept of 'delegates'.

At first, a Java guy may take some time to understand the concept of delegates, but once you are hooked on to it, you tend to use it everywhere...because it is so convienient. The .NET framework uses delegates extensively throughtout its event framework.Java folks have traditionally used the 'Listener' interface pattern for eventing. Even concurrent/parallel libraries in .NET heavily use delegates, whereas Java folks have to still stick with interfaces :(  The closest equivalent to delegates in Java is the anonymous inner class - which IMHO is messy to read and write.

StackOverFlow has a series of interesting discussion threads on this topic:
http://stackoverflow.com/questions/44912/java-delegates
http://stackoverflow.com/questions/2635013/why-not-net-style-delegates-rather-than-closures-in-java
http://stackoverflow.com/questions/1340231/is-there-an-equivilent-of-c-sharp-anonymous-delegates-in-java
http://stackoverflow.com/questions/1973579/why-doesnt-java-have-method-delegates


Another interesting feature that many dynamic languages have is 'closures'. A closure is similar to the concept of delegate, but they are not quite the same. Martin Fowler has a good bliki post explaining the concept of Closures and the difference compared to delegates.

.NET supports both closures and delegates. Found this good article explaning closures in .NET.

Wednesday, February 29, 2012

Connection timeouts in a mirrored SQLServer

Recently, one of my teams was facing a connection timeout issue when we tried to implement 'parallelism' in a data-driven application.
A colleague of my mine pointed out that there was a bug in ADO.NET (with a mirrored SQL Server) that could result in this wierd behavior. More details available at this link.

Quick resolution is to try increasing the connection timeout and allocate a greater no of connections at start-up in the pool.

Monday, February 20, 2012

Business Intelligence vs Analytics

My collegue Sandeep Raut has a very simple blog-post explaining the differences between traditional BI and Analytics. Summarizing a few key points from the blog below.

"BI traditionally is concerned with creating reports on past data or even current live data. We create OLAP cubes using which we can slice & dice the data, even do a drill down. Analytics is about analyzing the data using mathematics/statistics to identify patterns. These patterns can then be used to predict what may happen in the future. Analytics is about identifying relationships between key data variables that were unknown before. It is about surfacing unknown patterns."

But in my humble opinion, should Analytics not be a subset of BI? I can understand the hype that product vendors create to differentiate their products in the market, but can Analytics exist in isolation to BI? Even predictive data analysis using "realt-time" data/text mining techniques would logically fall under BI....
After all BI is all about meeting business needs through actionable information !
Maybe it is just a game of words and semantics. I remember a few years back, the term DSS (Decision Support Systems) was more widely used than BI :)

Wednesday, February 15, 2012

Using Parallelism in .NET WinForm applications

We all have gone through the travials of multi-threaded programming in WinForm applications. The challenge in WinForm applications is that the UI controls are bound to the thread that created/rendered them; i.e. the UI control can only by updated by the main thread or the GUI thread that created it.

But to keep the UI responsive, we cannot execute any long running task (>0.5 sec) on the UI thread, else the GUI would hang or freeze. If we run the business logic asynchronously on another thread, then how do we pass the results back to the main GUI thread to update the UI?

Traditionally this has been done using the Control.Invoke() methods. More details on this approach is available on this link: http://msdn.microsoft.com/en-gb/magazine/cc300429.aspx

But with the introduction of TPL, there is another alternative way of doing this. We can use the TaskScheduler and SynchronizationContext classes to call heavy lifting work and then pass the results to the main GUI thread.

For e.g.
TaskScheduler uiScheduler = 
           TaskScheduler.FromCurrentSynchronizationContext();
new Task({Your code here}).start(uiScheduler);

Given below are 2 excellent articles eloborating this in detail:
http://www.codeproject.com/Articles/152765/Task-Parallel-Library-1-of-n

http://reedcopsey.com/2010/03/18/parallelism-in-net-part-15-making-tasks-run-the-taskscheduler/

Sacha Barber has an excellent 6 series article on the intricacies of TPL, which I loved reading.

Parallelism in .NET

In one of my previous blogs, I had pointed out to an interesting article that shows how TPL controls the number of threads in the Thread Pool using hill-climbing heuristics.

In order to understand why TPL (Task Parallel Library) is far superior to simple muli-threading, we need to understand the concepts of global queue, local queue on each thread, work-stealing algorithms, etc.
Given below are some interesting links that explain these concepts with good illustrations.

http://www.danielmoth.com/Blog/New-And-Improved-CLR-4-Thread-Pool-Engine.aspx

http://blogs.msdn.com/b/jennifer/archive/2009/06/26/work-stealing-in-net-4-0.aspx

http://udooz.net/blog/2009/08/net-4-0-work-stealing-queue-plinq/

A few important points to remember:
  • There is one global queue for the default Thread Pool in .NET 4.0
  • There is also a local queue for each Thread. The Task Scheduler distributes the tasks from the global queue to the local queues on each Thread. Even sub-tasks created by each Thread get queued on the local queue. This improves the performance, as there is no contention to pick up work items (tasks) from the global queue; especially in a multi-core scenario.
  • If a thread is free and there are no tasks in its local queue and also global queue, then it will steal work from other threads. This ensures that all cores are optimally utilized. This concept is called 'work stealing'.
  • Tasks from the global queue are picked up in 'FIFO' order. Tasks from the local queue are picked up in 'LIFO' order based on the assumption that the last-in is still hot in the cache. Work stealing again happens in 'FIFO' order.
There is a wonderful book on parallel computing available on MSDN that is a must read for everyone.

Monday, February 13, 2012

Data Services in the Microsoft world

In my previous blog, I ranted on the concept of Data Services in creating a data virtualization layer. In the .NET world, data services equate to WCF data services (formerly a.k.a ADO.NET data services)

Microsoft is propogating the use of an open standard called OData for building REST style data services. A good article describing OData is available on MSDN. OData essentially leverages JSON/ATOM and HTTP semantics to build a simple data services layer across disparate data sources.
But looks like besides M$, there are no big vendors jumping on the OData bandwagon. Its interesting to note that WebSphere eXtreme Scale Servers also expose a OData service.

Ruminating of Data Virtualization

The industry is flooded with confusing terms when it comes to understanding 'Data Virtualization'. We have IaaS (Information as a service), Data Services, EII (Enterprise Information Integration), Data Federation, etc. and so on! The point is that there are no industry standard definitions for these analyst-coined terms and there is a lot of overlap between them.

Rick Lans tries to clear the cloud with some simple definitions here. Another interesting post by Barry Devlin throws more light on the concept of data virtualization.

The core concept behind data virtualization is to create an abstraction layer (Data Access Layer) that hides the complexities of the underlying disparate data sources and provides a unified view of the enterprise data to the applications. This can be implemented using "SOA style" Data Services or creating a virtual data layer that can be queried using SQL-like semantics. More info can be found at these links: Link1 & Link2

RedHat has a nice whitepaper explaining the concept of Data Services in a SOA environment. This post explains the benefits of data virtualization. Composite Software is a leader in data virtualization techniques and has shared a couple of interesting case studies that demonstrate the use of their data virtualization platform.

One thought that came to my mind was regarding the challenges in accessing NoSQL data from the data virtualization layer. While some type of NoSQL datastores such as XML documents, Key/Value pairs can be exposed as a relational SQL view, it may not be possible to have a uniform query interface for unstructured data. All NoSQL data stores will expose some kind of Java API that can be used for querying. Would it be possible to create a common set of meta-data for both structured and unstructured data?
In such scenarios, IMHO, the only strategy for data virtualization is to use Data Services.

Thursday, February 09, 2012

Google Protocol Buffers

Just found a good post by the Google Engineering team ranting about the historical context of Google Protocol Buffers.
My first reaction to GPB was - "Why on earth another binary serialization format"?
I think the reason behind the popularity of GPB has been its simplicity and ease of use. 

This site has an interesting discussion on comparing GPB to XML/JSON.  A few snippets from the site comments/discussions -

  • A major difference between protocol buffers and JSON is that protocol buffers use a binary format, while JSON is plain text.  Because it's binary, the format is more compact and easier to interpret by a computer - which makes protocol buffers faster than JSON.
  • Another reason GPB is so fast is that it uses positional binding. JSON is less bloated compared to XML (which is over bloated), it still sends the name of the attribute with each record. That creates an enormous amount of overhead. PB, on the other hand, uses positional binding and doesn't send the attribute names at all.
  •  Binary protocols have to deal with portability issues like byte-order (little/big-endian) etc., there are advantages when it comes to parsing dates, timestamps, etc.

Alternatives to XML Serialization

Today, there are a lot of alternatives for XML serialization of data structures. These data interchange formats are smaller and faster than processing XML.
Most popular are Google Protocol Buffers, Thrift (from FaceBook),  Avro and MessagePack. A good article comparing these alternatives is available here -
http://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/

Wikipedia also has an interesting article comparing various data serialization. 

Tuesday, January 03, 2012

Techniques for handling very large strings in Java

In my previous blog, I had jotted down the perils of storing large strings in memory. So what are the alternatives? Listing down a few at the top of my head right now.
  1. Stream the string to a file and read chunk-wise from the file when required.
  2. Store an array of strings, instead of storing a large string. A large continuous block of memory may not be available, but there could be small holes in the fragmented heap.
  3. Compress the string using GZIP compression methods. Use the GZIPWriter class to keep appending strings to a byte-buffer.
  4. If the large XML string is to be sent back as a webservice response, utilize the streaming support in SOAP stacks such as Axis 2 and CXF. Evaluate the use of MTOM for large attachments.
  5. If you are operating on a large number of files, first deal with the 'large' files. To understand why, please peruse these links - Link 1 & Link 2
In one of the scenarios, the large XML string had to be fed to the JasperReports engine. Found a few interesting options to deal with this challenge here.

Heap Memory in .NET

Apropos my previous post, my team was trying to resolve another memory leak problem in one of the .NET applications. It is interesting to note that a .NET program does not have any explict way to specify the heap size. The .NET heap size will keep on growing till it consumes all of the available memory.
A hosted application such as IIS can control the amount of heap allocated to a Application Domain.
The following discussion threads throw more light on this: Link1  Link2

Also found this amazing article by Andrew Hunter (ANTS profiler contributor) explaning the Large Object Heap concept in .NET. Understanding these concepts will make us appreciate how we get an unexpected OutOfMemory error even if our total object size is relatively small.

Friday, December 30, 2011

OutOfMemoryError while using StringBuilder/StringBuffer

I was helping a friend debug a OutOfMemory exception in a Java web application. The program made heavy use of StringBuilder and was appending a large number of strings. An entire record set (containing thousands of records) was essentially converted into an XML string.
Strangely when the OOM error occured, there was still plenty of heap memory available. Furthur deep-dive debugging and some googling around, taught a few important lessons.
  1. Whenever the internal buffer capacity of StringBuilder/StringBuffer is exceeded, then the next character array size it creates is twice of the original size. A good blog explaining this is here. Hence it is better to initialize the initial capacity of the StringBuilder to a reasonable value beforehand.
  2. StringBuilder needs a continuous block of memory for further allocation. For e.g. you may have 20MB free heap space, but it may be fragmented. Hence even a 5MB StringBuilder allocation may fail and result in a OOM error. Links to forums - Link 1  Link2 Link3
  3. Try to use a 64-bit machine, as there are no practical limitations for the heap memory allocation and also it is much easier to find a continous block of memory due to 64-bit addressing.
  4. Alter the design of the program to not store the string in memory, but in a file. Alternatively stream it directly to the HTTP response. 

Friday, December 02, 2011

Taxonomy of Services

In one of my previous posts, I had blogged about creating a taxonomy of services using functional categorization.
For e.g. Entity Services, Task/Activity Services, Process Services and Infrastructure services.

But services can also be categorized from different perspectives such as layers or intent of use. For e.g.
Categorization based on Service Layer:

1. Business Services: Represent high level business functions that define an enterprise.
2. Application Services: Application specific and usually will be aggregated in a composite service at the business level.
3. Infrastructure Services: Utility functions that deliver cross cutting functions.

Categorization based on scope:
1. Enterprise Services: Multiple LOBs use the service.
2. Domain Services: Applicable only within a LOB.
3. Application Services: Local to the App Level.

Thursday, November 17, 2011

Analysis vs Design

This age-old debate keeps propping up every now and then :)
Found a couple of good articles reflecting on the difference between the two.

http://butunclebob.com/ArticleS.UncleBob.AnalysisVsDesign

http://devhawk.net/2004/03/30/analysis-vs-design-modeling/

Thursday, November 03, 2011

Oracle Web Service Manager vs Oracle Enterprise Gateway

Oracle Web Service Manager is an integral part of the Oracle SOA suite and it allows us to implement security declaratively; without any coding from the developer. Security policies can be enforced at run-time using WSM agents or WSM gateways.
There is another product called "Oracle Enterprise Gateway" that has features that overlap with OWSM - hence this results in lot of confusion.

So, lets understand the concepts one by one. A WSM agent is a component that is installed with the endpoint service. So it provides the 'last-mile' security (last security layer). Now OWSM also a gateway component where security policies can be employed in a central location. A gateway can also perform functions that an agent cannot do, such as message routing, transformations, and failover. OWSM also has an extension for OSB, that allows us to use OWSM policies at the OSB (ESB) layer. These capabilities would suffice the requirements of most intranet SOA infrastructures.

Oracle positions the Oracle Enterprise Gateway as the first line of defence ("perimiter security") when SOA services are exposed to the outside world. This is the equivalent of a DMZ firewall. So it looks like Oracle Enteprise Gateway is a more expansive product that can do everything that OWSM does, plus all the bells and whistles.

Tuesday, November 01, 2011

Ruminating on the Oracle MDM suite

Recently during one of our internal brainstorming sessions, there was a lot of confusion over the various components available on the Oracle platform to build a robust MDM solution. Part of the confusion was because Oracle has picked up best of breed components from various acquisitions and integrated them together to form the MDM suite. Its important to understand that when someone talks about Oracle MDM - it is a suite of components and NOT just one product.

Oracle's acquisition of Hyperion has further added to the confusion, as Hyperion has full capabilites to be used as a MDM solution. Oracle promotes Hyperion Data Relationship Manager as a component in its MDM suite - that can be used for managing the relationships between different attributes of master data from disparate sources.

At the fundamental level, to build a MDM end-to-end solution, you need basic components such as a ETL tool, Data Profiler, Data Cleansing Engine that can be used for standarization, de-duplication, validation, etc. Given below are the core components of the Oracle MDM suite, followed by optional components that help in jump-starting your MDM journey.

  • Oracle Data Integrator Enterprise Edition: ODI can be used for ELT style bulk data movement, or near real-time updates, and data services. ODI can consolidate master data from various sources and also publish master data to downstream applications. (Note: Oracle has acquired Golden Gate product that enables real-time intergration of data across disaparate data-sources. GG can also be used in conjuction with ODI for movement of data)
  • Oracle Data Quality / Data Profiling: Oracle Data Profiling allows us to profile the master data and investigate the content and the structure of their different data sources. It also gives users the ability to monitor the evolution of data quality over time using Time Series. Oracle Data Quality allows us to standardize, validate, cleanse and enrich master data – for e.g. master list of securities, issuers, official list, etc. Using both these tools will ensure the integrity of data stored in the MDM data store.
  • Oracle Business Intelligence Suite: OBIEE can be used for analytics and reporting on master data entities.
Besides these core components, Oracle MDM suite also contains pre-packaged MDM solutions such as “Customer Hub”, “Product Hub”, “Site Hub”, “Supplier Hub”, etc. (Some of these are part of Siebel MDM, I believe.)  

Thursday, October 27, 2011

Ruminating on IRM (Information Rights Management)

From a security architecture perspective, it is important to consider the need for using IRM technology. Traditionally we have secured access to documents using RBAC patterns for secure access and download.

But how do you control the information once it is downloaded to the users machine? Can the user copy/paste from the document? Can the user print the document? Can the user forward the document to someone or upload it somewhere? Can he run macros on the document? So how can an enterprise have total control on sensitive information?

These questions cannot be answered by classical access control mechanisms, they need a new security framework concept called "Information Rights Management". Many traditional ECM vendors also offer IRM adapters or add-ons to help customers have total centralized control over their digital assets. For e.g. SharePoint 2010 has IRM protectors that can be plugged-in for end-to-end protection of documents on the user's computers.Oracle UCM can be extended with Oracle IRM, etc.

Across all these IRM product architectures, it is necessary to have some form of client application installed on all users machines. Files that get downloaded from the DMS are special encrypted rights-managed files. The file format contains meta-data that defines the access that can be given to the user. The client application would decrypt the file, understand the access constraints and accordingly give rights to the user. On the windows platform, MS has long released the Windows Rights Management Services - a comprehensive API to address IRM challenges on the windows platform.

Thursday, August 11, 2011

How does .NET TPL control the number of threads

I often wondered what heuristics the Task Parallel Library (TPL) in .NET uses to control the number of threads for optimal utilization on multi-core machines.
Found a great discussion thread on StackOverFlow explaining the details.

Thursday, July 21, 2011

Techniques for Service Identification in SOA

Its very important to use proper service identification techniques to identify services in an portfolio. In fact, service identification should be the first step in your service lifecycle management process.

Jotting down some of the techniques that we have been using for identifying services:
  1. Domain decomposition approach - Look at the high level business entities and create entity services for them.
  2. Top down BPM driven approach - Start from the business processes and divide them into sub-processes. Each business process consists of tasks & activities, that would orchestrate between different service components.
  3. Business Goal driven approach -  Derive services from business goals. Decompose the business goals into a set of services that would help satisfy the business goal. Provide tracability between business goals and IT services by a Goal/Business Service matix.
  4. Existing systems -  Service wrappers are created on existing systems - to surface them for orchestration in a business process or a composite service. This technique may not be appropriate if the existing IT landscape is not aligned with business goals.
  5. UI driven approach - Identify/discover services based on the user interface requirements. UI technologies such as Flash, Silverlight, Ext-JS directly call JSON/REST services on back-end application servers.