Tech Talk

Wednesday, April 13, 2011

Operational Reports vs MIS Reports

Once organizations create a data warehouse, a lot of people push all the reporting needs to the DW. But do all reports need to run from a DW?
The answer lies in understanding the difference between operational and informational (MIS) reports. A good article by Bill Inmon on this difference can be found here.

Operational reports are typically detail oriented and shows the latest up-to-date records. Operational reports are used by stakeholders for short term tactical decision making. MIS reports look at summary data over a longer time horizon and are used for strategic decision making.

Examples of operational reporting include bank teller end-of-day window balancing reports, daily account audits and adjustments, daily production records, flight-by-flight traveler logs and transaction logs.

Examples of informational reporting include monthly sales trends, annual revenue, regional sales by product line for the quarter, industry production figures for the year, number of employees by quarter and weekly shipping costs by carrier.

Wednesday, March 23, 2011

Activity Diagrams vs BPMN Diagrams

For modeling business processes, there are two standards popular today – UML 2.0 Activity Diagrams and BPMN.

There are semantic differences in notation between the two standards. For e.g. the way OR-splits, AND-splits/joins are shown.

A detailed whitepaper showcasing the differences between the two notations can be found here.
A discussion thread at http://www.bpm-research.com/forum/index.php?showtopic=501 makes an interesting read.

Saturday, January 29, 2011

Business Function Models Vs Business Capability Models

The difference between these models boils down to the difference between a “business function” and a “business capability”. Many organizations use them interchangeably. For e.g. A business capability model may illustrate current-state business functions and also future-state business functions that need to be built to deliver on the business vision.
But there are few folks who would like to draw a clear line of differentiation between the two. Capability can be defined as the ability to perform actions to achieve specific strategic goals/objectives.

The following links provide interesting reading:
Link 1
Link 2

Hence a business capability is much more than a business function – it encompasses other objects such as Actors, Services, Functions, Processes and Infrastructure. Examples of business capability are – the ability to service customers through online channels, capability to survive liquidity crisis, etc.

Business Architecture Models

While defining the enterprise architecture of an organization, it is essential to understand the various processes, functions and capabilities of the business. Recently came across the FEA Business Reference Model on Wikipedia. The simplicity of the diagram impressed me.

Every organization has different business areas or LOBs. Each business area has a set of business functions that it performs. A ‘Business Function Model’ describes these functional areas and sub-functions in a graphical representation. An example of a BFM can be found here.

Each LOB executes its business functions by following certain business processes. A business process is an orchestration of different business activities and tasks that may be exposed as SOA services.

Tuesday, January 18, 2011

Various dimensions of Security

When we design our applications to be secure, we have to consider all aspects of security. I have often seen people associate security with just authentication and authorization, but there are other security principles to be considered as stated below.

Integrity: We have to ensure that all messages/data have not been tampered with. Integrity of messages ensures that the data has not been maliciously modified by 'man-in-the-middle'.
Confidentiality: This security principle ensures that all messages are encrypted and cannot be eavesdropped.
Authentication/Authorization: Ensure that all resource access goes through a proper authentication process.
Non-Repudiation: This ensures that any party involved cannot refute the validity of a message exchange.

Modern toolkits and technologies such as digital certifications satisfy all of the above security principles.

Monday, January 17, 2011

Concurrent Business Engineering

Some time back, I had blogged about the advantages of having a unified BPM/SOA strategy at the enterprise level. Ran through a Forrestor report that touch a similar concept and calls it as "Concurrent Business Engineering".

Concurrent Business Engineering entails greater colloboration between Business and IT for jointly working on defining new business processes and also defining the technology platform for supporting the processes.
Business services are best designed with a strong understanding of the business process context, hence a top down BPM process-centric view would help in understanding what services need to be surfaced to provide maximum agility to the business process.

This is similar to the idea of having a unified BPM/SOA strategy and utilizes the best of top-down and bottom-up methods for executing the business strategy - as blogged earlier.

Friday, January 14, 2011

Types of Services in SOA

Found a nice article on MSDN describing the various types of services - a taxonomy for services.
Jotting down the concepts explained in the article.

Entity Services - Expose/Surface business entities in the system.e.g. employee, customer, sales order,etc. They contain CRUD operations and additional domain specific operations - e.g. FindOrderByPrice Entity Services abstract the underlying datasources and persistence mechanisms.

Capability Services - Implement a specific business capability - for e.g. Pricing Service, Credit Card Processing Service, etc. They may use Entity Services for persistence.

Thus Entity Services are "data-centric" and Capability Services are "action-centric".

Process Services - Acts as a facade for a BPM process. Process services would maintain state due to the very nature of a workflow having to maintain state over a long running process. Process Services are typically implemented using BPM tools such as WWF, Biztalk, WPS, etc.

Infrastructure Services (Utility Services) - Common cross cutting functions such as Logging, Auditing, Security, Authorization, etc.

Thursday, January 06, 2011

Ruminating on SOA Governance

SOA Governance has two dimensions. First – the processes and methodologies used. Second – the tools and products used for governance.

Quite often, people assume that the purchase of a SOA Governance tool would suffice for implementing SOA Governance. But the fact is that the tools would only help in automating certain enforcement policies and service lifecycle workflows. What is first needed is a framework of processes, policies and organization structure to be defined. Any governance process needs to embrace the trilogy of “people, process and technology”.

The following diagram illustrates this point and states the various activities that need to be done for implementing SOA Governance.

The Open Group has also published a SOA Governance Framework that can be accessed here.

Wednesday, December 29, 2010

SOA Registry, Repository and Service Catalog

While implementing enterprise SOA, it is important to consider deploying a service catalog for services. There is a lot of confusion between the concepts of registry, repository and service catalog.

Traditionally a registry has been a lookup service provided to service consumers. Service providers register their services in the registry and service consumers select an appropriate service for their needs. Standards such as UDDI addressed these needs.The Registry would contain service descriptions, service contracts and service policies that describe a service. Service registries have also been practically used for determining a service end-point address at runtime based on the service unique name.

So what is a repository? As the importance of SOA Governance grew, it became necessary to capture more meta-data about a service. A service repository integrates information about a service from multiple sources and stores it in a centralized database. Service information may include design artifacts, deployment topologies, service code repository, service monitoring stats, etc. Vendors have started positioning their generic asset management products as SOA repositories. For e.g. Rational Asset Manager.

A lot of vendors now sell a combined product that consists of the registry and repository. For e.g. IBM Websphere Registry and Repository.

A service catalog is a concept that can be implemented using SOA registry/repository products.

Monday, December 27, 2010

Entities Vs Value Objects

In Domain Driven Design, we often separate Entities and Value Objects. Junior architects always get confused between these 2 concepts.
The essential difference is that domain entities have an identity and a lifecycle. So each Entity has a unique identity and with a given domain, no two entities can have the same identity. Value objects need not have an identity. So if we have an "equals()" method that compares the parameter values of each value object, then we can have value objects that are identical. Value objects should ideally also be immutable.
The following links offer interesting stuff on this concept.
1) Lostechies
2) StackOverflow
3) Devlicious

Friday, December 17, 2010

TCO of applications during Portfolio Rationalization

In my previous blog post , I had narrated the process of portfolio rationalization. During the fact finding process, we need to calculate the TCO of an application. It’s a good idea to have a predefined template for entering all the parameters that add to the total cost of the application, i.e. hardware costs, software license costs, maintenance costs, data center costs, etc.

We should also try to collect the TCO statistics over a time period; i.e. over the last 3-5 years. This data when plotted on a graph would help us in identifying patterns and spotting trends. For e.g. if the TCO of an application is showing steep increase with every passing year, then we need to be wary of the “cost of inactivity”. Cost of inactivity means what will happen if no action is taken?

The TCO of applications should also be compared against the business value that the applications are providing. It may be that 70% of the TCO could be consumed by applications having 30% business value.
Another important dimension to capture would be the usage statistics and performance SLAs over the last few years. If the number of uses are increasing and the SLAs are not been met, then it’s time for some proactive action.

Monday, December 06, 2010

SOA and BPM

Yesterday, we were having a discussion with one of our customers on the hot topic of SOA and BPM strategy, i.e. can SOA/BPM initiatives be combined?, what are the challenges, pitfalls, best practices, etc. Jotting down some of the key points of the brainstorming session.

To start with, its important to realize that both BPM/SOA has a common goal - greater business agility and to align IT with business. SOA and BPM complement each other and the potential benefits are compounded when you have a unified enterprise wide strategy for them.
BPM drives a process-centric thought process - right from design, implementation, monitoring and continuous optimization. BPM forces a paradigm shift from an application centric view to a process centric view. SOA is an architectural style where as BPM is a management discipline.
A combined BPM/SOA initiative will do the delicate balancing act between incremental and transformational change. Also a combined initiative should enable stakeholders to decide what important processes need that extra agility and prioritize them to be re-engineered as services because funding is always limited.
Top-down BPM appraoch drives the discovery of services since they provide important insights into understanding what parts of the IT portfolio can be exposed as SOA services. Thus BMP can provide a structured approach for identifying reusable business services.
SOA services also enable faster integration in BPM as the need for custom integration touch points reduces and this in turn enables faster deployment of BPM. SOA also enables rapid change of business processes which is not possible if the business process in embedded in a lot of traditiona non-SOA applications. For e.g. when a process needs to change to comply with a new regulation or due to a change in business strategy, then a loosely coupled BPM process orchestrated using SOA services is easier to change. New services can be plugged-in or existing services can be rearranged.

In today's fast changing business dynamics - "as is" and "to be" are simply temporal states of reality. The future state cannot be predicted, we can only stay prepared by keeping our business processes agile.

Thursday, December 02, 2010

SONAR tool

My team has been evaluating the SONAR tool to manage code quality. I was impressed with the features and the user friendliness of the tool. SONAR can be used for both Java and .NET projects. It has a open plug-in architecture that allows any code quality tool to be plugged in.

For example, for static code analysis it combines the power of popular tools such as PMD, checkStyle and FindBugs into an unified user interface that is great to use :)

SONAR also has support for free code coverage tools such as JaCoCo. Code coverage can be measured by unit tests or integration tests. You can even drill down to source code level - something I love to do :)

SONAR also integrates with new tools such as SQALE that have a formal approach for defining code quality in terms of maintainability, Testability, Reliability, Changeability, Efficiency, Security, Portability, Reusability, etc. Overall it is an invaluable tool to access Technical Debt.

Monday, October 18, 2010

Activation bar in UML Sequence Diagrams

A lot of folks get confused between the "life-line" concept and "activation bar" concept in Sequence Diagrams.

The vertical lines drawn are called the lifeline of the object. When the object is no longer alive, then we can draw an 'X' at the bottom of the line. So why do we need an activation bar? I have seen a lot of architects choosing not to draw the activation bar to keep the diagrams simple.

The activation bar (a.k.a focus of control) represents the time the object is "active", i.e. doing some processing, computing something, waiting for a response from a sub-routine, etc. So it is possible to model multiple interactions in a single diagram - for e.g. a lifeline can have nultiple activation bars.

Friday, October 15, 2010

Good whitepaper on WSRP Portlets

A friend of mine was confused on the concept of remote portlets and the WSRP protocol. I forwarded him this cool whitepaper that explains the concept of remote portlets in a simple and lucid language.

Some snippets from the whitepaper:

Remote portlets enable dynamic integration of business applications and information sources into portals.
This approach only works if all portlets are physically installed at the employee portal; the process of making new portlets available is tedious and expensive.

Instead of just providing raw data or single business functions that still require special rendering on the portal side, Web Services for Remote Portlets (WSRP) are presentation-oriented, interactive web services.
They are easy to aggregate and can be invoked through a common interface using generic portlet-independent code that is built into the portal. In addition, re-implementation of the presentation layer on each portal is avoided. The use of generic portlet proxies consuming all WSRP services conforming to the common interface eliminates the need to develop service-specific portlets to run on the portal.

The big difference between WSRP services and data-oriented web services is that they are presentation-oriented, interactive services that all have one common interface. This means that they simple plug into portal servers and do not require any service-specific code on the consuming portal.

Another point to consider is the comparison of remote portlets with mashups? Are they the same? Remote portlets and WSRP standards can be used to create mashups, but portlets also bring in the advantage of personalization and customization. Mashup's also need not just refer to aggregation of content. Wikipedia describes mashup as "A mashup is a website or Web 2.0 application that uses content from more than one source to create a completely new service. This is akin to transclusion." For e.g. tons of new applications built on top of Google Maps.

Wednesday, September 22, 2010

The case for IT Portfolio Rationalization

Un-rationalized IT portfolio is a critical issue facing many organizations. There are a plethora of reasons why an IT portfolio becomes ‘bloated’ with an ever increasing inventory of applications running of multiple platforms. It could be because of mergers and acquisitions, lack of enterprise architecture standards and IT governance, legacy applications that need to undergo a technology refresh cycle, etc. As a result of this, a majority of the IT budget is spent on “keeping the lights on”, rather than investing on new strategic initiatives. This in turn results in poor business-IT alignment.

Continuous Portfolio Assessment and periodic Rationalization of IT assets is needed to lower the TCO of the IT portfolio and enable business agility. Portfolio assessment and optimization should be an integral part of the organizations EA group.

Any portfolio rationalization exercise would start with an assessment of the current state. Here pre-defined templates to capture the high level IT landscape would be beneficial. More comprehensive templates would be required to capture the details of each application. Information parameters such as application infrastructure, business process supported, TCO, pain areas, etc. would be captured.

The second step would be group applications into clusters. A ‘cluster’ is nothing but a group of applications that are similar in semantics - from a business perspective or a technology architecture perspective. For e.g. applications collaborating or orchestrated to fulfill a business process, a cluster of web applications running of the same technology stack - Websphere App Server on Solaris.

Each application cluster is then evaluated across various dimensions – such as Business value, Business functionality adherence, Current or Future Technology standards adherence, EA standards conformance, SLA conformance, TCO, etc. Based on the rating and the weightage given to each parameter, we would arrive at final scores. Based on the scoring, application clusters would be segmented into 4 categories – retain, enhance, replace, retire/sunset.

The final step is to define roadmaps for the transformation – a set of project streams/initiatives that would be required. A project prioritization framework would help in sequencing the selected initiatives. Pre-defined templates for project prioritization would come in handy here. Here again a metrics driven approach would work; wherein we evaluate the priority of the selected work streams based on parameters such as business impact, risk, technology skills, cost/investment required.

It is also very important to define a governance body to manage the rationalization program and sustain it. The governance body would have participation from both business and IT stakeholders.

The end deliverables of the rationalization exercise would be:
1. Matrix of applications segmented into 4 categories – retain, enhance, replace, retire/sunset
2. Define project streams (rationalization roadmaps) and project sequencing (prioritization) to meet the future state E A vision
3. Define opportunities/initiatives for business process optimization and reduction of TCO

Thursday, August 19, 2010

Estimates and Scheduling

Read an interesting article by Joel on software scheduling. Most organizations either use the use-case point estimation method or the function point estimation method.

But the best estimates come from experienced designers and developers - because you have done something similar in the past, you are more confident of the estimates you give.
But how can you bring in these past experiences of people into your estimation process. We would need to collect data on all past and present projects and extract metrics from them.

In the article, Joel talks about "evidence-based scheduling (EBS)" - i.e. base your future estimates on past experiences and developer productivity. It might be difficult to record the productivity of each and every developer, but we can create clusters for a developer group based on yrs of experience.

The most interesting point regarding EBS was that you do not have a single delivery date - you only have probabilities assigned to a range of dates. I always struggled selling this concept to program managers and customers. The point is that its impossible to predict the exact date of shipment at the start of the project. Based on fine granular tasks and past estimation data, we can use algorithms such as "Monte Carlo Simulation" to arrive at a range of dates each with a probability index assigned to it - similar to the image below.

Monday, July 05, 2010

Use of Generics in .NET webservices

Recently, one of my team members was trying out an idea of using a generic container "response" object for all return types from the webservice. The advantage of doing so was that we would just have one generic "Response" object instead of many response objects for different operations - hence it would be easier for clients to extract data and consistently get other meta-data information in the Reponse object such as "Success/Error messsage", "Count", etc.
Given below is sample code showing the generic response data contract.
-------------------------------------------


[DataContract]
public class Response
{
[DataMember]
public T Data { get; set; }
[DataMember]
public List DataCollection { get; set; }
[DataMember]
public string ErrorMessage { get; set; }
}

[DataContract]
public class Customer
{
[DataMember]
public string Name { get; set; }
[DataMember]
public int Age { get; set; }
[DataMember]
public decimal ContactNo { get; set; }
}

[DataContract]
public class Vendor
{
[DataMember]
public string VendorName { get; set; }
[DataMember]
public int Rating { get; set; }
}

-------------------------------------------
As you can see, the actual return object (e.g. vendor, customer) is wrapped in a generic response object. Given below is code snippet of the webservice/WCF service.
-------------------------------------------
public class GenericService : IGenericService
{
public Response DoOtherWork()
{
Response res = new Response();
Vendor c = new Vendor();
c.VendorName = "Rocky"; c.Rating = 3;
res.Data = c;
res.ErrorMessage = "Big Errors";
return res;
}

public Response DoWork(int i)
{
Response res = new Response();
List lstcust = new List();
Customer c = new Customer();
c.Name = "Rocky"; c.Age = 26; c.ContactNo = 7688890909;
int j = 0;
Customer c1 = new Customer();
c1.Name = "dfgdfgRocky"; c1.Age = 26; c1.ContactNo = 7688890909;

lstcust.Add(c); lstcust.Add(c1);
res.DataCollection = lstcust;
res.ErrorMessage = "No Errors";
return res;
}
}
-------------------------------------------
It was interesting to see how the .NET framework handled this in the WSDL. What complex types are created in the WSDL? This was important because the WCF services could potentially be used by Java and Ruby clients.
Here we found a hitch - the WSDL schema had 2 complex types such as "ResponseOfCustomer24335423" and "ResponseOfVendor24r58sdf". This WSDL worked fine when we created stubs in Java by importing the WSDL. But only the random numbers appended at the end of the complex schema was a problem from naming standards and clarity point of view. You would not want your callers to keep wondering why such strange names were given.
But here again, we found an easy workaround. While defining the data-contract, we have the option of specifying the name of the complex type it should appear as in WSDL.
So we just added one key value pair to the Data Contract attribute as follows.
-------------------------------------------
[DataContract(Name = "ResponseOf{0}")]
public class Response
{
[DataMember]
public T Data { get; set; }
[DataMember]
public List DataCollection { get; set; }
[DataMember]
public string ErrorMessage { get; set; }
}
-------------------------------------------
This was cool as could have a placeholder "{0}" for the type passed to the generic container class.
Once we did this, the complex types in the WSDL has user friendly names such as "ResponseOfCustomer" and "ResponseOfVendor".

Wednesday, June 23, 2010

What rules to put in a BRE?

Most organizations today understand the value of externalizing rules from application code by using a Business Rules Engine. The agility and flexibility derived from BREs allows for quicker roll out of new business rules and better business IT alignment.

But how do we determine what business rules should go into the BRE? Not all business logic needs to go into a BRE. The following methodology can be used to arrive at a good set of externalized business rules.

Prepare a set of variable data/rules for the application. Ask questions such as: What changes? Frequency of change? Who/What triggers the change?

Assign importance of agility to each variable data. What changes are time-critical? Business Impact if a business rule is not changed in time? Business benefit if we can change the variable data quickly.

Check if business users can make changes to the variable data. For e.g. if-else conditions can be handled by a business analyst, but change in complex statistical formulas may not be easy.

If we go through the above thought process, we can arrive at a logical set of variable data and rules that should be externalized by the system.

Tuesday, May 18, 2010

CLUE database in Insurance

The CLUE (Comprehensive Loss Underwriting Exchange) is a database of insurance losses and claims that was created and is maintained by ChoicePoint. ChoicePoint was taken over by LexisNexis some time back. LexisNexis C.L.U.E. Property reports contain up to seven years of personal property claims matching the search criteria submitted by the inquiring insurance company. So it's the equivalent of credit reports in the insurance industry. ChoicePoint also maintains a database for personal auto claims history.

When we apply for a homeowner's or auto insurance policy, the insurance company orders and review our CLUE report. A CLUE report is a document containing your personal information, as well as data regarding past property claims that have been paid under your previous insurance policies. CLUE reports includes the type of property (vehicle or home), dates of losses, loss descriptions and amounts paid.

Insurance companies use information on CLUE reports to evaluate homeowner's and auto insurance applicants for acceptability. Information can also be used to determine your policy premiums. CLUE reports are used almost exclusively to underwrite and rate new policies. Most insurers renewing existing policies do not access CLUE reports at renewal, largely because they already have loss histories for these properties in their own database. Only insurance companies that subscribe to CLUE can submit loss data and access CLUE reports.

A software program called Quick Connect has been designed by ChoicePoint for sending and receiving information electronically to and from ChoicePoint.

Thursday, May 13, 2010

Conceptual model for SOA

Just read a very interesting blog post from Nick Malik, where in he provides a Common Conceptual Model for SOA. I was bowled over by the simplicity of the language and diagram. So easy to explain to a fellow architect embarking on a design assignment based on SOA.

The most important things to work on in a SOA project:

Canonical Data Model

Canonical Message Model (Shared Message Contract)

Service Interfaces

Business Event Taxanomy

I have seen that if you get these fundamentals right during the design process, then building scalable SOA based systems is a lot easier.

Wednesday, May 12, 2010

Why Enterprise Architecture?

Recently a friend of mine asked me to explain in simple terms the benefit of having an Enterprise Architecture for an organization. The challenge was to translate the message in non-technical non-complex terms without any jargon. He apparently had read an EA white-paper by a big analyst firm and had his head spinning :) . I won’t blame him – this is a problem with ‘complex definitions’.

As architects, the most important challenge that we solve day-in and day-out is in reducing and managing IT complexity. Yet, it’s an irony that many architects/analysts use ‘complex long phrases’ to describe even the simplest of concepts.

Enterprise architecture is required to give a holistic view of the entire IT landscape of the organization. This enables the organization to view its current state and plan for the future state of its IT operations. EA is very useful to plan and prioritize your IT budgets in accordance with your important business priorities. It serves as a powerful communication tool between the business and IT teams and helps align new IT investments appropriately. Organizations have also started using EA as a tool for risk management and SWOT analysis.

Thus having an Enterprise Architecture Framework helps us in visualizing the ‘big picture’ and the relationship between different domains. It also leads to better governance and low maintenance cost enabled by enforcement of technology standards and architecture guidelines.

Quite often, EA is compared to ‘city planning’. In city planning, you look at the big picture and establish zones for specific purposes. It lays down the guidelines and best practices for buildings, roads, water supply, hospitals, etc. The design of each building is analogous to ‘Solution Architecture’. The special skill of designing a kitchen is analogous to ‘technical/application architecture’.

So to summarize, EA has the following advantages:

Helps build a common understanding of the future IT direction of the enterprise.
Provide clear mapping of the business processes and the IT systems supporting it. Provides visibility into how the business processes enable the mission of the enterprise.
Improves interoperability and integration by defining enterprise canonical message models and data models. Also integration standards and guidelines - for e.g. SOA, MOM, etc.
Enables organizational agility - If we need to respond to a business change, what is the impact on the IT systems, etc.
Less cost due to technical homogenity that is easier to support and maintain.
Powerful tool for communicating risk and change to all stakeholders in the enterprise.

There are a number of frameworks for defining EA – popular among them are the Zachman Framework and TOGAF. I had also blogged earlier about tools used for creating an EA.

OpenSource EA tools

Planning to evaluate a couple of open source Enteprise Architecture tools that are becoming popular. IBM System Architect was the de-facto commercial EA tool in the market and we were actively using it for our customers.

First is "The Essential Project" that was launched last year. It is based on the Protégé ontology tool and allows you to create a meta-model for your EA. It has a EA repository and reporting tools.

Second is a tool called "iteraplan". This tool is more popular in Europe and also has an Enterprise Edition that is free for use, but without the source code.

Tuesday, May 11, 2010

Creating Dynamic classes in Java

Earlier I had blogged about the ability to create new classes from scratch in .NET.
I was looking for something similar to Reflection.Emit() in Java. Found a few examples on the web that use the Java compiler to compile source code dynamically and use Reflection to execute the methods.
http://www.javaworld.com/javaworld/jw-06-2006/jw-0612-dynamic.html?page=5
http://www.rgagnon.com/javadetails/java-0039.html

Other option is to use byte-code manipulation libraries such as BCEL, ASM, etc. A good list of open source byte code libraries are given at:
http://www.java-opensource.com/open-source/bytecode-libraries.html

Payment Gateways

We were exploring the various options for selecting a payment gateway for a new business opportunity. Found this site that has a good explanation of payment gateways and compares the popular payment gateways available in the market.
http://www.bestpaymentgateways.com/
The site has a brief explanation on each payment gateway. The following gateways are explained.

Amazon Payment Gateway
Authorize.net
PayPal PayFlow Pro
VeriSign
Google Checkout
Moneybookers
VerePay
WorldPay

Monday, May 10, 2010

Parsing and reading text from PDF files

One of my development teams was looking for a PDF parsing library. They essentially wanted to search and extract data from PDF files. At first, I thought that OCR is the only way to achieve this, but there are libraries available to help us :)

PDFBox : This seems to be most popular library for extracting text out of PDF files. This is a Java library, but also has a .NET wrapper around it using iKVM.NET
Simple examples using this library can be found here and here.

iText & iTextSharp : These libraries are very popular for PDF generation and can also be used for extracting text from PDF files. Sample example can be found here.

I have heard that OpenOffice.org also provides you with a Java API that can be used to create and manipulate PDF files, but have not tried it yet.

Thursday, May 06, 2010

Datawarehouse vs Datamart

The debate between creating a datamart or sourcing data from the datawarehouse springs up from time to time across organizations. Found some good links on the web on this great debate:
information-management
exforsys.com
opensourceanalytics.com

Wednesday, April 28, 2010

Performance issues with an in-memory Datatable select clauses

Very often, we cache .NET Datatable objects in memory. This could be for lookup tables or for other frequently accessed data. The following article on MSDN points out that using 'select' on a Datatable can be very very slow.

http://msdn.microsoft.com/en-us/library/dd364983.aspx

Snippet from the article:
"Select takes arbitrary criteria and returns an array of DataRows. Essentially, DataTable.Select has to walk the entire table and compare every record to the criteria that you passed in. "

The performance stats for LINQ in the article look very impressive. Looks like Microsoft has made a lot of optimizations on LINQ. It would be great to know (see source code) of what happens behind the scenes and why LINQ is so damn fast!

.NET PDF creation tools

Some time back, I had blogged about the opensource tools available in Java and .NET for PDF creation. Recently came across another commerical library called "Dynamic PDF" for PDF creation. This library has a Java and .NET version of the API.
The library is easy to use and is very robust. It handles 'over-flowing' tables and text areas in PDF files very gracefully.
Another interesting library that Dynamic PDF sells is the DynamicPDF™ PrintManager for .NET . This library makes it so simple to print PDFs and also has callback handlers for error messages from printers - all pure managed code. Now that's what I like :)

Two opensource .NET PDF generation tools are PDFSharp and iTextSharp.

XSD to XML and vice versa

I was looking for a quick and free tool that could convert XSD schema files to sample XML files and also generate a XSD file from a sample XML file. Commerical tools such as XML Spy and OxygenXML were powerful tools that provided these features, but I was looking for a free one.
First I checkout the open source Java IDEs.

Netbeans had a beautiful editor to visualize and edit a XSD schema in a graphical tree structure. But unfortunately it did not have the ability to generate sample XML file or vice versa.
Eclipse too had a visual editor and allowed visual editing of schema elements. It could generate a sample XML file from a XSD file, but had no option for the reverse; i.e. generating a schema file from a sample XML.
VS 2008 SP1 had both the options - coversion between XML and XSD. For schema files, right click on a node in XSD Explorer view and select "Generate XML". For XML files, select "Tools -> Generate schema" to create the XSD file. Both these operations are very quick in Visual Studio.
VS 2010 has extensive support for XML tooling. You have 3 different views for schema designing that should suffice for most complex schema definition.

Besides these free tools, there are other command line tools that can be used. For e.g.

XSD.exe tool can be schema to XML and vice versa transformation.

There was another .NET tool that I found on MSDN for generating XML documents from XSD.

Another cool Java desktop tool that supported these features was XMLSpear.