Thursday, December 29, 2005

About Daemon Threads in Java

Came across this cool article about Daemon Threads at

I knew that the VM does not exit as long as there is one non-daemon thread running. But there were a couple of new things I learned. Some snippets from the above article:

There are two ways a thread can become a daemon thread (or a user thread, for that matter) without putting your soul at risk. First, you can explicitly specify a thread to be a daemon thread by calling setDaemon(true) on a Thread object. Note that the setDaemon() method must be called before the thread's start() method is invoked.Once a thread has started executing (i.e., its start() method has been called) its daemon status cannot be changed.

The second technique for creating a daemon thread is based on an often overlooked feature of Java's threading behavior: If a thread creates a new thread and does not call setDaemon() to explicitlly set the new thread's "daemon status", the new thread inherits the "daemon-status" of the thread that created it. In other words, unless setDaemon(false) is called, all threads created by daemon threads will be daemon threads; similarly, unless setDaemon(true) is called, all threads created by user threads will be user threads.

Monday, December 26, 2005

Debug information in Java .class files

I have often wanted to learn more about the class file format of Java class files. This study let me to interesting discoveries - regarding what debug information is stored in class files and how it is stored.

When we compile a Java source using the 'javac' exe, the generated class file by default contains some debug info - By default, only the line number and source file information is generated.
Hence when a stack trace is printed on the screen, we can see the source file name and the line number also printed on the screen. Also, when using log4J, I remember using a layout which can print the line number of each log statement - I bet log4J uses the debug infomation present in the class file to do this.

The javac exe also has a '-g' option - This option generates all debugging information, including local variables. To understand this better, I compiled a java source twice; once with the -g option and once without. I then decompiled both the class files to see how the decompiled output differs. The class file compiled with the debug option, showed all the local variable names same as in the original Java file, whereas the one without the debug option, had the local variables named by the decompiler as the local variable information was not stored in the class file.

Another interesting fact is that the class file can contain certain attributes that are non-standard; i.e. vendor specific. Suprised !!!...So was I..Please find below a snippet from the VM spec :

Compilers for Java source code are permitted to define and emit class files containing new attributes in the attributes tables of class file structures. Java Virtual Machine implementations are permitted to recognize and use new attributes found in the attributes tables of class file structures. However, all attributes not defined as part of this Java Virtual Machine specification must not affect the semantics of class or interface types. Java Virtual Machine implementations are required to silently ignore attributes they do not recognize.

For instance, defining a new attribute to support vendor-specific debugging is permitted. Because Java Virtual Machine implementations are required to ignore attributes they do not recognize, class files intended for that particular Java Virtual Machine implementation will be usable by other implementations even if those implementations cannot make use of the additional debugging information that the class files contain

Friday, December 23, 2005

Select second highest record with a query

A friend of mine was asked in an interview to write a query to fetch the second highest marks in a table of students.

Well, I found out 2 ways to do this:

1) Using TOP to get the topmost 2 rows from the table in descending order and then again getting a TOP of the resultset.

FROM Table1
ORDER BY Marks desc)

2) Using the MAX function. (I was suprised to see this work)

SELECT Max(Marks) FROM Table1
WHERE Marks Not In (SELECT MAX(Marks) FROM Table1);

Hierarchical databases Vs Relational Databases

While designing a systems, I often came across the moot point of choosing btw a heirarchical database and a relational flat table database. There are some domain problems, where heirarchical databases such as LDAP and MS Directory Services make sense over RDBs.

The hierarchical database model existed before the far more familiar relational model in early mainframe days. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element.

In his article, Scott Ambler makes the following statement:
Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the data
structure. For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.

I found an article on the microsoft site which discusses the areas where directory services can be used in place of Relational databases.

Some good candidates for heirarchical databases are- White pages" information, User credential & security group information, Network configuration and service policies, Application deployment policies etc.

Thursday, December 22, 2005

The beauty of Log4J

It took me sometime to appreciate the inheritance heirarchy in Log4J. It's quite simple to understand the Log Levels and their usage.

It is because of this heirarchy that we can have any arbitraty level of granularity in logging. For e.g. suppose we have 10 components in a distributed environment and we want to shutdown the logging of one component. This can be easily done in Log4J using config files. We can also disable logging at a class level - provided we have a Logger for that class - using the Logger.getLogger(this.class) method. Another example is if U want to switch off logging of one or more modules in a application.

We can even make dymanic changes in the configuration file at runtime and these changes would get reflected provided we have used the configureAndWatch() method.

Ruminating on ClassLoaders in JVM.

One interesting point about ClassLoaders is that it is possible for the same class to be loaded separately by different class-loaders, resulting in multiple copies of the class object.
I came across a good synopsis on classloaders here :

Here are some snippets from the above article:

Classloaders, as the name suggests, are responsible for loading classes within the JVM. Before your class can be executed or accessed, it must become available via a classloader. Given a class name, a classloader locates the class and loads it into the JVM. Classloaders are Java classes themselves. This brings the question: if classloaders are Java classes themselves, who or what loads them?
When you execute a Java program (i.e., by typing java at a command prompt), it executes and launches a native Java launcher. By native, I mean native to your current platform and environment. This native Java launcher contains a classloader called the bootstrap classloader. This bootstrap classloader is native to your environment and is not written in Java. The main function of the bootstrap classloader is to load the core Java classes.

The JVM implements two other classloaders by default. The bootstrap classloader loads the extension and application classloaders into memory. Both are written in Java. As mentioned before, the bootstrap classloader loads the core Java classes (for example, classes from the java.util package). The extension classloader loads classes that extend the core Java classes (e.g., classes from the javax packages, or the classes under the ext directory of your runtime). The application classloader loads the classes that make up your application.

bootstrap classloader <-- extension classloader <-- application classloader All three default classloaders follow the delegation model. Before a child classloader tries to locate a class, it delegates that task to a parent. When your application requests a particular class, the application classloader delegates this request to the extension classloader, which in turn delegates it to the bootstrap classloader. If the class that you requested is a Java core class, the bootstrap classloader will make the class available to you. However, if it cannot find the class, the request returns to the extension classloader, and from there to the application classloader itself. The idea is that each classloader first looks up to its parent for a particular class. Only if the parent does not have the class does the child classloader try to look it up itself.

In application servers, each separately-deployed web application and EJB gets its own classloader (normally; this is certainly the case in WebLogic). This classloader is derived from the application classloader and is responsible for that particular EJB or web application. This new classloader loads all classes that the webapp or EJB require that are not already part the Java core classes or the extension packages. It is also responsible for loading and unloading of classes, a feature missing from the default classloaders. This feature helps in hot deploy of applications. When WebLogic starts up, it uses the Java-supplied application classloader to load the classes that make up its runtime. It then launches individual classloaders, derived from the Java application classloader, which load the classes for individual applications. The individual classloaders are invisible to the classloaders of the other applications; hence, classes loaded for one particular application will not be seen by another application. What if you want to make a single class available to all applications? Load it in a top-level classloader. This could be in the classpath of WebLogic. When WebLogic starts, it will automatically load this class in memory using the Java-supplied application classloader. All sub-application classloaders get access to it. However, the negatives of this approach are clear too. First, you lose the capability of hot deploy for this particular class in all individual applications. Second, any change in this class means that the server needs to be restarted, as there is no mechanism for a Java application classloader to reload classes. You will need to weigh in the pros and cons before you take this approach. WebLogic Server allows you to deploy newer versions of application modules such as EJBs while the server is running. This process is known as hot-deploy or hot-redeploy and is closely related to classloading.
Java classloaders do not have any standard mechanism to undeploy or unload a set of classes, nor can they load new versions of classes. In order to make updates to classes in a running virtual machine, the classloader that loaded the changed classes must be replaced with a new classloader. When a classloader is replaced, all classes that were loaded from that classloader (or any classloaders that are offspring of that classloader) must be reloaded. Any instances of these classes must be re-instantiated.

Wednesday, December 21, 2005

O/R mapping tools

I have had some experience with some O/R mapping tools such as TopLink and Hibernate.
But I did not know that the market is flooded with a variety of OR tools.
Here are a few of them:

Abra -
BasicWebLib -
Castor -
Cayenne -
DataBind -
DB VisualArchitect -
EnterpriseObjectsFramework -
Expresso -
FireStorm -
Hibernate -
Hydrate -
iBATIS DB Layer -
Infobjects ObjectDRIVER-
InterSystems Cache -
Jakarta ObjectRelationalBridge -
Java Ultra-Lite Persistence (JULP) -
Jaxor -
JdoGenie -
JDOMax -
KodoJdo -
LiDO -
O/R Broker -
Persistence EdgeXtend -
PlanetJ's WOW =
PowerMapJdo -
Signsoft intelliBO -
SimpleOrm -
SpadeSoft XJDO -
Sql2java -
Sun Microsystem certified (SunTone?) - FrontierSuite? for J2EE & J2SE -
Sun Microsystem certified (SunTone?) - FrontierSuite? for JDO -
The ProductivityEnvironmentForJava (PE:J) -
TopLink -

Also the following articles give some some good introduction to O-R mapping tools and their comparisions:

Tuesday, December 20, 2005

Is Swing really slow?

Often I have heard people complain that Swing Apps are slow. I also have had some bad experiences with slow Swing Apps, but then I have also seen some very cool high performant applications. So, what's the secret of these fast Swing apps. I think the reason can be found in James Gosling's answer:

The big problem that Swing has is that it's big and complicated. And the reason that it's big and complicated is that it is the most fully featured, flexible, whatever-way-you-want-to-measure-it UI [user interface] tool kit on the planet. You can do the most amazing things with Swing. It's sort of the 747 cockpit of UI design tool kits, and flying a 747 is a little intimidating, and what people really want is a Cessna.

It is definately possible to write good speedy Swing applications, but U need to be a master of Swing and threading and know all the potholes which hog memory.

Here are a few links which contain some good info on fast Swing apps:

Wednesday, December 14, 2005

Duplicate Form Submission and the Synchronizer Token pattern

The Synchronizer Token pattern addresses the problem of duplicate form submissions. A synchronizer token is set in a user's session and included with each form returned to the client. When that form is submitted, the synchronizer token in the form is compared to the synchronizer token in the session. The tokens should match the first time the form is submitted. If the tokens do not match, then the form submission may be disallowed and an error returned to the user. Token mismatch may occur when the user submits a form, then clicks the Back button in the browser and attempts to resubmit the same form.

On the other hand, if the two token values match, then we are confident that the flow of control is exactly as expected. At this point, the token value in the session is modified to a new value and the form submission is accepted.

Struts has inbuild support for the Synchronizer Token pattern. Method such as "saveToken()" and "isTokenValid()" are present in the ActionForm class.

Other ways of preventing a duplicate submit is using javascript to disable the submit button after it has been pressed once. But this does not work well with all browsers.

Proxy setting for .NET programs

To access an internet site thru a .NET program U may have to go thru Ur company proxy.
Here's the sample code for doing so:

string query="";
WebProxy proxyObj = new WebProxy("", 8080) ;
NetworkCredential networkCredential = new NetworkCredential("username", "secret_password") ;
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(query) ; proxyObj.Credentials = networkCredential ;
req.Proxy = proxyObj ;

For proxy setting in Java programs click here.

Locking in Databases

Databases support different types of locking strategies. We have optimistic locking strategies where concurrency would be more, but data integrity would be less. We have pessimistic locking strategies where concurrency is less, but there is more data integrity. Basically when it comes to locking and transaction isolation levels, its a trade-off between data integrity and concurrency.

Databases supports various levels of lock granularity. The lowest level is a single row. Sometimes, the database doesn't have enough resources to lock each individual
row. In such cases, the database can acquire locks on a single data or index page, group of pages, or an entire table. The granularity of locks depends on the memory available to the database. Servers with more memory can support more concurrent users because they can acquire and release more locks.

We can give locking hints to the database while sending queries that would help us override locking decisions made by the database. For instance, in SQL Server. we can specify the ROWLOCK hint with the UPDATE statement to convince SQL Server to lock each row affected by that data modification.

While dealing with transactions, the type of locking used depends on the Transaction Isolation Level specified for that particular transaction.
SQL Server supports implicit and explicit transactions. By default, each INSERT, UPDATE, and DELETE statement runs within an implicit transaction.Explicit transactions, on the other hand, must be specified by the programmer. Such transactions are included in

There are two common types of locks present in all databases- Shared locks and Exclusive locks.
Shared locks (S) allow transactions to read data with SELECT statements. Other connections are allowed to read the data at the same time; however, no transactions are allowed to modify data until the shared locks are released.
Exclusive locks (X) completely lock the resource from any type of access including reads. They are issued when data is being modified through INSERT, UPDATE and DELETE statements

Locking/Transaction Isolation levels solve the following concurrency problems:
  • Lost Updates: Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data.
  • Dirty Read: Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.
  • Nonrepeatable Read: Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time.
  • Phantom reads: These occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding read shows a row that did not exist in the original read.

These concurrency problems can be solved using the correct transaction isolation level in our programs. To balance between high concurrency and data integrity, we can choose from the following isolation levels:

  • READ UNCOMMITTED: This is the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read. Does not protect against dirty read, phantom read or non-repeatable reads.
  • READ COMMITTED: The shared lock is held for the duration of the transaction, meaning that no other transactions can change the data at the same time. Protects against dirty reads, but the other problems remain.
  • REPEATABLE READ: This setting disallows dirty and non-repeatable reads. But phantom reads are still possible.
  • SERIALIZABLE: This is the highest level, where transactions are completely isolated from one another. Protects against dirty read, phantom read and non-repeatable read.

There are other types of locks available in the database such as 'update lock', 'intent lock', 'schema lock' etc. The update lock prevents deadlocks in the database. For more information about these check out the following links:

Tuesday, December 13, 2005

Deleting duplicate rows from a table

This seems to a favourite question during interviews - How to delete duplicate rows in a table?
Well, we can have many strategies:

- Capture one instance of the unique rows using a SELECT DISTINCT .., and dump the results into a temp table. Delete all of the rows from the original table and then insert the rows from the temp table back into the original table.

- If there is an identity column, the we can also write a query to get all the rows that have duplicate entries :
SELECT Field1, Field2, Count(ID)
GROUP BY Foo1.Field1, Foo1.Field2
HAVING Count(Foo1.ID) > 1
Then loop thru the resultset and get a cursor for the query
"SELECT Field1, Field2, ID
WHERE Field1 = @FIELD1 and Field2 = @FIELD2".
Use the cursor to delete all the duplicate rows returned except one.

The following links discuss the above techniques and some more:


Thursday, December 08, 2005

Tools used in SDLC

I decided to list down all the available tools for different phases of a SDLC. Here it goes:

Requirements Management:
- DOORS: Dynamic Object Oriented Requirements System
- RequisitePro : A requirements management tool from Rational Software
- Requirements Traceability Management (RTM)

- ArgoUML:
-Rational Rose
-Enterprise Architect
-Rational Rose Real Time
-Visio :

Code Generation:
- Rational Application Developer
-Object Frontier
-JVider is Java Visual Interface Designer which helps with GUI building for Swing applications

Unit Testing:

Load Testing:
SilkPerformer is load performance testing tool.
Microsoft Web Application Stress Tool

Web Testing/ GUI testing:
SilkTest from Segue Software
Rational ROBOT

Configuration Management Tools:
Concurrent Version System - CVS
Visual SourceSafe


Will keep augmenting this list...

Cool Free Data Models

While browsing the web, I came across this cool site which has over 200 data-models samples.
Great for anyone who wants to learn about data-modelling or have a template for a given problem domain.

The link is here :

Some interesting data models I found in this site are - datamodels for Hierarchies , e-commerce etc. Worth a perusal.

Data Modelling Vs OO Modelling

While working on projects, often I had tussles with erst-while database developers who used to drive the design of the system using Data Models. Most of them came from non-OO background and were apt in ER diagrams. But the porblem is that a Data Model cannot and should not drive the OO model. I came across this excellent article by Scott Ambler discussing this dilemma.

In the above article, the author argues that object schemas can be quite different from physical data schemas. Using two lucid examples he shows how different object schemas can map to the same data schema and how it is possible for a single object schema to correctly map to several data schemas.
There are situations where OO varies significantly from ERD, for example in cases of inheritance and non-persistent classes.
Aslo, OO models can convey more specification on the class diagram in representing certain things like associations than ERD can.

I personally believe that if U are building a OO system, then we should first go for OO Class modelling and then think about resolving the OO-relational database impedance mismatch.

What are Multi-dimensional databases?

A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications. Multidimensional databases are frequently created using input from existing relational databases.

A multidimensional database - or a multidimensional database management system (MDDBMS) - implies the ability to rapidly process the data in the database so that answers can be generated quickly. A number of vendors provide products that use multidimensional databases. Approaches to how data is stored and the user interface vary.

Conceptually, a multidimensional database uses the idea of a data cube to represent the dimensions of data available to a user. For example, "sales" could be viewed in the dimensions of product model, geography, time, or some additional dimension. In this case, "sales" is known as the measure attribute of the data cube and the other dimensions are seen as feature attributes. Additionally, a database creator can define hierarchies and levels within a dimension (for example, state and city levels within a regional hierarchy).

More information can be at:

Tuesday, December 06, 2005

Ruminating on Database Cursors

Cursors are database objects that allow us to manipulate data in a set in a row-by-row basis or on a group of rows at one time. Cursors are quite popular among database developers as the row can be updated at the same time itself. There is no need to fire a separate SQL query.

But there is also a lot of confusion regarding the exact definition of cursors, because different people use it in different contexts. For e.g. when a database administrator talks of cursors, he means the server-side cursor he uses inside stored procedures. But if a JDBC application developer talks about a cursor, he may mean the pointer in the JDBC ResultSet object.
I did a bit of reseach on the web, and finally cleared a few cobwebs from the head. Here's a snapshot of what I learned:
  • There are implicit cursors and explicit cursors. An implicit cursor is - well, that's just a piece of memory used by the database server to work with resulsets internally. Implicit cursors are not accessible via an externally exposed API, whereas explicit cursors are.
  • Server-side cursors and Client-side cursors: The difference between a client-side cursor and a server-side cursor in classic ADO is substantial and can be confusing. A server-side cursor allows you to manipulate rows on the server through calls on the client, usually storing all or a portion of the data in TEMPDB. The client-side cursor fetches data into a COM object on the client. Its name comes from the fact that the buffered data on the client exhibits cursor-like behaviors you can scroll through and, potentially, update it. The behavior difference manifests itself in a few ways. Fetching a large resultset into a client cursor causes a big performance hit on the initial fetch, and server cursors result in increased memory requirements for SQL Server and require a dedicated connection all the time you're fetching. Client-side cursors can be dangerous because using them has the side effect of retrieving all records from the server that are a result of your command/SQL statement. If you execute a procedure or SQL statement that could retrieve 10,000 records, and use a client-side cursor, you had better have enough memory to hold 10,000 records on the client. Also, control will not return to your code/application until all of these records are retrieved from the server, which can make your application appear slow to users. If you suspect that you are going to retrieve a large number of rows, client-side cursors are not the way to go.
  • DYNAMIC, STATIC, and KEYSET cursors: Dynamic cursors will show changes made on the base table as you scroll through the cursor. Static cursors copy the base table, to the tempdb. The cursor then reads from the tempdb, so any changes happening on the base table will not be reflected in the cursors scrolling. Keysets are in between Dynamic and Static cursors. A keyset will copy the base table's selected records keys into the tempdb, so the cursor will select the rows from the tempdb, but the data from the base table. So change to the base table will be seen, but new record inserts will not be.

Next, I tried to find out how these cursors can be manipulated using .NET and Java APIs.

In .NET, the DataReader object acts as a wrapper class for a server-side cursor. The DataReader provides a read-only and forward-only cursor. More info about this is here.

In JDBC, we handle everything using ResultSet objects. Since JDBC 2.0/3.0, we have ResultSets that are scrollable backwards and also that allow update operations. So I believe if U configure a ResultSet to be scrollable backwards and allow update opeartions, U are essentially dealing with a server-side cursor.

Code snippet:

Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("SELECT COF_NAME, PRICE FROM COFFEES");
The above code makes the ResultSet as scroll-sensitive, so it will reflect changes made to it while it is open. This is the equivalent of a dynamic cursor. In a dynamic cursor, the engine repeats the query each time the cursor is accessed so new members are added and existing members are removed as the database processes changes to the database.
The second argument is one of two ResultSet constants for specifying whether a result set is read-only or updatable: CONCUR_READ_ONLY and CONCUR_UPDATABLE.
Specifying the constant TYPE_FORWARD_ONLY creates a nonscrollable result set, that is, one in which the cursor moves only forward. If you do not specify any constants for the type and updatability of a ResultSet object, you will automatically get one that is TYPE_FORWARD_ONLY and CONCUR_READ_ONLY (as is the case when you are using only the JDBC 1.0 API). You might keep in mind, though, the fact that no matter what type of result set you specify, you are always limited by what your DBMS and driver actually provide.

Monday, December 05, 2005

Quick HEX dump utility in DOS

Recently, I wanted to see the hex dump of a binary file on a server. The server machine did not have my favourite hex GUI editors. I struggled to download a hex dump utility from the web, when my friend showed me how to use the 'debug.exe' tool in DOS to get a hex dump of a file.

On the DOS prompt type :
The 'd' (dump) option would dump the first 128 bytes. Typing 'd' again would give the hex dump of the next block of bytes.

U can type '?' to get a list of all the commands. Type 'q' to quit from debug mode.

Cool APIs for packet sniffing

A few years back, I often used to lament on the lack of packet sniffing API's available in Java or C#.NET. I knew of the WinPcap library, but I wanted a wrapper around it. Fortunately some good guys in the open source community have build these wrappers around WinPcap.

Check out the following links: (A C# API) (A Java API)

Friday, December 02, 2005

DataReader Vs DataSet in ADO.NET

Came across a wonderful article on MSDN which contrasts the use of DataReader Vs DataSet.
The article is available at :

Salient points of the article:
  • A DataReader is a stream of data that is returned from a database query. When the query is executed, the first row is returned to the DataReader via the stream. The stream then remains connected to the database, poised to retrieve the next record. The DataReader reads one row at a time from the database and can only move forward, one record at a time. As the DataReader reads the rows from the database, the values of the columns in each row can be read and evaluated, but they cannot be edited.
  • The DataReader can only get its data from a data source through a managed provider. The DataSet can also get its data from a data source via a managed provider, but the data source can also be loaded manually, even from an XML file on a hard drive.
  • The DataReader supports access to multiple resultsets, one at a time, in the order they are retrieved. (Methods to checkout are NextResult() and Read() )
  • When the SqlDataAdapter's Fill method is executed it opens its associated SqlConnection object (if not already open) and issues its associated SqlCommand object against the SqlConnection. Behind the scenes, a SqlDataReader is created implicitly and the rowset is retrieved one row at a time in succession and sent to the DataSet. Once all of the data is in the DataSet, the implicit SqlDataReader is destroyed and the SqlConnection is closed.
  • The DataSet's disconnected nature allows it to be transformed into XML and sent over the wire via HTTP if appropriate. This makes it ideal as the return vehicle from business-tier objects and Web services. A DataReader cannot be serialized and thus cannot be passed between physical-tier boundaries where only string (XML) data can go.
  • There are other times when a DataReader can be the right choice, such as when populating a list or retrieving 10,000 records for a business rule. When a huge amount of data must be retrieved to a business process, even on a middle tier, it can take a while to load a DataSet, pass the data to it on the business tier from the database, and then store it in memory. The footprint could be quite large and with numerous instances of it running (such as in a Web application where hundreds of users may be connected), scalability would become a problem. If this data is intended to be retrieved and then traversed for business rule processing, the DataReader could speed up the process as it retrieves one row at a time and does not require the memory resources that the DataSet requires.

User Defined Functions Vs Stored Procedures

Most databases have built-in functions, such as GetDate, DateAdd, and ObjectName. Such built-in functions are useful, but you can’t alter their functionality in any way, and that’s why UDFs are so powerful and necessary. UDFs allow you to add custom solutions for unique application-specific problems.
A UDF is actually a kind of subroutine that contains T-SQL statements and can return a scalar value or a table value. Hence U can call a UDF from a SELECT statement. Whereas a Stored Procedure needs to be invoked using the 'EXEC' command.
So when to use what? The answer depends on the problem situation. If you have an operation such as a query with a FROM clause that requires a rowset be drawn from a table or set of tables, then a function will be your appropriate choice. However, when you want to use that same rowset in your application the better choice would be a stored procedure.

The other differences btw a UDF and a Stored Procedure are as follows:
  • A stored prodecure supports output parameters, whereas a UDF does not.
  • A UDF can have a return value (scalar or table). A SPROC can create a table, but cannot return it. It can only return a integer (as status code)
  • SPROCs are powerful enough to change server environment variables. UDFs are not.
  • SPROCs can be used in the FOR XML clause but UDFs can't be.
  • Whenever an error occurs in a UDF, the function stops immediately. In a SPROC, we must include error handling code, or the next statement is processed.
  • U can 'join' to a UDF, but cannot 'join' to a SPROC.